Utility to calculate clusters in a set of features

Name

cluster - Utility to calculate clusters in a set of features

Synopsis

cluster [options] K DATASET RESULTS

Description

Utility to calculate clusters in a set of features that are in the DATASET matrix file. The number of clusters is given by K and results are output to a set of files prefixed by RESULTS.

The only clustering algorithm currently supported is k-means.

The options supported are as follows:

-a Automatically determine the best value for k (up to a maximum of K). The solution for each value of k is tried and the goodness of the fit evaluated. More clusters will inevitably fit the data more closely. To offset this, the Schwarz Criterion is applied, which applies a penalty that increases with the value of k. The optimum value for k is arrived at once the Schwarz Criterion no longer decreases.

-b Use binary mode for reading and writing matrices (quicker and smaller, but not as flexible)

-d fn The distance function to use. Can be: basic; euclidean; cosine.

-s Summarise the clusters. This lists the location and size of sections in the DATASET which are from same cluster. This is intended to be used for sequential data, such as time-series audio features.

Remarks:
Implemented by cluster.cpp.

Copyright ©1996-2006 Steven Blackburn - About MaART - MaART on SourceForge