Reduce the dimensionality of a set of features


fastmap - Reduce the dimensionality of a set of features


fastmap [options] DATASET OUTPUT


The FastMap algorithm reduces the number of dimensions in a data set whilst attempting to maintain relative distances between the data points, as published in a paper by Christos Faloutsos and King-Ip Lin. It could be thought of as a cheap Multi-Dimensional Scaling which, in some ways, it is. The mapping is performed in linear time (MDS isn't) which makes it suitable for large datasets. This linearity is at the cost of acheiving only an approximation of the best mapping. As the algorithm is intended for mapping high-dimensional data sets into two or three dimensions for visualisation, this loss is not often important.

The algorithm reduces the dimensionality by one by finding two distant 'pivot' objects in the data set and mapping all objects in relation to the pivot points. The procedure is then repeated until the desired number of dimensions is acheived.

The fastmap program is a wrapper for the utility::fastmap class. The options supported are as follows:

-b Use binary mode for reading and writing matrices (quicker and smaller, but not as flexible)

-d fn The distance function to use. Can be: basic; euclidean; cosine.

-k The number of dimensions to convert to. Defaults to 3

-p The first line of the text file is normally assumed to be a header containing the attributes of the matrix (i.e. number of columns / rows). This option indicates the file is plain text file without the header and that the parameters should be automatically deduced.

-s Calculate the stress induced by the mapping. This gives a goodness of fit of the resulting mapping. The closer to zero, the better the map. Calculating the stress involves a pair-wise comparison of all elements in the data set: as such, this option can take some time to complete.

Implemented by fastmap.cpp.
The FastMap algorithm is published in a paper by Christos Faloutsos and King-Ip (David) Lin. "FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets", SIGMOD Record (ACM Special Interest Group on Management of Data), 24, 2, issn:0163-5808, pp. 163--174, June, 1995.

Copyright ©1996-2006 Steven Blackburn - About MaART - MaART on SourceForge