mlpack_kmeans(1) - Linux man page

Name

kmeans - k-means clustering

Synopsis

 kmeans [-h] [-v] -c int -i string [-e] -f [-p] -l [-m int] [-o string] [-O double] [-s int]

Description

This program performs K-Means clustering on the given dataset, storing the learned cluster assignments either as a column of labels in the file containing the input dataset or in a separate file. Empty clusters are not allowed by default; when a cluster becomes empty, the point furthest from the centroid of the cluster with maximum variance is taken to fill that cluster.

Required Options

--clusters (-c) [int]

Number of clusters to find.
--inputFile (-i) [string]: Input dataset to perform clustering on.

Options

--allow_empty_clusters (-e)

Allow empty clusters to be created.
--fast_kmeans (-f): Use the experimental fast k-means algorithm by Pelleg and Moore
--help (-h): Default help info.
--in_place (-p): If specified, a column of the learned cluster assignments will be added to the input dataset file. In this case, --outputFile is not necessary.
--info [string]: Get help on a specific module or option. Default value ''.
--labels_only (-l): Only output labels into output file.
--max_iterations (-m) [int]: Maximum number of iterations before K-Means terminates. Default value 1000.
--outputFile (-o) [string]: File to write output labels or labeled data to. Default value 'output.csv'.
--overclustering (-O) [double]: Finds (overclustering * clusters) clusters, then merges them together until only the desired number of clusters are left. Default value 1.
--seed (-s) [int]: Random seed. If 0, 'std::time(NULL)' is used. Default value 0.
--verbose (-v): Display informational messages and the full list of parameters and timers at the end of execution.

Additional Information

For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of MLPACK.