Home > Data Mining Algorithms > Anomaly Detection > Anomaly Detection Model Vie... > Association > AR Troubleshooting, Model V... > Decision Tree > Generalized Linear Models > GLM Model Viewers and Algor... > k-Means > KM Model Viewer and Algorit... > KM Algorithm Settings
The k-Means (KM) algorithm supports these settings:
Number of Clusters is the maximum number of leaf clusters generated by the algorithm. The default is 10. k-Means tries to generate this number of leaf clusters.
Growth Factor is a number greater than 1 and less than or equal to 5. This value specifies the growth factor for memory allocated to hold cluster data; the default is 2.
Convergence Tolerence must be between 0.001 (slow build) and 0.1 (fast build); the default is 0.01. Increasing this value builds models faster, but with possibly lower accuracy.
Distance Function specifies how the algorithm calculates distance. The default distance function is Euclidean; other distance functions are Cosine and Fast Cosine.
Number of Iterations must be between 2 (slow build) and 30 (fast build); the default is 3. This value is the maximum number of iterations for the k-Means algorithm.
Min Percent Attribute Support is a number greater than or equal to 0. 0 and less than or equal to 1.0. This value is used to filter out rule predicates that do not meet the support threshold; setting this value too high can result in very short or even empty rules.
The default value is 0.1. The default value allows you to highlight the more important predicates instead producing a long list of predicates that have very low support.
In extreme cases, for very sparse data, all attribute predicates may be filtered out so that no rule is produced. If no rule is produced, you can lower the support threshold and rebuild the model to make the algorithm produce rules even if the predicate support is very low.
Number of Histogram Bins is a positive integer; the default value is 10. This value specifies the number of bins in the attribute histogram produced by k-Means. The bin boundaries for each attribute are computed globally on the entire training data set. The binning method is equi-width. All attributes have the same number of bins with the exception of attributes with a single value that have only one bin.
Split Criterion is either Variance or Size. The default is Variance.