dimxu 发表于 2012-4-10 19:07
高人能讲的具体点吗 我是个初学者,不是太了解这些过程。
十分感谢
要看呢具体做什么吧,如果你只关心结果而不是其中的算法,那样就用proc stdize对变量进行标准化,然后用proc cluster进行聚类,出来的结果就会准确。关于cluster中的method选择,可以参考下面的文档。如果你的数据量比较大的话,建议使用proc fastclus
PROC CLUSTER METHOD=name <options> ;
The PROC CLUSTER statement starts the CLUSTER procedure, specifies a clustering method, and optionally specifies details for clustering methods, data sets, data processing, and displayed output.
The METHOD= specification determines the clustering method used by the procedure. Any one of the following 11 methods can be specified for name:
AVERAGE | AVE
requests average linkage (group average, unweighted pair-group method using arithmetic averages, UPGMA). Distance data are squared unless you specify the NOSQUARE option.
CENTROID | CEN
requests the centroid method (unweighted pair-group method using centroids, UPGMC, centroid sorting, weighted-group method). Distance data are squared unless you specify the NOSQUARE option.
COMPLETE | COM
requests complete linkage (furthest neighbor, maximum method, diameter method, rank order typal analysis). To reduce distortion of clusters by outliers, the TRIM= option is recommended.
DENSITY | DEN
requests density linkage, which is a class of clustering methods using nonparametric probability density estimation. You must also specify either the K=, R=, or HYBRID option to indicate the type of density estimation to be used. See also the MODE= and DIM= options in this section.
EML
requests maximum-likelihood hierarchical clustering for mixtures of spherical multivariate normal distributions with equal variances but possibly unequal mixing proportions. Use METHOD=EML only with coordinate data. See the PENALTY= option for details. The NONORM option does not affect the reported likelihood values but does affect other unrelated criteria. The EML method is much slower than the other methods in the CLUSTER procedure.
FLEXIBLE | FLE
requests the Lance-Williams flexible-beta method. See the BETA= option in this section.
MCQUITTY | MCQ
requests McQuitty’s similarity analysis (weighted average linkage, weighted pair-group method using arithmetic averages, WPGMA).
MEDIAN | MED
requests Gower’s median method (weighted pair-group method using centroids, WPGMC). Distance data are squared unless you specify the NOSQUARE option.
SINGLE | SIN
requests single linkage (nearest neighbor, minimum method, connectedness method, elementary linkage analysis, or dendritic method). To reduce chaining, you can use the TRIM= option with METHOD=SINGLE.
TWOSTAGE | TWO
requests two-stage density linkage. You must also specify the K=, R=, or HYBRID option to indicate the type of density estimation to be used. See also the MODE= and DIM= options in this section.
WARD | WAR
requests Ward’s minimum-variance method (error sum of squares, trace W). Distance data are squared unless you specify the NOSQUARE option. To reduce distortion by outliers, the TRIM= option is recommended. See the NONORM option.