程序代码
#K-means聚类
#案例:iris鸢尾花聚类
library(cluster) #加载包
library(useful) #加载该包,用于下面作图
data(iris)
iris<-iris
dim(iris) #显示数据维度
head(iris) #最后一列是鸢尾花的分类
table(iris$Species) #显示iris数据集Species列中各个值出现的频次
iris_new<-iris[,-5];head(iris_new) #去掉最后一列
irisK3<-kmeans(iris_new,3);irisK3 #进行k-means聚类。注意:实际问题中注意k的取值
t=table(iris$Species,irisK3$cluster);t #查看聚类结果
plot(irisK3,iris,class="Species") #将聚类结果与鸢尾花初始类别绘制在一个图进行比对
运行结果
> irisK3<-kmeans(iris_new,3);irisK3 #进行k-means聚类
K-means clustering with 3 clusters of sizes 50, 38, 62
Cluster means:
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.006000 3.428000 1.462000 0.246000
2 6.850000 3.073684 5.742105 2.071053
3 5.901613 2.748387 4.393548 1.433871
Clustering vector:
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[33] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 3 3 3 3 3 3 3 3 3 3 3
[65] 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[97] 3 3 3 3 2 3 2 2 2 2 3 2 2 2 2 2 2 3 3 2 2 2 2 3 2 3 2 3 2 2 3 3
[129] 2 2 2 2 2 3 2 2 2 2 3 2 2 2 3 2 2 2 3 2 2 3
Within cluster sum of squares by cluster:
[1] 15.15100 23.87947 39.82097
(between_SS / total_SS = 88.4 %)
Available components:
[1] "cluster" "centers" "totss" "withinss"
[5] "tot.withinss" "betweenss" "size" "iter"
[9] "ifault"
> t=table(iris$Species,irisK3$cluster);t
1 2 3
setosa 50 0 0
versicolor 0 2 48
virginica 0 36 14
##观察上面结果可知,聚类结果还不错
作图结果: