Following
my introduction to PCA, I will demonstrate how to apply and visualize PCA in R. There are many packages and functions that can apply PCA in R. In this post I will use the function prcomp from the stats package. I will also show how to visualize PCA in R using Base R graphics. However, my favorite visualization function for PCA is ggbiplot, which is implemented by
Vince Q. Vu and available on
github. Please, let me know if you have better ways to visualize PCA in R.
- # Load data
- data(iris)
- head(iris, 3)
复制代码- # log transform
- log.ir <- log(iris[, 1:4])
- ir.species <- iris[, 5]
-
- # apply PCA - scale. = TRUE is highly
- # advisable, but default is FALSE.
- ir.pca <- prcomp(log.ir,
- center = TRUE,
- scale. = TRUE)
复制代码- # print method
- print(ir.pca)
复制代码- # summary method
- summary(ir.pca)
复制代码- # Predict PCs
- predict(ir.pca,
- newdata=tail(log.ir, 2))
复制代码- library(devtools)
- install_github("ggbiplot", "vqv")
-
- library(ggbiplot)
- g <- ggbiplot(ir.pca, obs.scale = 1, var.scale = 1,
- groups = ir.species, ellipse = TRUE,
- circle = TRUE)
- g <- g + scale_color_discrete(name = '')
- g <- g + theme(legend.direction = 'horizontal',
- legend.position = 'top')
- print(g)
复制代码- require(caret)
- trans = preProcess(iris[,1:4],
- method=c("BoxCox", "center",
- "scale", "pca"))
- PC = predict(trans, iris[,1:4])
- # Retained PCs
- head(PC, 3)
复制代码