楼主: ReneeBK
3728 17

[休闲其它] 【独家发布】R In Action:Data Analysis and Graphics with R [推广有奖]

  • 1关注
  • 62粉丝

VIP

已卖:4897份资源

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49635 个
通用积分
55.6937
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57805 点
帖子
4005
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

楼主
ReneeBK 发表于 2014-11-10 04:27:57 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币


[size=+1]R in Action
Data Analysis and Graphics with R
Second edition of this book is now available

Robert I. Kabacoff

August 2011 | 472 pages
ISBN: 9781935182399

http://www.manning.com/kabacoff/
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Graphics Analysis GRAPHIC Analysi alysis 2011

已有 1 人评分经验 收起 理由
苹果六人行 + 80 精彩帖子

总评分: 经验 + 80   查看全部评分

本帖被以下文库推荐

沙发
ReneeBK 发表于 2014-11-10 04:28:30
  1. #------------------------------------------------------#
  2. # R in Action: Chapter 1                               #
  3. #------------------------------------------------------#

  4. # In the following code q() is commented out so that
  5. # you don't quit the session

  6. # Listing 1.1 - A Sample R session

  7. age <- c(1, 3, 5, 2, 11, 9, 3, 9, 12, 3)
  8. weight <- c(4.4, 5.3, 7.2, 5.2, 8.5, 7.3, 6, 10.4,
  9.     10.2, 6.1)
  10. mean(weight)
  11. sd(weight)
  12. cor(age, weight)
  13. plot(age, weight)
  14. # q()

  15. # Listing 1.2 - An example of commands used to manage
  16. # the R Workspace. Change the next line to one of your
  17. # directories

  18. setwd("C:/myprojects/project1")
  19. options()
  20. options(digits=3)
  21. x <- runif(20)
  22. summary(x)
  23. hist(x)
  24. savehistory()
  25. save.image()
  26. # q()

  27. # Listing 1.3 - Working with a new package

  28. help.start()
  29. install.packages("vcd")
  30. help(package = "vcd")
  31. library(vcd)
  32. help(Arthritis)
  33. Arthritis
  34. example(Arthritis)
  35. # q()
复制代码


藤椅
ReneeBK 发表于 2014-11-10 04:35:30
  1. #--------------------------------------------------------#
  2. # R in Action: Chapter 2                                 #
  3. #--------------------------------------------------------#

  4. #  Creating vectors

  5. a <- c(1, 2, 5, 3, 6, -2, 4)
  6. b <- c("one", "two", "three")
  7. c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)

  8. # Using vector subscripts

  9. a <- c(1, 2, 5, 3, 6, -2, 4)
  10. a[3]
  11. a[c(1, 3, 5)]
  12. a[2:6]

  13. # Listing 2.1 - Creating Matrices

  14. y <- matrix(1:20, nrow = 5, ncol = 4)
  15. y
  16. cells <- c(1, 26, 24, 68)
  17. rnames <- c("R1", "R2")
  18. cnames <- c("C1", "C2")
  19. mymatrix <- matrix(cells, nrow = 2, ncol = 2, byrow = TRUE,
  20.     dimnames = list(rnames, cnames))
  21. mymatrix
  22. mymatrix <- matrix(cells, nrow = 2, ncol = 2, byrow = FALSE,
  23.     dimnames = list(rnames, cnames))
  24. mymatrix

  25. # Listing 2.2 - Using matrix subscripts

  26. x <- matrix(1:10, nrow = 2)
  27. x
  28. x[2, ]
  29. x[, 2]
  30. x[1, 4]
  31. x[1, c(4, 5)]

  32. # Listing 2.3 - Creating an array

  33. dim1 <- c("A1", "A2")
  34. dim2 <- c("B1", "B2", "B3")
  35. dim3 <- c("C1", "C2", "C3", "C4")
  36. z <- array(1:24, c(2, 3, 4), dimnames = list(dim1,
  37.     dim2, dim3))
  38. z

  39. # Listing 2.4 - Creating a dataframe

  40. patientID <- c(1, 2, 3, 4)
  41. age <- c(25, 34, 28, 52)
  42. diabetes <- c("Type1", "Type2", "Type1", "Type1")
  43. status <- c("Poor", "Improved", "Excellent", "Poor")
  44. patientdata <- data.frame(patientID, age, diabetes,
  45.     status)
  46. patientdata

  47. # Listing 2.5 - Specifying elements of a dataframe

  48. patientdata[1:2]
  49. patientdata[c("diabetes", "status")]
  50. patientdata$age

  51. # Listing 2.6 - Using factors

  52. patientID <- c(1, 2, 3, 4)
  53. age <- c(25, 34, 28, 52)
  54. diabetes <- c("Type1", "Type2", "Type1", "Type1")
  55. status <- c("Poor", "Improved", "Excellent", "Poor")
  56. diabetes <- factor(diabetes)
  57. status <- factor(status, order = TRUE)
  58. patientdata <- data.frame(patientID, age, diabetes,
  59.     status)
  60. str(patientdata)
  61. summary(patientdata)

  62. #  Listing 2.7 - Creating a list

  63. g <- "My First List"
  64. h <- c(25, 26, 18, 39)
  65. j <- matrix(1:10, nrow = 5)
  66. k <- c("one", "two", "three")
  67. mylist <- list(title = g, ages = h, j, k)
  68. mylist
复制代码


板凳
ReneeBK 发表于 2014-11-10 04:43:44
  1. #-----------------------------------------------------#
  2. # R in Action: Chapter 3                              #
  3. # requires that the Hmisc package has been installed  #
  4. # install.packages('Hmisc')                           #
  5. #-----------------------------------------------------#

  6. # pause after each graph
  7. par(ask = TRUE)

  8. # --Section 3.1--

  9. attach(mtcars)
  10. plot(wt, mpg)
  11. abline(lm(mpg ~ wt))
  12. title("Regression of MPG on Weight")
  13. detach(mtcars)

  14. # --Section 3.2--

  15. dose <- c(20, 30, 40, 45, 60)
  16. drugA <- c(16, 20, 27, 40, 60)
  17. drugB <- c(15, 18, 25, 31, 40)
  18. plot(dose, drugA, type = "b")

  19. # --Section 3.3--

  20. opar <- par(no.readonly = TRUE)
  21. par(lty = 2, pch = 17)
  22. plot(dose, drugA, type = "b")
  23. par(opar)

  24. plot(dose, drugA, type = "b", lty = 2, pch = 17)
  25. plot(dose, drugA, type = "b", lyt = 3, lwd = 3, pch = 15,
  26.     cex = 2)

  27. n <- 10
  28. mycolors <- rainbow(n)
  29. pie(rep(1, n), labels = mycolors, col = mycolors)
  30. mygrays <- gray(0:n/n)
  31. pie(rep(1, n), labels = mygrays, col = mygrays)

  32. # Listing 3.1 - Using graphical parameters to control
  33. # graph appearance

  34. dose <- c(20, 30, 40, 45, 60)
  35. drugA <- c(16, 20, 27, 40, 60)
  36. drugB <- c(15, 18, 25, 31, 40)
  37. opar <- par(no.readonly = TRUE)
  38. par(pin = c(2, 3))
  39. par(lwd = 2, cex = 1.5)
  40. par(cex.axis = 0.75, font.axis = 3)
  41. plot(dose, drugA, type = "b", pch = 19, lty = 2, col = "red")
  42. plot(dose, drugB, type = "b", pch = 23, lty = 6, col = "blue",
  43.     bg = "green")
  44. par(opar)

  45. # --Section 3.4--

  46. plot(dose, drugA, type = "b", col = "red", lty = 2,
  47.     pch = 2, lwd = 2, main = "Clinical Trials for Drug A",
  48.     sub = "This is hypothetical data",
  49.     xlab = "Dosage", ylab = "Drug Response", xlim = c(0, 60),
  50.     ylim = c(0, 70))


  51. # Listing 3.2 - An Example of Custom Axes

  52. x <- c(1:10)
  53. y <- x
  54. z <- 10/x
  55. opar <- par(no.readonly = TRUE)
  56. par(mar = c(5, 4, 4, 8) + 0.1)

  57. plot(x, y, type = "b", pch = 21, col = "red", yaxt = "n",
  58.     lty = 3, ann = FALSE)
  59. lines(x, z, type = "b", pch = 22, col = "blue", lty = 2)
  60. axis(2, at = x, labels = x, col.axis = "red", las = 2)
  61. axis(4, at = z, labels = round(z, digits = 2), col.axis = "blue",
  62.     las = 2, cex.axis = 0.7, tck = -0.01)
  63. mtext("y=1/x", side = 4, line = 3, cex.lab = 1, las = 2,
  64.     col = "blue")
  65. title("An Example of Creative Axes", xlab = "X values",
  66.     ylab = "Y=X")
  67. par(opar)

  68. # Listing 3.3 - Comparing Drug A and Drug B response by dose

  69. dose <- c(20, 30, 40, 45, 60)
  70. drugA <- c(16, 20, 27, 40, 60)
  71. drugB <- c(15, 18, 25, 31, 40)
  72. opar <- par(no.readonly = TRUE)
  73. par(lwd = 2, cex = 1.5, font.lab = 2)
  74. plot(dose, drugA, type = "b", pch = 15, lty = 1, col = "red",
  75.     ylim = c(0, 60), main = "Drug A vs. Drug B", xlab = "Drug Dosage",
  76.     ylab = "Drug Response")
  77. lines(dose, drugB, type = "b", pch = 17, lty = 2,
  78.     col = "blue")
  79. abline(h = c(30), lwd = 1.5, lty = 2, col = "grey")
  80. library(Hmisc)
  81. minor.tick(nx = 3, ny = 3, tick.ratio = 0.5)
  82. legend("topleft", inset = 0.05, title = "Drug Type",
  83.     c("A", "B"), lty = c(1, 2), pch = c(15, 17), col = c("red",
  84.         "blue"))
  85. par(opar)

  86. # --Section 3.4.5--

  87. # Example of labeling points

  88. attach(mtcars)
  89. plot(wt, mpg, main = "Milage vs. Car Weight", xlab = "Weight",
  90.     ylab = "Mileage", pch = 18, col = "blue")
  91. text(wt, mpg, row.names(mtcars), cex = 0.6, pos = 4,
  92.     col = "red")
  93. detach(mtcars)

  94. # View font families
  95. opar <- par(no.readonly = TRUE)
  96. par(cex = 1.5)
  97. plot(1:7, 1:7, type = "n")
  98. text(3, 3, "Example of default text")
  99. text(4, 4, family = "mono", "Example of mono-spaced text")
  100. text(5, 5, family = "serif", "Example of serif text")
  101. par(opar)

  102. # --Section 3.5--

  103. # combining graphs

  104. # Figure 3.14
  105. attach(mtcars)
  106. opar <- par(no.readonly = TRUE)
  107. par(mfrow = c(2, 2))
  108. plot(wt, mpg, main = "Scatterplot of wt vs. mpg")
  109. plot(wt, disp, main = "Scatterplot of wt vs disp")
  110. hist(wt, main = "Histogram of wt")
  111. boxplot(wt, main = "Boxplot of wt")
  112. par(opar)
  113. detach(mtcars)

  114. # Figure 3.15
  115. attach(mtcars)
  116. opar <- par(no.readonly = TRUE)
  117. par(mfrow = c(3, 1))
  118. hist(wt)
  119. hist(mpg)
  120. hist(disp)
  121. par(opar)
  122. detach(mtcars)

  123. # Figure 3.16
  124. attach(mtcars)
  125. layout(matrix(c(1, 1, 2, 3), 2, 2, byrow = TRUE))
  126. hist(wt)
  127. hist(mpg)
  128. hist(disp)
  129. detach(mtcars)

  130. # Figure 3.17
  131. attach(mtcars)
  132. layout(matrix(c(1, 1, 2, 3), 2, 2, byrow = TRUE),
  133.     widths = c(3, 1), heights = c(1, 2))
  134. hist(wt)
  135. hist(mpg)
  136. hist(disp)
  137. detach(mtcars)

  138. # Listing 3.4 - Fine placement of figures in a graph

  139. opar <- par(no.readonly = TRUE)
  140. par(fig = c(0, 0.8, 0, 0.8))
  141. plot(mtcars$wt, mtcars$mpg, xlab = "Miles Per Gallon",
  142.     ylab = "Car Weight")
  143. par(fig = c(0, 0.8, 0.55, 1), new = TRUE)
  144. boxplot(mtcars$wt, horizontal = TRUE, axes = FALSE)
  145. par(fig = c(0.65, 1, 0, 0.8), new = TRUE)
  146. boxplot(mtcars$mpg, axes = FALSE)
  147. mtext("Enhanced Scatterplot", side = 3, outer = TRUE,
  148.     line = -3)
  149. par(opar)
复制代码


报纸
ReneeBK 发表于 2014-11-10 04:47:18
  1. #---------------------------------------------------------#
  2. # R in Action: Chapter 4                                  #
  3. # requires that the reshape and sqldf packages have       #
  4. # been installed                                          #
  5. # install.packages(c('reshape', 'sqldf'))                 #
  6. #---------------------------------------------------------#

  7. # Listing 4.1 - Creating the leadership data frame

  8. manager <- c(1, 2, 3, 4, 5)
  9. date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08",
  10.     "5/1/09")
  11. gender <- c("M", "F", "F", "M", "F")
  12. age <- c(32, 45, 25, 39, 99)
  13. q1 <- c(5, 3, 3, 3, 2)
  14. q2 <- c(4, 5, 5, 3, 2)
  15. q3 <- c(5, 2, 5, 4, 1)
  16. q4 <- c(5, 5, 5, NA, 2)
  17. q5 <- c(5, 5, 2, NA, 1)
  18. leadership <- data.frame(manager, date, gender, age,
  19.     q1, q2, q3, q4, q5, stringsAsFactors = FALSE)
  20.    
  21. # the individual vectors are no longer needed
  22. rm(manager, date, gender, age, q1, q2, q3, q4, q5)

  23. # Listing 4.2 - Creating new variables

  24. mydata <- data.frame(x1 = c(2, 2, 6, 4), x2 = c(3,
  25.     4, 2, 8))
  26. mydata$sumx <- mydata$x1 + mydata$x2
  27. mydata$meanx <- (mydata$x1 + mydata$x2)/2

  28. attach(mydata)
  29. mydata$sumx <- x1 + x2
  30. mydata$meanx <- (x1 + x2)/2
  31. detach(mydata)

  32. mydata <- transform(mydata, sumx = x1 + x2, meanx = (x1 +
  33.     x2)/2)

  34. # --Section 4.3--

  35. # Recoding variables

  36. leadership$agecat[leadership$age > 75] <- "Elder"
  37. leadership$agecat[leadership$age > 45 &
  38.     leadership$age <= 75] <- "Middle Aged"
  39. leadership$agecat[leadership$age <= 45] <- "Young"

  40. # or more compactly

  41. leadership <- within(leadership, {
  42.     agecat <- NA
  43.     agecat[age > 75] <- "Elder"
  44.     agecat[age >= 55 & age <= 75] <- "Middle Aged"
  45.     agecat[age < 55] <- "Young"
  46. })

  47. # --Section 4.4--

  48. # Renaming variables with the reshape package
  49. library(reshape)
  50. rename(leadership, c(manager = "managerID", date = "testDate"))

  51. # --Section 4.5--

  52. # Applying the is.na() function
  53. is.na(leadership[, 6:10])

  54. # recode 99 to missing for the variable age
  55. leadership[leadership$age == 99, "age"] <- NA
  56. leadership

  57. # Using na.omit() to delete incomplete observations
  58. newdata <- na.omit(leadership)
  59. newdata

  60. # --Section 4.6--

  61. mydates <- as.Date(c("2007-06-22", "2004-02-13"))

  62. # Converting character values to dates

  63. strDates <- c("01/05/1965", "08/16/1975")
  64. dates <- as.Date(strDates, "%m/%d/%Y")

  65. myformat <- "%m/%d/%y"
  66. leadership$date <- as.Date(leadership$date, myformat)

  67. # Useful date functions

  68. Sys.Date()
  69. date()

  70. today <- Sys.Date()
  71. format(today, format = "%B %d %Y")
  72. format(today, format = "%A")

  73. # Calculations with with dates

  74. startdate <- as.Date("2004-02-13")
  75. enddate <- as.Date("2009-06-22")
  76. days <- enddate - startdate

  77. # Date functions and formatted printing

  78. today <- Sys.Date()
  79. format(today, format = "%B %d %Y")
  80. dob <- as.Date("1956-10-10")
  81. format(dob, format = "%A")

  82. # --Section 4.7--

  83. # Listing 4.5 - Converting from one data type to another

  84. a <- c(1, 2, 3)
  85. a
  86. is.numeric(a)
  87. is.vector(a)
  88. a <- as.character(a)
  89. a
  90. is.numeric(a)
  91. is.vector(a)
  92. is.character(a)

  93. # --Section 4.8--

  94. # Sorting a dataset

  95. attach(leadership)
  96. newdata <- leadership[order(age), ]
  97. newdata
  98. detach(leadership)

  99. attach(leadership)
  100. newdata <- leadership[order(gender, -age), ]
  101. newdata
  102. detach(leadership)

  103. # -- Section 4.10--

  104. # Selecting variables

  105. newdata <- leadership[, c(6:10)]

  106. myvars <- c("q1", "q2", "q3", "q4", "q5")
  107. newdata <- leadership[myvars]

  108. myvars <- paste("q", 1:5, sep = "")
  109. newdata <- leadership[myvars]

  110. # Dropping variables

  111. myvars <- names(leadership) %in% c("q3", "q4")
  112. newdata <- leadership[!myvars]

  113. newdata <- leadership[c(-7, -8)]

  114. # You could use the following to delete q3 and q4
  115. # from the leadership dataset (commented out so
  116. # the rest of the code in this file will work)
  117. #
  118. # leadership$q3 <- leadership$q4 <- NULL

  119. # Selecting observations

  120. # Listing 4.6 - Selecting Observations

  121. newdata <- leadership[1:3, ]

  122. newdata <- leadership[which(leadership$gender == "M" &
  123.     leadership$age > 30), ]

  124. attach(leadership)
  125. newdata <- leadership[which(leadership$gender == "M" &
  126.     leadership$age > 30), ]
  127. detach(leadership)

  128. # Selecting observations based on dates

  129. leadership$date <- as.Date(leadership$date, "%m/%d/%y")
  130. startdate <- as.Date("2009-01-01")
  131. enddate <- as.Date("2009-10-31")
  132. newdata <- leadership[leadership$date >= startdate &
  133.     leadership$date <= enddate, ]

  134. # Using the subset() function

  135. newdata <- subset(leadership, age >= 35 | age < 24,
  136.     select = c(q1, q2, q3, q4))
  137. newdata <- subset(leadership, gender == "M" & age >
  138.     25, select = gender:q4)

  139. # --Section 4.11--

  140. # Listing 4.7 - Using SQL statements to manipulate data frames

  141. library(sqldf)
  142. newdf <- sqldf("select * from mtcars where carb=1 order by mpg",
  143.     row.names = TRUE)
  144. newdf <- sqldf("select avg(mpg) as avg_mpg, avg(disp) as avg_disp,
  145.     gear from mtcars where cyl in (4, 6) group by gear")
  146.    
复制代码


地板
ReneeBK 发表于 2014-11-10 04:49:04
  1. #----------------------------------------------------#
  2. # R in Action: Chapter 5                             #
  3. #----------------------------------------------------#

  4. # Listing 5.1 -  Calculating the mean and
  5. # standard deviation

  6. x <- c(1, 2, 3, 4, 5, 6, 7, 8)
  7. mean(x)
  8. sd(x)
  9. n <- length(x)
  10. meanx <- sum(x)/n
  11. css <- sum((x - meanx)**2)            
  12. sdx <- sqrt(css / (n-1))
  13. meanx
  14. sdx

  15. # Listing 5.2 - Generating pseudo-random numbers from
  16. # a uniform distribution

  17. runif(5)
  18. runif(5)
  19. set.seed(1234)                                                     
  20. runif(5)
  21. set.seed(1234)                                                      
  22. runif(5)

  23. # Listing 5.3 - Generating data from a multivariate
  24. # normal distribution

  25. library(MASS)
  26. options(digits=3)
  27. set.seed(1234)

  28. mean <- c(230.7, 146.7, 3.6)                                          
  29. sigma <- matrix( c(15360.8, 6721.2, -47.1,                              
  30.                     6721.2, 4700.9, -16.5,
  31.                      -47.1,  -16.5,   0.3), nrow=3, ncol=3)

  32. mydata <- mvrnorm(500, mean, sigma)                                    
  33. mydata <- as.data.frame(mydata)                                         
  34. names(mydata) <- c("y", "x1", "x2")                                       

  35. dim(mydata)                                                            
  36. head(mydata, n=10)   

  37. # Listing 5.4 - Applying functions to data objects

  38. a <- 5
  39. sqrt(a)
  40. b <- c(1.243, 5.654, 2.99)
  41. round(b)
  42. c <- matrix(runif(12), nrow=3)
  43. c
  44. log(c)
  45. mean(c)

  46. # Listing 5.5 - Applying a function to the rows
  47. # (columns) of a matrix

  48. mydata <- matrix(rnorm(30), nrow=6)
  49. mydata
  50. apply(mydata, 1, mean)     
  51. apply(mydata, 2, mean)
  52. apply(mydata, 2, mean, trim=.4)   

  53. # Listing 5.6 - A solution to the learning example

  54. options(digits=2)

  55. Student <- c("John Davis", "Angela Williams",
  56.     "Bullwinkle Moose", "David Jones",
  57.     "Janice Markhammer", "Cheryl Cushing",
  58.     "Reuven Ytzrhak", "Greg Knox", "Joel England",
  59.     "Mary Rayburn")
  60. Math <- c(502, 600, 412, 358, 495, 512, 410, 625, 573, 522)
  61. Science <- c(95, 99, 80, 82, 75, 85, 80, 95, 89, 86)
  62. English <- c(25, 22, 18, 15, 20, 28, 15, 30, 27, 18)
  63. roster <- data.frame(Student, Math, Science, English,
  64.     stringsAsFactors=FALSE)
  65.    
  66. z <- scale(roster[,2:4])
  67. score <- apply(z, 1, mean)                                            
  68. roster <- cbind(roster, score)

  69. y <- quantile(score, c(.8,.6,.4,.2))                                   
  70. roster$grade[score >= y[1]] <- "A"                                    
  71. roster$grade[score < y[1] & score >= y[2]] <- "B"
  72. roster$grade[score < y[2] & score >= y[3]] <- "C"
  73. roster$grade[score < y[3] & score >= y[4]] <- "D"
  74. roster$grade[score < y[4]] <- "F"

  75. name <- strsplit((roster$Student), " ")                                
  76. lastname <- sapply(name, "[", 2)
  77. firstname <- sapply(name, "[", 1)

  78. roster <- cbind(firstname,lastname, roster[,-1])
  79. roster <- roster[order(lastname,firstname),]

  80. roster

  81. # Listing 5.7 - A switch example

  82. feelings <- c("sad", "afraid")
  83. for (i in feelings)
  84.     print(
  85.       switch(i,
  86.              happy  = "I am glad you are happy",
  87.              afraid = "There is nothing to fear",
  88.              sad    = "Cheer up",
  89.              angry  = "Calm down now"
  90.     )
  91.   )
  92.   
  93. # Listing 5.8 - mystats(): a user-written function
  94. # for summary statistics

  95. mystats <- function(x, parametric=TRUE, print=FALSE) {
  96.   if (parametric) {
  97.     center <- mean(x); spread <- sd(x)
  98.   } else {
  99.     center <- median(x); spread <- mad(x)
  100.   }
  101.   if (print & parametric) {
  102.     cat("Mean=", center, "\n", "SD=", spread, "\n")
  103.   } else if (print & !parametric) {
  104.     cat("Median=", center, "\n", "MAD=", spread, "\n")
  105.   }
  106.   result <- list(center=center, spread=spread)
  107.   return(result)
  108. }

  109. # trying it out
  110. set.seed(1234)
  111. x <- rnorm(500)
  112. y <- mystats(x)
  113. y <- mystats(x, parametric=FALSE, print=TRUE)

  114. # Another switch example
  115. mydate <- function(type="long") {
  116.     switch(type,
  117.     long =  format(Sys.time(), "%A %B %d %Y"),
  118.     short = format(Sys.time(), "%m-%d-%y"),
  119.     cat(type, "is not a recognized type\n"))
  120. }
  121. mydate("long")
  122. mydate("short")
  123. mydate()
  124. mydate("medium")

  125. # Listing 5.9 - Transposing a dataset

  126. cars <- mtcars[1:5, 1:4]      
  127. cars
  128. t(cars)

  129. # Listing 5.10 - Aggregating data

  130. options(digits=3)
  131. attach(mtcars)
  132. aggdata <-aggregate(mtcars, by=list(cyl,gear),
  133.     FUN=mean, na.rm=TRUE)
  134. aggdata
复制代码


7
ReneeBK 发表于 2014-11-10 04:51:26
  1. #----------------------------------------------------------------#
  2. # R in Action: Chapter 6                                         #
  3. # requires that the vcd, plotrix, sm, vioplot packages have been #
  4. # installed                                                      #
  5. # install.packages(c('vcd', 'plotrix', 'sm', 'vioplot'))         #
  6. #----------------------------------------------------------------#

  7. # pause after each graph
  8. par(ask = TRUE)

  9. # save original graphic settings
  10. opar <- par(no.readonly = TRUE)

  11. # Load vcd package
  12. library(vcd)

  13. # Get cell counts for improved variable
  14. counts <- table(Arthritis$Improved)
  15. counts

  16. # Listing 6.1 - Simple bar plot

  17. # simple bar plot
  18. barplot(counts, main = "Simple Bar Plot", xlab = "Improvement",
  19.     ylab = "Frequency")

  20. # horizontal bar plot
  21. barplot(counts, main = "Horizontal Bar Plot", xlab = "Frequency",
  22.     ylab = "Improvement", horiz = TRUE)


  23. # get counts for Improved by Treatment table
  24. counts <- table(Arthritis$Improved, Arthritis$Treatment)
  25. counts

  26. # Listing 6.2 - Stacked and groupde bar plots

  27. # stacked barplot
  28. barplot(counts, main = "Stacked Bar Plot", xlab = "Treatment",
  29.     ylab = "Frequency", col = c("red", "yellow", "green"),
  30.     legend = rownames(counts))

  31. # grouped barplot
  32. barplot(counts, main = "Grouped Bar Plot", xlab = "Treatment",
  33.     ylab = "Frequency", col = c("red", "yellow", "green"),
  34.     legend = rownames(counts),
  35.     beside = TRUE)


  36. # Listing 6.3 - Mean bar plots

  37. states <- data.frame(state.region, state.x77)
  38. means <- aggregate(states$Illiteracy,
  39.     by = list(state.region),
  40.     FUN = mean)
  41. means

  42. means <- means[order(means$x), ]
  43. means

  44. barplot(means$x, names.arg = means$Group.1)
  45. title("Mean Illiteracy Rate")

  46. # Listing 6.4 - Fitting labels in bar plots

  47. par(mar = c(5, 8, 4, 2))
  48. par(las = 2)
  49. counts <- table(Arthritis$Improved)

  50. barplot(counts, main = "Treatment Outcome", horiz = TRUE,
  51.     cex.names = 0.8, names.arg = c("No Improvement",
  52.     "Some Improvement", "Marked Improvement"))

  53. # Section --6.1.5 Spinograms--

  54. library(vcd)
  55. attach(Arthritis)
  56. counts <- table(Treatment, Improved)
  57. spine(counts, main = "Spinogram Example")
  58. detach(Arthritis)


  59. # Listing 6.5 - Pie charts

  60. par(mfrow = c(2, 2))
  61. slices <- c(10, 12, 4, 16, 8)
  62. lbls <- c("US", "UK", "Australia", "Germany", "France")

  63. pie(slices, labels = lbls, main = "Simple Pie Chart")

  64. pct <- round(slices/sum(slices) * 100)
  65. lbls2 <- paste(lbls, " ", pct, "%", sep = "")
  66. pie(slices, labels = lbls2, col = rainbow(length(lbls)),
  67.     main = "Pie Chart with Percentages")

  68. library(plotrix)
  69. pie3D(slices, labels = lbls, explode = 0.1, main = "3D Pie Chart ")

  70. mytable <- table(state.region)
  71. lbls <- paste(names(mytable), "\n", mytable, sep = "")
  72. pie(mytable, labels = lbls,
  73.     main = "Pie Chart from a Table\n (with sample sizes)")

  74. # restore original graphic parameters
  75. par(opar)

  76. # fan plots
  77. library(plotrix)
  78. slices <- c(10, 12, 4, 16, 8)
  79. lbls <- c("US", "UK", "Australia", "Germany", "France")
  80. fan.plot(slices, labels = lbls, main = "Fan Plot")


  81. # Listing 6.6 - Histograms

  82. par(mfrow = c(2, 2))

  83. hist(mtcars$mpg)

  84. hist(mtcars$mpg, breaks = 12, col = "red",
  85.     xlab = "Miles Per Gallon",
  86.     main = "Colored histogram with 12 bins")

  87. hist(mtcars$mpg, freq = FALSE, breaks = 12, col = "red",
  88.     xlab = "Miles Per Gallon",
  89.     main = "Histogram, rug plot, density curve")
  90. rug(jitter(mtcars$mpg))
  91. lines(density(mtcars$mpg), col = "blue", lwd = 2)

  92. # Histogram with Superimposed Normal Curve
  93. # (Thanks to Peter Dalgaard)
  94. x <- mtcars$mpg
  95. h <- hist(x, breaks = 12, col = "red",
  96.     xlab = "Miles Per Gallon",
  97.     main = "Histogram with normal curve and box")
  98. xfit <- seq(min(x), max(x), length = 40)
  99. yfit <- dnorm(xfit, mean = mean(x), sd = sd(x))
  100. yfit <- yfit * diff(h$mids[1:2]) * length(x)
  101. lines(xfit, yfit, col = "blue", lwd = 2)
  102. box()

  103. # restore original graphic parameters
  104. par(opar)

  105. # Listing 6.7 - Kernel density plot

  106. par(mfrow = c(2, 1))
  107. d <- density(mtcars$mpg)

  108. plot(d)

  109. d <- density(mtcars$mpg)
  110. plot(d, main = "Kernel Density of Miles Per Gallon")
  111. polygon(d, col = "red", border = "blue")
  112. rug(mtcars$mpg, col = "brown")

  113. # restore original graphic parameters
  114. par(opar)

  115. # Listing 6.8 - Comparing kernel density plots

  116. par(lwd = 2)
  117. library(sm)
  118. attach(mtcars)

  119. cyl.f <- factor(cyl, levels = c(4, 6, 8),
  120.     labels = c("4 cylinder", "6 cylinder", "8 cylinder"))

  121. sm.density.compare(mpg, cyl, xlab = "Miles Per Gallon")
  122. title(main = "MPG Distribution by Car Cylinders")

  123. colfill <- c(2:(2 + length(levels(cyl.f))))
  124. cat("Use mouse to place legend...", "\n\n")
  125. legend(locator(1), levels(cyl.f), fill = colfill)
  126. detach(mtcars)
  127. par(lwd = 1)

  128. # --Section 6.5--

  129. boxplot(mpg ~ cyl, data = mtcars,
  130.     main = "Car Milage Data",
  131.     xlab = "Number of Cylinders",
  132.     ylab = "Miles Per Gallon")

  133. boxplot(mpg ~ cyl, data = mtcars, notch = TRUE,
  134.     varwidth = TRUE, col = "red",
  135.     main = "Car Mileage Data",
  136.     xlab = "Number of Cylinders",
  137.     ylab = "Miles Per Gallon")

  138. # Listing 6.9 - Box plots for two crossed factors

  139. mtcars$cyl.f <- factor(mtcars$cyl, levels = c(4, 6,
  140.     8), labels = c("4", "6", "8"))

  141. mtcars$am.f <- factor(mtcars$am, levels = c(0, 1),
  142.     labels = c("auto", "standard"))

  143. boxplot(mpg ~ am.f * cyl.f, data = mtcars,
  144.     varwidth = TRUE, col = c("gold", "darkgreen"),
  145.     main = "MPG Distribution by Auto Type",
  146.     xlab = "Auto Type")

  147. # Listing 6.10 - Violin plots

  148. library(vioplot)
  149. x1 <- mtcars$mpg[mtcars$cyl == 4]
  150. x2 <- mtcars$mpg[mtcars$cyl == 6]
  151. x3 <- mtcars$mpg[mtcars$cyl == 8]
  152. vioplot(x1, x2, x3,
  153.     names = c("4 cyl", "6 cyl", "8 cyl"),
  154.     col = "gold")
  155. title("Violin Plots of Miles Per Gallon")

  156. # --Section 6.6--

  157. dotchart(mtcars$mpg, labels = row.names(mtcars),
  158.     cex = 0.7,
  159.     main = "Gas Milage for Car Models",
  160.     xlab = "Miles Per Gallon")

  161. # Listing 6.11 - sorted colored grouped dot chart

  162. x <- mtcars[order(mtcars$mpg), ]
  163. x$cyl <- factor(x$cyl)
  164. x$color[x$cyl == 4] <- "red"
  165. x$color[x$cyl == 6] <- "blue"
  166. x$color[x$cyl == 8] <- "darkgreen"
  167. dotchart(x$mpg, labels = row.names(x), cex = 0.7,
  168.     pch = 19, groups = x$cyl,
  169.     gcolor = "black", color = x$color,
  170.     main = "Gas Milage for Car Models\ngrouped by cylinder",
  171.     xlab = "Miles Per Gallon")
  172.    
复制代码


8
ReneeBK 发表于 2014-11-10 05:19:04
  1. #--------------------------------------------------------------------#
  2. # R in Action: Chapter 7                                             #
  3. # requires that the npmc, ggm, gmodels, vcd, Hmisc,                  #
  4. # pastecs, psych, doBy, and reshape packages have been installed     #
  5. # install.packages(c('npmc', 'ggm', 'gmodels', 'vcd', 'Hmisc',       #
  6. #     'pastecs', 'psych', 'doBy', 'reshape'))                        #
  7. #---------------------------------------------------------------------

  8. vars <- c("mpg", "hp", "wt")
  9. head(mtcars[vars])

  10. # Listing 7.1 - descriptive stats via summary

  11. summary(mtcars[vars])

  12. # Listing 7.2 - descriptive stats via sapply()

  13. mystats <- function(x, na.omit = FALSE) {
  14.     if (na.omit)
  15.         x <- x[!is.na(x)]
  16.     m <- mean(x)
  17.     n <- length(x)
  18.     s <- sd(x)
  19.     skew <- sum((x - m)^3/s^3)/n
  20.     kurt <- sum((x - m)^4/s^4)/n - 3
  21.     return(c(n = n, mean = m, stdev = s, skew = skew, kurtosis = kurt))
  22. }

  23. sapply(mtcars[vars], mystats)

  24. # Listing 7.3 - Descriptive statistics (Hmisc package)

  25. library(Hmisc)
  26. describe(mtcars[vars])

  27. # Listing 7.4 - Descriptive statistics (pastecs package)

  28. library(pastecs)
  29. stat.desc(mtcars[vars])

  30. # Listing 7.5 - Descriptive statistics (psych package)

  31. library(psych)
  32. describe(mtcars[vars])

  33. # Listing 7.6 - Descriptive statistics by group with aggregate()

  34. aggregate(mtcars[vars], by = list(am = mtcars$am), mean)
  35. aggregate(mtcars[vars], by = list(am = mtcars$am), sd)

  36. # Listing 7.7 - Descriptive statistics by group via by()

  37. dstats <- function(x)(c(mean=mean(x), sd=sd(x)))
  38. by(mtcars[vars], mtcars$am, dstats)

  39. # Listing 7.8 Summary statists by group (doBy package)

  40. library(doBy)
  41. summaryBy(mpg + hp + wt ~ am, data = mtcars, FUN = mystats)

  42. # Listing 7.9 - Summary statistics by group (psych package)

  43. library(psych)
  44. describe.by(mtcars[vars], mtcars$am)

  45. # Listing 1.10 Summary statistics by group (reshape package)

  46. library(reshape)
  47. dstats <- function(x) (c(n = length(x), mean = mean(x),
  48.     sd = sd(x)))
  49. dfm <- melt(mtcars, measure.vars = c("mpg", "hp",
  50.     "wt"), id.vars = c("am", "cyl"))
  51. cast(dfm, am + cyl + variable ~ ., dstats)

  52. # Section --7.2--

  53. # get Arthritis data
  54. library(vcd)

  55. # one way table

  56. mytable <- with(Arthritis, table(Improved))
  57. mytable
  58. prop.table(mytable)
  59. prop.table(mytable)*100


  60. # two way table

  61. mytable <- xtabs(~ Treatment+Improved, data=Arthritis)
  62. mytable
  63. margin.table(mytable, 1)
  64. prop.table(mytable, 1)
  65. margin.table(mytable, 2)
  66. prop.table(mytable, 2)
  67. prop.table(mytable)
  68. addmargins(mytable)
  69. admargins(prop.table(mytable))
  70. addmargins(prop.table(mytable, 1), 2)
  71. addmargins(prop.table(mytable, 2, 1)

  72. # Listing 7.11 - Two-way table using CrossTable

  73. library(gmodels)
  74. CrossTable(Arthritis$Treatment, Arthritis$Improved)

  75. # Listing 7.12 - Three-way contingency table

  76. mytable <- xtabs(~ Treatment+Sex+Improved, data=Arthritis)
  77. mytable
  78. ftable(mytable)
  79. margin.table(mytable, 1)
  80. margin.table(mytable, 2)
  81. margin.table(mytable, 3)
  82. margin.table(mytable, c(1,3))
  83. ftable(prop.table(mytable, c(1, 2)))
  84. ftable(addmargins(prop.table(mytable, c(1, 2)), 3))

  85. gtable(addmargins(prop.table(mytable, c(1, 2)), 3)) * 100

  86. # Listing 7.13 - Chis-square test of independence

  87. library(vcd)
  88. mytable <- xtabs(~Treatment+Improved, data=Arthritis)
  89. chisq.test(mytable)
  90. mytable <- xtabs(~Improved+Sex, data=Arthritis)
  91. chisq.test(mytable)

  92. # Fisher's exact test

  93. mytable <- xtabs(~Treatment+Improved, data=Arthritis)
  94. fisher.test(mytable)

  95. # Chochran-Mantel-Haenszel test

  96. mytable <- xtabs(~Treatment+Improved+Sex, data=Arthritis)
  97. mantelhaen.test(mytable)

  98. # Listing 7.14 - Measures of association for a two-way table

  99. library(vcd)
  100. mytable <- xtabs(~Treatment+Improved, data=Arthritis)
  101. assocstats(mytable)


  102. # Listing 7.15 - converting a table into a flat file via table2flat

  103. table2flat <- function(mytable) {
  104.     df <- as.data.frame(mytable)
  105.     rows <- dim(df)[1]
  106.     cols <- dim(df)[2]
  107.     x <- NULL
  108.     for (i in 1:rows) {
  109.         for (j in 1:df$Freq[i]) {
  110.             row <- df[i, c(1:(cols - 1))]
  111.             x <- rbind(x, row)
  112.         }
  113.     }
  114.     row.names(x) <- c(1:dim(x)[1])
  115.     return(x)
  116. }

  117. # Listing 7.16 - Using table2flat with published data

  118. treatment <- rep(c("Placebo", "Treated"), 3)
  119. improved <- rep(c("None", "Some", "Marked"), each = 2)
  120. Freq <- c(29, 13, 7, 7, 7, 21)
  121. mytable <- as.data.frame(cbind(treatment, improved, Freq))
  122. mydata <- table2flat(mytable)
  123. head(mydata)

  124. # Listing 7.17 - Covariances and correlations

  125. states <- state.x77[, 1:6]
  126. cov(states)
  127. cor(states)
  128. cor(states, method="spearman")

  129. x <- states[, c("Population", "Income", "Illiteracy", "HS Grad")]
  130. y <- states[, c("Life Exp", "Murder")]
  131. cor(x, y)

  132. # partial correlation of population and murder rate, controlling
  133. # for income, illiteracy rate, and HS graduation rate

  134. library(ggm)
  135. pcor(c(1, 5, 2, 3, 6), cov(states))

  136. # Listing 7.18 - Testing correlations for significance

  137. cor.test(states[, 3], states[, 5])

  138. # Listing 7.19 - Correlation matrix and tests of significance via corr.test

  139. library(psych)
  140. corr.test(states, use = "complete")

  141. # --Section 7.4--

  142. # independent t-test

  143. library(MASS)
  144. t.test(Prob ~ So, data=UScrime)

  145. # dependent t-test

  146. library(MASS)
  147. sapply(UScrime[c("U1", "U2")], function(x) (c(mean = mean(x),
  148.     sd = sd(x))))
  149. with(UScrime, t.test(U1, U2, paired = TRUE))

  150. # --Section 7.5--

  151. # Wilcoxon two group comparison

  152. with(UScrime, by(Prob, So, median))
  153. wilcox.test(Prob ~ So, data=UScrime)
  154. sapply(UScrime[c("U1", "U2")], median)
  155. with(UScrime, wilcox.test(U1, U2, paired = TRUE))

  156. # Kruskal Wallis test

  157. states <- as.data.frame(cbind(state.region, state.x77))
  158. kruskal.test(Illiteracy ~ state.region, data=states)

  159. # Listing 7.20 - Nonparametric multiple comparisons

  160. class <- state.region
  161. var <- state.x77[, c("Illiteracy")]
  162. mydata <- as.data.frame(cbind(class, var))
  163. rm(class,var)
  164. library(npmc)
  165. summary(npmc(mydata), type = "BF")
  166. aggregate(mydata, by = list(mydata$class), median)
复制代码


9
ReneeBK 发表于 2014-11-10 05:21:49
  1. #------------------------------------------------------------------------#
  2. # R in Action: Chapter 8                                                 #
  3. # requires that the car, gvlma, MASS, leaps packages have been installed #
  4. # install.packages(c('car', 'gvlma', 'MASS', 'leaps'))                   #
  5. #------------------------------------------------------------------------#

  6. # pause on each graph
  7. par(ask = TRUE)

  8. # save current graphical parameters
  9. opar <- par(no.readonly = TRUE)

  10. # Listing 8.1 - simple linear regression

  11. fit <- lm(weight ~ height, data = women)
  12. summary(fit)
  13. women$weight
  14. fitted(fit)
  15. residuals(fit)

  16. # scatter plot of height by weight

  17. plot(women$height, women$weight, main = "Women Age 30-39",
  18.     xlab = "Height (in inches)", ylab = "Weight (in pounds)")
  19. # add the line of best fit
  20. abline(fit)

  21. # Listing 8.2 - Polynomial regression

  22. fit2 <- lm(weight ~ height + I(height^2), data = women)
  23. summary(fit2)

  24. plot(women$height, women$weight, main = "Women Age 30-39",
  25.     xlab = "Height (in inches)", ylab = "Weight (in lbs)")
  26. lines(women$height, fitted(fit2))

  27. # scatterplot for women data

  28. library(car)
  29. scatterplot(weight ~ height, data = women, spread = FALSE,
  30.     lty.smooth = 2, pch = 19, main = "Women Age 30-39", xlab = "Height (inches)",
  31.     ylab = "Weight (lbs.)")

  32. # Listing 8.3 - Examining bivariate relationship

  33. states <- as.data.frame(state.x77[, c("Murder", "Population",
  34.     "Illiteracy", "Income", "Frost")])
  35.    
  36. cor(states)

  37. library(car)
  38. scatterplotMatrix(states, spread = FALSE, lty.smooth = 2,
  39.     main = "Scatterplot Matrix")
  40.    
  41. # Listing 8.4 - Multiple linear regression

  42. fit <- lm(Murder ~ Population + Illiteracy + Income +
  43.     Frost, data = states)
  44.    
  45. # Listing 8.5 Multiple linear regression with a significant
  46. # interaction term

  47. fit <- lm(mpg ~ hp + wt + hp:wt, data = mtcars)
  48. summary(fit)

  49. library(effects)
  50. plot(effect("hp:wt", fit, list(wt = c(2.2, 3.2, 4.2))),
  51.     multiline = TRUE)
  52.    
  53. # --Section 8.3--

  54. fit <- lm(Murder ~ Population + Illiteracy + Income +
  55.     Frost, data=states)
  56. confint(fit)

  57. # simple regression diagnostics

  58. fit <- lm(weight ~ height, data = women)
  59. par(mfrow = c(2, 2))
  60. plot(fit)
  61. par(opar)

  62. # regression diagnostics for quadratic fit

  63. newfit <- lm(weight ~ height + I(height^2), data = women)
  64. par(mfrow = c(2, 2))
  65. plot(newfit)
  66. par(opar)

  67. # regression diagnostics for quadratic fit
  68. # with deleted observations

  69. newfit <- lm(weight ~ height + I(height^2), data = women[-c(13, 15),])
  70. par(mfrow = c(2, 2))
  71. plot(newfit)
  72. par(opar)

  73. # basic regression diagnostics for states data

  74. fit <- lm(Murder ~ Population + Illiteracy + Income +
  75.     Frost, data = states)
  76. par(mfrow = c(2, 2))
  77. plot(fit)
  78. par(opar)

  79. # Assessing normality
  80. library(car)
  81. fit <- lm(Murder ~ Population + Illiteracy + Income +
  82.     Frost, data = states)
  83. qqPlot(fit, labels = FALSE, simulate = TRUE, main = "Q-Q Plot")


  84. # Listing 8.6 Function for plotting studentized residuals

  85. residplot <- function(fit, nbreaks=10) {
  86.     z <- rstudent(fit)
  87.     hist(z, breaks=nbreaks, freq=FALSE,
  88.     xlab="Studentized Residual",
  89.     main="Distribution of Errors")
  90.     rug(jitter(z), col="brown")
  91.     curve(dnorm(x, mean=mean(z), sd=sd(z)),
  92.         add=TRUE, col="blue", lwd=2)
  93.     lines(density(z)$x, density(z)$y,
  94.         col="red", lwd=2, lty=2)
  95.     legend("topright",
  96.         legend = c( "Normal Curve", "Kernel Density Curve"),
  97.         lty=1:2, col=c("blue","red"), cex=.7)
  98. }

  99. residplot(fit)

  100. #  Durbin Watson test for Autocorrelated Errors

  101. durbinWatsonTest(fit)

  102. # Component + Residual Plots

  103. crPlots(fit, one.page = TRUE, ask = FALSE)

  104. # Listing 8.7 - Assessing homoscedasticity

  105. library(car)
  106. ncvTest(fit)
  107. spreadLevelPlot(fit)

  108. # Listing 8.8 - Global test of linear model assumptions

  109. library(gvlma)
  110. gvmodel <- gvlma(fit)
  111. summary(gvmodel)

  112. # Library 8.9 - Evaluating multi-collinearity

  113. vif(fit)
  114. sqrt(vif(fit)) > 2

  115. # --Section 8.4--

  116. # Assessing outliers

  117. library(car)
  118. outlierTest(fit)

  119. # Index plot of hat values
  120. # use the mouse to identify points interactively

  121. hat.plot <- function(fit){
  122.     p <- length(coefficients(fit))
  123.     n <- length(fitted(fit))
  124.     plot(hatvalues(fit), main = "Index Plot of Hat Values")
  125.     abline(h = c(2, 3) * p/n, col = "red", lty = 2)
  126.     identify(1:n, hatvalues(fit), names(hatvalues(fit)))
  127. }

  128. hat.plot(fit)

  129. # Cook's D Plot
  130. # identify D values > 4/(n-k-1)

  131. cutoff <- 4/(nrow(states) - length(fit$coefficients) - 2)
  132. plot(fit, which = 4, cook.levels = cutoff)
  133. abline(h = cutoff, lty = 2, col = "red")

  134. # Added variable plots
  135. # use the mouse to identify points interactively

  136. avPlots(fit, ask = FALSE, onepage = TRUE, id.method = "identify")

  137. # Influence Plot
  138. # use the mouse to identify points interactively

  139. influencePlot(fit, id.method = "identify", main = "Influence Plot",
  140.     sub = "Circle size is proportial to Cook's Distance")

  141. # Listing 8.10 - Box-Cox Transformation to normality

  142. library(car)
  143. summary(powerTransform(states$Murder))

  144. # Box-Tidwell Transformations to Linearity

  145. library(car)
  146. boxTidwell(Murder ~ Population + Illiteracy, data = states)

  147. # Listing 8.11 - Comparing nested models using the anova function

  148. fit1 <- lm(Murder ~ Population + Illiteracy + Income +
  149.     Frost, data = states)
  150. fit2 <- lm(Murder ~ Population + Illiteracy, data = states)
  151. anova(fit2, fit1)

  152. # Listing 8.12 - Comparing models with the Akaike Information Criterion

  153. fit1 <- lm(Murder ~ Population + Illiteracy + Income +
  154.     Frost, data = states)
  155. fit2 <- lm(Murder ~ Population + Illiteracy, data = states)
  156. AIC(fit1, fit2)

  157. # Listing 8.13 - Backward stepwise selection

  158. library(MASS)
  159. fit1 <- lm(Murder ~ Population + Illiteracy + Income +
  160.     Frost, data = states)
  161. stepAIC(fit, direction = "backward")

  162. # Listing 8.14 - All subsets regression
  163. # use the mouse to place the legend interactively
  164. # in the second plot

  165. library(leaps)
  166. leaps <- regsubsets(Murder ~ Population + Illiteracy +
  167.     Income + Frost, data = states, nbest = 4)
  168. plot(leaps, scale = "adjr2")

  169. library(car)
  170. subsets(leaps, statistic = "cp",
  171.     main = "Cp Plot for All Subsets Regression")
  172. abline(1, 1, lty = 2, col = "red")

  173. # Listing 8.15 - Function for k-fold cross-validated R-square
  174. shrinkage <- function(fit, k = 10) {
  175.     require(bootstrap)
  176.     # define functions
  177.     theta.fit <- function(x, y) {
  178.         lsfit(x, y)
  179.     }
  180.     theta.predict <- function(fit, x) {
  181.         cbind(1, x) %*% fit$coef
  182.     }
  183.    
  184.     # matrix of predictors
  185.     x <- fit$model[, 2:ncol(fit$model)]
  186.     # vector of predicted values
  187.     y <- fit$model[, 1]
  188.    
  189.     results <- crossval(x, y, theta.fit, theta.predict, ngroup = k)
  190.     r2 <- cor(y, fit$fitted.values)^2
  191.     r2cv <- cor(y, results$cv.fit)^2
  192.     cat("Original R-square =", r2, "\n")
  193.     cat(k, "Fold Cross-Validated R-square =", r2cv, "\n")
  194.     cat("Change =", r2 - r2cv, "\n")
  195. }

  196. # using shrinkage()

  197. fit <- lm(Murder ~ Population + Income + Illiteracy +
  198.     Frost, data = states)
  199. shrinkage(fit)

  200. fit2 <- lm(Murder ~ Population + Illiteracy, data = states)
  201. shrinkage(fit2)

  202. #  Calculating standardized regression coefficients
  203. zstates <- as.data.frame(scale(states))
  204. zfit <- lm(Murder ~ Population + Income + Illiteracy +
  205.     Frost, data = zstates)
  206. coef(zfit)

  207. # Listing 8.16 - relweights() function for calculating relative
  208. # importance of predictors

  209. ########################################################################
  210. # The relweights function determines the relative importance of each   #
  211. # independent variable to the dependent variable in an OLS regression. #
  212. # The code is adapted from an SPSS program generously provided by      #
  213. # Dr. Johnson.                                                         #
  214. #                                                                      #
  215. # See Johnson (2000, Multivariate Behavioral Research, 35, 1-19) for   #
  216. # an explanation of how the relative weights are derived.              #
  217. ########################################################################
  218. relweights <- function(fit, ...) {
  219.     R <- cor(fit$model)
  220.     nvar <- ncol(R)
  221.     rxx <- R[2:nvar, 2:nvar]
  222.     rxy <- R[2:nvar, 1]
  223.     svd <- eigen(rxx)
  224.     evec <- svd$vectors
  225.     ev <- svd$values
  226.     delta <- diag(sqrt(ev))
  227.    
  228.     # correlations between original predictors and new orthogonal variables
  229.     lambda <- evec %*% delta %*% t(evec)
  230.     lambdasq <- lambda^2
  231.    
  232.     # regression coefficients of Y on orthogonal variables
  233.     beta <- solve(lambda) %*% rxy
  234.     rsquare <- colSums(beta^2)
  235.     rawwgt <- lambdasq %*% beta^2
  236.     import <- (rawwgt/rsquare) * 100
  237.     lbls <- names(fit$model[2:nvar])
  238.     rownames(import) <- lbls
  239.     colnames(import) <- "Weights"
  240.    
  241.     # plot results
  242.     barplot(t(import), names.arg = lbls, ylab = "% of R-Square",
  243.         xlab = "Predictor Variables", main = "Relative Importance of Predictor Variables",
  244.         sub = paste("R-Square = ", round(rsquare, digits = 3)),
  245.         ...)
  246.     return(import)
  247. }

  248. # using relweights()

  249. fit <- lm(Murder ~ Population + Illiteracy + Income +
  250.     Frost, data = states)
  251. relweights(fit, col = "lightgrey")
复制代码


10
ReneeBK 发表于 2014-11-10 05:24:39
  1. #------------------------------------------------------------------ #
  2. # R in Action: Chapter 9                                            #
  3. # requires that the multcomp, gplots, car, HH, effects,             #
  4. #     rrcov, mvoutlier, MASS packages have been installed           #
  5. # install.packages(c('multcomp', 'gplots', 'car', 'HH', 'effects',  #
  6. #     'rrcov', 'mvoutlier', 'MASS'))                                #
  7. #-------------------------------------------------------------------#

  8. # pause for each graph
  9. par(ask = TRUE)

  10. # save original graphical parameters
  11. opar <- par(no.readonly = TRUE)

  12. # Listing 9.1 - One-way ANOVA

  13. library(multcomp)
  14. attach(cholesterol)
  15. table(trt)
  16. aggregate(response, by = list(trt), FUN = mean)
  17. aggregate(response, by = list(trt), FUN = sd)
  18. fit <- aov(response ~ trt)
  19. summary(fit)
  20. library(gplots)
  21. plotmeans(response ~ trt, xlab = "Treatment", ylab = "Response",
  22.     main = "Mean Plot\nwith 95% CI")
  23. detach(cholesterol)

  24. # Listing 9.2 - Tukey HSD pairwise group comparisons

  25. TukeyHSD(fit)
  26. par(las = 2)
  27. par(mar = c(5, 8, 4, 2))
  28. plot(TukeyHSD(fit))
  29. par(opar)

  30. # Multiple comparisons the multcomp package

  31. library(multcomp)
  32. par(mar = c(5, 4, 6, 2))
  33. tuk <- glht(fit, linfct = mcp(trt = "Tukey"))
  34. plot(cld(tuk, level = 0.05), col = "lightgrey")
  35. par(opar)

  36. # Assessing normality

  37. library(car)
  38. qqPlot(lm(response ~ trt, data = cholesterol), simulate = TRUE,
  39.     main = "QQ Plot", labels = FALSE)

  40. # Assessing homogeneity of variances

  41. bartlett.test(response ~ trt, data = cholesterol)

  42. # Assessing outliers

  43. library(car)
  44. outlierTest(fit)

  45. # Listing 9.3 - One-way ANCOVA

  46. data(litter, package = "multcomp")
  47. attach(litter)
  48. table(dose)
  49. aggregate(weight, by = list(dose), FUN = mean)
  50. fit <- aov(weight ~ gesttime + dose)
  51. summary(fit)

  52. # Obtaining adjusted means

  53. library(effects)
  54. effect("dose", fit)

  55. # Listing 9.4 - Multiple comparisons using user supplied contrasts

  56. library(multcomp)
  57. contrast <- rbind(`no drug vs. drug` = c(3, -1, -1, -1))
  58. summary(glht(fit, linfct = mcp(dose = contrast)))

  59. # Listing 9.5 - Testing for Homegeneity of Regression Slopes

  60. library(multcomp)
  61. fit2 <- aov(weight ~ gesttime * dose)
  62. summary(fit2)

  63. # Visualizing a one-way ANCOVA

  64. library(HH)
  65. ancova(weight ~ gesttime + dose, data = litter)

  66. # Listing 9.6 - Two way ANOVA

  67. attach(ToothGrowth)
  68. table(supp, dose)
  69. aggregate(len, by = list(supp, dose), FUN = mean)
  70. aggregate(len, by = list(supp, dose), FUN = sd)
  71. fit <- aov(len ~ supp * dose)
  72. summary(fit)

  73. # plotting interactions

  74. interaction.plot(dose, supp, len, type = "b", col = c("red",
  75.     "blue"), pch = c(16, 18),
  76.     main = "Interaction between Dose and Supplement Type")
  77.    
  78. library(gplots)
  79. plotmeans(len ~ interaction(supp, dose, sep = " "),
  80.     connect = list(c(1, 3, 5), c(2, 4, 6)),
  81.     col = c("red", "darkgreen"),
  82.     main = "Interaction Plot with 95% CIs",
  83.     xlab = "Treatment and Dose Combination")
  84.    
  85. library(HH)
  86. interaction2wt(len ~ supp * dose)

  87. # Listing 9.7 - Repeated measures ANOVA with one between
  88. # and within groups factor

  89. w1b1 <- subset(CO2, Treatment == "chilled")
  90. fit <- aov(uptake ~ (conc * Type) + Error(Plant/(conc)),
  91.     w1b1)
  92. summary(fit)

  93. par(las = 2)
  94. par(mar = c(10, 4, 4, 2))
  95. with(w1b1, interaction.plot(conc, Type, uptake, type = "b",
  96.     col = c("red", "blue"), pch = c(16, 18), main = "Interaction Plot for Plant Type and Concentration"))
  97. boxplot(uptake ~ Type * conc, data = w1b1, col = (c("gold",
  98.     "green")), main = "Chilled Quebec and Mississippi Plants",
  99.     ylab = "Carbon dioxide uptake rate (umol/m^2 sec)")
  100. par(opar)

  101. # Listing 9.8 - One-way MANOVA

  102. library(MASS)
  103. attach(UScereal)
  104. y <- cbind(calories, fat, sugars)
  105. aggregate(y, by = list(shelf), FUN = mean)
  106. cov(y)
  107. fit <- manova(y ~ shelf)
  108. summary(fit)
  109. summary.aov(fit)

  110. # Listing 9.9 - Assessing multivariate normality
  111. # identify points interactively with the mouse

  112. center <- colMeans(y)
  113. n <- nrow(y)
  114. p <- ncol(y)
  115. cov <- cov(y)
  116. d <- mahalanobis(y, center, cov)
  117. coord <- qqplot(qchisq(ppoints(n), df = p), d, main = "QQ Plot Assessing Multivariate Normality",
  118.     ylab = "Mahalanobis D2")
  119. abline(a = 0, b = 1)
  120. identify(coord$x, coord$y, labels = row.names(UScereal))

  121. # multivariate outliers
  122. library(mvoutlier)
  123. outliers <- aq.plot(y)
  124. outliers

  125. # Listing 9.10 - Robust one-way MANOVA
  126. # this may take a while...

  127. library(rrcov)
  128. Wilks.test(y, shelf, method = "mcd")

  129. # Listing 9.11 - A regression approach to the ANOVA
  130. # problem in section 9.3

  131. fit.lm <- lm(response ~ trt, data=cholesterol)
  132. summary(fit.lm)

  133. contrasts(cholesterol$trt)
复制代码


您需要登录后才可以回帖 登录 | 我要注册

本版微信群
jg-xs1
拉您进交流群
GMT+8, 2025-12-25 23:09