楼主: ReneeBK
1931 12

[休闲其它] 【独家发布】Data Manipulation with R [推广有奖]

  • 1关注
  • 62粉丝

VIP

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49407 个
通用积分
51.8704
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57815 点
帖子
4006
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
https://www.packtpub.com/big-data-and-business-intelligence/data-manipulation-r

Data Manipulation with R

January 2014








二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:manipulation ulation ATION Data With 2014

本帖被以下文库推荐

沙发
ReneeBK 发表于 2014-11-11 06:39:41 |只看作者 |坛友微信交流群

Chapter 1:

  1. ##################
  2. # Code Snipped-1
  3. ##################

  4. # Constant
  5. 2
  6. "July"
  7. NULL
  8. NA
  9. NaN
  10. Inf

  11. # Object can be created from existing object
  12. # to make the result reproducible mean every time we run the following code we will get same results # we need to set a seed value
  13. set.seed(123)
  14. rnorm(9)+runif(9)

  15. ##################
  16. # Code Snipped-2
  17. ##################

  18. # Storing R object into a variable and then see the mode

  19. num.obj <- seq(from=1,to=10,by=2)
  20. mode(num.obj)
  21. logical.obj<-c(TRUE,TRUE,FALSE,TRUE,FALSE)
  22. mode(logical.obj)
  23. character.obj <- c("a","b","c")
  24. mode(character.obj)


  25. ##################
  26. # Code Snipped-3
  27. ##################

  28. # R object containing both numeric and logical element
  29. xz <- c(1, 3, TRUE, 5, FALSE, 9)
  30. xz
  31. mode(xz)
  32. # R object containing character, numeric and logical elements
  33. xw <- c(1,2,TRUE,FALSE,"a")
  34. xw
  35. mode(xw)

  36. ##################
  37. # Code Snipped-4
  38. ##################

  39. num.obj <- seq(from=1,to=10,by=2)
  40. logical.obj<-c(TRUE,TRUE,FALSE,TRUE,FALSE)
  41. character.obj <- c("a","b","c")
  42. is.numeric(num.obj)
  43. is.logical(num.obj)
  44. is.character(num.obj)

  45. ##################
  46. # Code Snipped-5
  47. ##################

  48. mode(mean)
  49. # Also we can test whether "mean" is function or not as follows
  50. is.function(mean)

  51. ##################
  52. # Code Snipped-6
  53. ##################

  54. num.obj <- seq(from=1,to=10,by=2)
  55. logical.obj<-c(TRUE,TRUE,FALSE,TRUE,FALSE)
  56. character.obj <- c("a","b","c")
  57. class(num.obj)
  58. class(logical.obj)
  59. class(character.obj)

  60. ##################
  61. # Code Snipped-7
  62. ##################

  63. # Output omitted due to space limitation
  64. num.obj <- seq(from=1,to=10,by=2)
  65. set.seed(1234) # To make the matrix reproducible
  66. mat.obj <- matrix(runif(9),ncol=3,nrow=3)
  67. mode(num.obj)
  68. mode(mat.obj)
  69. class(num.obj)
  70. class(mat.obj)
  71. # prints a numeric object
  72. print(num.obj)
  73. #prints a matrix object
  74. print(mat.obj)

  75. ##################
  76. # Code Snipped-8
  77. ##################

  78. character.obj <- c("a","b","c")
  79. character.obj
  80. is.factor(character.obj)

  81. # Converting character object into factor object using as.factor()
  82. factor.obj <- as.factor(character.obj)
  83. factor.obj
  84. is.factor(factor.obj)
  85. mode(factor.obj)
  86. class(factor.obj)

  87. ##################
  88. # Code Snipped-9
  89. ##################

  90. # creating vector of numeric element with "c" function
  91. num.vec <- c(1,3,5,7)
  92. num.vec
  93. mode(num.vec)
  94. class(num.vec)
  95. is.vector(num.vec)

  96. ##################
  97. # Code Snipped-10
  98. ##################

  99. # Vector with mixed elements
  100. num.char.vec <- c(1,3,"five",7)
  101. num.char.vec
  102. mode(num.char.vec)
  103. class(num.char.vec)
  104. is.vector(num.char.vec)

  105. ##################
  106. # Code Snipped-11
  107. ##################

  108. # combining multiple vectors
  109. comb.vec <- c(num.vec,num.char.vec)
  110. mode(comb.vec)

  111. # creating named vector
  112. named.num.vec <- c(x1=1,x2=3,x3=5)
  113. named.num.vec

  114. ##################
  115. # Code Snipped-12
  116. ##################

  117. # vector of single element
  118. unit.vec <- 9
  119. is.vector(unit.vec)


  120. ##################
  121. # Code Snipped-13
  122. ##################

  123. # creating a vector of numbers
  124. # and then convert it to logical and character
  125. numbers.vec <- c(-3,-2,-1,0,1,2,3)
  126. numbers.vec
  127. num2char <- as.character(numbers.vec)
  128. num2char
  129. num2logical <- as.logical(numbers.vec)
  130. num2logical

  131. # creating character vector
  132. #and then convert it to numeric and logical
  133. char.vec <- c("1","3","five","7")
  134. char.vec
  135. char2num <- as.numeric(char.vec)
  136. char2num
  137. char2logical <- as.logical(char.vec)
  138. char2logical

  139. # logical to character conversion
  140. logical.vec <- c(TRUE, FALSE, FALSE,  TRUE,  TRUE)
  141. logical.vec
  142. logical2char <- as.character(logical.vec)
  143. logical2char

  144. ##################
  145. # Code Snipped-14
  146. ##################

  147. # creating a vector and accessing elements
  148. vector1 <- c(1,3,5,7,9)
  149. vector1
  150. # accessing second elements of "vector1"
  151. vector1[2]

  152. # accessing three elements starting from second element
  153. vector1[2:4]

  154. # another way of creating vector. Here "from" is the starting point of the vector and "to" is the
  155. # end point of the vector and "by" is increment
  156. vector2 <- seq(from=2, to=10, by=2)
  157. is.vector(vector2)

  158. ##################
  159. # Code Snipped-15
  160. ##################

  161. #creating factor variable with only one argument with factor()
  162. factor1 <- factor(c(1,2,3,4,5,6,7,8,9))
  163. factor1
  164. levels(factor1)
  165. labels(factor)
  166. labels(factor1)

  167. #creating factor with user given levels to display
  168. factor2 <- factor(c(1,2,3,4,5,6,7,8,9),labels=letters[1:9])
  169. factor2
  170. levels(factor2)
  171. labels(factor2)

  172. ##################
  173. # Code Snipped-16
  174. ##################

  175. # creating numeric factor and trying to find out mean
  176. num.factor <- factor(c(5,7,9,5,6,7,3,5,3,9,7))
  177. num.factor
  178. mean(num.factor)

  179. ##################
  180. # Code Snipped-17
  181. ##################

  182. num.factor <- factor(c(5,7,9,5,6,7,3,5,3,9,7))
  183. num.factor

  184. #as.numeric() function only returns internal values of the factor
  185. as.numeric(num.factor)
  186. # now see the levels of the factor
  187. levels(num.factor)
  188. as.character(num.factor)

  189. # now to convert the "num.factor" to numeric there are two method
  190. # method-1:
  191. mean(as.numeric(as.character(num.factor)))

  192. # method-2:
  193. mean(as.numeric(levels(num.factor)[num.factor]))

  194. ##################
  195. # Code Snipped-18
  196. ##################

  197. #creating vector of different variables and then create data frame
  198. var1 <- c(101,102,103,104,105)
  199. var2 <- c(25,22,29,34,33)
  200. var3 <- c("Non-Diabetic", "Diabetic", "Non-Diabetic", "Non-Diabetic", "Diabetic")
  201. var4 <- factor(c("male","male","female","female","male"))

  202. # now we will create data frame using two numeric vector one
  203. # character vector and one factor
  204. diab.dat <- data.frame(var1,var2,var3,var4)
  205. diab.dat

  206. ##################
  207. # Code Snipped-19
  208. ##################

  209. #class of each column before creating data frame
  210. class(var1)
  211. class(var2)
  212. class(var3)
  213. class(var4)

  214. ##################
  215. # Code Snipped-20
  216. ##################

  217. # class of each column after creating data frame
  218. class(diab.dat$var1)
  219. class(diab.dat$var2)
  220. class(diab.dat$var3)
  221. class(diab.dat$var4)

  222. # now create the data frame specifying as.is=TRUE
  223. diab.dat.2 <- data.frame(var1,var2,var3,var4,stringsAsFactors=FALSE)
  224. diab.dat.2
  225. class(diab.dat.2$var3)

  226. ##################
  227. # Code Snipped-21
  228. ##################

  229. # data frame to matrix conversion
  230. mat.diab <- as.matrix(diab.dat)
  231. mat.diab
  232. class(mat.diab)
  233. mode(mat.diab)

  234. # matrix multiplication is not possible
  235. # with this newly created matrix

  236. t(mat.diab) %*% mat.diab

  237. # creating a matrix with numeric elements only
  238. # To produce the same matrix over time we set a seed value
  239. set.seed(12345)
  240. num.mat <- matrix(rnorm(9),nrow=3,ncol=3)
  241. num.mat
  242. class(num.mat)
  243. mode(num.mat)

  244. # matrix multiplication
  245. t(num.mat) %*% num.mat

  246. ##################
  247. # Code Snipped-22
  248. ##################

  249. mat.array=array(dim=c(2,2,3))

  250. # To produce the same results over time we set a seed value
  251. set.seed(12345)
  252. mat.array[,,1]<-rnorm(4)
  253. mat.array[,,2]<-rnorm(4)
  254. mat.array[,,3]<-rnorm(4)
  255. mat.array

  256. ##################
  257. # Code Snipped-23
  258. ##################

  259. var1 <- c(101,102,103,104,105)
  260. var2 <- c(25,22,29,34,33)
  261. var3 <- c("Non-Diabetic", "Diabetic", "Non-Diabetic", "Non-Diabetic", "Diabetic")
  262. var4 <- factor(c("male","male","female","female","male"))
  263. diab.dat <- data.frame(var1,var2,var3,var4)

  264. mat.array=array(dim=c(2,2,3))
  265. set.seed(12345)

  266. mat.array[,,1]<-rnorm(4)
  267. mat.array[,,2]<-rnorm(4)
  268. mat.array[,,3]<-rnorm(4)

  269. # creating list
  270. obj.list <- list(elem1=var1,elem2=var2,elem3=var3,elem4=var4,elem5=diab.dat,elem6=mat.array)
  271. obj.list


  272. ##################
  273. # Code Snipped-24
  274. ##################

  275. missing_dat <- data.frame(v1=c(1,NA,0,1),v2=c("M","F",NA,"M"))
  276. missing_dat
  277. is.na(missing_dat$v1)
  278. is.na(missing_dat$v2)
  279. any(is.na(missing_dat))
复制代码


使用道具

藤椅
ReneeBK 发表于 2014-11-11 06:44:23 |只看作者 |坛友微信交流群

Chapter 2:

  1. ##################
  2. # Code Snipped-1
  3. ##################

  4. # Before running the following command we need to set the data
  5. # location using setwd(). For example setwd("d:/chap2").

  6. anscombe <- read.csv("CSVanscombe.csv",skip=2)

  7. ##################
  8. # Code Snipped-2
  9. ##################

  10. # import csv file that contains both numeric and character variable
  11. # firstly using default and then using stringsAsFActors=FALSE

  12. iris_a <- read.csv("iris.csv")
  13. str(iris_a)

  14. ##################
  15. # Code Snipped-3
  16. ##################

  17. # Now using stringsAsFactors=FALSE
  18. iris_b <- read.csv("iris.csv",stringsAsFactors=F)
  19. str(iris_b)

  20. ##################
  21. # Code Snipped-4
  22. ##################

  23. iris_semicolon <- read.csv("iris_semicolon.csv",stringsAsFactors=FALSE,sep=";")
  24. str(iris_semicolon)

  25. ##################
  26. # Code Snipped-5
  27. ##################

  28. anscombe_tab <- read.csv("anscombe.txt",sep="\t")
  29. anscombe_tab_2 <- read.table("anscombe.txt",header=TRUE)

  30. ##################
  31. # Code Snipped-6
  32. ##################
  33. # Calling xlsx library
  34. library(xlsx)
  35. # importing xlsxanscombe.xlsx
  36. anscombe_xlsx <- read.xlsx2("xlsxanscombe.xlsx",sheetIndex=1)

  37. ##################
  38. # Code Snipped-7
  39. ##################

  40. # loading robjects.RData file
  41. load("robjects.RData")

  42. # to see whether the objects are imported correctly
  43. objects()

  44. ##################
  45. # Code Snipped-8
  46. ##################

  47. library(foreign)
  48. iris_stata <- read.dta("iris_stata.dta")

  49. ##################
  50. # Code Snipped-9
  51. ##################

  52. # creating an R objects whose value is "datamanipulation"
  53. char.obj <- "datamanipulation"

  54. # creating a factor variable by extracting each single letter from the
  55. # character string. To extract each single letter substring() function
  56. # has been used. Note: nchar() function give number of character count
  57. # in an character type R object
  58. factor.obj <- factor(substring(char.obj,1:nchar(char.obj),1:nchar(char.obj)),levels=letters)

  59. # Displaying levels of the factor variable
  60. levels(factor.obj)

  61. # Displaying the data using table() function
  62. table(factor.obj)


  63. ##################
  64. # Code Snipped-10
  65. ##################

  66. # re-creating factor variable from existing factor variable
  67. factor.obj1 <- factor(factor.obj)

  68. # Displaying levels of the new factor variable
  69. levels(factor.obj1)

  70. # displaying data using table() function
  71. table(factor.obj1)
  72. factor.obj1


  73. ##################
  74. # Code Snipped-11
  75. ##################

  76. # creating a numeric variable by taking 100 random numbers
  77. # from normal distribution
  78. set.seed(1234) # setting seed to reproduce the example
  79. numvar <- rnorm(100)

  80. # creating factor variable with 5 distinct category

  81. num2factor <- cut(numvar,breaks=5)
  82. class(num2factor)
  83. levels(num2factor)
  84. table(num2factor)

  85. ##################
  86. # Code Snipped-11
  87. ##################

  88. # creating factor with given labels
  89. num2factor <- cut(numvar,breaks=5,labels=c("lowest group","lower middle group", "middle group", "upper middle", "highest group"))

  90. # displaying the data is tabular form
  91. data.frame(table(num2factor))

  92. # creating factor variable using conditional statement
  93. num2factor <- factor(ifelse(numvar<=-1.37,1,ifelse(numvar<=-0.389,2,ifelse(numvar<=0.592,3,ifelse(numvar<=1.57,4,5)))),labels=c("(-2.35,-1.37]", "(-1.37,-0.389]", "(-0.389,0.592]", "(0.592,1.57]",   "(1.57,2.55]"))

  94. # displaying data using table function
  95. table(num2factor)

  96. ##################
  97. # Code Snipped-12
  98. ##################

  99. # creating date object using built in as.Date() function
  100. as.Date("1970-01-01")

  101. # looking at the internal value of date object
  102. as.numeric(as.Date("1970-01-01"))

  103. # Second January 1970 is showing number of elapsed day is 1.
  104. as.Date("1970-01-02")
  105. as.numeric(as.Date("1970-01-02"))

  106. ##################
  107. # Code Snipped-13
  108. ##################

  109. # creating date object specifying format of date
  110. as.Date("Jan-01-1970",format="%b-%d-%Y")

  111. ##################
  112. # Code Snipped-14
  113. ##################

  114. # loading lubridate package
  115. library(lubridate)

  116. # creating date object using mdy() function
  117. mdy("Jan-01-1970")

  118. ##################
  119. # Code Snipped-15
  120. ##################

  121. # creating heterogeneous date object
  122. hetero_date <- c("second chapter due on 2013, august, 24", "first chapter submitted on 2013, 08, 18", "2013 aug 23")

  123. # parsing the character date object and convert to valid date
  124. ymd(hetero_date)

  125. ##################
  126. # Code Snipped-16
  127. ##################

  128. hetero_date <- c("second chapter due on 2013, august, 24", "first chapter submitted on 2013, 08, 18", "23 aug 2013")
  129. ymd(hetero_date)

  130. ##################
  131. # Code Snipped-17
  132. ##################

  133. # Creating date object using based R functionality
  134. date <- as.POSIXct("23-07-2013",format = "%d-%m-%Y", tz = "UTC")
  135. date

  136. # extracting month from the date object
  137. as.numeric(format(date, "%m"))

  138. # manipulating month by replacing month 7 to 8
  139. date <- as.POSIXct(format(date,"%Y-8-%d"), tz = "UTC")
  140. date

  141. # The same operation is done using lubridate package
  142. date <- dmy("23-07-2013")
  143. date

  144. month(date)

  145. month(date) <- 8
  146. date

  147. ##################
  148. # Code Snipped-18
  149. ##################

  150. # accessing system date and time
  151. current_time <- now()
  152. current_time

  153. # changing time zone to "GMT"
  154. current_time_gmt <- with_tz(current_time,"GMT")
  155. current_time_gmt

  156. # rounding the date to nearest day
  157. round_date(current_time_gmt,"day")

  158. # rounding the date to nearest month
  159. round_date(current_time_gmt,"month")

  160. # rounding date to nearest year
  161. round_date(current_time_gmt,"year")

  162. ##################
  163. # Code Snipped-19
  164. ##################

  165. # creating a 10 element vector
  166. num10 <- c(3,2,5,3,9,6,7,9,2,3)
  167. # accessing 5th element
  168. num10[5]

  169. # checking whether there is any value of num10 object greater than 6
  170. num10>6

  171. # keeping only values greater than 6
  172. num10[num10>6]

  173. # use of negative subscript removes first element "3"
  174. num10[-1]

  175. ##################
  176. # Code Snipped-20
  177. ##################

  178. # creating a data frame with 2 variables
  179. data_2variable <- data.frame(x1=c(2,3,4,5,6),x2=c(5,6,7,8,1))

  180. # accessing only first row
  181. data_2variable[1,]

  182. # accessing only first column
  183. data_2variable[,1]

  184. # accessing first row and first column
  185. data_2variable[1,1]

  186. ##################
  187. # Code Snipped-21
  188. ##################

  189. list_obj<- list(dat=data_2variable,vec.obj=c(1,2,3))
  190. list_obj

  191. # accessing second element of the list_obj objects
  192. list_obj[[2]]
  193. list_obj[[2]][1]

  194. # accessing dataset from the list object
  195. list_obj$dat
复制代码


使用道具

板凳
ReneeBK 发表于 2014-11-11 06:49:20 |只看作者 |坛友微信交流群

Chapter 3:

  1. ##################
  2. # Code Snipped-1
  3. ##################

  4. # notice that during split step a negative 5 is used within the code,
  5. # this negative 5 has been used to discard fifth column of the iris data
  6. # that contains "species" information and we do not need that column to calculate mean.

  7. iris.set <- iris[iris$Species=="setosa",-5]
  8. iris.versi <- iris[iris$Species=="versicolor",-5]
  9. iris.virg <- iris[iris$Species=="virginica",-5]

  10. # calculating mean for each piece ( The apply step)
  11. mean.set <- colMeans(iris.set)
  12. mean.versi <- colMeans(iris.versi)
  13. mean.virg <- colMeans(iris.virg)

  14. # combining the output (The combine step)
  15. mean.iris <- rbind(mean.set,mean.versi,mean.virg)

  16. # giving row names so that the output could be easily understood
  17. rownames(mean.iris) <- c("setosa","versicolor","virginica")

  18. ##################
  19. # Code Snipped-2
  20. ##################

  21. # split-apply-combine using loop
  22. # each iteration represents split
  23. # mean calculation within each iteration represents apply step
  24. # rbind command in each iteration represents combine step

  25. mean.iris.loop <- NULL
  26. for(species in unique(iris$Species))
  27. {
  28.   iris_sub <- iris[iris$Species==species,]
  29.   column_means <- colMeans(iris_sub[,-5])
  30.   mean.iris.loop <- rbind(mean.iris.loop,column_means)
  31. }

  32. # giving row names so that the output could be easily understood
  33. rownames(mean.iris.loop) <- unique(iris$Species)

  34. ##################
  35. # Code Snipped-3
  36. ##################

  37. mean.iris.loop <- NULL
  38. for(species in unique(iris$Species))
  39. {
  40.   iris_sub <- iris[iris$Species==species,]
  41.   column_means <- colMeans(iris_sub[,-5])
  42.   mean.iris.loop <- rbind(mean.iris.loop,column_means)
  43. }
  44. rownames(mean.iris.loop) <- unique(iris$Species)

  45. mean.iris.loop

  46. #The same mean calculation, but this time using the plyr package:
  47. library(plyr)
  48. ddply(iris,~Species,function(x) colMeans(x[,-
  49. which(colnames(x)=="Species")]))

  50. mean.iris.loop

  51. ##################
  52. # Code Snipped-4
  53. ##################

  54. # class of iris3 dataset is array
  55. class(iris3)
  56. # dimension of iris3 dataset
  57. dim(iris3)

  58. ##################
  59. # Code Snipped-5
  60. ##################

  61. # Calculate column mean for each species and output will be data frame
  62. iris_mean <- adply(iris3,3,colMeans)

  63. class(iris_mean)
  64. iris_mean

  65. ##################
  66. # Code Snipped-6
  67. ##################

  68. # again we will calculate the mean but this time output will be an array
  69. iris_mean <- aaply(iris3,3,colMeans)
  70. class(iris_mean)
  71. iris_mean

  72. # note that here the class is showing "matrix",
  73. # since the output is a two dimensional array which represents matrix

  74. # Now calculate mean again with output as list
  75. iris_mean <- alply(iris3,3,colMeans)
  76. class(iris_mean)
  77. iris_mean

  78. ##################
  79. # Code Snipped-7
  80. ##################

  81. # converting 3 dimensional array to a 2 dimensional data frame
  82. iris_dat <- adply(iris3, .margins=3)
  83. class(iris_dat)
  84. str(iris_dat)

  85. ##################
  86. # Code Snipped-8
  87. ##################

  88. # Function to calculate five number summary
  89. fivenum.summary <- function(x)
  90. {
  91.   results <-data.frame(min=apply(x,2,min),
  92.   mean=apply(x,2,mean),
  93.   median=apply(x,2,median),
  94.   max=apply(x,2,max),
  95.   sd=apply(x,2,sd))
  96.   return(results)
  97. }

  98. #To calculate the summaries for the five numbers using a for loop with default R is as shown:
  99. # initialize the output list object
  100. all_stats <- list()

  101. # the for loop will run for each species
  102. for(i in 1:dim(iris3)[3])
  103. {
  104.   sub_data <- iris3[,,i]
  105.   all_stat_species <- fivenum.summary(sub_data)
  106.   all_stats[[i]] <-  all_stat_species
  107. }

  108. # class of the output object
  109. class(all_stats)
  110. all_stats

  111. # Let's calculate the same statistics, but this time using the adply() function from the plyr package:
  112. all_stats <- alply(iris3,3,fivenum.summary)

  113. class(all_stats)
  114. all_stats

  115. ##################
  116. # Code Snipped-9
  117. ##################

  118. # define parameter set
  119. parameter.dat <- data.frame(n=c(25,50,100,200,400),mean=c(0,2,3.5,2.5,0.1),sd=c(1,1.5,2,5,2))

  120. # displaying parameter set

  121. parameter.dat

  122. # random normal variate generate using base R
  123. # set seed to make the example reproducible
  124. set.seed(12345)

  125. # initialize blank list object to store the generated variable
  126. dat <- list()
  127. for(i in 1:nrow(parameter.dat))
  128. {
  129. dat[[i]] <- rnorm(n=parameter.dat[i,1],
  130.              mean=parameter.dat[i,2],sd=parameter.dat[i,3])
  131. }

  132. # estimating mean from the newly generated data
  133. estmean <- lapply(dat,mean)
  134. estmean

  135. # Performing same task as above but this time use plyr package

  136. dat_plyr <- mlply(parameter.dat,rnorm)
  137. estmean_plyr <- llply(dat_plyr,mean)
  138. estmean_plyr
复制代码


使用道具

报纸
ReneeBK 发表于 2014-11-11 06:54:57 |只看作者 |坛友微信交流群

Chapter 4:

  1. ##################
  2. # Code Snipped-1
  3. ##################

  4. # Example of typical two dimensional data

  5. # A demo dataset "students" with typical layout. This data contains
  6. # two students' exam score of "math", "literature" and "language" in
  7. # different term exam.
  8. students <- data.frame(sid=c(1,1,2,2),
  9.                        exmterm=c(1,2,1,2),
  10.                        math=c(50,65,75,69),
  11.                        literature=c(40,45,55,59),
  12.                        language=c(70,80,75,78))
  13. students

  14. ##################
  15. # Code Snipped-2
  16. ##################

  17. library(reshape)
  18. # Example of molten data
  19. molten_students <- melt.data.frame(students,id.vars=c("sid","exmterm"))

  20. ##################
  21. # Code Snipped-3
  22. ##################

  23. # Reshaping dataset using reshape function
  24. wide_students <- reshape(students,direction="wide",idvar="sid",timevar="exmterm")
  25. wide_students

  26. # Now again reshape to long format
  27. long_students <- reshape(wide_students,direction="long",idvar="id")
  28. long_students

  29. ##################
  30. # Code Snipped-4
  31. ##################

  32. # original data
  33. students

  34. # Melting by specifying both id and measured variables
  35. melt(students,id=c("sid","exmterm"), measured=c("math","literature","language"))

  36. # Melting by specifying only id variables
  37. melt(students,id=c("sid","exmterm"))

  38. ##################
  39. # Code Snipped-5
  40. ##################

  41. # Melting students data
  42. molten_students <- melt(students,id.vars=c("sid","exmterm"))
  43. molten_students

  44. # return back to original data
  45. cast(molten_students,sid+exmterm~variable)

  46. # Now the same operation but specifying only row variable.
  47. cast(molten_students,...~variable)

  48. # We now rearrange the data where sid is now separate column for each student
  49. cast(molten_students,...~sid)

  50. # Again rearranging the data where exmterm is now separate column for each term
  51. cast(molten_students,...~exmterm)
复制代码


使用道具

地板
ReneeBK 发表于 2014-11-11 06:55:40 |只看作者 |坛友微信交流群

Chapter 5

  1. ##################
  2. # Code Snipped-1
  3. ##################

  4. # Trying to create a vector of zero with length 2^32-1. Note that the RAM
  5. # of the computer we are generating this example is 8GB with 64-bit Windows-7
  6. # Professional edition.Processor core i5.

  7. x <- rep(0, 2^31-1)
  8. 2^31

  9. # If we try to assign a vector of length greater than maximum addressable
  10. # length then that will produce NA

  11. as.integer(2^31)

  12. ##################
  13. # Code Snipped-2
  14. ##################

  15. # calling ODBC library into R
  16. library(RODBC)

  17. # creating connection with the database using odbc package and the connection
  18. # we created earlier.

  19. xldb<- odbcConnect("xlopen")

  20. # In the odbcConnect() function the minimum argument required
  21. # is the ODBC connection string.

  22. # Now the connection created, using that connection we will import data

  23. xldata<- sqlFetch(xldb, "CSVanscombe")

  24. # Note here that "CSVanscombe"is the Excel worksheet name.

  25. odbcClose(xldb) # closing the database connection

  26. ##################
  27. # Code Snipped-3
  28. ##################

  29. # calling odbc library
  30. library(RODBC)

  31. # connecting with database
  32. access_con<- odbcConnect("accessdata")

  33. # import separate table as separate R data frame
  34. coverage_page<- sqlFetch(access_con, "coverpage")
  35. ques1 <- sqlFetch(access_con, "questionnaire1")
  36. ques2 <- sqlFetch(access_con, "questionnaire2")

  37. odbcClose(access_con) # closing the database connection
  38. ##################
  39. # Code Snipped-4  for filehash package
  40. ##################

  41. library(filehash)
  42. dbCreate("exampledb")
  43. filehash_db<- dbInit("exampledb")

  44. dbInsert(filehash_db, "xx", rnorm(50))
  45. value<- dbFetch(filehash_db, "xx")
  46. summary(value)

  47. dbInsert(filehash_db, "y", 4709)
  48. dbDelete(filehash_db, "xx")
  49. dbList(filehash_db)
  50. dbExists(filehash_db, "xx")

  51. filehash_db$x<- runif(100)
  52. summary(filehash_db$x)
  53. summary(filehash_db[["x"]])
  54. filehash_db$y<- rnorm(100, 2)
  55. dbList(filehash_db)

  56. # To run the following line make sure the working directory is set properly.
  57. # The working directory should be the folder where the file "anscombe.txt" is stored

  58. dumpDF(read.table("anscombe.txt", header=T), dbName="massivedata")
  59. massive_environment<- db2env(db="massivedata")

  60. fit<- with(massive_environment, lm(Y1~X1))
  61. with(massive_environment, summary(Y1))
  62. with(massive_environment, Y1[1] <- 99)

  63. ##################
  64. # Code Snipped-5  for ff package
  65. ##################
  66. library(ff)
  67. file1 <- ff(filename="file1", length=10,vmode="double")
  68. str(file1)

  69. # calling rivers data
  70. data(rivers)
  71. file1[1:10] <- rivers[1:10]

  72. # Note that here file1 is an ff object whereas
  73. # file1[...] returns default R vector
  74. str(file1)

  75. # We can perform sampling if required on the ff objects:
  76. # set seed to reproduce the example
  77. set.seed(1337)
  78. sample(file1,5,replace=FALSE)

  79. gc()

  80. ##################
  81. # Code Snipped-6  for sqldf package
  82. ##################

  83. # Selecting the rows from iris dataset where sepal length > 2.5
  84. # and store that in subiris data frame

  85. library(sqldf)
  86. subiris<- sqldf("select * from iris where Sepal_Width> 3")
  87. head(subiris)
  88. nrow(subiris)

  89. subiris2<- sqldf("select Sepal_Length,Petal_Length,Species from iris where Petal_Length> 1.4")
  90. nrow(subiris2)

  91. # Before running the following line, make sure the working directory is set properly
  92. # import only Sepal width and Petal width along with species information where Petal width is greater than 0.4
  93. iriscsv<-read.csv.sql("iris.csv",sql="select Sepal_Width,Petal_Width,Species from file where Petal_Width>0.4")
  94. head(iriscsv)

  95. # do not use underscore as within variable name it will give error, here is the example
  96. iriscsv<-read.csv.sql("iris.csv",sql="select Sepal.Width,Petal.Width,Species from file where Petal.Width>0.4")

  97. # we can draw a random sample of size 10 from iris data that are stored in iris.csv file.
  98. iris_sample<- read.csv.sql("iris.csv",sql="select * from file order by random(*) limit 10")
  99. iris_sample

  100. # Calculate group wise mean from iris data
  101. iris_avg<-sqldf("select Species, avg(Sepal_Length),avg(Sepal_Width),avg(Petal_Length),avg(Petal_Width) from iris group by Species")

  102. colnames(iris_avg) <- c("Species","Sepal_L","Sepal_W","Petal_L","Petal_W")
  103. iris_avg

  104. # The base R counterpart to perform same operation is
  105. aggregate(iris[,-5],list(iris$Species),mean)
复制代码


使用道具

7
zhangwenqian1 发表于 2014-11-11 07:31:03 |只看作者 |坛友微信交流群
kankan

使用道具

8
spss1010 发表于 2014-11-11 09:14:01 |只看作者 |坛友微信交流群
书在哪里?

使用道具

9
tracymicky 发表于 2014-11-11 09:59:21 |只看作者 |坛友微信交流群
have a look

使用道具

10
lonestone 在职认证  发表于 2014-11-12 01:45:06 来自手机 |只看作者 |坛友微信交流群
ReneeBK 发表于 2014-11-11 06:31
https://www.packtpub.com/big-data-and-business-intelligence/data-manipulation-r
Data Manipulation w ...
good

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加JingGuanBbs
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-28 11:01