签到
- 苹果/安卓/wp
- 苹果/安卓/wp
客户端
0.0

0.00

人大经济论坛 › 论坛 › 休闲区十二区 › 休闲灌水 › 【独家发布】Data Manipulation with R

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

提升主题| 本版置顶| 关闭主题| 变更主题颜色| 抢沙发| 顶贴| 显身卡| 道具中心

楼主: ReneeBK

1931 12

[休闲其它] 【独家发布】Data Manipulation with R [推广有奖]

1关注
62粉丝

学术权威

14%

还不是VIP/贵宾

-

TA的文库 其他...

Panel Data Analysis

Experimental Design

0%

威望: 1 级
论坛币: 49407 个
通用积分: 51.8704
学术水平: 370 点
热心指数: 273 点
信用等级: 335 点
经验: 57815 点
帖子: 4006
精华: 21
在线时间: 582 小时
注册时间: 2005-5-8
最后登录: 2023-11-26

楼主

ReneeBK 发表于 2014-11-11 06:31:14 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

https://www.packtpub.com/big-data-and-business-intelligence/data-manipulation-r

Data Manipulation with R

January 2014

二维码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏4 回帖

关键词：manipulation ulation ATION Data With 2014

相关帖子

• CDA数据分析师认证考试
• Data Manipulation with R by Jaynal Abedin 2014版
• data manipulation with R
• Data Manipulation with R
• Data.Manipulation.with.R 2014
• [下载]Data Manipulation with R (2008)
• Data Manipulation with R
• Data Manipulation with R
• 分享《Data Manipulation with R》
• Data Manipulation with R
• Data manipulation with tidyr

本帖被以下文库推荐

· R资源总汇 |主题: 1226, 订阅: 404

回复

使用道具举报

沙发

ReneeBK 发表于 2014-11-11 06:39:41 |只看作者 |坛友微信交流群

Chapter 1:

##################
# Code Snipped-1
##################
# Constant
2
"July"
NULL
NA
NaN
Inf
# Object can be created from existing object
# to make the result reproducible mean every time we run the following code we will get same results # we need to set a seed value
set.seed(123)
rnorm(9)+runif(9)
##################
# Code Snipped-2
##################
# Storing R object into a variable and then see the mode
num.obj <- seq(from=1,to=10,by=2)
mode(num.obj)
logical.obj<-c(TRUE,TRUE,FALSE,TRUE,FALSE)
mode(logical.obj)
character.obj <- c("a","b","c")
mode(character.obj)
##################
# Code Snipped-3
##################
# R object containing both numeric and logical element
xz <- c(1, 3, TRUE, 5, FALSE, 9)
xz
mode(xz)
# R object containing character, numeric and logical elements
xw <- c(1,2,TRUE,FALSE,"a")
xw
mode(xw)
##################
# Code Snipped-4
##################
num.obj <- seq(from=1,to=10,by=2)
logical.obj<-c(TRUE,TRUE,FALSE,TRUE,FALSE)
character.obj <- c("a","b","c")
is.numeric(num.obj)
is.logical(num.obj)
is.character(num.obj)
##################
# Code Snipped-5
##################
mode(mean)
# Also we can test whether "mean" is function or not as follows
is.function(mean)
##################
# Code Snipped-6
##################
num.obj <- seq(from=1,to=10,by=2)
logical.obj<-c(TRUE,TRUE,FALSE,TRUE,FALSE)
character.obj <- c("a","b","c")
class(num.obj)
class(logical.obj)
class(character.obj)
##################
# Code Snipped-7
##################
# Output omitted due to space limitation
num.obj <- seq(from=1,to=10,by=2)
set.seed(1234) # To make the matrix reproducible
mat.obj <- matrix(runif(9),ncol=3,nrow=3)
mode(num.obj)
mode(mat.obj)
class(num.obj)
class(mat.obj)
# prints a numeric object
print(num.obj)
#prints a matrix object
print(mat.obj)
##################
# Code Snipped-8
##################
character.obj <- c("a","b","c")
character.obj
is.factor(character.obj)
# Converting character object into factor object using as.factor()
factor.obj <- as.factor(character.obj)
factor.obj
is.factor(factor.obj)
mode(factor.obj)
class(factor.obj)
##################
# Code Snipped-9
##################
# creating vector of numeric element with "c" function
num.vec <- c(1,3,5,7)
num.vec
mode(num.vec)
class(num.vec)
is.vector(num.vec)
##################
# Code Snipped-10
##################
# Vector with mixed elements
num.char.vec <- c(1,3,"five",7)
num.char.vec
mode(num.char.vec)
class(num.char.vec)
is.vector(num.char.vec)
##################
# Code Snipped-11
##################
# combining multiple vectors
comb.vec <- c(num.vec,num.char.vec)
mode(comb.vec)
# creating named vector
named.num.vec <- c(x1=1,x2=3,x3=5)
named.num.vec
##################
# Code Snipped-12
##################
# vector of single element
unit.vec <- 9
is.vector(unit.vec)
##################
# Code Snipped-13
##################
# creating a vector of numbers
# and then convert it to logical and character
numbers.vec <- c(-3,-2,-1,0,1,2,3)
numbers.vec
num2char <- as.character(numbers.vec)
num2char
num2logical <- as.logical(numbers.vec)
num2logical
# creating character vector
#and then convert it to numeric and logical
char.vec <- c("1","3","five","7")
char.vec
char2num <- as.numeric(char.vec)
char2num
char2logical <- as.logical(char.vec)
char2logical
# logical to character conversion
logical.vec <- c(TRUE, FALSE, FALSE, TRUE, TRUE)
logical.vec
logical2char <- as.character(logical.vec)
logical2char
##################
# Code Snipped-14
##################
# creating a vector and accessing elements
vector1 <- c(1,3,5,7,9)
vector1
# accessing second elements of "vector1"
vector1[2]
# accessing three elements starting from second element
vector1[2:4]
# another way of creating vector. Here "from" is the starting point of the vector and "to" is the
# end point of the vector and "by" is increment
vector2 <- seq(from=2, to=10, by=2)
is.vector(vector2)
##################
# Code Snipped-15
##################
#creating factor variable with only one argument with factor()
factor1 <- factor(c(1,2,3,4,5,6,7,8,9))
factor1
levels(factor1)
labels(factor)
labels(factor1)
#creating factor with user given levels to display
factor2 <- factor(c(1,2,3,4,5,6,7,8,9),labels=letters[1:9])
factor2
levels(factor2)
labels(factor2)
##################
# Code Snipped-16
##################
# creating numeric factor and trying to find out mean
num.factor <- factor(c(5,7,9,5,6,7,3,5,3,9,7))
num.factor
mean(num.factor)
##################
# Code Snipped-17
##################
num.factor <- factor(c(5,7,9,5,6,7,3,5,3,9,7))
num.factor
#as.numeric() function only returns internal values of the factor
as.numeric(num.factor)
# now see the levels of the factor
levels(num.factor)
as.character(num.factor)
# now to convert the "num.factor" to numeric there are two method
# method-1:
mean(as.numeric(as.character(num.factor)))
# method-2:
mean(as.numeric(levels(num.factor)[num.factor]))
##################
# Code Snipped-18
##################
#creating vector of different variables and then create data frame
var1 <- c(101,102,103,104,105)
var2 <- c(25,22,29,34,33)
var3 <- c("Non-Diabetic", "Diabetic", "Non-Diabetic", "Non-Diabetic", "Diabetic")
var4 <- factor(c("male","male","female","female","male"))
# now we will create data frame using two numeric vector one
# character vector and one factor
diab.dat <- data.frame(var1,var2,var3,var4)
diab.dat
##################
# Code Snipped-19
##################
#class of each column before creating data frame
class(var1)
class(var2)
class(var3)
class(var4)
##################
# Code Snipped-20
##################
# class of each column after creating data frame
class(diab.dat$var1)
class(diab.dat$var2)
class(diab.dat$var3)
class(diab.dat$var4)
# now create the data frame specifying as.is=TRUE
diab.dat.2 <- data.frame(var1,var2,var3,var4,stringsAsFactors=FALSE)
diab.dat.2
class(diab.dat.2$var3)
##################
# Code Snipped-21
##################
# data frame to matrix conversion
mat.diab <- as.matrix(diab.dat)
mat.diab
class(mat.diab)
mode(mat.diab)
# matrix multiplication is not possible
# with this newly created matrix
t(mat.diab) %*% mat.diab
# creating a matrix with numeric elements only
# To produce the same matrix over time we set a seed value
set.seed(12345)
num.mat <- matrix(rnorm(9),nrow=3,ncol=3)
num.mat
class(num.mat)
mode(num.mat)
# matrix multiplication
t(num.mat) %*% num.mat
##################
# Code Snipped-22
##################
mat.array=array(dim=c(2,2,3))
# To produce the same results over time we set a seed value
set.seed(12345)
mat.array[,,1]<-rnorm(4)
mat.array[,,2]<-rnorm(4)
mat.array[,,3]<-rnorm(4)
mat.array
##################
# Code Snipped-23
##################
var1 <- c(101,102,103,104,105)
var2 <- c(25,22,29,34,33)
var3 <- c("Non-Diabetic", "Diabetic", "Non-Diabetic", "Non-Diabetic", "Diabetic")
var4 <- factor(c("male","male","female","female","male"))
diab.dat <- data.frame(var1,var2,var3,var4)
mat.array=array(dim=c(2,2,3))
set.seed(12345)
mat.array[,,1]<-rnorm(4)
mat.array[,,2]<-rnorm(4)
mat.array[,,3]<-rnorm(4)
# creating list
obj.list <- list(elem1=var1,elem2=var2,elem3=var3,elem4=var4,elem5=diab.dat,elem6=mat.array)
obj.list
##################
# Code Snipped-24
##################
missing_dat <- data.frame(v1=c(1,NA,0,1),v2=c("M","F",NA,"M"))
missing_dat
is.na(missing_dat$v1)
is.na(missing_dat$v2)
any(is.na(missing_dat))

复制代码

回复

使用道具举报

藤椅

ReneeBK 发表于 2014-11-11 06:44:23 |只看作者 |坛友微信交流群

Chapter 2:

##################
# Code Snipped-1
##################
# Before running the following command we need to set the data
# location using setwd(). For example setwd("d:/chap2").
anscombe <- read.csv("CSVanscombe.csv",skip=2)
##################
# Code Snipped-2
##################
# import csv file that contains both numeric and character variable
# firstly using default and then using stringsAsFActors=FALSE
iris_a <- read.csv("iris.csv")
str(iris_a)
##################
# Code Snipped-3
##################
# Now using stringsAsFactors=FALSE
iris_b <- read.csv("iris.csv",stringsAsFactors=F)
str(iris_b)
##################
# Code Snipped-4
##################
iris_semicolon <- read.csv("iris_semicolon.csv",stringsAsFactors=FALSE,sep=";")
str(iris_semicolon)
##################
# Code Snipped-5
##################
anscombe_tab <- read.csv("anscombe.txt",sep="\t")
anscombe_tab_2 <- read.table("anscombe.txt",header=TRUE)
##################
# Code Snipped-6
##################
# Calling xlsx library
library(xlsx)
# importing xlsxanscombe.xlsx
anscombe_xlsx <- read.xlsx2("xlsxanscombe.xlsx",sheetIndex=1)
##################
# Code Snipped-7
##################
# loading robjects.RData file
load("robjects.RData")
# to see whether the objects are imported correctly
objects()
##################
# Code Snipped-8
##################
library(foreign)
iris_stata <- read.dta("iris_stata.dta")
##################
# Code Snipped-9
##################
# creating an R objects whose value is "datamanipulation"
char.obj <- "datamanipulation"
# creating a factor variable by extracting each single letter from the
# character string. To extract each single letter substring() function
# has been used. Note: nchar() function give number of character count
# in an character type R object
factor.obj <- factor(substring(char.obj,1:nchar(char.obj),1:nchar(char.obj)),levels=letters)
# Displaying levels of the factor variable
levels(factor.obj)
# Displaying the data using table() function
table(factor.obj)
##################
# Code Snipped-10
##################
# re-creating factor variable from existing factor variable
factor.obj1 <- factor(factor.obj)
# Displaying levels of the new factor variable
levels(factor.obj1)
# displaying data using table() function
table(factor.obj1)
factor.obj1
##################
# Code Snipped-11
##################
# creating a numeric variable by taking 100 random numbers
# from normal distribution
set.seed(1234) # setting seed to reproduce the example
numvar <- rnorm(100)
# creating factor variable with 5 distinct category
num2factor <- cut(numvar,breaks=5)
class(num2factor)
levels(num2factor)
table(num2factor)
##################
# Code Snipped-11
##################
# creating factor with given labels
num2factor <- cut(numvar,breaks=5,labels=c("lowest group","lower middle group", "middle group", "upper middle", "highest group"))
# displaying the data is tabular form
data.frame(table(num2factor))
# creating factor variable using conditional statement
num2factor <- factor(ifelse(numvar<=-1.37,1,ifelse(numvar<=-0.389,2,ifelse(numvar<=0.592,3,ifelse(numvar<=1.57,4,5)))),labels=c("(-2.35,-1.37]", "(-1.37,-0.389]", "(-0.389,0.592]", "(0.592,1.57]", "(1.57,2.55]"))
# displaying data using table function
table(num2factor)
##################
# Code Snipped-12
##################
# creating date object using built in as.Date() function
as.Date("1970-01-01")
# looking at the internal value of date object
as.numeric(as.Date("1970-01-01"))
# Second January 1970 is showing number of elapsed day is 1.
as.Date("1970-01-02")
as.numeric(as.Date("1970-01-02"))
##################
# Code Snipped-13
##################
# creating date object specifying format of date
as.Date("Jan-01-1970",format="%b-%d-%Y")
##################
# Code Snipped-14
##################
# loading lubridate package
library(lubridate)
# creating date object using mdy() function
mdy("Jan-01-1970")
##################
# Code Snipped-15
##################
# creating heterogeneous date object
hetero_date <- c("second chapter due on 2013, august, 24", "first chapter submitted on 2013, 08, 18", "2013 aug 23")
# parsing the character date object and convert to valid date
ymd(hetero_date)
##################
# Code Snipped-16
##################
hetero_date <- c("second chapter due on 2013, august, 24", "first chapter submitted on 2013, 08, 18", "23 aug 2013")
ymd(hetero_date)
##################
# Code Snipped-17
##################
# Creating date object using based R functionality
date <- as.POSIXct("23-07-2013",format = "%d-%m-%Y", tz = "UTC")
date
# extracting month from the date object
as.numeric(format(date, "%m"))
# manipulating month by replacing month 7 to 8
date <- as.POSIXct(format(date,"%Y-8-%d"), tz = "UTC")
date
# The same operation is done using lubridate package
date <- dmy("23-07-2013")
date
month(date)
month(date) <- 8
date
##################
# Code Snipped-18
##################
# accessing system date and time
current_time <- now()
current_time
# changing time zone to "GMT"
current_time_gmt <- with_tz(current_time,"GMT")
current_time_gmt
# rounding the date to nearest day
round_date(current_time_gmt,"day")
# rounding the date to nearest month
round_date(current_time_gmt,"month")
# rounding date to nearest year
round_date(current_time_gmt,"year")
##################
# Code Snipped-19
##################
# creating a 10 element vector
num10 <- c(3,2,5,3,9,6,7,9,2,3)
# accessing 5th element
num10[5]
# checking whether there is any value of num10 object greater than 6
num10>6
# keeping only values greater than 6
num10[num10>6]
# use of negative subscript removes first element "3"
num10[-1]
##################
# Code Snipped-20
##################
# creating a data frame with 2 variables
data_2variable <- data.frame(x1=c(2,3,4,5,6),x2=c(5,6,7,8,1))
# accessing only first row
data_2variable[1,]
# accessing only first column
data_2variable[,1]
# accessing first row and first column
data_2variable[1,1]
##################
# Code Snipped-21
##################
list_obj<- list(dat=data_2variable,vec.obj=c(1,2,3))
list_obj
# accessing second element of the list_obj objects
list_obj[[2]]
list_obj[[2]][1]
# accessing dataset from the list object
list_obj$dat

复制代码

回复

使用道具举报

板凳

ReneeBK 发表于 2014-11-11 06:49:20 |只看作者 |坛友微信交流群

Chapter 3:

##################
# Code Snipped-1
##################
# notice that during split step a negative 5 is used within the code,
# this negative 5 has been used to discard fifth column of the iris data
# that contains "species" information and we do not need that column to calculate mean.
iris.set <- iris[iris$Species=="setosa",-5]
iris.versi <- iris[iris$Species=="versicolor",-5]
iris.virg <- iris[iris$Species=="virginica",-5]
# calculating mean for each piece ( The apply step)
mean.set <- colMeans(iris.set)
mean.versi <- colMeans(iris.versi)
mean.virg <- colMeans(iris.virg)
# combining the output (The combine step)
mean.iris <- rbind(mean.set,mean.versi,mean.virg)
# giving row names so that the output could be easily understood
rownames(mean.iris) <- c("setosa","versicolor","virginica")
##################
# Code Snipped-2
##################
# split-apply-combine using loop
# each iteration represents split
# mean calculation within each iteration represents apply step
# rbind command in each iteration represents combine step
mean.iris.loop <- NULL
for(species in unique(iris$Species))
{
iris_sub <- iris[iris$Species==species,]
column_means <- colMeans(iris_sub[,-5])
mean.iris.loop <- rbind(mean.iris.loop,column_means)
}
# giving row names so that the output could be easily understood
rownames(mean.iris.loop) <- unique(iris$Species)
##################
# Code Snipped-3
##################
mean.iris.loop <- NULL
for(species in unique(iris$Species))
{
iris_sub <- iris[iris$Species==species,]
column_means <- colMeans(iris_sub[,-5])
mean.iris.loop <- rbind(mean.iris.loop,column_means)
}
rownames(mean.iris.loop) <- unique(iris$Species)
mean.iris.loop
#The same mean calculation, but this time using the plyr package:
library(plyr)
ddply(iris,~Species,function(x) colMeans(x[,-
which(colnames(x)=="Species")]))
mean.iris.loop
##################
# Code Snipped-4
##################
# class of iris3 dataset is array
class(iris3)
# dimension of iris3 dataset
dim(iris3)
##################
# Code Snipped-5
##################
# Calculate column mean for each species and output will be data frame
iris_mean <- adply(iris3,3,colMeans)
class(iris_mean)
iris_mean
##################
# Code Snipped-6
##################
# again we will calculate the mean but this time output will be an array
iris_mean <- aaply(iris3,3,colMeans)
class(iris_mean)
iris_mean
# note that here the class is showing "matrix",
# since the output is a two dimensional array which represents matrix
# Now calculate mean again with output as list
iris_mean <- alply(iris3,3,colMeans)
class(iris_mean)
iris_mean
##################
# Code Snipped-7
##################
# converting 3 dimensional array to a 2 dimensional data frame
iris_dat <- adply(iris3, .margins=3)
class(iris_dat)
str(iris_dat)
##################
# Code Snipped-8
##################
# Function to calculate five number summary
fivenum.summary <- function(x)
{
results <-data.frame(min=apply(x,2,min),
mean=apply(x,2,mean),
median=apply(x,2,median),
max=apply(x,2,max),
sd=apply(x,2,sd))
return(results)
}
#To calculate the summaries for the five numbers using a for loop with default R is as shown:
# initialize the output list object
all_stats <- list()
# the for loop will run for each species
for(i in 1:dim(iris3)[3])
{
sub_data <- iris3[,,i]
all_stat_species <- fivenum.summary(sub_data)
all_stats[[i]] <- all_stat_species
}
# class of the output object
class(all_stats)
all_stats
# Let's calculate the same statistics, but this time using the adply() function from the plyr package:
all_stats <- alply(iris3,3,fivenum.summary)
class(all_stats)
all_stats
##################
# Code Snipped-9
##################
# define parameter set
parameter.dat <- data.frame(n=c(25,50,100,200,400),mean=c(0,2,3.5,2.5,0.1),sd=c(1,1.5,2,5,2))
# displaying parameter set
parameter.dat
# random normal variate generate using base R
# set seed to make the example reproducible
set.seed(12345)
# initialize blank list object to store the generated variable
dat <- list()
for(i in 1:nrow(parameter.dat))
{
dat[[i]] <- rnorm(n=parameter.dat[i,1],
mean=parameter.dat[i,2],sd=parameter.dat[i,3])
}
# estimating mean from the newly generated data
estmean <- lapply(dat,mean)
estmean
# Performing same task as above but this time use plyr package
dat_plyr <- mlply(parameter.dat,rnorm)
estmean_plyr <- llply(dat_plyr,mean)
estmean_plyr

复制代码

回复

使用道具举报

报纸

ReneeBK 发表于 2014-11-11 06:54:57 |只看作者 |坛友微信交流群

Chapter 4:

##################
# Code Snipped-1
##################
# Example of typical two dimensional data
# A demo dataset "students" with typical layout. This data contains
# two students' exam score of "math", "literature" and "language" in
# different term exam.
students <- data.frame(sid=c(1,1,2,2),
exmterm=c(1,2,1,2),
math=c(50,65,75,69),
literature=c(40,45,55,59),
language=c(70,80,75,78))
students
##################
# Code Snipped-2
##################
library(reshape)
# Example of molten data
molten_students <- melt.data.frame(students,id.vars=c("sid","exmterm"))
##################
# Code Snipped-3
##################
# Reshaping dataset using reshape function
wide_students <- reshape(students,direction="wide",idvar="sid",timevar="exmterm")
wide_students
# Now again reshape to long format
long_students <- reshape(wide_students,direction="long",idvar="id")
long_students
##################
# Code Snipped-4
##################
# original data
students
# Melting by specifying both id and measured variables
melt(students,id=c("sid","exmterm"), measured=c("math","literature","language"))
# Melting by specifying only id variables
melt(students,id=c("sid","exmterm"))
##################
# Code Snipped-5
##################
# Melting students data
molten_students <- melt(students,id.vars=c("sid","exmterm"))
molten_students
# return back to original data
cast(molten_students,sid+exmterm~variable)
# Now the same operation but specifying only row variable.
cast(molten_students,...~variable)
# We now rearrange the data where sid is now separate column for each student
cast(molten_students,...~sid)
# Again rearranging the data where exmterm is now separate column for each term
cast(molten_students,...~exmterm)

复制代码

回复

使用道具举报

地板

ReneeBK 发表于 2014-11-11 06:55:40 |只看作者 |坛友微信交流群

Chapter 5

##################
# Code Snipped-1
##################
# Trying to create a vector of zero with length 2^32-1. Note that the RAM
# of the computer we are generating this example is 8GB with 64-bit Windows-7
# Professional edition.Processor core i5.
x <- rep(0, 2^31-1)
2^31
# If we try to assign a vector of length greater than maximum addressable
# length then that will produce NA
as.integer(2^31)
##################
# Code Snipped-2
##################
# calling ODBC library into R
library(RODBC)
# creating connection with the database using odbc package and the connection
# we created earlier.
xldb<- odbcConnect("xlopen")
# In the odbcConnect() function the minimum argument required
# is the ODBC connection string.
# Now the connection created, using that connection we will import data
xldata<- sqlFetch(xldb, "CSVanscombe")
# Note here that "CSVanscombe"is the Excel worksheet name.
odbcClose(xldb) # closing the database connection
##################
# Code Snipped-3
##################
# calling odbc library
library(RODBC)
# connecting with database
access_con<- odbcConnect("accessdata")
# import separate table as separate R data frame
coverage_page<- sqlFetch(access_con, "coverpage")
ques1 <- sqlFetch(access_con, "questionnaire1")
ques2 <- sqlFetch(access_con, "questionnaire2")
odbcClose(access_con) # closing the database connection
##################
# Code Snipped-4 for filehash package
##################
library(filehash)
dbCreate("exampledb")
filehash_db<- dbInit("exampledb")
dbInsert(filehash_db, "xx", rnorm(50))
value<- dbFetch(filehash_db, "xx")
summary(value)
dbInsert(filehash_db, "y", 4709)
dbDelete(filehash_db, "xx")
dbList(filehash_db)
dbExists(filehash_db, "xx")
filehash_db$x<- runif(100)
summary(filehash_db$x)
summary(filehash_db[["x"]])
filehash_db$y<- rnorm(100, 2)
dbList(filehash_db)
# To run the following line make sure the working directory is set properly.
# The working directory should be the folder where the file "anscombe.txt" is stored
dumpDF(read.table("anscombe.txt", header=T), dbName="massivedata")
massive_environment<- db2env(db="massivedata")
fit<- with(massive_environment, lm(Y1~X1))
with(massive_environment, summary(Y1))
with(massive_environment, Y1[1] <- 99)
##################
# Code Snipped-5 for ff package
##################
library(ff)
file1 <- ff(filename="file1", length=10,vmode="double")
str(file1)
# calling rivers data
data(rivers)
file1[1:10] <- rivers[1:10]
# Note that here file1 is an ff object whereas
# file1[...] returns default R vector
str(file1)
# We can perform sampling if required on the ff objects:
# set seed to reproduce the example
set.seed(1337)
sample(file1,5,replace=FALSE)
gc()
##################
# Code Snipped-6 for sqldf package
##################
# Selecting the rows from iris dataset where sepal length > 2.5
# and store that in subiris data frame
library(sqldf)
subiris<- sqldf("select * from iris where Sepal_Width> 3")
head(subiris)
nrow(subiris)
subiris2<- sqldf("select Sepal_Length,Petal_Length,Species from iris where Petal_Length> 1.4")
nrow(subiris2)
# Before running the following line, make sure the working directory is set properly
# import only Sepal width and Petal width along with species information where Petal width is greater than 0.4
iriscsv<-read.csv.sql("iris.csv",sql="select Sepal_Width,Petal_Width,Species from file where Petal_Width>0.4")
head(iriscsv)
# do not use underscore as within variable name it will give error, here is the example
iriscsv<-read.csv.sql("iris.csv",sql="select Sepal.Width,Petal.Width,Species from file where Petal.Width>0.4")
# we can draw a random sample of size 10 from iris data that are stored in iris.csv file.
iris_sample<- read.csv.sql("iris.csv",sql="select * from file order by random(*) limit 10")
iris_sample
# Calculate group wise mean from iris data
iris_avg<-sqldf("select Species, avg(Sepal_Length),avg(Sepal_Width),avg(Petal_Length),avg(Petal_Width) from iris group by Species")
colnames(iris_avg) <- c("Species","Sepal_L","Sepal_W","Petal_L","Petal_W")
iris_avg
# The base R counterpart to perform same operation is
aggregate(iris[,-5],list(iris$Species),mean)

复制代码

回复

使用道具举报

7楼

zhangwenqian1 发表于 2014-11-11 07:31:03 |只看作者 |坛友微信交流群

kankan

回复

使用道具举报

8楼

spss1010 发表于 2014-11-11 09:14:01 |只看作者 |坛友微信交流群

书在哪里？

回复

使用道具举报

9楼

tracymicky 发表于 2014-11-11 09:59:21 |只看作者 |坛友微信交流群

have a look

回复

使用道具举报

10楼

在职认证

发表于 2014-11-12 01:45:06 来自手机 |只看作者 |坛友微信交流群

ReneeBK 发表于 2014-11-11 06:31
https://www.packtpub.com/big-data-and-business-intelligence/data-manipulation-r
Data Manipulation w ...

good

回复

使用道具举报

发帖

本版微信群

加JingGuanBbs
拉您进交流群

如有投资本站、合作意向或投放广告，请联系：13661292478（刘老师）

联系客服

邮箱：service@pinggu.org 投诉或不良信息处理：（010-68466864）

京ICP备16021002-2号京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明