发帖

楼主: 林随机漫步

773 0

[学习分享] R语言快速入门Ⅱ：类SQL数据操作总结 [推广有奖]

1关注
3粉丝

已卖：1份资源

硕士生

7%

还不是VIP/贵宾

-

0%

威望: 0 级
论坛币: 154 个
通用积分: 102.8396
学术水平: 11 点
热心指数: 11 点
信用等级: 11 点
经验: 1957 点
帖子: 67
精华: 0
在线时间: 138 小时
注册时间: 2014-1-11
最后登录: 2025-9-29

楼主

林随机漫步 发表于 2019-9-18 22:02:41 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

向量 Vector : c()

矩阵 Matrix: matrix()

数据框 DataFrame: data.frame()

时间序列 XTS: xts()

因子Factor：factor（补充）

1.1 向量 Vector : c()
> x <- c(1:10)> x [1]  1  2  3  4  5  6  7  8  9 10
1.2 矩阵 Matrix: matrix()
#矩阵用法matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE,dimnames = NULL)
#表示生成1行，1列的一个矩阵，其中仅仅包含一个元素“NA”
#---示例---#
> matrix(c(1,2,3, 11,12,13), nrow = 2, ncol = 3, byrow = TRUE, dimnames = list(c("row1", "row2"), c("C.1", "C.2", "C.3")))

C.1 C.2 C.3row1 1 2 3row2  11  12  13

#nrow = 2和ncol = 3 定义2x3的2行3列矩阵#byrow = TRUE 是控制矩阵中的数据c(1,2,3, 11,12,13)按照行的顺序排列，默认按照列排列
#dimnames = list(c("row1", "row2"), c("C.1", "C.2", "C.3")) 定义矩阵行名和列名

1.3 数据框 DataFrame: data.frame()
#其中" <- "是赋值的意思，将向量c(11:15)赋值给对象x
> x <- c(11:15)
> y <- c(1:5)
#将向量x和y合并存储到数据框中，并重命名为xf和yf

> data.frame(xf = x, yf = y)

xf yf1  11  12  12  23  13  34  14  45  15  5

1.4 时间序列 XTS: xts()

> library(xts)
> x <- c(11:15)
> xts(x,order.by=as.Date('2019-09-14')+1:5)
[,1]2019-09-15 112019-09-16 122019-09-17 132019-09-18 142019-09-19 15

关于xts类型的详细介绍，请参考文章《可扩展的时间序列xts》http://blog.fens.me/r-xts/

1.5 因子Factor：factor（补充）
可以理解为分类变量
#factor是numeric数值类型
#factor(x = character(), levels, labels = levels,exclude = NA, ordered = is.ordered(x), nmax = NA)
#注意：levels与labels的对应关系，其中levels发挥角标作用，与labels位置对应

#例1
> x <- c("Man", "Male", "Man","Lady","Female")

> xf <- factor(x, levels = c("Male","Man" ,"Lady","Female"),labels = c("Male","Male","Female","Female"))
> xf

[1] Male Male Male Female FemaleLevels: Male Female

在R中查看数据概况的函数常用summary()和str()。
其中，summary更多的是描述统计，返回：最大最小、四分位数、均值、中位数等;而str更多的是查看数据（变量）结构，返回：数据集维度数，列变量类型等。summary()和str()结合使用可以对数据概况作出初步了解。

> data(iris)
> head(iris,10)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1       3.5       1.4       0.2  setosa2          4.9       3.0       1.4       0.2  setosa3          4.7       3.2       1.3       0.2  setosa4          4.6       3.1       1.5       0.2  setosa5          5.0       3.6       1.4       0.2  setosa6          5.4       3.9       1.7       0.4  setosa7          4.6       3.4       1.4       0.3  setosa8          5.0       3.4       1.5       0.2  setosa9          4.4       2.9       1.4       0.2  setosa10       4.9       3.1       1.5       0.1  setosa

> summary(iris)
Sepal.Length Sepal.Width    Petal.Length Petal.Width       Species
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50 Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50 Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199                3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800                Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500

> str(iris)

'data.frame':  150 obs. of  5 variables:
$ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species    : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

3.修改/替换/重定义数据

修改指定单元格，修改指定列，within 关联修改

leadership$age[leadership$age==99] <- NA

leadership$agecat2 <- NAleadership <- within(leadership,{ agecat2[age>75] <- "Elder"  agecat2[age>=55 & age<=75] <- "Middle Aged"  agecat2[age<55] <- "Young"})

4 数据合并

数据操作中，数据（集）合并是经常被用到。例如：合并来源不同，结构相似的两个表格3.1 向量合并
#一维向量合并直接将要合并的变量以","分割放到c()中即可。

> x <- c(11:20)
> y <- c(1:10)
> c(x,y)
[1] 11 12 13 14 15 16 17 18 19 20  1  2  3  4  5  6  7  8  9 10

3.2 cbind列合并（等长）

总结：cbind等行数、按列合并（无序）

#生成测试数据
> ID1 <- c(1:4)
> ID2 <- c(2:5)
> name<-c("A","B","C","D")
> score<-c(8,22,7,6)
> student1<-data.frame(ID1,name)
> student2<-data.frame(ID2,score)
#按照行合并student1和student2
> cbind(student1,student2)

ID1 name ID2 score1 1 A 2    82 2 B 3 223 3 C 4    74 4 D 5    6

3.3 rbind行合并
总结：按行合并，需要注意数据集需要有相同的列字段名

> #生成测试数据student1
> ID <- c(1:4)
> score <- c(8,22,7,33)
> student1<-data.frame(ID,score)
>
#生成测试数据student2
> ID <- c("A","B","C","D")
> score <- c(11,2,55,3)
> student2<-data.frame(ID,score)
#按行合并，需要注意数据集需要有相同的列字段名
> rbind(student1,student2)
ID score1  1    82  2 223  3    74  4 335  A 116  B    27  C 558  D    3

3.4 merge
#merge语法结构merge(x, y, by = intersect(names(x), names(y)),    by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,    sort = TRUE, suffixes = c(".x",".y"), no.dups = TRUE,    incomparables = NULL, ...)
#其中，通过by字段控制连接字段by = "ID"为单字段连接，by = c("ID","NAME",……)为多字段连接；
#通过all=FALSE/TRUE、all.x = TRUE和all.y = TRUE实现内连接、外连接、左连接和右连接

#———merge用法———#
> #生成测试数据
> ID1 <- c(1:4)
> ID2 <- c(2:5)
> name<-c("A","B","C","D")
> score<-c(8,22,7,6)
> student1<-data.frame(ID1,name)
> student2<-data.frame(ID2,score)
>

> #内连接：保留交叉位置数据
> merge(student1,student2,by.x = "ID1", by.y = "ID2",all=TRUE)  ID1 name score1 1 A NA2 2 B    83 3 C 224 4 D    75 5 <NA>    6

> #左连接：保留左边所有数据及交叉y数据> merge(student1,student2,by.x = "ID1", by.y = "ID2",all.x=TRUE)  ID1 name score1 1 A NA2 2 B    83 3 C 224 4 D    7

> #右连接：保留右边所有数据及交叉x数据> merge(student1,student2,by.x = "ID1", by.y = "ID2",all.y=TRUE)  ID1 name score1 2 B    82 3 C 223 4 D    74 5 <NA>    6

数据连接主要涉及到merge函数和dplyr包中的*_join等函数，

另外sqldf函数（SQL）亦可以实现数据连接功能。

R语言快速入门

R 语言逻辑运算：TRUE/FALSE

R语言高阶可视化绘图系统：ggplot2入门

R语言，入门首看、必看基础概述

R语言数据管理与dplyr、tidyr

快速掌握R语言中的apply函数族 | 精选分享

R语言分组计算，不止group_by

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏2 回帖

返回列表

发帖

[学习分享] R语言快速入门Ⅱ：类SQL数据操作总结 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

浏览过的帖子

浏览过的版块

本版微信群

[学习分享] R语言快速入门Ⅱ：类SQL数据操作总结 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群