楼主: oliyiyi
1297 1

Solve common R problems efficiently with data.table [推广有奖]

版主

已卖:2994份资源

泰斗

1%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
84105 个
通用积分
31671.0967
学术水平
1454 点
热心指数
1573 点
信用等级
1364 点
经验
384134 点
帖子
9629
精华
66
在线时间
5508 小时
注册时间
2007-5-21
最后登录
2025-7-8

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

楼主
oliyiyi 发表于 2016-1-9 18:46:28 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
(This article was first published on Jan Gorecki - R, and kindly contributed to R-bloggers)

I was recently browsing stackoverflow.com (often called SO) for the most voted questions under R tag.
To my surprise, many questions on the first page were quite well addressed with the data.table package. I found a few other questions that could benefit from a data.table answer, therefore went ahead and answered them.
In this post, I’d like to summarise them along with benchmarks (where possible) and my comments if any.
Many answers under highly voted questions seem to have been posted a while back. data.table is quite actively developed and has had tons of improvements (in terms of speed and memory usage) over the recent years. It might therefore be entirely possible that some of those answers will have even better performance by now.

50 highest voted questions under R tag

Here’s the list of top 50 questions. I’ve marked those for which a data.table answer is available (which is usually quite performant).

I

Number of votes

Question titleUse data.table solution

1

1153

How to make a great R reproducible example?

2

621

How to sort a dataframe by column(s)?TRUE

3

496

R Grouping functions: sapply vs. lapply vs. apply. vs. tapplTRUE

4

429

How can we make xkcd style graphs?

5

396

How to join (merge) data frames (inner, outer, left, right)?TRUE

6

330

What statistics should a programmer (or computer scientist)

7

314

Drop columns in R data frameTRUE

8

290

Tricks to manage the available memory in an R session

9

280

Remove rows with NAs in data.frameTRUE

10

279

Quickly reading very large tables as dataframes in RTRUE

11

263

How to properly document S4 class slots using Roxygen2?

12

250

Assignment operators in R: '=' and '<-'

13

236

Drop factor levels in a subsetted data frameTRUE

14

234

Plot two graphs in same plot in R

15

225

What is the difference between require() and library()?

16

221

data.table vs dplyr: can one do something well the other can

17

216

In R, why is [ better than subset?

18

212

R function for testing if a vector contains a given element

19

201

Expert R users, what's in your .Rprofile?

20

197

R list to data frameTRUE

21

197

Rotating and spacing axis labels in ggplot2

22

197

How to Correctly Use Lists in R?

23

192

How to convert a factor to an integernumeric without a loss

24

184

How can I read command line parameters from an R script?

25

184

How to unload a package without restarting R?

26

182

Tools for making latex tables in R

27

181

In R, what is the difference between the [] and [[]] notatio

28

180

How can I view the source code for a function?

29

171

Cluster analysis in R: determine the optimal number of clust

30

170

How do I install an R package from source?

31

162

How do I replace NA values with zeros in R?

32

152

Counting the number of elements with the values of x in a ve

33

152

Write lines of text to a file in R

34

151

Standard library function in R for finding the mode?

35

150

How to trim leading and trailing whitespace in R?

36

143

How to save a plot as image on the disk?

37

139

Most underused data visualization

38

137

Convert data.frame columns from factors to charactersTRUE

39

136

How to find the length of a string in R?

40

134

Workflow for statistical analysis and report writing

41

132

Create an empty data.frame

42

130

adding leading zeros using R

43

129

Check existence of directory and create if doesn't exist

44

127

Run R script from command line

45

125

Changing column names of a data frame in RTRUE

46

120

How to set limits for axes in ggplot2 R plots?

47

114

How to find out which package version is loaded in R?

48

112

How to plot two histograms together in R?

49

112

How can 2 strings be concatenated in R

50

112

How to organize large R programs?

Below are the chosen answers where data.table can be applied. Each one supplied with the usage and timing copied from the linked answer. Click on the question title to view SO question or follow the answer link for a reproducible example and benchmark details.


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:EFFICIENT Problems problem Common Table problems common

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html

沙发
seahhj 发表于 2016-1-10 10:12:38
good material, thanks for sharing

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-31 08:07