发帖

楼主: oliyiyi

1477 1

A use of gsub, reshape2 and sqldf with healthcare data [推广有奖]

1关注
185
粉丝

版主

已卖：3000份资源

泰斗

1%

还不是VIP/贵宾

-

TA的文库 其他...

计量文库

0%

威望: 7 级
论坛币: -140545 个
通用积分: 31676.0721
学术水平: 1454 点
热心指数: 1573 点
信用等级: 1364 点
经验: 384234 点
帖子: 9629
精华: 66
在线时间: 5508 小时
注册时间: 2007-5-21
最后登录: 2025-7-8

楼主

oliyiyi 发表于 2016-1-9 16:51:57 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

(This article was first published on DataScience+, and kindly contributed to R-bloggers)

Building off other industry-specific posts, I want to use healthcare data to demonstrate the use of R packages. The data can be downloaded here. To read the .CSV file in R you might read the post how to import data in R. Packages in R are stored in libraries and often are pre-installed, but reaching the next level of skill requires being able to know when to use new packages and what they contain. With that let’s get to our example.

gsub

When working with vectors and strings, especially in cleaning up data, gsub makes cleaning data much simpler. In my healthcare data, I wanted to convert dollar values to integers (ie. $21,000 to 21000), and I used gsub as seen below.

Reading the data in R from CSV file. I am naming the dataset “hosp”.

hosp <- read.csv("Payment_and_value_of_care_-_Hospital.csv")

In the code below I will remove hospitals without estimates

hospay<-hosp[hosp$Payment.category !="Not Available" & hosp$Payment.category !="Number of Cases Too Small",]

Now its time to remove the dollar signs and commas in estimate values

hospay$Payment <- as.numeric(gsub("[$,]","",hospay$Payment))hospay$Lower.estimate <- as.numeric(gsub("[$,]", "", hospay$Lower.estimate))hospay$Higher.estimate <- as.numeric(gsub("[$,]", "", hospay$Lower.estimate))head(hospay$Payment)[1] 13469 12863 12308 12222 21376 14740reshape2

In looking at the data, I wanted to focus on the Payment estimate. So I used the melt() function that is part of reshape2. Melt allows pivot-table style capabilities to restructure data without losing values.

library(reshape2)hosp_mel<-melt(data=hospay,id=c(2,5,9,11), measure=as.numeric(c(13)), value.name='Estimate') names(hosp_melt)[1] "Hospital.name" "State" "Payment.measure.name" "Payment.category" "variable" "Estimate" sqldf

With my data melted, I wanted to get the average estimate for heart attack patients by state. This is a classic SQL query, so bringing in sqldf allows for that.

library(sqldf)names(hosp_melt) [3] <- "paymentmeasurename"hosp_est <- sqldf("select State, avg(Estimate) as Estimate from hosp_melt where paymentmeasurename = 'Payment for heart attack patients' group by State")head(hosp_est) State Estimate1 AK 20987.602 AL 21850.323 AR 21758.004 AZ 22690.625 CA 22707.456 CO 21795.30

If you have any question feel free to leave a comment below.

To leave a comment for the author, please follow the link and comment on their blog: DataScience+.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data,R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL,Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：HEALTHCARE reshape Health Shape With published article contain import file

A use of gsub, reshape2 and sqldf with healthcare data [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

浏览过的帖子

浏览过的版块

初级学术勋章

初级热心勋章

初级信用勋章

中级信用勋章

中级学术勋章

中级热心勋章

高级热心勋章

高级学术勋章

高级信用勋章

特级热心勋章

特级学术勋章

特级信用勋章

本版微信群

A use of gsub, reshape2 and sqldf with healthcare data [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

浏览过的帖子

浏览过的版块

初级学术勋章

初级热心勋章

初级信用勋章

中级信用勋章

中级学术勋章

中级热心勋章

高级热心勋章

高级学术勋章

高级信用勋章

特级热心勋章

特级学术勋章

特级信用勋章

本版微信群

扫码加我拉你入群