搜索
人大经济论坛 附件下载

附件下载

所在主题:
文件名:  specdata.rar
资料下载链接地址: https://bbs.pinggu.org/a-1556791.html
附件大小:
花了若干个小时做了一次作业,但水平实在太有限了,只能在这请教下各位大大了,只要懂一些R的应该解决起来都很简单。需要的数据在此:
第一个问题:
写一个名为“pollutantmean'的函数,计算整个指定列表中(specdata)的污染物((sulfate 或nitrate )的平均值的函数。函数'pollutantmean'有三个参数:'目录','污染'和'ID'。无视编码为NA任何遗漏值。函数原型如下:
pollutantmean <- function(directory, pollutant, id = 1:332) {
## 'directory' is a character vector of length 1 indicating
## the location of the CSV files

## 'pollutant' is a character vector of length 1 indicating
## the name of the pollutant for which we will calculate the
## mean; either "sulfate" or "nitrate".

## 'id' is an integer vector indicating the monitor ID numbers
## to be used

## Return the mean of the pollutant across all monitors list
## in the 'id' vector (ignoring NA values)
}


参考答案:
pollutantmean("specdata", "nitrate", 70:72)
## [1] 1.706
pollutantmean("specdata", "nitrate", 23)
## [1] 1.281


我写的如下:
pollutantmean <- function(directory,pollutant,id=1:332){
files_list <- dir(directory, full.names=T)
data <- data.frame()
for (i in 1:332){
data <- rbind(data,read.csv(files_list))
}
data_subset <- subset(data, data$ID<=max(id)&data$ID<=max(id)&data$ID>=min(id))
if(pollutant=="sulfate"){
result<-mean(data_subset$sulfate, na.rm=T)
}
if(pollutant=="nitrate"){
result<-mean(data_subset$nitrate, na.rm=T)
}
return (result)
}

我计算的结果都是对的,但是:
1.似乎结果位数过多,且运算可能过久导致Coursera系统自动否定了我的答案,希望得到详细的修改意见~

第二个问题:
写一个函数,这个函数的原型如下:
complete <- function(directory, id = 1:332) {
## 'directory' is a character vector of length 1 indicating
## the location of the CSV files

## 'id' is an integer vector indicating the monitor ID numbers
## to be used

## Return a data frame of the form:
## id nobs
## 1117
## 21041
## ...
## where 'id' is the monitor ID number and 'nobs' is the
## number of complete cases
}

答案示例:
complete("specdata", 30:25)
## id nobs
## 1 30932
## 2 29711
## 3 28475
## 4 27338
## 5 26586
## 6 25463

我写的:
complete<- function(directory,id=1:332){
files_list <- dir(directory, full.names=T)
data <- data.frame()
for (i in 1:332){
data <- rbind(data,read.csv(files_list))
}
filecom<-vector()
for (i in id){
data_subset<-subset(data,data$ID==i)
data2<-data_subset[,2:3]
cc<-sum(complete.cases(data2))
filecom<-rbind(filecom,c(i,cc))
}
colnames(filecom)<-c("id","nobs")
return (filecom) }

1.结果都是对的,但是:
> class(complete("specdata", 30:25))
[1] "matrix"
我希望得到:
> class(complete("specdata", 30:25))
[1] "data.frame"
2.同样,数据运算的非常慢!而且位数蛮多的,希望得到具体意见。

第三个问题:
原型:
corr <- function(directory, threshold = 0) {
## 'directory' is a character vector of length 1 indicating
## the location of the CSV files

## 'threshold' is a numeric vector of length 1 indicating the
## number of completely observed observations (on all
## variables) required to compute the correlation between
## nitrate and sulfate; the default is 0

## Return a numeric vector of correlations
这个参考了一位朋友的:
corr <- function(directory,threshold=0){
filenames <- list.files("specdata", full.names=TRUE)
n <-length(filenames)
cr <- numeric()

for (i in 1:332) {
dat <- data.frame(lapply (filenames, read.csv))

datcomplete <- subset(dat, dat$sulfate != "NA" & dat$nitrate != "NA")
check <- length(datcomplete$ID)
if (check >= threshold & check>0) {

cal <- cor(datcomplete$sulfate,datcomplete$nitrate)
cr <- c(cr, cal)
}

}
return(cr)
}

cr <- corr("specdata", 150)
head(cr)
## [1] -0.01896 -0.14051 -0.04390 -0.06816 -0.12351 -0.07589
summary(cr)
## Min. 1st Qu.Median Mean 3rd Qu. Max.
## -0.2110 -0.05000.09460.12500.26800.7630
cr <- corr("specdata", 400)
head(cr)
## [1] -0.01896 -0.04390 -0.06816 -0.075890.76313 -0.15783
但结果也不太对,head是对的,但后面的有一点点出入...希望得到详细意见。

完全没有编程经验上这个课太痛苦了,希望得到达人的帮助!如果有更简洁的公式希望能直接告诉我,可能我的思路本来就不太好~~


    熟悉论坛请点击新手指南
下载说明
1、论坛支持迅雷和网际快车等p2p多线程软件下载,请在上面选择下载通道单击右健下载即可。
2、论坛会定期自动批量更新下载地址,所以请不要浪费时间盗链论坛资源,盗链地址会很快失效。
3、本站为非盈利性质的学术交流网站,鼓励和保护原创作品,拒绝未经版权人许可的上传行为。本站如接到版权人发出的合格侵权通知,将积极的采取必要措施;同时,本站也将在技术手段和能力范围内,履行版权保护的注意义务。
(如有侵权,欢迎举报)
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

GMT+8, 2026-1-10 19:35