| 所在主题: | |
| 文件名: specdata.rar | |
| 资料下载链接地址: https://bbs.pinggu.org/a-1556791.html | |
| 附件大小: | |
|
花了若干个小时做了一次作业,但水平实在太有限了,只能在这请教下各位大大了,只要懂一些R的应该解决起来都很简单。需要的数据在此:
第一个问题: 写一个名为“pollutantmean'的函数,计算整个指定列表中(specdata)的污染物((sulfate 或nitrate )的平均值的函数。函数'pollutantmean'有三个参数:'目录','污染'和'ID'。无视编码为NA任何遗漏值。函数原型如下: pollutantmean <- function(directory, pollutant, id = 1:332) { ## 'directory' is a character vector of length 1 indicating ## the location of the CSV files ## 'pollutant' is a character vector of length 1 indicating ## the name of the pollutant for which we will calculate the ## mean; either "sulfate" or "nitrate". ## 'id' is an integer vector indicating the monitor ID numbers ## to be used ## Return the mean of the pollutant across all monitors list ## in the 'id' vector (ignoring NA values) } 参考答案: pollutantmean("specdata", "nitrate", 70:72) ## [1] 1.706 pollutantmean("specdata", "nitrate", 23) ## [1] 1.281 我写的如下: pollutantmean <- function(directory,pollutant,id=1:332){ files_list <- dir(directory, full.names=T) data <- data.frame() for (i in 1:332){ data <- rbind(data,read.csv(files_list)) } data_subset <- subset(data, data$ID<=max(id)&data$ID<=max(id)&data$ID>=min(id)) if(pollutant=="sulfate"){ result<-mean(data_subset$sulfate, na.rm=T) } if(pollutant=="nitrate"){ result<-mean(data_subset$nitrate, na.rm=T) } return (result) } 我计算的结果都是对的,但是: 1.似乎结果位数过多,且运算可能过久导致Coursera系统自动否定了我的答案,希望得到详细的修改意见~ 第二个问题: 写一个函数,这个函数的原型如下: complete <- function(directory, id = 1:332) { ## 'directory' is a character vector of length 1 indicating ## the location of the CSV files ## 'id' is an integer vector indicating the monitor ID numbers ## to be used ## Return a data frame of the form: ## id nobs ## 1117 ## 21041 ## ... ## where 'id' is the monitor ID number and 'nobs' is the ## number of complete cases } 答案示例: complete("specdata", 30:25) ## id nobs ## 1 30932 ## 2 29711 ## 3 28475 ## 4 27338 ## 5 26586 ## 6 25463 我写的: complete<- function(directory,id=1:332){ files_list <- dir(directory, full.names=T) data <- data.frame() for (i in 1:332){ data <- rbind(data,read.csv(files_list)) } filecom<-vector() for (i in id){ data_subset<-subset(data,data$ID==i) data2<-data_subset[,2:3] cc<-sum(complete.cases(data2)) filecom<-rbind(filecom,c(i,cc)) } colnames(filecom)<-c("id","nobs") return (filecom) } 1.结果都是对的,但是: > class(complete("specdata", 30:25)) [1] "matrix" 我希望得到: > class(complete("specdata", 30:25)) [1] "data.frame" 2.同样,数据运算的非常慢!而且位数蛮多的,希望得到具体意见。 第三个问题: 原型: corr <- function(directory, threshold = 0) { ## 'directory' is a character vector of length 1 indicating ## the location of the CSV files ## 'threshold' is a numeric vector of length 1 indicating the ## number of completely observed observations (on all ## variables) required to compute the correlation between ## nitrate and sulfate; the default is 0 ## Return a numeric vector of correlations 这个参考了一位朋友的: corr <- function(directory,threshold=0){ filenames <- list.files("specdata", full.names=TRUE) n <-length(filenames) cr <- numeric() for (i in 1:332) { dat <- data.frame(lapply (filenames, read.csv)) datcomplete <- subset(dat, dat$sulfate != "NA" & dat$nitrate != "NA") check <- length(datcomplete$ID) if (check >= threshold & check>0) { cal <- cor(datcomplete$sulfate,datcomplete$nitrate) cr <- c(cr, cal) } } return(cr) } cr <- corr("specdata", 150) head(cr) ## [1] -0.01896 -0.14051 -0.04390 -0.06816 -0.12351 -0.07589 summary(cr) ## Min. 1st Qu.Median Mean 3rd Qu. Max. ## -0.2110 -0.05000.09460.12500.26800.7630 cr <- corr("specdata", 400) head(cr) ## [1] -0.01896 -0.04390 -0.06816 -0.075890.76313 -0.15783 但结果也不太对,head是对的,但后面的有一点点出入...希望得到详细意见。 完全没有编程经验上这个课太痛苦了,希望得到达人的帮助!如果有更简洁的公式希望能直接告诉我,可能我的思路本来就不太好~~ |
|
熟悉论坛请点击新手指南
|
|
| 下载说明 | |
|
1、论坛支持迅雷和网际快车等p2p多线程软件下载,请在上面选择下载通道单击右健下载即可。 2、论坛会定期自动批量更新下载地址,所以请不要浪费时间盗链论坛资源,盗链地址会很快失效。 3、本站为非盈利性质的学术交流网站,鼓励和保护原创作品,拒绝未经版权人许可的上传行为。本站如接到版权人发出的合格侵权通知,将积极的采取必要措施;同时,本站也将在技术手段和能力范围内,履行版权保护的注意义务。 (如有侵权,欢迎举报) |
|
京ICP备16021002号-2 京B2-20170662号
京公网安备 11010802022788号
论坛法律顾问:王进律师
知识产权保护声明
免责及隐私声明