第一个问题:
写一个名为“pollutantmean'的函数,计算整个指定列表中(specdata)的污染物((sulfate 或nitrate )的平均值的函数。函数'pollutantmean'有三个参数:'目录','污染'和'ID'。无视编码为NA任何遗漏值。函数原型如下:
pollutantmean <- function(directory, pollutant, id = 1:332) {
## 'directory' is a character vector of length 1 indicating
## the location of the CSV files
## 'pollutant' is a character vector of length 1 indicating
## the name of the pollutant for which we will calculate the
## mean; either "sulfate" or "nitrate".
## 'id' is an integer vector indicating the monitor ID numbers
## to be used
## Return the mean of the pollutant across all monitors list
## in the 'id' vector (ignoring NA values)
}
第二个问题:
写一个函数,这个函数的原型如下:
complete <- function(directory, id = 1:332) {
## 'directory' is a character vector of length 1 indicating
## the location of the CSV files
## 'id' is an integer vector indicating the monitor ID numbers
## to be used
## Return a data frame of the form:
## id nobs
## 1 117
## 2 1041
## ...
## where 'id' is the monitor ID number and 'nobs' is the
## number of complete cases
}
我写的:
complete<- function(directory,id=1:332){
files_list <- dir(directory, full.names=T)
data <- data.frame()
for (i in 1:332){
data <- rbind(data,read.csv(files_list))
}
filecom<-vector()
for (i in id){
data_subset<-subset(data,data$ID==i)
data2<-data_subset[,2:3]
cc<-sum(complete.cases(data2))
filecom<-rbind(filecom,c(i,cc))
}
colnames(filecom)<-c("id","nobs")
return (filecom) }
第三个问题:
原型:
corr <- function(directory, threshold = 0) {
## 'directory' is a character vector of length 1 indicating
## the location of the CSV files
## 'threshold' is a numeric vector of length 1 indicating the
## number of completely observed observations (on all
## variables) required to compute the correlation between
## nitrate and sulfate; the default is 0
## Return a numeric vector of correlations
这个参考了一位朋友的:
corr <- function(directory,threshold=0){
filenames <- list.files("specdata", full.names=TRUE)
n <-length(filenames)
cr <- numeric()
for (i in 1:332) {
dat <- data.frame(lapply (filenames, read.csv))