(2)读入json文件和提取数据、写入数据时有什么比较高效的办法?能否考虑减少循环和多线程运行?
(3)将结果df_all用RMySQL包中的dbWriteTable(con1,"data_test",df_all,overwrite=TRUE)写入MySQL数据库时,中文出现乱码,尝试使用了dbSendQuery(con1,"set names utf8")转化,还是未解决。
以下是我用比较笨的方法写的,可以正常运行,请问如何优化代码,可以提高效率。R如果使用多线程,有什么推荐的好方法,谢谢!代码如下:
- library(jsonlite)
- Indexnames <- c("publication_number","earliest_publication_date","title","title_zh_cn","title_en",
- "abstract","abstract_zh_cn","abstract_en","applicants_address","applicants_countries" )
- M <- length(Indexnames)
- setwd('E:/tempdata/data/')
- filenames <- dir()
- df_all <- data.frame()
- for (h in filenames){
- filenames_1 <- dir(paste0(h,'/'))
- for (j in filenames_1){
- file_list <- dir(paste0(h,'/',j,'/'))
- file_list <- file_list[grepl("patent",file_list)]
- for (k in file_list){
- data <- jsonlite::stream_in(file(paste0(h,'/',j,'/',k)))
- N <- NROW(data)
- df_empty <- data.frame(matrix(ncol = M , nrow = N ,dimnames = list(c(),Indexnames)))
- for (i in 1:N){
- df_empty[i,1] <- data[i,"publication_number"]
- df_empty[i,2] <- data[i,"earliest_publication_date"]
- df_empty[i,3] <- data[i,3][[1]][1]
- df_empty[i,4] <- data[i,3][[2]][1]
- df_empty[i,5] <- data[i,3][[3]][1]
- df_empty[i,6] <- data[i,4][[1]][1]
- df_empty[i,7] <- data[i,4][[2]][1]
- df_empty[i,8] <- data[i,4][[3]][1]
- df_empty[i,9] <- data[i,5][[1]]$address[1]
- df_empty[i,10] <- data[i,5][[1]]$countries
- }
- rm(data)
- df_all <- rbind(df_all,df_empty)
- }
- }
- }


雷达卡




京公网安备 11010802022788号







