楼主: ReneeBK
876 0

JSTORr [推广有奖]

  • 1关注
  • 62粉丝

VIP

已卖:4898份资源

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49640 个
通用积分
55.8137
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57805 点
帖子
4005
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

楼主
ReneeBK 发表于 2017-5-22 04:08:42 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
JSTORr

Simple exploratory text mining and document clustering of journal articles from JSTOR's Data for Research service.

Objective

The aim of this package is provide some simple functions in R to explore changes in word frequencies over time in a specific journal archive. It is designed to solve the problem of finding patterns and trends in the unstructured text content of a large number of scholarly journals articles from the JSTOR archive.

Currently there are functions to explore changes in:

  • a single word (ie. plot the relative frequency of a 1-gram over time)
  • two words independantly (ie. plot the relative frequency of two 1-grams over time)
  • sets of words (ie. plot the relative frequency of a single group of mulitple 1-grams over time)
  • correlations between two words over time (ie. plot the correlation of two 1-grams over time)
  • correlations between two sets of words over time (ie. plot the correlation two sets of multiple 1-grams over time)
  • all of the above with bigrams (a sequence of two words)
  • the most frequent words by n-year ranges of documents (ie. top words in all documents published in 2-5-10 year ranges, whatever you like)
  • the top n words correlated a word by n-year ranges of documents (ie. the top 20 words associated with the word 'pirate' in 5 year ranges)
  • various methods (k-means, PCA, affinity propagation) to detect clusters in a set of documents containing a word or set of words
  • topic models with the lda package for full R solution or the Java-based MALLET program (if installing that is an option, currently implemented here for Windows only)

This package will be useful to researchers who want to explore the history of ideas in an academic field, and investigate changes in word and phrase use over time, and between different journals.

How to install

First, make sure you've got Hadley Wickham's excellent devtools package installed. If you haven't got it, you can get it with these lines in your R console:

install.packages(pkgs = "devtools", dependencies = TRUE)

Then, use the install_github() function to fetch this package from github:

library(devtools)# download and install the package (do this only once ever per computer)install_github("benmarwick/JSTORr")

Error messages relating to rJava on Windows can probably be fixed by following exactly the instructions here. On OSX, try R CMD javareconf at the command line, then R install.packages("rJava",type='source').


First, go to JSTOR's Data for Research service and make a request for data. The DfR service makes available large numbers of journal articles in a format that is convenient for text mining. When making a request for data to use with this package, youmust chose:

  • CSV as the 'output format', not XML, which is the default
  • Word Counts and bigrams as the 'Data Type'

Second, once you've downloaded and unzipped the zip file that is the 'full dataset' from DfR then you can start R (it's highly recommended to use RStudio when working with this package, much easier to manage the plot output) and work through the steps in the next section.

本帖隐藏的内容

JSTORr-master.zip (5.02 MB)


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:JSTOR JSTO jst sto correlations designed document specific content explore

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-10 02:47