楼主: Lisrelchen
1833 13

[Text Mining]textTinyR [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50288 个
通用积分
83.6306
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2017-7-6 23:14:43 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
textTinyR

The textTinyR package consists of text pre-processing functions for small or big data files. More details on the functionality of the textTinyR can be found in the blog-post and in the package Vignette. The R package can be installed, in the following OS's: Linux, Mac and Windows. However, there are some limitations :

  • there is no support for chinese, japanese, korean, thai or languages with ambiguous word boundaries.
  • there is no support functions for utf-locale on windows, meaning only english character strings or files can be input and pre-processed.

System Requirements ( for unix OS's )
Debian/Ubuntu

sudo apt-get install libboost-all-dev

sudo apt-get update

sudo apt-get install libboost-locale-dev


Fedora

yum install boost-devel


Macintosh OSX/brew

UPDATE 25-05-2017 : The current CRAN version of the package can only be installed on Linux and Windows. If the boost locale are installed properly on your OSystem use the devtools::install_github(repo = 'mlampros/textTinyR', clean = TRUE)function to download the textTinyR package.


The boost library will be installed on the Macintosh OSx using the Homebrew package manager,

If the boost library is already installed using brew install boost then it must be removed using the following command,


brew uninstall boost


Then the formula for the boost library should be modified using a text editor (TextEdit, TextMate, etc). The formula on a Macintosh OS Sierra is saved in:


/usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/boost.rb


The user should open the boost.rb formula and replace the following code chunk beginning from (approx.) line 71,


# layout should be synchronized with boost-pythonargs = ["--prefix=#{prefix}",        "--libdir=#{lib}",        "-d2",        "-j#{ENV.make_jobs}",        "--layout=tagged",        "--user-config=user-config.jam",        "install"]if build.with? "single"  args << "threading=multi,single"else  args << "threading=multi"end

with the following code chunk,


# layout should be synchronized with boost-pythonargs = ["--prefix=#{prefix}",        "--libdir=#{lib}",        "-d2",        "-j#{ENV.make_jobs}",        "--layout=system",         "--user-config=user-config.jam",        "threading=multi",        "install"]#if build.with? "single"#  args << "threading=multi,single"#else#  args << "threading=multi"#end

Then the user should save the changes, close the file and run,


brew update


to apply the changes.


Then he/she should open a new terminal (console) and type the following command, which installs the boost library using the modified formula from source, (warning: there are two dashes before : build-from-source)


brew install /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/boost.rb --build-from-source


That's it.


Installation of the textTinyR package (CRAN, Github)

To install the package from CRAN use,

install.packages('textTinyR', clean = TRUE)

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Text Mining Text Mini ning TeX

沙发
MouJack007 发表于 2017-7-6 23:41:20
谢谢楼主分享!

藤椅
MouJack007 发表于 2017-7-6 23:43:09

板凳
soccy 发表于 2017-7-7 00:15:31

报纸
ReneeBK 发表于 2017-7-7 00:46:09
  1. Examples
  2. library(textTinyR)
  3. # fs <- big_tokenize_transform$new(verbose = FALSE)
  4. #---------------
  5. # file splitter:
  6. #---------------
  7. # fs$big_text_splitter(input_path_file = "input.txt",
  8. # output_path_folder = "/folder/output/",
  9. # end_query = "endword", batches = 5,
  10. # trimmed_line = FALSE)
复制代码

地板
ReneeBK 发表于 2017-7-7 00:47:14
  1. Examples
  2. library(textTinyR)
  3. # bc = bytes_converter(input_path_file = 'some_file.txt', unit = "MB")
复制代码

7
ReneeBK 发表于 2017-7-7 00:47:54
  1. Examples
  2. library(textTinyR)
  3. sentence1 = 'this is one sentence'
  4. sentence2 = 'this is a similar sentence'
  5. cds = cosine_distance(sentence1, sentence2)
复制代码

8
ReneeBK 发表于 2017-7-7 00:48:30
  1. Examples
  2. library(textTinyR)
  3. tmp = matrix(sample(0:1, 100, replace = TRUE), 10, 10)
  4. sp_mat = dense_2sparse(tmp)
复制代码

9
ReneeBK 发表于 2017-7-7 00:48:55
  1. Examples
  2. library(textTinyR)
  3. word1 = 'one_word'
  4. word2 = 'two_words'
  5. dts = dice_distance(word1, word2, n_grams = 2)
复制代码

10
ReneeBK 发表于 2017-7-7 00:49:49
  1. Examples
  2. library(textTinyR)
  3. word1 = 'one_word'
  4. word2 = 'two_words'
  5. lvs = levenshtein_distance(word1, word2)
复制代码

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-2 21:23