楼主: oliyiyi
4470 49

Python vs R: 4 Implementations of Same Machine Learning Technique [推广有奖]

回帖奖励 24 个论坛币 回复本帖可获得 3 个论坛币奖励! 每人限 3 次(中奖概率 10%)

版主

泰斗

0%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
271951 个
通用积分
31269.3519
学术水平
1435 点
热心指数
1554 点
信用等级
1345 点
经验
383775 点
帖子
9598
精华
66
在线时间
5468 小时
注册时间
2007-5-21
最后登录
2024-4-18

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

本帖隐藏的内容

Actually, this is about two R versions (standard and improved), a Python version, and a Perl version of a new machine learning technique recently published here. We asked for help to translate the original Perl script to Python and R, and finally decided to work with Naveenkumar Ramaraju, who is currently pursuing a master's in Data Science at Indiana University. So the Python and R versions are from him.

We believe that this code comparison and translation will be very valuable to anyone learning Python or R with the purpose of applying it to data science and machine learning.



[color=rgb(255, 255, 255) !important]


The code

The source code is easy to read and has deliberately made longer than needed to provide enough details, avoid complicated iterations, and facilitate maintenance.The main output file is hdt-out2.txt. The input data set is HDT-data3.txt. You need to read this article (see section 4 after clicking, it has been updated) to check out what the code is trying to accomplish. In short, it is an algorithm to classify blog posts as popular or not based on extracted features (mostly, keywords in the title.)

The code has been written in Perl, R and Python. Perl and Python run faster than R. Click on the relevant link below to access the source code, available as a text file. The code, originally written in Perl, was translated to Python and R by Naveenkumar Ramaraju.

For those learning Python or R, this is a great opportunity.

Note regarding the R implementation

Required library: hash (R doesn't have inbuilt hash or dictionary without imports.) You can use any one of below script files.

  • Standard version is the literal translation of the Perl code with same variable names to the maximum extent possible.
  • Improved version uses functions, more data frames and more R-like approach to reduce code running time (~30 % faster) and less lines of code. Variable names would vary from Perl. Output file would have comma(,) as delimiter between IDs.

Instructions to run:  Place the R file and HDT-data3.txt (input file) in root folder of R environment. Execute the '.R' file in R studio or using command line script:  > Rscript HDT_improved.R  R is known to be slow in text parsing. We can optimize further if all inputs are within double quotes or no quotes at all by using data frames.



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Implementa implement Technique Learning machine

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
沙发
kkwei 发表于 2017-2-25 09:16:22 |只看作者 |坛友微信交流群
niubility

使用道具

藤椅
huiyujuanjuan 发表于 2017-2-25 09:27:46 |只看作者 |坛友微信交流群

回帖奖励 +3 个论坛币

使用道具

板凳
fengyg 企业认证  发表于 2017-2-25 11:16:21 |只看作者 |坛友微信交流群

回帖奖励 +3 个论坛币

kankan

使用道具

报纸
albertwishedu 发表于 2017-2-25 13:55:54 |只看作者 |坛友微信交流群

使用道具

地板
albertwishedu 发表于 2017-2-25 13:56:38 |只看作者 |坛友微信交流群

使用道具

7
albertwishedu 发表于 2017-2-25 13:56:59 |只看作者 |坛友微信交流群

回帖奖励 +3 个论坛币

使用道具

8
albertwishedu 发表于 2017-2-25 13:57:34 |只看作者 |坛友微信交流群

使用道具

9
ekscheng 发表于 2017-2-25 14:43:38 |只看作者 |坛友微信交流群

使用道具

10
ekscheng 发表于 2017-2-25 14:44:19 |只看作者 |坛友微信交流群
Thanks for sharing

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-27 08:57