楼主: oliyiyi
6249 102

Moving from R to Python: The Libraries You Need to Know   [推广有奖]

版主

泰斗

0%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
271951 个
通用积分
31269.3519
学术水平
1435 点
热心指数
1554 点
信用等级
1345 点
经验
383775 点
帖子
9598
精华
66
在线时间
5468 小时
注册时间
2007-5-21
最后登录
2024-4-18

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

本帖隐藏的内容

Are you considering making a move from R to Python? Here are the libraries you need to know, how they stack up to their R contemporaries, and why you should learn them.




This post originally appeared on the Yhat blog. Yhat is a Brooklyn based company whose goal is to make data science applicable for developers, data scientists, and businesses alike. Yhat provides a software platform for deploying and managing predictive algorithms as REST APIs, while eliminating the painful engineering obstacles associated with production environments like testing, versioning, scaling and security.


Why the switch?


One of my favorite parts of machine learning in Python is that it got the benefit of observing the R community and then emulating the best parts of it. I'm a big believer that a language is only as helpful as its libraries. So in this post I'm going to go over some critical packages that I use almost every time I work in R, and their counterpart(s) in Python.

glm, knn, randomForest, e1071 -> scikit-learn


One thing that is a blessing and a curse in R is that the machine learning algorithms are generally segmented by package. Meaning instead of having a single (or set) of ML libraries that each implement some common algorithms, each algorithm gets its own package. It's sort of nice because you can find very esoteric, cutting edge implementations of algorithms, but it can be a pain for day-to-day use where you might be switching between algorithms. This pain is something that Python's scikit-learnsolves really well. scikit-learn provides a common set of ML algorithms all under the same API. It makes switching between LogisticRegression and GradientBoostingMachines a one-liner.

reshape/reshape2, plyr/dplyr -> pandas


This was actually the subject of one of our first posts. pandas took the best parts of data munging in R and turned it into a Python package. This includes its own implementation of a data frame along with ways to modify and restructure it. Basically it took the best parts of reshape/reshape2 and plyr/dplyrand Pythonified it!

ggplot2 -> ggplot + seaborn + bokeh


One thing that R still does better than Python is plotting. Hands down, R is better in just about every facet. Even so, Python plotting has matured though it's a fractured community. If you like the ggplot-style syntax, then look no further than Yhat's own ggplot. If you're after super statistical and technical plots then reach for seaborn. And if you're in the market for some super slick, great looking interactive plots then try out bokeh.

stringr -> nothing


String manipulation in "base R" is nearly as unintuitive as it is silly. Any time I'm working with strings in R I do 2 things (in order):

  • briefly nod in appreciation to New Zealand for producing Hadley Wickham
  • import stringr


Much obliged, New Zealand

stringr is an absolute lifesaver. It's well written, performant (at least I think so), and easy to install (don't overlook this last item. if people can't install your software, there's no sense in making it).

Ok so stringr appreciation monologue complete. So the good news for you is that Python is so great for string manipulation, you don't really need a string library! It has a fantastic built-in regular expressions library, re, and a built-in string meta-libarary appropriately called string. So lucky for you, Python comes with all string-related batteries included!

RStudio -> Rodeo


To many users, RStudio is synonymous with R. And why not? It's a great IDE for data analysis in R. Historically speaking, there haven't been a lot of comparable options for Python. Of course this is no longer the case. We released the very first version of Rodeo just over a year ago and released the 2.0 for Windows, OSX, and Linux about a month ago.

"Ever since we've used RStudio, we've been looking for an IDE like it for Python. We went through IDEs such as Sublime Text and Spyder, none of which suited our likings. We searched and found Rodeo and couldn't have been more pleased with the IDE." -Stephen Hsu, University of California, Berkeley


[size=+1]Download Rodeo!

Knitr -> Jupyter


Knitr is a great way to create reproducible and highly visual analysis using R. It's been a staple in RStudio for a while now. In the Python world, the most analagous package is Jupyter. Jupyter notebooks provide an interactive environment for programming in Python (and other languages) that focuses on reproducibility and visualization--it even has a plugin for R!

sqldf -> pandasql


sqldf is a great way for SQL users to comfortably manipulate data in R. I myself used it when I first started learning R. Way back when, Yhat actually built a similar package for Python called pandasql. Same concept: write SQL queries against your data frames, get data frames back! Fast-forward 3 years and pandasql has over 256 stars on GitHub :). Not bad for a library with only 358 lines of code!





二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:libraries Moving python Aries Need businesses software managing provides painful

已有 3 人评分经验 学术水平 热心指数 信用等级 收起 理由
kongqingbao280 + 20 精彩帖子
日新少年 + 2 + 2 + 2 精彩帖子
guo.bailing + 100 精彩帖子

总评分: 经验 + 120  学术水平 + 2  热心指数 + 2  信用等级 + 2   查看全部评分

本帖被以下文库推荐

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
沙发
lzguo568 在职认证  发表于 2017-2-25 09:19:31 |只看作者 |坛友微信交流群
good        book

使用道具

藤椅
huiyujuanjuan 发表于 2017-2-25 09:27:16 |只看作者 |坛友微信交流群

使用道具

板凳
h2h2 发表于 2017-2-25 10:12:08 |只看作者 |坛友微信交流群
谢谢分享

使用道具

报纸
fengyg 企业认证  发表于 2017-2-25 11:05:09 |只看作者 |坛友微信交流群
kankan

使用道具

地板
yangke74 在职认证  发表于 2017-2-25 12:19:43 |只看作者 |坛友微信交流群

回帖奖励 +2

好书,支持一下

使用道具

7
albertwishedu 发表于 2017-2-25 13:48:14 |只看作者 |坛友微信交流群

使用道具

8
albertwishedu 发表于 2017-2-25 13:48:31 |只看作者 |坛友微信交流群

使用道具

9
albertwishedu 发表于 2017-2-25 13:48:51 |只看作者 |坛友微信交流群

使用道具

10
albertwishedu 发表于 2017-2-25 13:49:07 |只看作者 |坛友微信交流群

回帖奖励 +2

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-24 03:20