楼主: oliyiyi
5079 72

Eight Things an R user Will Find Frustrating When Trying to Learn Python [推广有奖]

回帖奖励 36 个论坛币 回复本帖可获得 2 个论坛币奖励! 每人限 3 次(中奖概率 10%)

版主

泰斗

0%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
271951 个
通用积分
31269.3519
学术水平
1435 点
热心指数
1554 点
信用等级
1345 点
经验
383775 点
帖子
9598
精华
66
在线时间
5468 小时
注册时间
2007-5-21
最后登录
2024-4-18

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

本帖隐藏的内容

When speaking with clients and other R users at events such as LondonR and EARL I’ve noticed an increasing trend in people looking to learn some python as the next step in their data science journey.  At Mango most of our consultants are pretty happy using either language but as an R user of 12 or so years I’ve only ever dabbled with python.  Recently however I found myself having to learn quickly and so I thought I’d share some of my observations.

Before you stop reading I should say that I am fully aware that there are many blog posts covering the high level pros and cons of each language.  For this post I thought I’d get down to the nitty gritty.  What does an R user really experience when trying to pick up python?  In particular what does an R user that comes from a statistics background experience?

Personally I found eight (I wanted 10 but python is too good) and here they are:

  • Lack of Hadley. So there is a Wes but there is a lot of duplication in functionality between packages.  To start with you import statistics and find the mean function only to find it has been re-written for pandas.  Later you find that everyone has their own idea on the best way to implement cross-validation.  All very confusing when you start out.  This brings me on to:
  • Plotting. I had heard a lot of good things about matplotlib and seaborn but ggplot2 is streets ahead (IMHO).  I would even go as far as to say that ggplot2 has a shallower learning curve.
  • IDEs. Hats off to RStudio for changing the R world when it comes to IDEs.  I remember a time before RStudio when the R GUI, StatET and Tinn-R were the norm.  How things have improved.  Sadly, python is not quite there yet.  As an RStudio user I opted for Spyder.  It’s OK but the script editor needs some work.  The integration in Jupyter Notebook seems much better when I chat with colleagues but I’m just not a big fan of notebooks.
  • Namespaces. I’ve lost count of the number of times I’ve told trainees on an intro to R course that masking very rarely trips you up as a user (unless you’re building packages it really doesn’t).  Let’s just say that in python you have to be careful.  Bring too much in and you’ll overwrite your own objects and cause chaos.  This means you bring in things as and when you need them.  Having to explicitly import OS utilities in order to change the working directory and so on is frustrating.  That said, python’s capabilities are a little better than R in this area.
  • Object Orientation. I’ve grown to love R’s flexible S3 classes with lines like:

> x <- 5> class(x) <- "just_made_this_up"> x[1] 5attr(,"class")[1] "just_made_this_up"


In python I am never quite sure what methods exist for an object and when to just go functional.  You also really have to know about classes to work with python effectively whereas a casual R user can get by without even knowing that R has a class system.

  • Reliance on R. On my recent project I was using the best of the statistical capabilities in python.  First off I should say that it’s basically all there (except for stepwise GLMs for some bizarre reason).  However, although I’ve always known that most of the statistical modelling capabilities in python have been ported from R the documentation is pretty lazy and most of it just points you at the R documentation.  The example datasets are even the same!  Speaking of the documentation.
  • Help documentation. I can only speak for the more popular packages in the two languages but the R documentation is much more plentiful and generally contains a lot more examples.
  • Zero-based arrays. I couldn’t write a list without this coming up.  I do love it when smug coders that have developed in other languages tell me that R is the exception here by indexing from 1.  However, as a human being I count from 1 and this will always make more sense to me.  Ending at n-1 is also confusing.  Compare:

# Rx = seq(2,10, by = 2)x[1:3] # Select first 3 elements[1] 2 4 6



# Pythonx = list(range(2,11, 2))x[0:3] # Select first 3 elements[2, 4, 6]


What I was impressed by was how extensively the statistical capabilities in R have been ported to python (I wasn’t expecting the mixed modelling or survival analysis capabilities to be anything like that in R for example).  However, as an existing R user there really is no point in switching to python for statistics.  The only benefit would be if you were using python for, say, extensive web-scraping and you wanted to be consistent.  If that’s your reason though then let me point you towards Chris Musselle’s blog post, “Integrating Python and R Part II – Executing R from Python and Vice Versa”.  And don’t forget that you can also just use rvest.

So my advice would be if you’re going to try to learn python, don’t learn it with the intention of using it to build models.  Learn it because it’s a more flexible all-round programming language and you have some heavy lifting to do.  Just find something that’s hard to do in R and try using python for that.  Otherwise you’ll end up like me, writing a whingy blog post!

Bio: Andy Nicholls is Head of Data Science at Mango Solutions, as well as a well-rounded consultant and statistician.



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:trying RATING Things python Learn

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
沙发
evergreen5893 发表于 2017-2-25 08:56:21 |只看作者 |坛友微信交流群
顶一下
已有 1 人评分论坛币 收起 理由
oliyiyi + 5 精彩帖子

总评分: 论坛币 + 5   查看全部评分

使用道具

藤椅
h2h2 发表于 2017-2-25 10:12:58 |只看作者 |坛友微信交流群
谢谢分享
已有 1 人评分论坛币 收起 理由
oliyiyi + 5 精彩帖子

总评分: 论坛币 + 5   查看全部评分

使用道具

板凳
huiyujuanjuan 发表于 2017-2-25 11:03:23 |只看作者 |坛友微信交流群
thanks for sharing.
已有 1 人评分论坛币 收起 理由
oliyiyi + 5 精彩帖子

总评分: 论坛币 + 5   查看全部评分

使用道具

报纸
fengyg 企业认证  发表于 2017-2-25 11:04:41 |只看作者 |坛友微信交流群
kankan
已有 1 人评分论坛币 收起 理由
oliyiyi + 5 精彩帖子

总评分: 论坛币 + 5   查看全部评分

使用道具

地板
albertwishedu 发表于 2017-2-25 13:55:10 |只看作者 |坛友微信交流群
已有 1 人评分经验 收起 理由
oliyiyi + 5 精彩帖子

总评分: 经验 + 5   查看全部评分

使用道具

7
albertwishedu 发表于 2017-2-25 13:55:28 |只看作者 |坛友微信交流群
已有 1 人评分经验 收起 理由
oliyiyi + 5 精彩帖子

总评分: 经验 + 5   查看全部评分

使用道具

8
ekscheng 发表于 2017-2-25 14:45:14 |只看作者 |坛友微信交流群
已有 1 人评分经验 收起 理由
oliyiyi + 5 精彩帖子

总评分: 经验 + 5   查看全部评分

使用道具

9
ekscheng 发表于 2017-2-25 14:46:25 |只看作者 |坛友微信交流群
Thanks for sharing
已有 1 人评分论坛币 收起 理由
oliyiyi + 5 精彩帖子

总评分: 论坛币 + 5   查看全部评分

使用道具

10
ekscheng 发表于 2017-2-25 14:46:41 |只看作者 |坛友微信交流群
Worth trying
已有 1 人评分论坛币 收起 理由
oliyiyi + 5 精彩帖子

总评分: 论坛币 + 5   查看全部评分

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-20 07:13