楼主: oliyiyi
1612 2

R or Python? Consider learning both [推广有奖]

版主

泰斗

0%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
271996 个
通用积分
31269.4471
学术水平
1435 点
热心指数
1554 点
信用等级
1345 点
经验
383775 点
帖子
9598
精华
66
在线时间
5470 小时
注册时间
2007-5-21
最后登录
2024-4-30

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

The key to become a data science professional is in understanding the underlying data science concepts and work towards expanding your programming toolbox as much as you can. Hence, one should understand when to use Python and when to pick R, rather mastering just one language.


By Martijn Theuwissen, DataCamp.
rpy2 and rPython

The python package rpy2 allows one to essentially call R from within Python. So when you are working in Python, but there is an R package that you like for certain types of analysis, you can simply use rpy2 to bridge R and Python. If you are interested in the details of how this is done you can check out a quick guidehere. For R there is a similar package: rPython. rPython allows you to run Python code, make function calls, assign and retrieve variables, etc. from R.

It should be noted though that such strategies can harm the readability of code making it more difficult to communicate your code to others. Nonetheless if you are able to annotate and document you work well, bothrpy2 and rPython can bridge the R and Python universes in your work environment.

Microsoft Azur

Broadly Speaking Microsoft Azur is a collection of cloud based services revolving around data management tasks. One very interesting service related to our topic is the Microsoft Azure Machine Learning Studio. Microsoft Azure ML essentially makes it easier to manage machine learning projects with a graphical interface, and presided algorithms. However the ML Studio also support customer code. That code can be written in either R or Python and easily integrated into a project. So ML Studio is a space where both R and Python code can be implemented for a project, thereby mitigating the limitation of using only one or the other.

Dataiku

Dataiku Data Science Studio (DSS) is a product developed by Dataiku that simplifies a lot of data related operations for companies trying to leverage on their data. DSS has a projects feature that allows you to put Python and R code into one data analysis project. This feature creates an environment where someone who knows how to take advantage of both Python and R can implement them in one workflow to produce output that supports business activities, or serves clients. A nice example to look at isSpacial Data Analyticswhere packages from both R and Python can be integrated into the analysis to generate maps, and analyze geographical data.

Where To Begin

In the previous sections we discussed why it’s relevant to learn both R and Python, and how new tools and technologies make it more easy than ever to integrate both languages. To end, we want to address the question of where to begin.

Do you start with R or Python? If you have been working with R for a while, keep focusing on R but make sure to also get some Python skills in your toolbox. Similarly if Python has been your tool of choice thus far, carry on with Python but learn the basics of R as well. However if you are a complete novice, and even if you know a bit of one of the languages, you need to begin by asking yourself a couple of important questions.

What is your (academic) background?

Generally speaking if your background is in something other than CS or Engineering, R is more appropriate for you. So if you are interested in data science and your background is in social science like psychology and political science, or natural sciences like geography or biology, R is likely to be the way for you. However if you are a CS person, or an engineer who works a lot with computers you might prefer Python because it is a general purpose language. This by no means a universal guideline, but if you are well versed in programming you might begin thinking of R as a limited programming language. The truth is, R was designed more like a tool with programming capabilities, rather than a programming language. Regarding R as an open source alternative to STATA or SAS would be more appropriate here.

What are your needs?

if you mostly find yourself in an academic setting and you are in need of a tool for data analysis R is the way to go. However, as a professional Python would be a more likely contender for you simply because Python is more widely applied in the industry, though R is beginning to gain traction as well. Keep in mind that Python is a general purpose language that is also widely used in CS, Engineering, and other disciplines, often as a compliment or alternative to other programming languages or commercial software like Matlab.

The best way to go about answering this question for yourself is to look around you and see what people are using. For example if you are in some data science related course ask your Professor or instructor about what they prefer for personal use, or what they think is more often used in their department or discipline.

What is your life plan?

Perhaps the most vital question is: where do you hope to end up in the next 5 or 10 years? We already established that Data Science is interdisciplinary, but Data Science also lives between worlds of academia and business.

You might have already guessed that if you aspire to be an academic beginning with R might be wiser, and if you want to work in the industry you should begin with Python. By the time you get there, you will probably already know both anyway because of the intersection of R and Python users that we mentioned earlier.

If you plan to be in the academia you will most likely won’t have to develop front-end programs, manage databases, or things of that sort. You will be doing research in which case the main factor will be your preference for a particular language. Still, given that when it comes to Data Science it is becoming a standard for Statisticians, Computer Scientists, Engineers, Health researchers and others to collaborate. So your personal preferences might not be as big of a factor.

In the business world data science teams often use R internally. However when it comes to dealing with Big Data and data product development the use of other technologies and programming languages such as Python is inevitable.

Taking the first step with DataCamp

DataCamp.com is your best resource where you can learn the fundamentals of bothPython andR for free, all within your browser. The courses consist of insightful tutorial videos coupled with interactive exercises so you will be learning by doing. Such approach to learning is the most effective way to take your first steps, so go ahead a begin your path towards becoming a data science!



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Learning earning python Learn side learning

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
沙发
oliyiyi 发表于 2016-7-5 17:55:30 |只看作者 |坛友微信交流群
Learn Data Science, not Programming

With the outbreak of the Data Science revolution a “war” between R enthusiasts and Python fanatics emerged. As a result Python and R have been compared and contrasted a thousand times with detailed listings of their respective advantages and weaknesses (e.g. see ourinfographic for a refresher).

All this “warfare” led to the misconception that as a data science learner & enthusiast you should relentlessly focus on mastering either R or Python. This is bad advice. Namely, the actual key to become a data science professional is in understanding the underlying data science concepts and work towards expanding your programming toolbox as much as you can. In other words you should aim to learn the fundamentals of both R (see Introduction to R) and Python (see Introduction to Python for Data Science), one after the other.

So while it is certainly true that it is important to know the differences between R and Python, today it is more relevant to understand how you can leverage the knowledge of both based on your understanding of fundamental data science concepts. In this post we hope to explain to you why you should learn both, and give you some ideas about how to begin.

R vs Python, different brushes

Why are you choosing between R and Python in the first place?

Most likely you are in need of a tool that will allow you to perform data analysis, do statistical computations, and in general be a data science practitioner. So knowing R or Python is just one component of a bigger whole, which is comprised of knowledge from disciplines such as statistics, computer science, engineering, mathematics, and even graphics design. There is a reason why most data science curricular begin with a computing tool, but never end with them.

You should think of R and Python as two different brushes that will allow you to better express yourself in data science projects, and take advantage of their individual unique features. Surely the brushes have different grip and texture, but they are also very similar and will allow you to do so much more.

Do not choose between R & Python, learn both

In general, you shouldn’t be choosing between R and Python, but instead should be working towards having both in your toolbox. Investing your time into acquiring working knowledge of the two languages is worthwhile and practical for multiple reasons.

It strengthens your data science communication skills

Both R and Python have strong online communities such asR-bloggers andpython.org dedicated to the respective languages. Looking at these sites you can get the impression that R and Python communities are completely disjoint. Unnecessary to state that is not the case.

In the real world of data science, Python and R users intersect a lot. So whichever industry or discipline you are interested in you are likely to run into projects done in both languages. To appreciate it all you need to have at least a basic understanding of both R and Python. Furthermore, by mastering both, you have the advantage and versatility of presenting and communicating effectively regardless of whether your audience is more comfortable with R or Python. So if you strive to become a data scientist, you will eventually need to be fairly familiar with both languages, and most likely a whole lot more.

It boosts your data science career

Knowing both R and Python will open doors for more job opportunities. Some companies, or departments within companies might prefer Python, while other like to work with R. Imagine that you are a perfect fit for the job, except that you know R while the company requires you to know Python. Wouldn’t that suck? Generally professionals from the industry encourage entrants to acquire as many tools and skills as they can. Most of the time you won’t be expected to be a complete master of R or Python, but displaying your commitment and passion by having learned at least some of both will only give you bonus points.

It is not that hard

You can think of Python and R as Spanish and Italian; they are both very different and very similar at the same time. They have a different syntax and have their own (technical) advantages, but at the same time they become very similar when appropriate Python packages are used (numpy, pandas, …). For example:

Suppose you want to load csv files. In R you have a couple of options, one of which is read_csv(…). In Python you can use a function from the Pandas library with the code pd.read_csv(…). Spot the difference!

Also, both Python and R are what is considered «scripting languages» which allows you to write snippets of executable code without having to use a compiler like when using Java for example. Next, they both have libraries and packages that you load into your environment to add functionality and do the tasks you need to complete. In addition, when working with both you will experience that your workflow for both languages is very similar, as are the documentations and communities surrounding them.

Where the R and Python Worlds Cross

In the past, one could argue that although R and Python are two very useful tools you could learn, it’s not true that one can paint on the same canvases with them. Today, thanks to new tools and technologies, that argument is becoming more and more invalid.

We more and more see that the R and Python universes are starting to overlap, thereby mitigating the need to choose between the two languages. Lets look at some examples of technologies and tools that allow to leverage the knowledge of both languages and thus intersecting the borders between the R and Python worlds.

Jupyter Notebooks

Let’s begin with the Jupyter project. TheJupyter Notebook is essentially a tool that allows you to write and share executable code in a variety of programming languages. The name «Ju-Pyt-er» is derived from Julia, Python, and R which immediately tells you that these three languages are the focus, though today these online notebooks support something like 40 different languages.

When working on a project in Jupyter, you can document both Python and R in the same format and share these notebooks with your colleagues, clients, students, or whoever. Jupyter is not an IDE and doesn’t attempt to replace Rstudio or Rodeo for Python. What Jupyter does is it gives you a universal space where you can display your work in either language, and hence organize your work more efficiently when implementing both R and Python for a project.

If you are interested about how to use Jupyter with R read these posts fromContinuum Analytics  andRevolution Analytics to get started, or see anexample of what you can do with them. There is also a nice guide fromquant-econ.net that you might find useful.


缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html

使用道具

藤椅
maxiaoan 在职认证  发表于 2016-7-6 14:04:53 |只看作者 |坛友微信交流群
谢谢了,读起来有点吃力

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-5-4 15:20