楼主: 幻雪风扬
6404 14

[网帖精选] 数据科学完整学习路径Python版 [推广有奖]

  • 4关注
  • 5粉丝

已卖:20份资源

本科生

94%

还不是VIP/贵宾

-

威望
0
论坛币
4733 个
通用积分
2.5797
学术水平
48 点
热心指数
63 点
信用等级
38 点
经验
23400 点
帖子
135
精华
0
在线时间
80 小时
注册时间
2013-12-6
最后登录
2024-8-30

楼主
幻雪风扬 发表于 2015-1-16 11:19:44 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
从菜鸟到Kaggler——数据科学完整学习路径Python版。没有捷径,必须扎扎实实:学python基础、学正则、学NumPy, SciPy, Matplotlib还有Pandas、学一点数据可视化、学习Scikit-learn和机器学习、实践再实践、深度学习 http://t.cn/RZ92R5G




二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:python 数据科学 scikit-learn Matplotlib Python基础 python

已有 1 人评分论坛币 热心指数 收起 理由
xupengswordsman + 5 + 1 精彩帖子

总评分: 论坛币 + 5  热心指数 + 1   查看全部评分

沙发
hanszhu 发表于 2015-2-10 21:45:30
Step 0: Warming up

Before starting your journey, the first question to answer is:

Why use Python?

or

How would Python be useful?

Watch the first 30 minutes of this talk from Jeremy, Founder of DataRobot at PyCon 2014, Ukraine to get an idea of how useful Python could be.


Step 1: Setting up your machine

Now that you have made up your mind, it is time to set up your machine. The easiest way to proceed is to justdownload Anaconda from Continuum.io . It comes packaged with most of the things you will need ever. The major downside of taking this route is that you will need to wait for Continuum to update their packages, even when there might be an update available to the underlying libraries. If you are a starter, that should hardly matter.

If you face any challenges in installing, you can find more detailed instructions for various OS here


Step 2: Learn the basics of Python language

You should start by understanding the basics of the language, libraries and data structure. The python track fromCodecademy is one of the best places to start your journey. By end of this course, you should be comfortable writing small scripts on Python, but also understand classes and objects.

Specifically learn: Lists, Tuples, Dictionaries, List comprehensions, Dictionary comprehensions

Assignment: Solve the python tutorial questions on HackerRank. These should get your brain thinking on Python scripting

Alternate resources: If interactive coding is not your style of learning, you can also look at The Google Class for Python. It is a 2 day class series and also covers some of the parts discussed later.


Step 3: Learn Regular Expressions in Python

You will need to use them a lot for data cleansing, especially if you are working on text data. The best way tolearn Regular expressions is to go through the Google class and keep this cheat sheet handy.

Assignment: Do the baby names exercise

If you still need more practice, follow this tutorial for text cleaning. It will challenge you on various steps involved in data wrangling.

Step 4: Learn Scientific libraries in Python – NumPy, SciPy, Matplotlib and Pandas

This is where fun begins! Here is a brief introduction to various libraries. Let’s start practicing some common operations.

  • Practice the NumPy tutorial thoroughly, especially NumPy arrays. This will form a good foundation for things to come.
  • Next, look at the SciPy tutorials. Go through the introduction and the basics and do the remaining ones basis your needs.
  • If you guessed Matplotlib tutorials next, you are wrong! They are too comprehensive for our need here. Instead look at this ipython notebook till Line 68 (i.e. till animations)
  • Finally, let us look at Pandas. Pandas provide DataFrame functionality (like R) for Python. This is also where you should spend good time practicing. Pandas would become the most effective tool for all mid-size data analysis. Start with a short introduction, 10 minutes to pandas. Then move on to a more detailed tutorial on pandas.

You can also look at Exploratory Data Analysis with Pandas and Data munging with Pandas

Additional Resources:

  • If you need a book on Pandas and NumPy, “Python for Data Analysis by Wes McKinney”
  • There are a lot of tutorials as part of Pandas documentation. You can have a look at them here

Assignment: Solve this assignment from CS109 course from Harvard.


Step 5: Effective Data Visualization

Go through this lecture form CS109. You can ignore the initial 2 minutes, but what follows after that is awesome! Follow this lecture up with this assignment


Step 6: Learn Scikit-learn and Machine Learning

Now, we come to the meat of this entire process. Scikit-learn is the most useful library on python for machine learning. Here is a brief overview of the library. Go through lecture 10 to lecture 18 from CS109 course from Harvard. You will go through an overview of machine learning, Supervised learning algorithms like regressions, decision trees, ensemble modeling and non-supervised learning algorithms like clustering. Follow individual lectures with the assignments from those lectures.


Additional Resources:

Assignment: Try out this challenge on Kaggle


Step 7: Practice, practice and Practice

Congratulations, you made it!

You now have all what you need in technical skills. It is a matter of practice and what better place to practice than compete with fellow Data Scientists on Kaggle. Go, dive into one of the live competitions currently running onKaggle and give all what you have learnt a try!


Step 8: Deep Learning

Now that you have learnt most of machine learning techniques, it is time to give Deep Learning a shot. There is a good chance that you already know what is Deep Learning, but if you still need a brief intro, here it is.

I am myself new to deep learning, so please take these suggestions with a pinch of salt. The most comprehensive resource is deeplearning.net. You will find everything here – lectures, datasets, challenges, tutorials. You can also try the course from Geoff Hinton a try in a bid to understand the basics of Neural Networks.


P.S. In case you need to use Big Data libraries, give Pydoop and PyMongo a try. They are not included here as Big Data learning path is an entire topic in itself.


藤椅
愚人山 发表于 2015-9-14 13:48:08
thanks mate

板凳
epath 发表于 2015-9-16 12:19:47
全英文的!!!

报纸
benxiaohai415 发表于 2015-11-30 09:09:18
翻译翻译,读不懂

地板
techie01 发表于 2015-12-5 22:17:21
谢谢楼主,楼主好人。

7
liuhanhong 发表于 2016-1-3 23:41:38
谢谢楼主,不过怎么都是英文版的,看到英文的都头痛了

8
zhang_ch 发表于 2016-1-4 14:02:40
谢谢楼主~

9
adjfa 发表于 2016-1-4 22:19:08
good one, thanks a lot!

10
bjbluefish 发表于 2016-1-8 16:27:25
学习了

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2025-12-27 00:40