楼主: oliyiyi
1113 1

Top 10 Data Science Resources on Github [推广有奖]

版主

泰斗

1%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
237460 个
通用积分
31653.5208
学术水平
1454 点
热心指数
1573 点
信用等级
1364 点
经验
384146 点
帖子
9645
精华
66
在线时间
5504 小时
注册时间
2007-5-21
最后登录
2024-10-28

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

The top 10 data science projects on Github are chiefly composed of a number of tutorials and educational resources for learning and doing data science. Have a look at the resources others are using and learning from.

By Matthew Mayo, KDnuggets.

In our latest inspection of Github repositories, we focus on "data science" projects. Unlike other searches we have performed over the past several months, nearly all of the repositories which show up (listed by number of stars* in descending order) are resources forlearning data science, as opposed to tools for doing. As such, this is much less a software listing than it is a collection of tutorials and educational resources. There are, however, a few software surprises in here as well, such as a data science-oriented IDE and a great notebook-related project.

We include, however, the standard informational notification we have placed on our previous Github Top 10 lists: open source tools have been used by 73% of data scientists in the past 12 months, according to a recent KDnuggets survey (and accounting for the 12 months prior to the survey). While the following repositories focus mainly on learning resources, previous offerings have been software-heavy; also, open source learning materials are the new black, and amain source of learning for data scientists these days.


Image: Research Hubs

1. Data Science iPython Notebooks

Stars: 5169, Forks: 902

Donne Martin has put together a great (and, apparently, wildly popular) resource for those looking for iPython notebooks for tutorials. The repo describes itself best:

Continually updated data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

2. The Open Source Data Science Masters

Stars: 4338, Forks: 2624

This is the official repository holding the curriculum of the Data Science Masters, the brainchild of data scientist Clare Corthell, designed as an open source alternative to formal data science education. With that in mind, this repo is a collection of materials for pursuit of this alternative route to data science mastery.

The open-source curriculum for learning Data Science. Foundational in both theory and technologies, the OSDSM breaks down the core competencies necessary to making use of data.

3. Rodeo

Stars: 2540, Forks: 229

Rodeo is a data science IDE. Developed by yhat, Rodeo is currently in version 1.0 of development. Rodeo's philosophy builds on iPython notebooks:

We originally built Rodeo because we like the Jupyter Notebook for presentations and tutorials, but thought it was a bit clunky for daily work. We wanted a one-stop IDE for Python with a good text editor, a simple plot window and a terminal with autocomplete.

4. Data Science Blogs

Stars: 2307, Forks: 259

This is a simple, but extensive, list of data science blogs, listed in alphabetical order. You'll find all the big blogs in here (including KDnuggets, of course), but also many smaller, off-the-beaten-path selections as well. The repo appears to be updated often, with the most recent updates happening only hours prior to this writing.

5. Awesome Data Science

Stars: 2142, Forks: 529

This is another of the Awesome... "brand" of curated lists. Straight to the point:

An open source Data Science repository to learn and apply towards solving real world problems.

Like other Awesome lists around (what, exactly, makes these lists more "awesome" than others?), there are countless resources broken down into several categories.

6. Data Science Specialization

Stars: 1986, Forks: 20800

This is a collection of the resources for the Johns Hopkins Data Science Specialization on Coursera. A wildly popular course with names like Roger Peng, Jeff Leek, and Brian Caffo attached to it, it is responsible for teaching data science and R to thousands of learners. Get all of the resources used in all of the courses collected here.

7. Data Science Specialization Community Site

Stars: 1153, Forks: 2307

This is a community-curated content companion site for the Johns Hopkins Data Science Specialization on Coursera.

A couple students have created quality content around the subjects we discuss, and many of these materials are so good we feel that they should be shared with all of our students. This site is meant to serve as a central directory for community created content.

If you have a resource which would be useful to others in the program, a pull request can be submitted in order to have it included in the curated knowledge pages list.

8. Spark Notebook

Stars: 1087, Forks: 258

Andy Petrella forked scala-notebook and refactored it for massive dataset analysis with Apache Spark, and this is the result. From the repo:

The tool allows performing reproducible analysis with Scala, Apache Spark and more.

This is achieved through an interactive web-based editor that can combine Scala code, SQL queries, Markup or even JavaScript in a collaborative manner.

9. Learn Data Science

Stars: 993, Forks: 541

Nitin Borwankar has put together another compilation of resources for learning data science. It is a collection of iPython notebooks focusing on machine learning, specifically the topics of:

  • Linear Regression
  • Logistic Regression
  • Random Forests
  • K-Means Clustering

It appears to be a beginner's guide to fundamental concepts in machine learning, but a well-crafted one.

10. Data Science at the Command Line

Stars: 948, Forks: 260

This repository contains the virtual machine, data, scripts, and custom command-line tools used in the book Data Science at the Command Line.

Included is the Data Science Toolbox, a virtual environment for data science. Author Jeroen Janssens' brand of data science includes the interplay of Python, R, numerous packages, and command line utilities. If you have read the book, or reading these few lines has captured your interest, give the repo a look.

* As viewed 6:00 PM EST, March 21, 2016.



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Data Science Resources resource Science sources performed resources learning science several

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
沙发
hjtoh 发表于 2016-7-3 08:40:31 来自手机 |只看作者 |坛友微信交流群
oliyiyi 发表于 2016-7-3 08:01
The top 10 data science projects on Github are chiefly composed of a number of tutorials and educati ...
谢谢分享

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-11-5 18:51