楼主: oliyiyi
232 1

The 3 Biggest Mistakes on Learning Data Science [推广有奖]

版主

大师

86%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
6
论坛币
628790 个
学术水平
1326 点
热心指数
1434 点
信用等级
1232 点
经验
326683 点
帖子
8793
精华
66
在线时间
4915 小时
注册时间
2007-5-21
最后登录
2019-5-19

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

oliyiyi 发表于 2019-5-6 23:55:20 |显示全部楼层

Hello! It’s me again. I’ve discussed parts of what I’m going to mention here in other articles, but now I want to give a few directions on what’s not data science and how not to learn it.

So let’s start with the basics.


What is Data Science?

[size=-1]
Updated by memes_and_science

[size=-1]


Data science not just knowing some programming languages, math, statistics and have “domain knowledge”.

The time has come. We’ve created a new field, or something like that. There’s a lot of things to say and study in this field. It doesn’t matter the name, maybe data science is just a temporary name for a bigger field, but the scientific study of data, getting insights from it and then be able to predict something is the present and future of the world.

I’ll focus on business related definitions and proposals for data science, maybe these can apply for the field as a whole, but the ideas in this article are about data science for business.

I’m going to propose three things:

  • Data science is a science
  • There are awful ways to learn data science
  • Using well created cheatsheets can help you do data science in a systematic way

Data Science is a science

[size=-1]
memes_and_science

[size=-1]


I know this maybe controversial for some people but stick with me. What I want to say here is that data science is of course linked to the business, but it is a science in the end, or in the process of becoming one.

I defined data science before as:

[…] The resolution to Business / Organizations problems through mathematics, programming and the scientific method that involves the creation of hypotheses, experiments and tests through the analysis of data and the generation of predictive models. It is responsible for transforming these problems into well-posed questions that can also respond to the initial hypothesis in a creativeway. It must also include the effective communication of the results obtained and how the solution adds value to the Business / Organization.

I’m stating here a description and definition of data science as a science. I think it could be very useful that data science can be described as a science because if that’s the case, every project this field should be at least:

  • Reproducible: Necessary for making easy to test other’s work and analysis.
  • Fallible: Data science and science doesn’t look for the truth, they look for knowledge, so every project can be substituted or improved in the future, no solution is the ultimate solution.
  • Collaborative: Data scientists don’t exists alone, they need a team, this team will make things possible for developing intelligent solutions. Collaboration is a big part of science, and data science should not be an exception.
  • Creative: Most of what data scientists do is new research, new approaches or takes on different solutions, so their environment should be very creative and easy to work. Creativity is crucial in science, is the only way we can find solutions to hard and complex problems.
  • Compliant to regulations: Right now there are a lot of regulations in science, not that much in data science, but there will be more in the future. Is important that the projects we are building are aware of these different types of regulations so we can create a clean and acceptable solution for the problems.

If we don’t follow those basic principles it would be very hard to conduct a proper data science practice. Data science should be implemented in a way that enables decision making to follow a systematic process. But more on that later.


How NOT to Learn Data Science. The big 3.

[size=-1]
memes_and_science

[size=-1]


If you are here it’s probable that you are learning data science right now, or you took some MOOCs or even classes on the field. I’m not going to talk bad here about platforms or bad courses, I think we something can learn even in the worst courses.


1. Seeing and seeing without practicing

[size=-1]
memes_and_science

[size=-1]


If you are taking a class on anything related to data science, like math, statistics, programming or something like that, and you are there just listening to the class.

Well you are wasting you time. data science needs practice. Everything you learn, even though if the professor doesn’t tell you, practice and try it. This is fundamental to really comprehend things and when you are working in the field you will be doing a lot of different practical stuff.

A good knowledge on statistics, math and python won’t make you a successful data scientist. You need more, you need to master your craft. Be able to use these tools to solve business problems. So if you are learning something new, and you want to understand it for real, find a scenario where you can apply it or play with it.


2. Creating models in a crazy way

[size=-1]
memes_and_science

[size=-1]


We get the data from the “outside world” and our body and brain analyze the raw data we got, and then we “interpret” things.

[size=-1]
https://towardsdatascience.com/going-beyond-with-agile-data-science-fcff5aaa9f0c

[size=-1]


What is this “interpretation”? Just what we’ve learned about how to react, think, feel and understand from the information we are getting. When we are understanding we are decoding the parts that forms this complex thing, and transforming the raw data we got in the beginning into something useful and simple.

We do this by modeling. This is the process of understanding the “reality”, the world around us, but creating a higher level prototype that will describe the things we are seeing, hearing and feeling, but it’s a representative thing, not the “actual” or “real” thing.

So think before you do this:

model_i_created_i_5_seconds.fit(X,y)

Modeling is something very important in the machine learning and data science space, but they must have a purpose. And you have to understand them before using them. Now what they assume from the data before training it, understand the different metrics they use to learn, how to evaluate them and more.

For that I can tell you, there’s no harm in reading the documentations of libraries like Scikit-Learn:

A tutorial on statistical-learning for scientific data processing - scikit-learn 0.20.3…
Machine learning is a technique with a growing importance, as the size of the datasets experimental sciences are facing…scikit-learn.org

Apache Spark:

MLlib: Main Guide - Spark 2.4.1 Documentation
Due to licensing issues with runtime proprietary binaries, we do not include netlib-java's native proxies by default…spark.apache.org

Tensorflow:

TensorFlow Guide | TensorFlow Core | TensorFlow
sessions, which are TensorFlow's mechanism for running dataflow graphs across one or more local or remote devices. If…www.tensorflow.org

And more. They’ll lead you to articles, papers and more blog posts and most of them will even have practical examples on how to do modeling in machine learning and statistical learning.

Also there are great videos in the field that will take you from zero to hero in minutes like the ones from my friend Brandon Rohrer:





3. “ Yeah, I’m a lone wolf. I can study and do everything by myself”

[size=-1]
memes_and_science

[size=-1]


Remember that one of the characteristics I proposed before is that data science is a collaborative field. Well studying it too!

I’m not saying here that you need to start a course with your BFF but make use of what the online platforms give us today. We have forums, chats, discussion boards and more where you can meet people learning the same things you are learning. It will be much easier to learn with more people, and don’t be afraid on asking questions.

Ask as many questions as yo need to understand something, and don’t rest until you do it. Don’t harass people either, but if you ask politely most people will be more than happy to help you.

Here are great resources (apart from the ones the MOOCs and courses have on their inside) to find people learning the same things as you:

Stack Overflow - Where Developers Learn, Share, & Build Careers
Stack Overflow is the largest, most trusted online community for developers to learn, share​ ​their programming…stackoverflow.com

Quora
Quora is a place to gain and share knowledge. It's a platform to ask questions and connect with people who…www.quora.com

Deep Cognition Community
An active community working together to drive growth and innovation through AI.community.deepcognition.ai

r/datascience
r/datascience: A place for data science practitioners and professionals to discuss and debate data science career…www.reddit.com




缺少币币的网友请访问有奖回帖集合
http://bbs.pinggu.org/thread-3990750-1-1.html
stata SPSS
hifinecon 发表于 2019-5-7 07:12:15 |显示全部楼层
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 我要注册

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2019-5-20 03:08