楼主: oliyiyi
151 5

The Third Wave Data Scientist [推广有奖]

版主

大师

86%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
6
论坛币
628790 个
学术水平
1326 点
热心指数
1434 点
信用等级
1232 点
经验
326683 点
帖子
8793
精华
66
在线时间
4915 小时
注册时间
2007-5-21
最后登录
2019-5-19

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

oliyiyi 发表于 2019-5-6 23:53:05 |显示全部楼层

By Dominik Haitz, IONOS.

Introduction

Drew Conway’s visualization of the data science skill set is an often cited classic. Different opinions and the versatility of the role have spawned numerous variations:

Various data science Venn diagrams. Image courtesy of Google Images. Source: https://sinews.siam.org/Details-Page/a-timely-focus-on-data-science

There seems to be no consensus on the data science skill set. Additionally, as the field evolves, shortcomings become obvious and new challenges arise. How can we describe this evolution?

The first wave of data scientists happendbefore data became big and before data science was actually a thing (pre-2010s): Statisticians and analysts who had always been around, doing a lot of what modern data scientists are doing, but accompanied by less hype.

Second wave: Large-scale data collection created a demand for smart minds who can work magic and turn all this big data into big money. Companies were still figuring out what kind of people to employ and often turned to science graduates. While the second wave data scientists did a lot right, their carefully crafted models often ended up as PoCs and failed to bring about actual change.

Now, at the end of the 2010s, amidst the hype around deep learning and AI, enter the third wave of data scientists: Experimenting and innovating, efficiently seeking out business value und bridging the deployment gap to create great data products. What skills are required here?

The skill portfolio of the third wave data scientist.

1. Business Mindset

The business mindset is the centerpiece of the data science skill set, as it sets goals and applies the other skills to reach them. Patrick McKenzie states in this blog post:

Engineers are hired to create business value, not to program things: Businesses do things for irrational and political reasons all the time […], but in the main they converge on doing things which increase revenue or reduce costs.

Likewise, data scientists are hired to create business value, not just to build models. Ask yourself: How will the outcome of my work influence company decisions? What do I have to do to maximize this effect? With this entrepreneurial spirit, the third wave data scientist does not only produce actionable insights, but also seeks that they bring about real change.

Look where the money flows in your organization — the divisions with the largest cost or revenue will likely offer the highest financial leverage. However, business value is a fuzzy concept: It goes beyond cost and revenue of the current fiscal year. Experimenting and creating an innovative data culture will increase a company’s long-term competitiveness.

Prioritizing your workand knowing when to stop is the key to efficiency. Think of diminishing returns: Is it worth spending weeks to tweak a model for another 0.2% of precision? Quite often, good enough is the real perfect.

Domain expertise, which makes up a third of Conway’s skill set, is by no means to be neglected — however, you’ll almost everywhere have to learn it on the job. This includes knowledge about your industry as well as all the company processes, naming schemes and peculiarities. This knowledge does not only set the frame conditions for your work, but it is often indispensable to understand and interpret your data.

Keep it simple, stupid
Mat Velloso@matvelloso





Half of the time when companies say they need "AI" what they really need is a SELECT clause with GROUP BY.

You're welcome.



6,661

2:53 AM - May 31, 2018
Twitter Ads info and privacy





2,667 people are talking about this











Look out for the low hanging fruit and quick wins. A simple SQL query on existing data warehouse might yield valuable insights unbeknownst to product managers or executives. Don’t fall into the trap of doing “buzzword-driven data science”, focusing on state-of-the-art deep learning where a beautifully simple regression model would be sufficient — and much less work to build, implement and maintain. Know the complicated things, but do not overcomplicate things.

2. Software Engineering Craftsmanship

The notion of (second wave) data scientists needing only “hacking skills” instead of proper software engineering has been repeatedly critizised. Lack of readability, modularity or versioning hinders collaboration, reproducibility and productionizing.

Instead, learn the craft from proper software engineers. Test your code and use version control. Follow an established coding style (e.g. PEP8) and learn how to use an IDE (e.g. PyCharm). Try pair programming. Modularize and document your code, use meaningful variable names and refactor, refactor, refactor.

Bridge the deployment gap for agile prototyping of data products: Learn to use tools for logging and monitoring. Know how to build a REST API (e.g. using Flask) to provide your results to others. Learn how to ship your work inside a Docker container or deploy it to a platform like Heroku. Instead of letting your models rot on your laptop, wrap them into data-driven services that fit snugly into your company’s IT landscape.

3. Statistics and Algorithms Toolbox

Data scientists have to thoroughly understand the basic concepts in statistics and particularly in machine learning (A STEM university education is probably the best way to acquire this foundation). There are tons of resources on what’s important, so I’m not gonna delve further into this here. You will often have to explain algorithms or concepts like statistical uncertainty to your clients, or red-flag an insight because of a confusion between correlation and causation.



缺少币币的网友请访问有奖回帖集合
http://bbs.pinggu.org/thread-3990750-1-1.html
stata SPSS
oliyiyi 发表于 2019-5-6 23:54:20 |显示全部楼层
4. Soft Skills

As people skills are as important to productivity as technical skills, the third wave data scientist makes conscious efforts to improve in these areas.

Work well with others

Consult your peers — most people are happy to help or give advice. Treat others on par: You might have a fancy degree and an understanding of sophisticated algorithms, but others have experience that you don’t (This sounds like basic social advice, but who hasn’t met an arrogant IT professional?).

Understand your client

Ask the right follow-up questions. If a client or your boss wants you to calculate some key figures or create some chart, ask “Why? What for? What do you want to achieve? What actions will you take, depending on the result?” to better understand the core of the problem. Then figure out how to get there together — is there another, better way to reach that goal than the one proposed?

Navigate company politics

Network, not because you expect others to benefit you in your career, but because you are an approachable person. Connect with people with similar work topics. If there are no platforms for this in your company, create them.Identify key stakeholders and figure out how to help them with their problems. Invite others early on and make them part of the change process. Keep in mind: A company is not a rational entity, but an assembly of often irrational human beings.

Communicate your results

Ramp up your visualization and presentation skills. Communicate from a client perspective: How can I precisely answer their question? Learn to communicate at different levels and sum up the details of your work. People are easily intrigued by fancy multidimensional plots, but often a simple bar chart conveys a message more efficiently. Showcase your results: When people see what you’re doing, and see that you’re doing great work, they will trust you.

Evaluate yourself

Communicate your goals and problems and actively seek out advice. Find role models inside and outside the data science community and learn from them.


缺少币币的网友请访问有奖回帖集合
http://bbs.pinggu.org/thread-3990750-1-1.html
回复

使用道具 举报

sunyiping 发表于 2019-5-7 00:21:55 |显示全部楼层
学习学习!
回复

使用道具 举报

redflame 发表于 2019-5-7 00:30:55 |显示全部楼层
书呢?
回复

使用道具 举报

oliyiyi 发表于 2019-5-7 06:57:43 |显示全部楼层
redflame 发表于 2019-5-7 00:30
书呢?
这是一篇分享文章
回复

使用道具 举报

redflame 发表于 2019-5-7 09:04:13 |显示全部楼层
oliyiyi 发表于 2019-5-7 06:57
这是一篇分享文章
哦,明白了。很好的一篇总结文章。谢谢~
已有 1 人评分论坛币 收起 理由
oliyiyi + 5 精彩帖子

总评分: 论坛币 + 5   查看全部评分

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 我要注册

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2019-5-20 03:09