人大经济论坛 › 论坛 › 计量经济学与统计论坛五区 › 计量经济学与统计软件 › LATEX论坛 › The Third Wave Data Scientist

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: oliyiyi

1032 6

The Third Wave Data Scientist [推广有奖]

1关注
184
粉丝

版主

泰斗

还不是VIP/贵宾

TA的文库 其他...

计量文库

威望: 7 级
论坛币: 271951 个
通用积分: 31269.3519
学术水平: 1435 点
热心指数: 1554 点
信用等级: 1345 点
经验: 383775 点
帖子: 9598
精华: 66
在线时间: 5468 小时
注册时间: 2007-5-21
最后登录: 2024-4-18

oliyiyi 发表于 2019-5-6 23:53:05 |显示全部楼层 |坛友微信交流群

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

By Dominik Haitz, IONOS.

Introduction

Drew Conway’s visualization of the data science skill set is an often cited classic. Different opinions and the versatility of the role have spawned numerous variations:

Various data science Venn diagrams. Image courtesy of Google Images. Source: https://sinews.siam.org/Details-Page/a-timely-focus-on-data-science

There seems to be no consensus on the data science skill set. Additionally, as the field evolves, shortcomings become obvious and new challenges arise. How can we describe this evolution?

The first wave of data scientists happendbefore data became big and before data science was actually a thing (pre-2010s): Statisticians and analysts who had always been around, doing a lot of what modern data scientists are doing, but accompanied by less hype.

Second wave: Large-scale data collection created a demand for smart minds who can work magic and turn all this big data into big money. Companies were still figuring out what kind of people to employ and often turned to science graduates. While the second wave data scientists did a lot right, their carefully crafted models often ended up as PoCs and failed to bring about actual change.

Now, at the end of the 2010s, amidst the hype around deep learning and AI, enter the third wave of data scientists: Experimenting and innovating, efficiently seeking out business value und bridging the deployment gap to create great data products. What skills are required here?

The skill portfolio of the third wave data scientist.

1. Business Mindset

The business mindset is the centerpiece of the data science skill set, as it sets goals and applies the other skills to reach them. Patrick McKenzie states in this blog post:

Engineers are hired to create business value, not to program things: Businesses do things for irrational and political reasons all the time […], but in the main they converge on doing things which increase revenue or reduce costs.

Likewise, data scientists are hired to create business value, not just to build models. Ask yourself: How will the outcome of my work influence company decisions? What do I have to do to maximize this effect? With this entrepreneurial spirit, the third wave data scientist does not only produce actionable insights, but also seeks that they bring about real change.

Look where the money flows in your organization — the divisions with the largest cost or revenue will likely offer the highest financial leverage. However, business value is a fuzzy concept: It goes beyond cost and revenue of the current fiscal year. Experimenting and creating an innovative data culture will increase a company’s long-term competitiveness.

Prioritizing your workand knowing when to stop is the key to efficiency. Think of diminishing returns: Is it worth spending weeks to tweak a model for another 0.2% of precision? Quite often, good enough is the real perfect.

Domain expertise, which makes up a third of Conway’s skill set, is by no means to be neglected — however, you’ll almost everywhere have to learn it on the job. This includes knowledge about your industry as well as all the company processes, naming schemes and peculiarities. This knowledge does not only set the frame conditions for your work, but it is often indispensable to understand and interpret your data.

Keep it simple, stupid

Mat Velloso@matvelloso

Half of the time when companies say they need "AI" what they really need is a SELECT clause with GROUP BY.

You're welcome.

6,661
2:53 AM - May 31, 2018
Twitter Ads info and privacy

2,667 people are talking about this

Look out for the low hanging fruit and quick wins. A simple SQL query on existing data warehouse might yield valuable insights unbeknownst to product managers or executives. Don’t fall into the trap of doing “buzzword-driven data science”, focusing on state-of-the-art deep learning where a beautifully simple regression model would be sufficient — and much less work to build, implement and maintain. Know the complicated things, but do not overcomplicate things.

2. Software Engineering Craftsmanship

The notion of (second wave) data scientists needing only “hacking skills” instead of proper software engineering has been repeatedly critizised. Lack of readability, modularity or versioning hinders collaboration, reproducibility and productionizing.

Instead, learn the craft from proper software engineers. Test your code and use version control. Follow an established coding style (e.g. PEP8) and learn how to use an IDE (e.g. PyCharm). Try pair programming. Modularize and document your code, use meaningful variable names and refactor, refactor, refactor.

Bridge the deployment gap for agile prototyping of data products: Learn to use tools for logging and monitoring. Know how to build a REST API (e.g. using Flask) to provide your results to others. Learn how to ship your work inside a Docker container or deploy it to a platform like Heroku. Instead of letting your models rot on your laptop, wrap them into data-driven services that fit snugly into your company’s IT landscape.

3. Statistics and Algorithms Toolbox

Data scientists have to thoroughly understand the basic concepts in statistics and particularly in machine learning (A STEM university education is probably the best way to acquire this foundation). There are tons of resources on what’s important, so I’m not gonna delve further into this here. You will often have to explain algorithms or concepts like statistical uncertainty to your clients, or red-flag an insight because of a confusion between correlation and causation.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

相关帖子

• CDA数据分析师认证考试

缺少币币的网友请访问有奖回帖集合：
https://bbs.pinggu.org/thread-3990750-1-1.html

使用道具举报

oliyiyi 发表于 2019-5-6 23:54:20 |显示全部楼层 |坛友微信交流群

4. Soft Skills

As people skills are as important to productivity as technical skills, the third wave data scientist makes conscious efforts to improve in these areas.

Work well with others

Consult your peers — most people are happy to help or give advice. Treat others on par: You might have a fancy degree and an understanding of sophisticated algorithms, but others have experience that you don’t (This sounds like basic social advice, but who hasn’t met an arrogant IT professional?).

Understand your client

Ask the right follow-up questions. If a client or your boss wants you to calculate some key figures or create some chart, ask “Why? What for? What do you want to achieve? What actions will you take, depending on the result?” to better understand the core of the problem. Then figure out how to get there together — is there another, better way to reach that goal than the one proposed?

Navigate company politics

Network, not because you expect others to benefit you in your career, but because you are an approachable person. Connect with people with similar work topics. If there are no platforms for this in your company, create them.Identify key stakeholders and figure out how to help them with their problems. Invite others early on and make them part of the change process. Keep in mind: A company is not a rational entity, but an assembly of often irrational human beings.

Communicate your results

Ramp up your visualization and presentation skills. Communicate from a client perspective: How can I precisely answer their question? Learn to communicate at different levels and sum up the details of your work. People are easily intrigued by fancy multidimensional plots, but often a simple bar chart conveys a message more efficiently. Showcase your results: When people see what you’re doing, and see that you’re doing great work, they will trust you.

Evaluate yourself

Communicate your goals and problems and actively seek out advice. Find role models inside and outside the data science community and learn from them.

缺少币币的网友请访问有奖回帖集合：
https://bbs.pinggu.org/thread-3990750-1-1.html

使用道具举报