请选择 进入手机版 | 继续访问电脑版
楼主: oliyiyi
1068 2

Is Data Scientist the right career path for you? Candid advice [推广有奖]

版主

泰斗

0%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
272091 个
通用积分
31269.1753
学术水平
1435 点
热心指数
1554 点
信用等级
1345 点
经验
383778 点
帖子
9599
精华
66
在线时间
5466 小时
注册时间
2007-5-21
最后登录
2024-3-21

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

oliyiyi 发表于 2016-8-10 09:38:22 |显示全部楼层 |坛友微信交流群

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
It is no surprise today that Data Scientist (or related roles such as Data Manager, Statistician, Data Analyst, etc.) is one of the most sought career paths. In response to this cross-industry trend, several top universities have started programs dedicated to Data Science.

Allured by the tremendous opportunities, great compensation and visibility to business leaders, many people are moving towards the Data Scientist career path without a thorough careful assessment of the day-to-day responsibilities of such a role, the required attitude; and balance of technical and business skills.

In the pursuit to provide data science aspirants a clear realistic picture of the data scientist role, which they can assess against their personality and career ambitions, I recently discussed this with Paco Nathan, a data science expert with 25+ years of industry experience. His candid, detailed response is very likely to be an eye-opener for many.

Paco Nathan’s short bio is provided at the end of the post.

Anmol Rajpurohit:  Data Scientist has been termed as the sexiest job of 21st century. Do you agree? What advice would you give to people thinking of a long career in Data Science?

Paco Nathan: I don’t agree. Not many people have the breadth of skills to perform the role, nor the patience that is absolutely needed to acquire those skills, nor the desire to get there.

As a self test:
  • prepare an analysis and visualization of an unknown data set, while impatient stakeholders watch over your shoulder and ask pointed questions; be prepared to make quantitative arguments about the confidence of the results
  • describe “loss function” and “regularization term” each in 25 words or less, with a compare/contrast of several examples, and show how to structure a range of trade-offs for model transparency, predictive power, and resource requirements
  • pitch a reorg proposal to an executive staff session which implies firing some ranking people
  • interview 3-4 different departments that are hostile to your project, to tease out the metadata for datasets that they’ve been reluctant to release
  • build, test, and deploy a mission--critical app with real-time SLAs, efficiently across a 1000+ node cluster
  • troubleshoot intermittent bugs in somebody else’s code which is at least 2000 lines long, without their assistance
  • leverage ensemble approaches to enhance a predictive model that you’re working on
  • work on a deadline in paired programming with people from 3-4 different fields completely disjoint from the work that you’ve done

























If one doesn’t feel absolutely comfortable performing each of those listed above, right now, then my advice is to avoid “Data Science” as a career.

The term Data Scientist was “sexy” as a new role circa 2012 in the sense of DJ Patil, Hilary Mason, et al. However, not everyone gets a chunk of a $4B IPO! (full disclosure: I got invited 3x to join LI prior to their IPO but stubbornly pursued other opportunities; what an excellent team there!)

Circa 2012: that was then, this is now. Actual work in Data Science entails:
  • some opportunities to innovate from a “greenfield” state, but not often
  • mostly being called into an existing project — which is somehow at risk
  • having to speak truth to power (not fun, but the essence of the role)





To echo what DJ and others have articulated so well before: most data-related problems are social/organizational (e.g., data silos, lack of metadata, matrix org in-fighting, etc.) or else the key insights probably would have been apparent within that organization already.

I have a hunch that much of the interesting work in e-commerce has played out already — big players will continue to reap big revenue, but the work to be done now is mostly outside of Silicon Valley.  Or rather, other industries coming here to learn, partner, purchase, etc.
For example, Monsanto launched a private equity firm in SF that, practically speaking, can invest more money at more favorable terms into Ag data ventures than just about any VC firm. Meanwhile, VCs in the area have all but ignored data-related ventures in domains that matter — with the exception of Khosla. In the past several months they’ve acquired business units within SV: Climate Corp, Solum, etc., which by the way were funded by Khosla. Expect more of that trend.

From my perspective, the big issues in data now are not in ad-tech, but instead real issues: food supply, drought/flooding, energy security, health care, telecom, transportation apart from oil dependency, smarter manufacturing, deforestation monitoring, oceanographic analysis, etc.

Also, IT budgets are still enormously flawed w.r.t. data insights. Too much budget goes into the priesthood of “data engineering”, and far too much budget tends to be earmarked for data that’s already cleaned up. Also, I find that the notion of “Product Management” in SV is almost antithetically opposed to effective use of data: in many cases product managers are incentivized to discourage use of data within companies.

Hence our value is generally going to be realized at:
  • writing code to prepare data
  • automating process to improve feature engineering and model tournaments
  • speaking truth to power


The first speaks to IT budgets earmarked the wrong way, and the second speaks to Product Management being almost systematically hostile to effective use of data. The third speaks to the fact that several of my biggest contributions as a data scientist have been to provide exec staff with hard evidence to fire other executives and get the company back on track. Again, industry disruptions have impact.

For people just starting out, be really careful about where you go to work. If a firm claims to have “excellent engineering” but insufficient use of data circa 2014, then they are *not* the sharpest tools on the workbench; pick some other firm in which to start. Find mentors. Join teams that have strong sponsorship from Finance or Operations (which generally understand data and variance) while perhaps avoiding teams that have sponsorship from Engineering or Marketing (which generally do not understand effective use of data).

Recommendations, not necessarily in order:

  • learn to leverage the evolving Py data stack: IPython, Pandas, scikit-learn, etc.
  • learn how to lead an interdisciplinary team
  • get experience in 1+ domains outside of data/analytics/programming
  • get a good grounding in design and apply it to data visualization
  • do everything you can to become a better writer and speaker (outside of academic confs)
  • participate in meetups; publish blogs, presentations, etc. (hiring managers ignore resumes and look for published content online)
  • get a good grounding in abstract algebra, Bayesian stats, linear algebra, convex optimization
  • study up on algorithms and frameworks for streaming data (the bigger use cases on the horizon are not batch)
  • learn Scalding and functional programming with type safety
  • avoid Business Intelligence (like the plague)
  • avoid anything referred to as “The Hadoop Ecosystem” or “Hadoop as an OS”


Paco Nathan is a "player/coach" in the field of Big Data, having led innovative Data teams on large-scale apps for 10+ years. An expert in distributed systems, machine learning, and Enterprise data workflows, Paco is an O'Reilly author and an advisor for several firms including The Data Guild, Mesosphere, Marinexplore, Agromeda, and TagThisCar. Paco received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 25+ years technology industry experience ranging from Bell Labs to early-stage start-ups.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Scientist Career Advice RIGHT Care advice career

已有 1 人评分学术水平 收起 理由
william9225 + 1 精彩帖子

总评分: 学术水平 + 1   查看全部评分

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
hjtoh 发表于 2016-8-10 09:42:55 来自手机 |显示全部楼层 |坛友微信交流群
oliyiyi 发表于 2016-8-10 09:38
It is no surprise today that Data Scientist (or related roles such as Data Manager, Statistician, Da ...
是个好职业

使用道具

william9225 学生认证  发表于 2016-8-10 17:13:32 |显示全部楼层 |坛友微信交流群

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-3-29 18:37