楼主: oliyiyi
1467 2

What statisticians think about data scientists [推广有奖]

版主

已卖:2993份资源

泰斗

1%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
117070 个
通用积分
31670.9540
学术水平
1454 点
热心指数
1573 点
信用等级
1364 点
经验
384134 点
帖子
9629
精华
66
在线时间
5508 小时
注册时间
2007-5-21
最后登录
2025-7-8

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

楼主
oliyiyi 发表于 2016-1-24 10:40:49 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

nteresting to read what some statisticians write about data science, on the American Statistical Association (ASA) blog. Most of us don't care about our job title - there are so many breeds of statisticians and data scientists after all - and they do overlap to some extent. While I was once a statistician, I now call myself data scientist or business scientist. Anyway, below are some extracts from very lively and interesting discussions taking place on the ASA blog.

Tommy Jones posted The Identity of Statistics in Data Science on the American Statistical Association (ASA) website  in December 2015. In his long and very interesting article, he wrote (this is just a tiny extract):

Judging by current statistics curricula, statistics is more closely tied to the mathematics of probability than to fundamentals of data management.[...] As models have become more accurate, they have also become more complex.

Dogling Yan commented:

In that data analyst job, I barely used any statistical models because people don’t really care about p-values. Also, with the size of current datasets, p-values are always very small. The models, analysis methods that most people learned at school are not very useful since the simple model and more valid and complex models tend to give the same conclusion when sample size is large.

My comment:

As a data scientist, I work on making models (actually, absence of models, but instead data-driven systems) simpler, not more sophisticated, and fit for black-box processing of big data in production mode. That is, robustness is more important than 100% accuracy, especially if your data is 70% accurate. And also, I work on designing a new statistical framework that is free of mathematics, traditional probability theory, random variables, and so on - so that anyone who know Excel can learn it. Even to compute confidence intervals or more elaborate forecasting systems. It will be published in my upcoming book, Data Science 2.0.

Jennifer Lewis Priestley also posted on ASA, in January 2016: Data Science: The Evolution or the Extinction of Statistics?

In this article, she wrote:

While data scientists can do a great many things I can’t do—mainly in the areas of coding, API development, web scraping, and machine learning—they would be hard pressed to compete with a PhD student in statistics in supervised modeling techniques or variable reduction methods.

My comment:

Read my article about a fast, efficient, combinatorial algorithm for feature selection using predictive power to jointly select variables. It is the data science approach to variable reduction and variable generation. Likewise, supervised modeling - which it also belongs to machine learning - is not foreign to data scientists. Read about my automated indexation/tagging algorithm, used for taxonomy creation/maintenance or cataloguing: it performs clustering of n data points in O(n), and can cluster billions of web pages in very little time. It is also used to turn unstructured data into structured data.

And my reply to someone (Peter) who commented on LinkedIn, saying that "the feature selection method mentioned in the blog is still a heuristic method i.e. no guarantee to find the optimal subset of variables."

Peter, data scientists are usually interested in local optima, easy to detect, and that provide almost the same yield as the global optimum which has two drawbacks: (1) the global optimum could be an unstable optimum, and (2) it might take far more time to compute if the data set is immense.

About the author: Vincent Granville worked for Visa, eBay, Microsoft, Wells Fargo, NBC, a few startups and various organizations, to optimize business problems, boost ROI or to develop ROI attribution models, developing new techniques and systems to leverage modern big data and deliver added value. Vincent owns several patents, published in top scientific journals, raised VC funding, and founded a few startups. The most recent one - Data Science Central - is growing exponentially, and delivers a substantial profit margin. Vincent also manages his own self-funded research lab, focusing on simplifying, unifying, modernizing, automating, scaling, and dramatically optimizing statistical techniques. Vincent's focus is on producing robust, automatable tools, API's and algorithms that can be used and understood by the layman, and at the same time adapted to modern big, fast-flowing, unstructured data. Vincent is a post-graduate from Cambridge University.


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:statistician Scientists Scientist statistic Statist about

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html

沙发
hjtoh 发表于 2016-1-24 10:48:04 来自手机
oliyiyi 发表于 2016-1-24 10:40
nteresting to read what some statisticians write about data science, on the American Statistical Ass ...
一家人不说两家话嘛

藤椅
seahhj 发表于 2016-1-24 14:22:35
good material, thanks for sharing

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-5 21:32