楼主: oliyiyi
3408 117

Predictive Science vs Data Science   [推广有奖]

回帖奖励 43 个论坛币 回复本帖可获得 1 个论坛币奖励! 每人限 5 次(中奖概率 10%)

版主

大师

83%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
6
论坛币
618353 个
学术水平
1307 点
热心指数
1415 点
信用等级
1218 点
经验
324032 点
帖子
8524
精华
66
在线时间
4808 小时
注册时间
2007-5-21
最后登录
2018-11-15

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

oliyiyi 发表于 2017-2-17 15:01:49 |显示全部楼层
本帖最后由 oliyiyi 于 2017-2-28 15:22 编辑

本帖隐藏的内容

To be clear, data science is much more than prediction or classification. It includes other machine learning techniques, such as clustering and frequent itemset mining. It also includes data visualization and data storytelling. It can also encompass the various aspects of traditional data mining frameworks, like The KDD Process, including data selection, preprocessing, and transformation. Data science can also include other algorithms and approaches to data-related tasks beyond what I have mentioned here.

I have previously and holistically defined data science as follows:

Data science is a multifaceted discipline, which encompasses machine learning and other analytic processes, statistics and related branches of mathematics, increasingly borrows from high performance scientific computing, all in order to ultimately extract insight from data and use this new-found information to tell stories.

When considering "predictive science" vs. data science, it is the slender related section of data science which I am measuring it against. In fact, the disassembly of data science into constituent "sciences" (clustering science, for example) would certainly help express what exactly it is we do, at the obvious expense of a sexy umbrella buzzword.

But taking a step back, it is inarguable that data is input, a raw material. In this sense, data science places the emphasis on the "what" in predictive processes. While the data is a prime ingredient in the predictive puzzle, and possibly the most difficult to procure or otherwise come across, "data science" seems to neglect the other major component as well as the interesting insights.

Algorithms are transformative processes. So what about algorithmic science? This focuses on the tools, the "how," and is firmly rooted in computer science. Again, this falls short of accurately describing the holistic predictive process; data is abandoned in favor of the processes which transform it into prediction. Any successful description would likely focus on the end result.

The outcome of the holistic predictive process is the prediction. Or is it the hypothesis? I don't mean this in a general "hypothesis vs. prediction" sort of way, but in a "is the prediction or the hypothesis the more valuable output from a particular classifier/model?"

Whether prediction or hypothesis, one of these 2 will be the most interesting piece of the holistic predictive science puzzle. Predictive science - or prediction science, if that strikes you better - sounds pretty good. But really, isn't that just "science?" That seems very non-specific.

What about statistics? Are we applied statisticians? Sourced from Wikipedia:

"Applied statistics" comprises descriptive statistics and the application of inferential statistics.

Add in prescriptive statistics, and this seems like a step in the right direction. However, the emphasis in this case is on the application of statistical processes at the expense of... well, not much, really. Yet I would argue that this actually does not place the proper emphasis on inferential and prescriptive statistics, and perhaps implies too much reliance on descriptive, and thus seems to also fall short in describing the science of prediction.

Predictive analytics? Maybe the closest fit, but this term seems closer to the business world at this point than to the world of science. I don't see this term brought up in research at all, and it generally seems to be the sole domain of big business. And that's fine for what it is, but what it is does not seem to put science at its forefront (though, clearly, science underlies its usage).

I don't know that there is a solution. To be fair, I don't even know that this is a problem that exists outside of my own head. But I think everything about boils down to the following, and can be generalized beyond the prediction aspect of data science: Does the term data science actually represent anything of value to us, the data scientists, or to everyone else?

I don't purport to have a recommendation, and even if I did I'm sure it would be passed over. Which is fine. But as someone who is not terribly excited by, or comfortable with, the term "data science," I think it's worth being introspective about what it is we do, and how we categorize those tasks. Sure, there is a convenience at being able to put a name to a broad profession of somewhat related tasks, but do we lose the trees through this forest?

And when it comes to the very complex science of prediction, data may be the new oil, and algorithms the special sauce, but their paired predictive power is where the actual money is, both figuratively and literally.




缺少币币的网友请访问有奖回帖集合
http://bbs.pinggu.org/thread-3990750-1-1.html
stata SPSS
alvin2 发表于 2017-2-18 21:17:16 |显示全部楼层

回帖奖励 +1 个论坛币

看看。赚个论坛币。
回复

使用道具 举报

叫我水手123456 学生认证  发表于 2017-2-18 23:03:03 |显示全部楼层

回帖奖励 +1 个论坛币

哈哈哈哈哈哈哈哈哈和
回复

使用道具 举报

kkkm_db 发表于 2017-2-19 01:16:54 |显示全部楼层

回帖奖励 +1 个论坛币

看看。赚个论坛币。
回复

使用道具 举报

cszcszcsz 发表于 2017-2-19 05:55:26 |显示全部楼层

回帖奖励 +1 个论坛币

R Learning Path: From beginner to expert in R in 7 steps
回复

使用道具 举报

sdhb 发表于 2017-2-19 14:28:48 |显示全部楼层

回帖奖励 +1 个论坛币

need coins
回复

使用道具 举报

sdhb 发表于 2017-2-19 14:29:34 |显示全部楼层

回帖奖励 +1 个论坛币

more coins
回复

使用道具 举报

sdhb 发表于 2017-2-19 14:30:03 |显示全部楼层

回帖奖励 +1 个论坛币

more more coins
回复

使用道具 举报

sdhb 发表于 2017-2-19 14:30:19 |显示全部楼层
more more more coins
回复

使用道具 举报

ccwwccww 发表于 2017-2-21 08:29:58 |显示全部楼层

回帖奖励 +1 个论坛币

谢谢分享
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 我要注册

GMT+8, 2018-11-19 01:02