楼主: oliyiyi
1176 1

Data Science for Beginners 1: The 5 questions data science answers [推广有奖]

版主

泰斗

0%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
271951 个
通用积分
31269.3519
学术水平
1435 点
热心指数
1554 点
信用等级
1345 点
经验
383775 点
帖子
9598
精华
66
在线时间
5468 小时
注册时间
2007-5-21
最后登录
2024-4-18

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

Get a quick introduction to data science from Data Science for Beginners in five short videos. This video series is helpful if you're interested in doing data science - or work with people who do data science - and you want to start with some basic concepts.

This first video is about the kinds of questions that data science can answer. Data science predicts answers to questions using a number or category. To get the most out of the series, watch them in order. Go to the list of videos

Transcript: The 5 questions data science answers


Hi! Welcome to the video series Data Science for Beginners.

Data Science can be intimidating, so I'll introduce the basics here without any equations or computer programming jargon.

In this first video, we'll talk about "The 5 questions data science answers."

Data Science uses numbers and names (also known as categories or labels) to predict answers to questions.

It might surprise you, but there are only five questions that data science answers:

  • Is this A or B?
  • Is this weird?
  • How much – or – How many?
  • How is this organized?
  • What should I do next?

Each one of these questions is answered by a separate family of machine learning methods, called algorithms.

It's helpful to think about an algorithm as a recipe and your data as the ingredients. An algorithm tells how to combine and mix the data in order to get an answer. Computers are like a blender. They do most of the hard work of the algorithm for you and they do it pretty fast.

Question 1: Is this A or B? uses classification algorithms


Let's start with the question: Is this A or B?

This family of algorithms is called two-class classification.

It's useful for any question that has just two possible answers.

For example:

  • Will this tire fail in the next 1,000 miles: Yes or no?
  • Which brings in more customers: a $5 coupon or a 25% discount?

This question can also be rephrased to include more than two options: Is this A or B or C or D, etc.? This is called multiclass classification and it's useful when you have several—or several thousand—possible answers. Multiclass classification chooses the most likely one.

Question 2: Is this weird? uses anomaly detection algorithms


The next question data science can answer is: Is this weird? This question is answered by a family of algorithms called anomaly detection.

If you have a credit card, you’ve already benefitted from anomaly detection. Your credit card company analyzes your purchase patterns, so that they can alert you to possible fraud. Charges that are "weird" might be a purchase at a store where you don't normally shop or buying an unusually pricey item.

This question can be useful in lots of ways. For instance:

  • If you have a car with pressure gauges, you might want to know: Is this pressure gauge reading normal?
  • If you're monitoring the internet you’d want to know: Is this message from the internet typical?

Anomaly detection flags unexpected or unusual events or behaviors. It gives clues where to look for problems.

Question 3: How much? or How many? uses regression algorithms


Machine learning can also predict the answer to How much? or How many? The algorithm family that answers this question is called regression.

Regression algorithms make numerical predictions, such as:

  • What will the temperature be next Tuesday?
  • What will my fourth quarter sales be?

They help answer any question that can asks for a number.

Question 4: How is this organized? uses clustering algorithms


Now the last two questions are a bit more advanced.

Sometimes you want to understand the structure of a data set - How is this organized? For this question, you don’t have examples that you already know outcomes for.

There are a lot of ways to tease out the structure of data. One approach is clustering. It separates data into natural "clumps," for easier interpretation. With clustering there is no one right answer.

Common examples of clustering questions are:

  • Which viewers like the same types of movies?
  • Which printer models fail the same way?

By understanding how data is organized, you can better understand - and predict - behaviors and events.

Question 5: What should I do now? uses reinforcement learning algorithms


The last question – What should I do now? – uses a family of algorithms called reinforcement learning.

Reinforcement learning was inspired by how the brains of rats and humans respond to punishment and rewards. These algorithms learn from outcomes, and decide on the next action.

Typically, reinforcement learning is a good fit for automated systems that have to make lots of small decisions without human guidance.

Questions it answers are always about what action should be taken - usually by a machine or a robot. Examples are:

  • If I'm a temperature control system for a house: Adjust the temperature or leave it where it is?
  • If I'm a self-driving car: At a yellow light, brake or accelerate?
  • For a robot vacuum: Keep vacuuming, or go back to the charging station?

Reinforcement learning algorithms gather data as they go, learning from trial and error.

So that's it - The 5 questions data science can answer.


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Data Science beginners questions beginner question science

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
沙发
h2h2 发表于 2016-7-27 15:12:12 |只看作者 |坛友微信交流群
谢谢分享

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-26 13:23