[博客精选]Building a decision tree from scratch - a beginner tutorial - HLM专版

0关注
62粉丝

VIP

院士

67%

还不是VIP/贵宾

-

TA的文库 其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

0%

威望: 0 级
论坛币: 49957 个
通用积分: 79.5487
学术水平: 253 点
热心指数: 300 点
信用等级: 208 点
经验: 41518 点
帖子: 3256
精华: 14
在线时间: 766 小时
注册时间: 2006-5-4
最后登录: 2022-11-6

楼主

Lisrelchen 发表于 2016-8-19 01:55:27 |只看作者 |坛友微信交流群|倒序 |AI写论文

相似文件

换一批

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Building a decision tree from scratch - a beginner tutorialby Patrick L. Lê

Suppose you have a population. You want to divide this population into relevant subgroups based on specific features characterizing each subgroup, so that you can accurately predict outcomes associated with each subgroup. For instance, you could :

Use the list of the people on the Titanic, and by dividing them into subgroups depending on specific criteria (e.g. female vs male, passengers in 1st class vs 2nd and 3rd class, age class....) determines if they were (probably) going to survive or not.
Look at the people who bought product on your e-commerce website, divide this population into segments depending on specific features (e.g. returning visitors vs new visitors, localization, ...) and determines for future visitors if they are (probably) going to buy your product or not.

In sum, you want to create a model that predicts the value of a target variable (e.g. survive/die; buy/not buy) based on simple decision rules inferred from the data features (e.g. female vs male, age, etc.).

The result is a decision tree that offers the great advantage to be easily vizualized and simple to understand. For instance, the picture below, fromwikipedia, shows the probability of passengers of the Titanic to survive depending on their sex, age and number of spouses or siblings aboard. Note how each branching is based on answering a question (the decision rule) and how the graph looks like an inverted tree.

本帖隐藏的内容

9.pdf (4.4 MB)

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：Decision Tutorial Building beginner scratch scratch

相关帖子

使用道具举报

沙发

Lisrelchen 发表于 2016-8-19 01:58:27 |只看作者 |坛友微信交流群

def buildtree(rows,scoref=entropy): #rows is the set, either whole dataset or part of it in the recursive call,
#scoref is the method to measure heterogeneity. By default it's entropy.
if len(rows)==0: return decisionnode() #len(rows) is the number of units in a set
current_score=scoref(rows)
# Set up some variables to track the best criteria
best_gain=0.0
best_criteria=None
best_sets=None
column_count=len(rows[0])-1 #count the # of attributes/columns.
#It's -1 because the last one is the target attribute and it does not count.
for col in range(0,column_count):
# Generate the list of all possible different values in the considered column
global column_values #Added for debugging
column_values={}
for row in rows:
column_values[row[col]]=1
# Now try dividing the rows up for each value in this column
for value in column_values.keys(): #the 'values' here are the keys of the dictionnary
(set1,set2)=divideset(rows,col,value) #define set1 and set2 as the 2 children set of a division
# Information gain
p=float(len(set1))/len(rows) #p is the size of a child set relative to its parent
gain=current_score-p*scoref(set1)-(1-p)*scoref(set2) #cf. formula information gain
if gain>best_gain and len(set1)>0 and len(set2)>0: #set must not be empty
best_gain=gain
best_criteria=(col,value)
best_sets=(set1,set2)
# Create the sub branches
if best_gain>0:
trueBranch=buildtree(best_sets[0])
falseBranch=buildtree(best_sets[1])
return decisionnode(col=best_criteria[0],value=best_criteria[1],
tb=trueBranch,fb=falseBranch)
else:
return decisionnode(results=uniquecounts(rows))

复制代码

使用道具举报

藤椅

fengyg

发表于 2016-8-19 07:41:07 |只看作者 |坛友微信交流群

kankan

使用道具举报

板凳

r9205009 发表于 2016-8-19 08:30:00 |只看作者 |坛友微信交流群

THS

使用道具举报

报纸

ekscheng 发表于 2016-8-19 09:06:13 |只看作者 |坛友微信交流群

使用道具举报

[博客精选]Building a decision tree from scratch - a beginner tutorial [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我拉你入群

相关帖子

初级热心勋章

中级热心勋章

初级信用勋章

中级信用勋章

初级学术勋章

中级学术勋章

高级热心勋章

高级学术勋章

高级信用勋章

本版微信群

[博客精选]Building a decision tree from scratch - a beginner tutorial [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我 拉你入群

相关帖子

初级热心勋章

中级热心勋章

初级信用勋章

中级信用勋章

初级学术勋章

中级学术勋章

高级热心勋章

高级学术勋章

高级信用勋章

本版微信群

扫码加我拉你入群