[Case Study]Data Mining using Decision Trees

1关注
62粉丝

VIP

已卖：4901份资源

学术权威

14%

还不是VIP/贵宾

-

TA的文库 其他...

R资源总汇

Panel Data Analysis

Experimental Design

0%

威望: 1 级
论坛币: 49675 个
通用积分: 56.1887
学术水平: 370 点
热心指数: 273 点
信用等级: 335 点
经验: 57805 点
帖子: 4005
精华: 21
在线时间: 582 小时
注册时间: 2005-5-8
最后登录: 2023-11-26

楼主

ReneeBK 发表于 2016-5-12 10:27:54 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Data Mining using Decision Trees
#### Part 1: Decision Trees -------------------
## Understanding Decision Trees ----
# calculate entropy of a two-class segment
-0.60 * log2(0.60) - 0.40 * log2(0.40)
curve(-x * log2(x) - (1 - x) * log2(1 - x),
col="red", xlab = "x", ylab = "Entropy", lwd=4)
## Example: Identifying Risky Bank Loans ----
## Step 2: Exploring and preparing the data ----
credit <- read.csv("credit.csv")
str(credit)
# look at two characteristics of the applicant
table(credit$checking_balance)
table(credit$savings_balance)
# look at two characteristics of the loan
summary(credit$months_loan_duration)
summary(credit$amount)
# look at the class variable
table(credit$default)
# create a random sample for training and test data
# use set.seed to use the same random number sequence as the tutorial
set.seed(12345)
credit_rand <- credit[order(runif(1000)), ]
# compare the credit and credit_rand data frames
summary(credit$amount)
summary(credit_rand$amount)
head(credit$amount)
head(credit_rand$amount)
# split the data frames
credit_train <- credit_rand[1:900, ]
credit_test <- credit_rand[901:1000, ]
# check the proportion of class variable
prop.table(table(credit_train$default))
prop.table(table(credit_test$default))
## Step 3: Training a model on the data ----
# build the simplest decision tree
library(C50)
credit_model <- C5.0(credit_train[-17], credit_train$default)
# display simple facts about the tree
credit_model
# display detailed information about the tree
summary(credit_model)
## Step 4: Evaluating model performance ----
# create a factor vector of predictions on test data
credit_pred <- predict(credit_model, credit_test)
# cross tabulation of predicted versus actual classes
library(gmodels)
CrossTable(credit_test$default, credit_pred,
prop.chisq = FALSE, prop.c = FALSE, prop.r = FALSE,
dnn = c('actual default', 'predicted default'))
## Step 5: Improving model performance ----
## Boosting the accuracy of decision trees
# boosted decision tree with 10 trials
credit_boost10 <- C5.0(credit_train[-17], credit_train$default,
trials = 10)
credit_boost10
summary(credit_boost10)
credit_boost_pred10 <- predict(credit_boost10, credit_test)
CrossTable(credit_test$default, credit_boost_pred10,
prop.chisq = FALSE, prop.c = FALSE, prop.r = FALSE,
dnn = c('actual default', 'predicted default'))
# boosted decision tree with 100 trials (not shown in text)
credit_boost100 <- C5.0(credit_train[-17], credit_train$default,
trials = 100)
credit_boost_pred100 <- predict(credit_boost100, credit_test)
CrossTable(credit_test$default, credit_boost_pred100,
prop.chisq = FALSE, prop.c = FALSE, prop.r = FALSE,
dnn = c('actual default', 'predicted default'))
## Making some mistakes more costly than others
# create a cost matrix
error_cost <- matrix(c(0, 1, 4, 0), nrow = 2)
error_cost
# apply the cost matrix to the tree
credit_cost <- C5.0(credit_train[-17], credit_train$default,
costs = error_cost)
credit_cost_pred <- predict(credit_cost, credit_test)
CrossTable(credit_test$default, credit_cost_pred,
prop.chisq = FALSE, prop.c = FALSE, prop.r = FALSE,
dnn = c('actual default', 'predicted default'))

复制代码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：Data Mining Case study Decision Using study credit

本帖被以下文库推荐

· 东西方数据挖掘|主题: 1798, 订阅: 171

沙发

ReneeBK 发表于 2016-5-12 10:32:28

/**
*
*/
package dt;
import java.util.*;
public class DecisionTree {
/**
* Contains the set of available attributes.
*/
private LinkedHashSet<String> attributes;
/**
* Maps a attribute name to a set of possible decisions for that attribute.
*/
private Map<String, Set<String> > decisions;
private boolean decisionsSpecified;
/**
* Contains the examples to be processed into a decision tree.
*
* The 'attributes' and 'decisions' member variables should be updated
* prior to adding examples that refer to new attributes or decisions.
*/
private Examples examples;
/**
* Indicates if the provided data has been processed into a decision tree.
*
* This value is initially false, and is reset any time additional data is
* provided.
*/
private boolean compiled;
/**
* Contains the top-most attribute of the decision tree.
*
* For a tree where the decision requires no attributes,
* the rootAttribute yields a boolean classification.
*
*/
private Attribute rootAttribute;
private Algorithm algorithm;
public DecisionTree() {
algorithm = null;
examples = new Examples();
attributes = new LinkedHashSet<String>();
decisions = new HashMap<String, Set<String> >();
decisionsSpecified = false;
}
private void setDefaultAlgorithm() {
if ( algorithm == null )
setAlgorithm(new ID3Algorithm(examples));
}
public void setAlgorithm(Algorithm algorithm) {
this.algorithm = algorithm;
}
/**
* Saves the array of attribute names in an insertion ordered set.
*
* The ordering of attribute names is used when addExamples is called to
* determine which values correspond with which names.
*
*/
public DecisionTree setAttributes(String[] attributeNames) {
compiled = false;
decisions.clear();
decisionsSpecified = false;
attributes.clear();
for ( int i = 0 ; i < attributeNames.length ; i++ )
attributes.add(attributeNames[i]);
return this;
}
/**
*/
public DecisionTree setDecisions(String attributeName, String[] decisions) {
if ( !attributes.contains(attributeName) ) {
// TODO some kind of warning or something
return this;
}
compiled = false;
decisionsSpecified = true;
Set<String> decisionsSet = new HashSet<String>();
for ( int i = 0 ; i < decisions.length ; i++ )
decisionsSet.add(decisions[i]);
this.decisions.put(attributeName, decisionsSet);
return this;
}
/**
*/
public DecisionTree addExample(String[] attributeValues, boolean classification) throws UnknownDecisionException {
String[] attributes = this.attributes.toArray(new String[0]);
if ( decisionsSpecified )
for ( int i = 0 ; i < attributeValues.length ; i++ )
if ( !decisions.get(attributes[i]).contains(attributeValues[i]) ) {
throw new UnknownDecisionException(attributes[i], attributeValues[i]);
}
compiled = false;
examples.add(attributes, attributeValues, classification);
return this;
}
public DecisionTree addExample(Map<String, String> attributes, boolean classification) throws UnknownDecisionException {
compiled = false;
examples.add(attributes, classification);
return this;
}
public boolean apply(Map<String, String> data) throws BadDecisionException {
compile();
return rootAttribute.apply(data);
}
private Attribute compileWalk(Attribute current, Map<String, String> chosenAttributes, Set<String> usedAttributes) {
// if the current attribute is a leaf, then there are no decisions and thus no
// further attributes to find.
if ( current.isLeaf() )
return current;
// get decisions for the current attribute (from this.decisions)
String attributeName = current.getName();
// remove this attribute from all further consideration
usedAttributes.add(attributeName);
for ( String decisionName : decisions.get(attributeName) ) {
// overwrite the attribute decision for each value considered
chosenAttributes.put(attributeName, decisionName);
// find the next attribute to choose for the considered decision
// build the subtree from this new attribute, pre-order
// insert the newly-built subtree into the open decision slot
current.addDecision(decisionName, compileWalk(algorithm.nextAttribute(chosenAttributes, usedAttributes), chosenAttributes, usedAttributes));
}
// remove the attribute decision before we walk back up the tree.
chosenAttributes.remove(attributeName);
// return the subtree so that it can be inserted into the parent tree.
return current;
}
public void compile() {
// skip compilation if already done.
if ( compiled )
return;
// if no algorithm is set beforehand, select the default one.
setDefaultAlgorithm();
Map<String, String> chosenAttributes = new HashMap<String, String>();
Set<String> usedAttributes = new HashSet<String>();
if ( !decisionsSpecified )
decisions = examples.extractDecisions();
// find the root attribute (either leaf or non)
// walk the tree, adding attributes as needed under each decision
// save the original attribute as the root attribute.
rootAttribute = compileWalk(algorithm.nextAttribute(chosenAttributes, usedAttributes), chosenAttributes, usedAttributes);
compiled = true;
}
public String toString() {
compile();
if ( rootAttribute != null )
return rootAttribute.toString();
else
return "";
}
public Attribute getRoot() {
return rootAttribute;
}
}

复制代码

https://github.com/saebyn/java-decision-tree/

藤椅

ReneeBK 发表于 2016-5-12 10:33:40

Decision Tree in Java

/**
*
*/
package dt;
import java.util.*;
class Examples {
class Example {
private Map<String, String> values;
private boolean classifier;
public Example(String[] attributeNames, String[] attributeValues,
boolean classifier) {
assert(attributeNames.length == attributeValues.length);
values = new HashMap<String, String>();
for ( int i = 0 ; i < attributeNames.length ; i++ ) {
values.put(attributeNames[i], attributeValues[i]);
}
this.classifier = classifier;
}
public Example(Map<String, String> attributes, boolean classifier) {
this.classifier = classifier;
this.values = attributes;
}
public Set<String> getAttributes() {
return values.keySet();
}
public String getAttributeValue(String attribute) {
return values.get(attribute);
}
public boolean matchesClass(boolean classifier) {
return classifier == this.classifier;
}
}
private List<Example> examples;
public Examples() {
examples = new LinkedList<Example>();
}
public void add(String[] attributeNames, String[] attributeValues,
boolean classifier) {
examples.add(new Example(attributeNames, attributeValues, classifier));
}
public void add(Map<String, String> attributes, boolean classifier) {
examples.add(new Example(attributes, classifier));
}
/**
* Returns the number of examples where the attribute has the specified
* 'decision' value
*/
int countDecisions(String attribute, String decision) {
int count = 0;
for ( Example e : examples ) {
if ( e.getAttributeValue(attribute).equals(decision) )
count++;
}
return count;
}
/**
* Returns a map from each attribute name to a set of all values used in the
* examples for that attribute.
*/
public Map<String, Set<String> > extractDecisions() {
Map<String, Set<String> > decisions = new HashMap<String, Set<String> >();
for ( String attribute : extractAttributes() ) {
decisions.put(attribute, extractDecisions(attribute));
}
return decisions;
}
public int countNegative(String attribute, String decision,
Map<String, String> attributes) {
return countClassifier(false, attribute, decision, attributes);
}
public int countPositive(String attribute, String decision,
Map<String, String> attributes) {
return countClassifier(true, attribute, decision, attributes);
}
public int countNegative(Map<String, String> attributes) {
return countClassifier(false, attributes);
}
public int countPositive(Map<String, String> attributes) {
return countClassifier(true, attributes);
}
public int count(String attribute, String decision, Map<String, String> attributes) {
attributes = new HashMap(attributes);
attributes.put(attribute, decision);
return count(attributes);
}
public int count(Map<String, String> attributes) {
int count = 0;
nextExample:
for ( Example e : examples ) {
for ( Map.Entry<String, String> attribute : attributes.entrySet() )
if ( !(e.getAttributeValue(attribute.getKey()).equals(attribute.getValue())) )
continue nextExample;
// All of the provided attributes match the example.
count++;
}
return count;
}
public int countClassifier(boolean classifier, Map<String, String> attributes) {
int count = 0;
nextExample:
for ( Example e : examples ) {
for ( Map.Entry<String, String> attribute : attributes.entrySet() )
if ( !(e.getAttributeValue(attribute.getKey()).equals(attribute.getValue())) )
continue nextExample;
// All of the provided attributes match the example.
// If the example matches the classifier, then include it in the count.
if ( e.matchesClass(classifier) )
count++;
}
return count;
}
public int countClassifier(boolean classifier, String attribute,
String decision, Map<String, String> attributes) {
attributes = new HashMap(attributes);
attributes.put(attribute, decision);
return countClassifier(classifier, attributes);
}
/**
* Returns the number of examples.
*/
public int count() {
return examples.size();
}
/**
* Returns a set of attribute names used in the examples.
*/
public Set<String> extractAttributes() {
Set<String> attributes = new HashSet<String>();
for ( Example e : examples ) {
attributes.addAll(e.getAttributes());
}
return attributes;
}
private Set<String> extractDecisions(String attribute) {
Set<String> decisions = new HashSet<String>();
for ( Example e : examples ) {
decisions.add(e.getAttributeValue(attribute));
}
return decisions;
}
}
/**
*
*/
package dt;
import java.util.*;
public class DecisionTree {
/**
* Contains the set of available attributes.
*/
private LinkedHashSet<String> attributes;
/**
* Maps a attribute name to a set of possible decisions for that attribute.
*/
private Map<String, Set<String> > decisions;
private boolean decisionsSpecified;
/**
* Contains the examples to be processed into a decision tree.
*
* The 'attributes' and 'decisions' member variables should be updated
* prior to adding examples that refer to new attributes or decisions.
*/
private Examples examples;
/**
* Indicates if the provided data has been processed into a decision tree.
*
* This value is initially false, and is reset any time additional data is
* provided.
*/
private boolean compiled;
/**
* Contains the top-most attribute of the decision tree.
*
* For a tree where the decision requires no attributes,
* the rootAttribute yields a boolean classification.
*
*/
private Attribute rootAttribute;
private Algorithm algorithm;
public DecisionTree() {
algorithm = null;
examples = new Examples();
attributes = new LinkedHashSet<String>();
decisions = new HashMap<String, Set<String> >();
decisionsSpecified = false;
}
private void setDefaultAlgorithm() {
if ( algorithm == null )
setAlgorithm(new ID3Algorithm(examples));
}
public void setAlgorithm(Algorithm algorithm) {
this.algorithm = algorithm;
}
/**
* Saves the array of attribute names in an insertion ordered set.
*
* The ordering of attribute names is used when addExamples is called to
* determine which values correspond with which names.
*
*/
public DecisionTree setAttributes(String[] attributeNames) {
compiled = false;
decisions.clear();
decisionsSpecified = false;
attributes.clear();
for ( int i = 0 ; i < attributeNames.length ; i++ )
attributes.add(attributeNames[i]);
return this;
}
/**
*/
public DecisionTree setDecisions(String attributeName, String[] decisions) {
if ( !attributes.contains(attributeName) ) {
// TODO some kind of warning or something
return this;
}
compiled = false;
decisionsSpecified = true;
Set<String> decisionsSet = new HashSet<String>();
for ( int i = 0 ; i < decisions.length ; i++ )
decisionsSet.add(decisions[i]);
this.decisions.put(attributeName, decisionsSet);
return this;
}
/**
*/
public DecisionTree addExample(String[] attributeValues, boolean classification) throws UnknownDecisionException {
String[] attributes = this.attributes.toArray(new String[0]);
if ( decisionsSpecified )
for ( int i = 0 ; i < attributeValues.length ; i++ )
if ( !decisions.get(attributes[i]).contains(attributeValues[i]) ) {
throw new UnknownDecisionException(attributes[i], attributeValues[i]);
}
compiled = false;
examples.add(attributes, attributeValues, classification);
return this;
}
public DecisionTree addExample(Map<String, String> attributes, boolean classification) throws UnknownDecisionException {
compiled = false;
examples.add(attributes, classification);
return this;
}
public boolean apply(Map<String, String> data) throws BadDecisionException {
compile();
return rootAttribute.apply(data);
}
private Attribute compileWalk(Attribute current, Map<String, String> chosenAttributes, Set<String> usedAttributes) {
// if the current attribute is a leaf, then there are no decisions and thus no
// further attributes to find.
if ( current.isLeaf() )
return current;
// get decisions for the current attribute (from this.decisions)
String attributeName = current.getName();
// remove this attribute from all further consideration
usedAttributes.add(attributeName);
for ( String decisionName : decisions.get(attributeName) ) {
// overwrite the attribute decision for each value considered
chosenAttributes.put(attributeName, decisionName);
// find the next attribute to choose for the considered decision
// build the subtree from this new attribute, pre-order
// insert the newly-built subtree into the open decision slot
current.addDecision(decisionName, compileWalk(algorithm.nextAttribute(chosenAttributes, usedAttributes), chosenAttributes, usedAttributes));
}
// remove the attribute decision before we walk back up the tree.
chosenAttributes.remove(attributeName);
// return the subtree so that it can be inserted into the parent tree.
return current;
}
public void compile() {
// skip compilation if already done.
if ( compiled )
return;
// if no algorithm is set beforehand, select the default one.
setDefaultAlgorithm();
Map<String, String> chosenAttributes = new HashMap<String, String>();
Set<String> usedAttributes = new HashSet<String>();
if ( !decisionsSpecified )
decisions = examples.extractDecisions();
// find the root attribute (either leaf or non)
// walk the tree, adding attributes as needed under each decision
// save the original attribute as the root attribute.
rootAttribute = compileWalk(algorithm.nextAttribute(chosenAttributes, usedAttributes), chosenAttributes, usedAttributes);
compiled = true;
}
public String toString() {
compile();
if ( rootAttribute != null )
return rootAttribute.toString();
else
return "";
}
public Attribute getRoot() {
return rootAttribute;
}
}
复制代码
https://github.com/saebyn/java-decision-tree/

复制代码

板凳

ReneeBK 发表于 2016-5-12 10:36:23

Decision Tree using Python

import math
#find item in a list
def find(item, list):
for i in list:
if item(i):
return True
else:
return False
#find most common value for an attribute
def majority(attributes, data, target):
#find target attribute
valFreq = {}
#find target in data
index = attributes.index(target)
#calculate frequency of values in target attr
for tuple in data:
if (valFreq.has_key(tuple[index])):
valFreq[tuple[index]] += 1
else:
valFreq[tuple[index]] = 1
max = 0
major = ""
for key in valFreq.keys():
if valFreq[key]>max:
max = valFreq[key]
major = key
return major
#Calculates the entropy of the given data set for the target attr
def entropy(attributes, data, targetAttr):
valFreq = {}
dataEntropy = 0.0
#find index of the target attribute
i = 0
for entry in attributes:
if (targetAttr == entry):
break
++i
# Calculate the frequency of each of the values in the target attr
for entry in data:
if (valFreq.has_key(entry[i])):
valFreq[entry[i]] += 1.0
else:
valFreq[entry[i]] = 1.0
# Calculate the entropy of the data for the target attr
for freq in valFreq.values():
dataEntropy += (-freq/len(data)) * math.log(freq/len(data), 2)
return dataEntropy
def gain(attributes, data, attr, targetAttr):
"""
Calculates the information gain (reduction in entropy) that would
result by splitting the data on the chosen attribute (attr).
"""
valFreq = {}
subsetEntropy = 0.0
#find index of the attribute
i = attributes.index(attr)
# Calculate the frequency of each of the values in the target attribute
for entry in data:
if (valFreq.has_key(entry[i])):
valFreq[entry[i]] += 1.0
else:
valFreq[entry[i]] = 1.0
# Calculate the sum of the entropy for each subset of records weighted
# by their probability of occuring in the training set.
for val in valFreq.keys():
valProb = valFreq[val] / sum(valFreq.values())
dataSubset = [entry for entry in data if entry[i] == val]
subsetEntropy += valProb * entropy(attributes, dataSubset, targetAttr)
# Subtract the entropy of the chosen attribute from the entropy of the
# whole data set with respect to the target attribute (and return it)
return (entropy(attributes, data, targetAttr) - subsetEntropy)
#choose best attibute
def chooseAttr(data, attributes, target):
best = attributes[0]
maxGain = 0;
for attr in attributes:
newGain = gain(attributes, data, attr, target)
if newGain>maxGain:
maxGain = newGain
best = attr
return best
#get values in the column of the given attribute
def getValues(data, attributes, attr):
index = attributes.index(attr)
values = []
for entry in data:
if entry[index] not in values:
values.append(entry[index])
return values
def getExamples(data, attributes, best, val):
examples = [[]]
index = attributes.index(best)
for entry in data:
#find entries with the give value
if (entry[index] == val):
newEntry = []
#add value if it is not in best column
for i in range(0,len(entry)):
if(i != index):
newEntry.append(entry[i])
examples.append(newEntry)
examples.remove([])
return examples
def makeTree(data, attributes, target, recursion):
recursion += 1
#Returns a new decision tree based on the examples given.
data = data[:]
vals = [record[attributes.index(target)] for record in data]
default = majority(attributes, data, target)
# If the dataset is empty or the attributes list is empty, return the
# default value. When checking the attributes list for emptiness, we
# need to subtract 1 to account for the target attribute.
if not data or (len(attributes) - 1) <= 0:
return default
# If all the records in the dataset have the same classification,
# return that classification.
elif vals.count(vals[0]) == len(vals):
return vals[0]
else:
# Choose the next best attribute to best classify our data
best = chooseAttr(data, attributes, target)
# Create a new decision tree/node with the best attribute and an empty
# dictionary object--we'll fill that up next.
tree = {best:{}}
# Create a new decision tree/sub-node for each of the values in the
# best attribute field
for val in getValues(data, attributes, best):
# Create a subtree for the current value under the "best" field
examples = getExamples(data, attributes, best, val)
newAttr = attributes[:]
newAttr.remove(best)
subtree = makeTree(examples, newAttr, target, recursion)
# Add the new subtree to the empty dictionary object in our new
# tree/node we just created.
tree[best][val] = subtree
return tree

复制代码

https://github.com/NinjaSteph/DecisionTree/

报纸

ReneeBK 发表于 2016-5-12 10:39:52

function [classifications] = ClassifyByTree(tree, attributes, instance)
% ClassifyByTree Classifies data instance by given tree
% args:
% tree - tree data structure
% attributes - cell array of attribute strings (no CLASS)
% instance - data including correct classification (end col.)
% return:
% classifications - 2 numbers, first given by tree, 2nd given by
% instance's last column
% tree struct:
% value - will be the string for the splitting
% attribute, or 'true' or 'false' for leaf
% left - left pointer to another tree node (left means
% the splitting attribute was false)
% right - right pointer to another tree node (right
% means the splitting attribute was true)
% Store the actual classification
actual = instance(1, length(instance));
% Recursion with 3 cases
% Case 1: Current node is labeled 'true'
% So trivially return the classification as 1
if (strcmp(tree.value, 'true'));
classifications = [1, actual];
return
end
% Case 2: Current node is labeled 'false'
% So trivially return the classification as 0
if (strcmp(tree.value, 'false'));
classifications = [0, actual];
return
end
% Case 3: Current node is labeled an attribute
% Follow correct branch by looking up index in attributes, and recur
index = find(ismember(attributes,tree.value)==1);
if (instance(1, index)); % attribute is true for this instance
% Recur down the right side
classifications = ClassifyByTree(tree.right, attributes, instance);
else
% Recur down the left side
classifications = ClassifyByTree(tree.left, attributes, instance);
end
return
end

复制代码

https://github.com/gwheaton/ID3-Decision-Tree

地板

ReneeBK 发表于 2016-5-12 10:41:30

% George Wheaton
% EECS 349
% Homework 1 Problem 7
% October 7, 2012
% ID3 Decision Tree Algorithm
function[] = decisiontree(inputFileName, trainingSetSize, numberOfTrials,...
verbose)
% DECISIONTREE Create a decision tree by following the ID3 algorithm
% args:
% inputFileName - the fully specified path to input file
% trainingSetSize - integer specifying number of examples from input
% used to train the dataset
% numberOfTrials - integer specifying how many times decision tree
% will be built from a randomly selected subset
% of the training examples
% verbose - string that must be eiher '1' or '0', if '1'
% output includes training and test sets, else
% it will only contain description of tree and
% results for the trials
% Read in the specified text file contain the examples
fid = fopen(inputFileName, 'rt');
dataInput = textscan(fid, '%s');
% Close the file
fclose(fid);
% Reformat the data into attribute array and data matrix of 1s and 0s for
% true or false
i = 1;
% First store the attributes into a cell array
while (~strcmp(dataInput{1}{i}, 'CLASS'));
i = i + 1;
end
attributes = cell(1,i);
for j=1:i;
attributes{j} = dataInput{1}{j};
end
% NOTE: The classification will be the final attribute in the data rows
% below
numAttributes = i;
numInstances = (length(dataInput{1}) - numAttributes) / numAttributes;
% Then store the data into matrix
data = zeros(numInstances, numAttributes);
i = i + 1;
for j=1:numInstances
for k=1:numAttributes
data(j, k) = strcmp(dataInput{1}{i}, 'true');
i = i + 1;
end
end
% Here is where the trials start
for i=1:numberOfTrials;
% Print the trial number
fprintf('TRIAL NUMBER: %d\n\n', i);
% Split data into training and testing sets randomly
% Use randsample to get a vector of row numbers for the training set
rows = sort(randsample(numInstances, trainingSetSize));
% Initialize two new matrices, training set and test set
trainingSet = zeros(trainingSetSize, numAttributes);
testingSetSize = (numInstances - trainingSetSize);
testingSet = zeros(testingSetSize, numAttributes);
% Loop through data matrix, copying relevant rows to each matrix
training_index = 1;
testing_index = 1;
for data_index=1:numInstances;
if (rows(training_index) == data_index);
trainingSet(training_index, :) = data(data_index, :);
if (training_index < trainingSetSize);
training_index = training_index + 1;
end
else
testingSet(testing_index, :) = data(data_index, :);
if (testing_index < testingSetSize);
testing_index = testing_index + 1;
end
end
end
% If verbose, print out training set
if (verbose);
for ii=1:numAttributes;
fprintf('%s\t', attributes{ii});
end
fprintf('\n');
for ii=1:trainingSetSize;
for jj=1:numAttributes;
if (trainingSet(ii, jj));
fprintf('%s\t', 'true');
else
fprintf('%s\t', 'false');
end
end
fprintf('\n');
end
end
% Estimate the expected prior probability of TRUE and FALSE based on
% training set
if (sum(trainingSet(:, numAttributes)) >= trainingSetSize);
expectedPrior = 'true';
else
expectedPrior = 'false';
end
% Construct a decision tree on the training set using the ID3 algorithm
activeAttributes = ones(1, length(attributes) - 1);
new_attributes = attributes(1:length(attributes)-1);
tree = ID3(trainingSet, attributes, activeAttributes);
% Print out the tree
fprintf('DECISION TREE STRUCTURE:\n');
PrintTree(tree, 'root');
% Run tree and expected prior against testing set, recording
% classifications
% The second column is for actual classification, first for calculated
ID3_Classifications = zeros(testingSetSize,2);
ExpectedPrior_Classifications = zeros(testingSetSize,2);
ID3_numCorrect = 0; ExpectedPrior_numCorrect = 0;
for k=1:testingSetSize; %over the testing set
% Call a recursive function to follow the tree nodes and classify
ID3_Classifications(k,:) = ...
ClassifyByTree(tree, new_attributes, testingSet(k,:));
ExpectedPrior_Classifications(k, 2) = testingSet(k,numAttributes);
if (expectedPrior);
ExpectedPrior_Classifications(k, 1) = 1;
else
ExpectedPrior_Classifications(k, 0) = 0;
end
if (ID3_Classifications(k,1) == ID3_Classifications(k, 2)); %correct
ID3_numCorrect = ID3_numCorrect + 1;
end
if (ExpectedPrior_Classifications(k,1) == ExpectedPrior_Classifications(k,2));
ExpectedPrior_numCorrect = ExpectedPrior_numCorrect + 1;
end
end
% If verbose, print the testing data with final two columns ID3 Class
% and Prior Class
if (verbose);
for ii=1:numAttributes;
fprintf('%s\t', attributes{ii});
end
fprintf('%s\t%s\t', 'ID3 Class', 'Prior Class');
fprintf('\n');
for ii=1:testingSetSize;
for jj=1:numAttributes;
if (testingSet(ii, jj));
fprintf('%s\t', 'true');
else
fprintf('%s\t', 'false');
end
end
if (ID3_Classifications(ii,1));
fprintf('%s\t', 'true');
else
fprintf('%s\t', 'false');
end
if (ExpectedPrior_Classifications(ii,1));
fprintf('%s\t', 'true');
else
fprintf('%s\t', 'false');
end
fprintf('\n');
end
end
% Calculate the proportions correct and print out
if (testingSetSize);
ID3_Percentage = round(100 * ID3_numCorrect / testingSetSize);
ExpectedPrior_Percentage = round(100 * ExpectedPrior_numCorrect / testingSetSize);
else
ID3_Percentage = 0;
ExpectedPrior_Percentage = 0;
end
ID3_Percentages(i) = ID3_Percentage;
ExpectedPrior_Percentages(i) = ExpectedPrior_Percentage;
fprintf('\tPercent of test cases correctly classified by an ID3 decision tree = %d\n' ...
, ID3_Percentage);
fprintf('\tPercent of test cases correctly classified by using prior probabilities from the training set = %d\n\n' ...
, ExpectedPrior_Percentage);
end
meanID3 = round(mean(ID3_Percentages));
meanPrior = round(mean(ExpectedPrior_Percentages));
% Print out remaining details
fprintf('example file used = %s\n', inputFileName);
fprintf('number of trials = %d\n', numberOfTrials);
fprintf('training set size for each trial = %d\n', trainingSetSize);
fprintf('testing set size for each trial = %d\n', testingSetSize);
fprintf('mean performance (percentage correct) of decision tree over all trials = %d\n', meanID3);
fprintf('mean performance (percentage correct) of prior probability from training set = %d\n\n', meanPrior);
end

复制代码

https://github.com/gwheaton/ID3-Decision-Tree

7楼

ReneeBK 发表于 2016-5-12 10:43:47

Decision Tree using Javascript

//ID3 Decision Tree Algorithm
//main algorithm and prediction functions
var id3 = function(_s,target,features){
var targets = _.unique(_s.pluck(target));
if (targets.length == 1){
console.log("end node! "+targets[0]);
return {type:"result", val: targets[0], name: targets[0],alias:targets[0]+randomTag() };
}
if(features.length == 0){
console.log("returning the most dominate feature!!!");
var topTarget = mostCommon(_s.pluck(target));
return {type:"result", val: topTarget, name: topTarget, alias: topTarget+randomTag()};
}
var bestFeature = maxGain(_s,target,features);
var remainingFeatures = _.without(features,bestFeature);
var possibleValues = _.unique(_s.pluck(bestFeature));
console.log("node for "+bestFeature);
var node = {name: bestFeature,alias: bestFeature+randomTag()};
node.type = "feature";
node.vals = _.map(possibleValues,function(v){
console.log("creating a branch for "+v);
var _newS = _(_s.filter(function(x) {return x[bestFeature] == v}));
var child_node = {name:v,alias:v+randomTag(),type: "feature_value"};
child_node.child = id3(_newS,target,remainingFeatures);
return child_node;
});
return node;
}
var predict = function(id3Model,sample) {
var root = id3Model;
while(root.type != "result"){
var attr = root.name;
var sampleVal = sample[attr];
var childNode = _.detect(root.vals,function(x){return x.name == sampleVal});
root = childNode.child;
}
return root.val;
}
//necessary math functions
var entropy = function(vals){
var uniqueVals = _.unique(vals);
var probs = uniqueVals.map(function(x){return prob(x,vals)});
var logVals = probs.map(function(p){return -p*log2(p) });
return logVals.reduce(function(a,b){return a+b},0);
}
var gain = function(_s,target,feature){
var attrVals = _.unique(_s.pluck(feature));
var setEntropy = entropy(_s.pluck(target));
var setSize = _s.size();
var entropies = attrVals.map(function(n){
var subset = _s.filter(function(x){return x[feature] === n});
return (subset.length/setSize)*entropy(_.pluck(subset,target));
});
var sumOfEntropies = entropies.reduce(function(a,b){return a+b},0);
return setEntropy - sumOfEntropies;
}
var maxGain = function(_s,target,features){
return _.max(features,function(e){return gain(_s,target,e)});
}
var prob = function(val,vals){
var instances = _.filter(vals,function(x) {return x === val}).length;
var total = vals.length;
return instances/total;
}
var log2 = function(n){
return Math.log(n)/Math.log(2);
}
var mostCommon = function(l){
return _.sortBy(l,function(a){
return count(a,l);
}).reverse()[0];
}
var count = function(a,l){
return _.filter(l,function(b) { return b === a}).length
}
var randomTag = function(){
return "_r"+Math.round(Math.random()*1000000).toString();
}
//Display logic
var drawGraph = function(id3Model,divId){
var g = new Array();
g = addEdges(id3Model,g).reverse();
window.g = g;
var data = google.visualization.arrayToDataTable(g.concat(g));
var chart = new google.visualization.OrgChart(document.getElementById(divId));
google.visualization.events.addListener(chart, 'ready',function(){
_.each($('.google-visualization-orgchart-node'),function(x){
var oldVal = $(x).html();
if(oldVal){
var cleanVal = oldVal.replace(/_r[0-9]+/,'');
$(x).html(cleanVal);
}
});
});
chart.draw(data, {allowHtml: true});
}
var addEdges = function(node,g){
if(node.type == 'feature'){
_.each(node.vals,function(m){
g.push([m.alias,node.alias,'']);
g = addEdges(m,g);
});
return g;
}
if(node.type == 'feature_value'){
g.push([node.child.alias,node.alias,'']);
if(node.child.type != 'result'){
g = addEdges(node.child,g);
}
return g;
}
return g;
}
var renderSamples = function(samples,$el,model,target,features){
_.each(samples,function(s){
var features_for_sample = _.map(features,function(x){return s[x]});
$el.append("<tr><td>"+features_for_sample.join('</td><td>')+"</td><td><b>"+predict(model,s)+"</b></td><td>actual: "+s[target]+"</td></tr>");
})
}
var renderTrainingData = function(_training,$el,target,features){
_training.each(function(s){
$el.append("<tr><td>"+_.map(features,function(x){return s[x]}).join('</td><td>')+"</td><td>"+s[target]+"</td></tr>");
})
}
var calcError = function(samples,model,target){
var total = 0;
var correct = 0;
_.each(samples,function(s){
total++;
var pred = predict(model,s);
var actual = s[target];
if(pred == actual){
correct++;
}
});
return correct/total;
}

复制代码

https://github.com/willkurt/ID3-Decision-Tree/tree/master/js

8楼

ReneeBK 发表于 2016-5-12 10:54:07

> library("party")
> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

复制代码

> iris_ctree <- ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=iris)
> print(iris_ctree)
Conditional inference tree with 4 terminal nodes
Response: Species
Inputs: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
Number of observations: 150
1) Petal.Length <= 1.9; criterion = 1, statistic = 140.264
2)* weights = 50
1) Petal.Length > 1.9
3) Petal.Width <= 1.7; criterion = 1, statistic = 67.894
4) Petal.Length <= 4.8; criterion = 0.999, statistic = 13.865
5)* weights = 46
4) Petal.Length > 4.8
6)* weights = 8
3) Petal.Width > 1.7
7)* weights = 46
> plot(iris_ctree)

复制代码

> plot(iris_ctree, type="simple")

复制代码

http://www.rdatamining.com/examples/decision-tree

9楼

ReneeBK 发表于 2016-5-12 10:56:02

Decision Tree using R

# Load the party package. It will automatically load other dependent packages.
library(party)
# Print some records from data set readingSkills.
print(head(readingSkills))
When we execute the above code, it produces the following result and chart −
nativeSpeaker age shoeSize score
1 yes 5 24.83189 32.29385
2 yes 6 25.95238 36.63105
3 no 11 30.42170 49.60593
4 yes 7 28.66450 40.28456
5 yes 11 31.88207 55.46085
6 yes 10 30.07843 52.83124
Loading required package: methods
Loading required package: grid
...............................
...............................
Example
We will use the ctree() function to create the decision tree and see its graph.
# Load the party package. It will automatically load other dependent packages.
library(party)
# Create the input data frame.
input.dat <- readingSkills[c(1:105),]
# Give the chart file a name.
png(file = "decision_tree.png")
# Create the tree.
output.tree <- ctree(
nativeSpeaker ~ age + shoeSize + score,
data = input.dat)
# Plot the tree.
plot(output.tree)
# Save the file.
dev.off()

复制代码

http://www.tutorialspoint.com/r/r_decision_tree.htm

[Case Study]Data Mining using Decision Trees [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

本帖被以下文库推荐

Decision Tree in Java

Decision Tree using Python

Decision Tree using Javascript

Decision Tree using R

浏览过的帖子

浏览过的版块

本版微信群

[Case Study]Data Mining using Decision Trees [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

本帖被以下文库推荐

Decision Tree in Java

Decision Tree using Python

Decision Tree using Javascript

Decision Tree using R

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群