楼主: Lisrelchen
1843 1

Machine Learning using Random Forests(Python) [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50288 个
通用积分
83.6306
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2015-5-18 01:06:21 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Random forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random forests correct for decision trees' habit of overfitting to their training set.

The algorithm for inducing a random forest was developed by Leo Breiman[1] and Adele Cutler,and "Random Forests" is their trademark. The method combines Breiman's "bagging" idea and the random selection of features, introduced independently by Ho and Amit and Geman in order to construct a collection of decision trees with controlled variance.

The selection of a random subset of features is an example of the random subspace method, which, in Ho's formulation, is a way to implement classification proposed by Eugene Kleinberg.

  1. import urllib2
  2. import numpy
  3. from sklearn import tree
  4. from sklearn.tree import DecisionTreeRegressor
  5. import random
  6. from math import sqrt
  7. import matplotlib.pyplot as plot
  8. #read data into iterable
  9. target_url = ("http://archive.ics.uci.edu/ml/machine-learning-"
  10. "databases/wine-quality/winequality-red.csv")
  11. data = urllib2.urlopen(target_url)
  12. xList = []
  13. labels = []
  14. names = []
  15. firstLine = True
  16. for line in data:
  17. if firstLine:
  18. names = line.strip().split(";")
  19. firstLine = False
  20. else:
  21. #split on semi-colon
  22. row = line.strip().split(";")
  23. #put labels in separate array
  24. labels.append(float(row[-1]))
  25. #remove label from row
  26. row.pop()
  27. #convert row to floats
  28. floatRow = [float(num) for num in row]
  29. xList.append(floatRow)
  30. nrows = len(xList)
  31. ncols = len(xList[0])
  32. #take fixed test set 30% of sample
  33. random.seed(1) #set seed so results are the same each run
  34. nSample = int(nrows * 0.30)
  35. idxTest = random.sample(range(nrows), nSample)
  36. idxTest.sort()
  37. idxTrain = [idx for idx in range(nrows) if not(idx in idxTest)]
  38. #Define test and training attribute and label sets
  39. xTrain = [xList[r] for r in idxTrain]
  40. xTest = [xList[r] for r in idxTest]
  41. yTrain = [labels[r] for r in idxTrain]
  42. yTest = [labels[r] for r in idxTest]
  43. #train a series of models on random subsets of the training data
  44. #collect the models in a list and check error of composite as list grows
  45. #maximum number of models to generate
  46. numTreesMax = 30
  47. #tree depth - typically at the high end
  48. treeDepth = 12
  49. #pick how many attributes will be used in each model.
  50. # authors recommend 1/3 for regression problem
  51. nAttr = 4
  52. #initialize a list to hold models
  53. modelList = []
  54. indexList = []
  55. predList = []
  56. nTrainRows = len(yTrain)
  57. for iTrees in range(numTreesMax):
  58. modelList.append(DecisionTreeRegressor(max_depth=treeDepth))
  59. #take random sample of attributes
  60. idxAttr = random.sample(range(ncols), nAttr)
  61. idxAttr.sort()
  62. indexList.append(idxAttr)
  63. #take a random sample of training rows
  64. idxRows = []
  65. for i in range(int(0.5 * nTrainRows)):
  66. idxRows.append(random.choice(range(len(xTrain))))
  67. idxRows.sort()
  68. #build training set
  69. xRfTrain = []
  70. yRfTrain = []
  71. for i in range(len(idxRows)):
  72. temp = [xTrain[idxRows[i]][j] for j in idxAttr]
  73. xRfTrain.append(temp)
  74. yRfTrain.append(yTrain[idxRows[i]])
  75. modelList[-1].fit(xRfTrain, yRfTrain)
  76. #restrict xTest to attributes selected for training
  77. xRfTest = []
  78. for xx in xTest:
  79. temp = [xx[i] for i in idxAttr]
  80. xRfTest.append(temp)
  81. latestOutSamplePrediction = modelList[-1].predict(xRfTest)
  82. predList.append(list(latestOutSamplePrediction))
  83. #build cumulative prediction from first "n" models
  84. mse = []
  85. allPredictions = []
  86. for iModels in range(len(modelList)):
  87. #add the first "iModels" of the predictions and multiply by eps
  88. prediction = []
  89. for iPred in range(len(xTest)):
  90. prediction.append(sum([predList[i][iPred]
  91. for i in range(iModels + 1)]) / (iModels + 1))
  92. allPredictions.append(prediction)
  93. errors = [(yTest[i] - prediction[i]) for i in range(len(yTest))]
  94. mse.append(sum([e * e for e in errors]) / len(yTest))
  95. nModels = [i + 1 for i in range(len(modelList))]
  96. plot.plot(nModels,mse)
  97. plot.axis('tight')
  98. plot.xlabel('Number of Trees in Ensemble')
  99. plot.ylabel('Mean Squared Error')
  100. plot.ylim((0.0, max(mse)))
  101. plot.show()
  102. print('Minimum MSE')
  103. print(min(mse))
  104. #printed output
  105. #Depth 1
  106. #Minimum MSE
  107. #0.52666715461
  108. #Depth 5
  109. #Minimum MSE
  110. #0.426116327584
  111. #Depth 12
  112. #Minimum MSE
  113. #0.38508387863
复制代码
Reference
  • Machine Learning in Python: Essential Techniques for Predictive Analysis Wiley

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Learning earning Forests machine python developed selected sequence training element

本帖被以下文库推荐

沙发
fjrong 在职认证  发表于 2015-5-18 02:11:51

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-31 08:18