Random Forests
LEO BREIMAN (随机森林的原创者)
Statistics Department, University of California, Berkeley, CA 94720
Abstract. Random forests are a combination of tree predictors such that each tree depends on the values of a
random vector sampled independently and with the same distribution for all trees in the forest. The generalization
error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization
error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation
between them. Using a random selection of features to split each node yields error rates that compare
favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International
conference, ∗ ∗ ∗, 148–156), but are more robust with respect to noise. Internal estimates monitor error,
strength, and correlation and these are used to show the response to increasing the number of features used in
the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to
regression.
此文 发表于:
Machine Learning, 45, 5–32, 2001
c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands


雷达卡
京公网安备 11010802022788号







