[学科前沿] Adaboost求助 [推广有奖]

0关注
0粉丝

大专生

46%

还不是VIP/贵宾

威望: 0 级
论坛币: 2 个
通用积分: 0
学术水平: 0 点
热心指数: 0 点
信用等级: 0 点
经验: 928 点
帖子: 17
精华: 0
在线时间: 63 小时
注册时间: 2012-4-1
最后登录: 2016-8-8

楼主

dabap 发表于 2012-8-3 00:39:37 |只看作者 |坛友微信交流群|倒序 |AI写论文

10论坛币

以构建tree为例，boosting，bagging这些方法的作用之一都是将大数据分块处理得到较小训练数据集来得到tree，然后对这些tree进行投票得到最终结果。对bagging而言，训练数据的选取是随机的，而我现在的疑问是boosting如何在原始数据（大数据）上面选取训练样本，感觉是和分布有关，但是具体的不太清楚，求帮忙解答，谢谢 ☺
不能是对大数据直接训练吧，感觉那样就没意义了

最佳答案

ltx5151 查看完整内容

分享0 收藏0 回帖

关键词：adaboost boost abo Boosting bagging 数据如何

使用道具举报

沙发

ltx5151 发表于 2012-8-3 00:39:38 |只看作者 |坛友微信交流群

Hi,

Firstly, please note that Adaboost is not the same concept as boosting. Boosting it a more general idea of machine learning model.

I don't think Adaboost is to use partial of the training sample. It uses all of the sample to build weak learners and take all the weak learners together to make better classification.

I guess want you really refer to is stochastic boosting, rather than Adaboost. stochastic boosting would have a such a random sub-sampling step to train each base learner. This kind of takes advantage of the randomness to avoid overfitting and accelerate the training process. For details, you can see Friedman 1999.
Bagging and boosting are ensemble learning methods that becomes popular in recent 15 years. But typically, boosting can be more powerful than bagging. Bagging was origianlly designed to achieve variance reduction. You can view that as a specific bootstrap method. So such persepctive would be closer to what you really worry about.

使用道具举报

藤椅

dabap 发表于 2012-8-3 13:58:09 |只看作者 |坛友微信交流群

ltx5151 发表于 2012-8-3 00:39
Hi,

Firstly, please note that Adaboost is not the same concept as boosting. Boosting it a more ge ...

谢谢，看了回复后很有帮助，最近在做森林扰动的遥感图像分类，主要用的二叉树的方法，但是由于数据量过大，行列想乘后有几百万个样本，所以对这些样本应用Adaboost不怎么现实，需要bagging这类投票的方法，但是由于bagging的精度限制，想找个更好的方法，谢谢你的建议，想问下R里面有没有stochastic boosting的包啊？谢谢了