人大经济论坛 › 论坛 › 数据科学与人工智能 › 大数据分析 › mahout论坛 › Bayesian classifiers in Mahout

发帖

楼主: hanszhu

2856 0

Bayesian classifiers in Mahout [推广有奖]

0关注
34粉丝

已卖：4535份资源

院士

27%

还不是VIP/贵宾

TA的文库 其他...

Clojure NewOccidental

Job and Interview

Perl资源总汇

威望: 7 级
论坛币: 144575308 个
通用积分: 68.9538
学术水平: 37 点
热心指数: 38 点
信用等级: 25 点
经验: 31240 点
帖子: 1873
精华: 1
在线时间: 802 小时
注册时间: 2005-1-3
最后登录: 2024-10-15

楼主

hanszhu 发表于 2014-7-13 23:06:08 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Mahout currently has two implementations of Bayesian classifiers. One is the traditional Naive Bayes approach, and the other is called Complementary Naive Bayes.
ImplementationsNaiveBayes (MAHOUT-9)
Complementary Naive Bayes (MAHOUT-60)
The Naive Bayes implementations in Mahout follow the paperhttp://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf Before we get to the actual algorithm lets discuss the terminology
Given, in an input set of classified documents:

j = 0 to N features
k = 0 to L labels

Then:

Normalized Frequency for a term(feature) in a document is calculated by dividing the term frequency by the root mean square of terms frequencies in that document
Weight Normalized Tf for a given feature in a given label = sum of Normalized Frequency of the feature across all the documents in the label.
Weight Normalized Tf-Idf for a given feature in a label is the Tf-idf calculated using standard idf multiplied by the Weight Normalized Tf

Once Weight Normalized Tf-idf(W-N-Tf-idf) is calculated, the final weight matrix for Bayes and Cbayes are calculated as follows
We calculate the sum of W-N-Tf-idf for all the features in a label called as Sigma_k or sumLabelWeight
For Bayes
Weight = Log [ ( W-N-Tf-Idf + alpha_i ) / ( Sigma_k + N ) ]

For CBayes
We calculate the Sum of W-N-Tf-Idf across all labels for a given feature. We call this sumFeatureWeight of Sigma_j
Also we sum the entire W-N-Tf-Idf weights for all feature,label pair in the train set. Call this Sigma_jSigma_k
Final Weight is calculated as
Weight = Log [ ( Sigma_j - W-N-Tf-Idf + alpha_i ) / ( Sigma_jSigma_k - Sigma_k + N ) ]

ExamplesIn Mahout's example code, there are two samples that can be used:

Wikipedia Bayes Example - Classify Wikipedia data.

Twenty Newsgroups - Classify the classic Twenty Newsgroups data.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：classifiers classifier Bayesian mahout Bayes currently called start

本帖被以下文库推荐

· Data Science NewOccidental|主题: 1233, 订阅: 120

返回列表

发帖

本版微信群

加好友,备注cda
拉您进交流群

京ICP备16021002号-2 京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明

Bayesian classifiers in Mahout [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

Bayesian classifiers in Mahout [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群