160 2

# Naive Bayes: A Baseline Model for Machine Learning Classification Performance [推广有奖]

87%

-

TA的文库  其他...

6

632695 个

22387.4456

1396 点

1509 点

1305 点

328879 点

8915

66

4957 小时

2007-5-21

2019-7-17

oliyiyi 发表于 2019-5-8 19:42:49 |显示全部楼层

#### 本帖隐藏的内容

By Asel Mendis, KDnuggets.

Bayes Theorem

$\\boldsymbol{\\mathbf{P(A|B) = \\frac{P(B|A)*(P(A)}{P(B)}}}$

The above equation represents Bayes Theorem in which it describes the probability of an event occurring P(A) based on our prior knowledge of events that may be related to that event P(B).

Lets explore the parts of Bayes Theorem:

• P(A|B) - Posterior Probability
• The conditional probability that event A occurs given that event B has occurred.
• P(A) - Prior Probability
• The probability of event A.
• P(B) - Evidence
• The probability of event B.
• P(B|A) - Likelihood
• The conditional probability of B occurring given event A has occurred.

Now, lets explore the parts of Bayes Theorem through the eyes of someone conducting machine learning:

• P(A|B) - Posterior Probability
• The conditional probability of the response variable (target variable) given the training data inputs.
• P(A) - Prior Probability
• The probability of the response variable (target variable).
• P(B) - Evidence
• The probability of the training data.
• P(B|A) - Likelihood
• The conditional probability of the training data given the response variable.

$\\boldsymbol{\\mathbf{P(c|x) = \\frac{P(x|c)*P(c)}{P(x)}}}$

• P(c|x) - Posterior probability of the target/class (c) given predictors (x).
• P(c) - Prior probability of the class (target).
• P(x|c) - Probability of the predictor (x) given the class/target (c).
• P(x) - Prior probability of the predictor (x).

Example of using Bayes theorem:
I'll be using the tennis weather dataset.

oliyiyi 发表于 2019-5-8 19:43:38 |显示全部楼层
 import numpy as npimport pandas as pdimport matplotlib.pyplot as plt%matplotlib inline tennis = pd.read_csv('tennis.csv')tennis outlooktemphumiditywindyplay 0sunnyhothighFalseno 1sunnyhothighTrueno 2overcasthothighFalseyes 3rainymildhighFalseyes 4rainycoolnormalFalseyes 5rainycoolnormalTrueno 6overcastcoolnormalTrueyes 7sunnymildhighFalseno 8sunnycoolnormalFalseyes 9rainymildnormalFalseyes 10sunnymildnormalTrueyes 11overcastmildhighTrueyes 12overcasthotnormalFalseyes 13rainymildhighTruenoLets take a look at how each category looks when inside a frequency table:outlook = tennis.groupby(['outlook', 'play']).size()temp = tennis.groupby(['temp', 'play']).size()humidity = tennis.groupby(['humidity', 'play']).size()windy = tennis.groupby(['windy', 'play']).size()play = tennis.play.value_counts() print(temp)print('------------------')print(humidity)print('------------------')print(windy)print('------------------')print(outlook)print('------------------')print('play')print(play) temp  playcool  no      1      yes     3hot   no      2      yes     2mild  no      2      yes     4dtype: int64------------------humidity  playhigh      no      4          yes     3normal    no      1          yes     6dtype: int64------------------windy  playFalse  no      2       yes     6True   no      3       yes     3dtype: int64------------------outlook   playovercast  yes     4rainy     no      2          yes     3sunny     no      3          yes     2dtype: int64------------------playyes    9no     5Name: play, dtype: int64What is the probability of playing tennis given it is rainy?P(rain|play=yes) frequency of (outlook=rainy) when (play=yes) / frequency of (play=yes) = 3/9 P(play=yes) frequency of (play=yes) / total(play) = 9/14 P(outlook=rainy) frequency of (outlook=rainy) / total(outlook) = 5/14 $\\boldsymbol{\\mathbf{P(play=yes|outlook=rainy) = \\frac{P(outlook=rainy|play=yes) * P(play=yes)}{P(outlook=rainy)}}}$ (3/9)*(9/14)/(5/14) 0.6The probability of playing tennis when it is rainy is 60%. The process is very simple once you obtain the frequencies for each category.Here is a simple function to help any newbies remember the parts of Bayes equation:  def bayestheorem():    print('Posterior [P(c|x)] - Posterior probability of the target/class (c) given predictors (x)'),    print('Prior [P(c)] - Prior probability of the class (target)'),    print('Likelihood [P(x|c)] - Probability of the predictor (x) given the class/target (c)'),    print('Evidence [P(x)] - Prior probability of the predictor (x))') Here is a simple function to calculate the posterior probability for you, but you must be able to find each part of bayes equation yourself.  def bayesposterior(prior, likelihood, evidence, string):      print('Prior=', prior),      print('Likelihood=', likelihood),      print('Evidence=', evidence),      print('Equation =','(Prior*Likelihood)/Evidence')      print(string, (prior*likelihood)/evidence) Lets see another way to find the posterior probability this time using contingency tables in Python:ct = pd.crosstab(tennis['outlook'], tennis['play'], margins = True)print(ct)           no  yes  rowtotalovercast   0    4         4rainy      2    3         5sunny      3    2         5coltotal   5    9        14ct.columns = ["no","yes","rowtotal"]ct.index= ["overcast","rainy","sunny","coltotal"]ct / ct.loc["coltotal","rowtotal"] noyesrowtotal overcast0.0000000.2857140.285714 rainy0.1428570.2142860.357143 sunny0.2142860.1428570.357143 coltotal0.3571430.6428571.000000 To only get the column totalct / ct.loc["coltotal"] noyesrowtotal overcast0.00.4444440.285714 rainy0.40.3333330.357143 sunny0.60.2222220.357143 coltotal1.01.0000001.000000 To only get the row totalct.div(ct["rowtotal"], axis=0) noyesrowtotal overcast0.0000001.0000001.0 rainy0.4000000.6000001.0 sunny0.6000000.4000001.0 coltotal0.3571430.6428571.0 These tables are all pandas dataframe objects. Therefore using pandas subsetting and the bayesposterior function I made, we can arrive at the same conclusion:bayesposterior(prior = ct.iloc[1,1]/ct.iloc[3,1],               likelihood = ct.iloc[3,1]/ct.iloc[3,2],               evidence = ct.iloc[1,2]/ct.iloc[3,2],               string = 'Probability of Tennis given Rain =') Prior= 0.3333333333333333Likelihood= 0.6428571428571429Evidence= 0.35714285714285715Equation = (Prior*Likelihood)/EvidenceProbability of Tennis given Rain = 0.6 Naive Bayes Algorithm Naive Bayes is a supervised Machine Learning algorithm inspired by the Bayes theorem. It works on the principles of conditional probability. Naive Bayes is a classification algorithm for binary and multi-class classification. The Naive Bayes algorithm uses the probabilities of each attribute belonging to each class to make a prediction.Example What is the probability of playing tennis when it is sunny, hot, highly humid and windy? So using the tennis dataset, we need to use the Naive Bayes method to predict the probability of someone playing tennis given the mentioned weather conditions.pd.crosstab(tennis['outlook'], tennis['play'], margins = True) playnoyesAll outlook overcast044 rainy235 sunny325 All5914 pd.crosstab(tennis['temp'], tennis['play'], margins = True) playnoyesAll temp cool134 hot224 mild246 All5914 pd.crosstab(tennis['humidity'], tennis['play'], margins = True) playnoyesAll humidity high437 normal167 All5914 pd.crosstab(tennis['windy'], tennis['play'], margins = True) playnoyesAll windy False268 True336 All5914 pd.crosstab(index=tennis['play'],columns="count", margins=True) col_0countAll play no55 yes99 All1414Now by using the above contingency tables, we will go through how the Naive Bayes algorithm calculates the posterior probability.Calculate P(x|play=yes). In this case x refers to all the predictors 'outlook', 'temp', 'humidity' and 'windy'. P(sunny|play=yes)→2/9P(hot|play=yes)→2/9P(high|play=yes)→3/9P(True|play=yes)→3/9 $\\boldsymbol{\\mathbf{P(x|play=yes) \\rightarrow P(sunny|play=yes)*P(hot|play=yes)*P(high|play=yes)*P(True|play=yes)}}$ p_x_yes = ((2/9)*(2/9)*(3/9)*(3/9))print('The probability of the predictors given playing tennis is', '%.3f'%p_x_yes) The probability of the predictors given playing tennis is 0.005Calculate P(x|play=no) using the same method as above. P(sunny|play=no)→3/5P(hot|play=no)→2/5P(high|play=no)→4/5P(True|play=no)→3/5 $\\boldsymbol{\\mathbf{P(x|play=no) \\rightarrow P(sunny|play=no)*P(hot|play=no)*P(high|play=no)*P(True|play=no)}}$ p_x_no = ((3/5)*(2/5)*(4/5)*(3/5))print('The probability of the predictors given not playing tennis is ', '%.3f'%p_x_no) The probability of the predictors given not playing tennis is  0.115Calculate P(play=yes) and P(play=no) P(play=yes)→9/14P(play=yes)→5/14 yes = (9/14)no = (5/14)print('The probability of playing tennis is', '%.3f'% yes)print('The probability of not playing tennis is', '%.3f'% no) The probability of playing tennis is 0.643The probability of not playing tennis is 0.357

 Calculate the probability of playing and not playing tennis given the predictors $\\boldsymbol{\\mathbf{P(play=yes|x) = P(x|play=yes)*P(play=yes)}}$ $\\boldsymbol{\\mathbf{P(play=no|x) = P(x|play=no)*P(play=no)}}$ yes_x = p_x_yes*yesprint('The probability of playing tennis given the predictors is', '%.3f'%yes_x)no_x = p_x_no*noprint('The probability of not playing tennis given the predictors is', '%.3f'%no_x)   The probability of playing tennis given the predictors is 0.004  The probability of not playing tennis given the predictors is 0.041The prediction will be whichever probability is higher if yes_x > no_x:  print('The probability of playing tennis when the outlook is sunny, the temperature is hot, there is high humidity and windy is higher')else:  print('The probability of not playing tennis when the outlook is sunny, the temperature is hot, there is high humidity and windy is higher') The probability of not playing tennis is higher when the outlook is sunny, the temperature is hot, there is high humidity and it is windy. Type of Naive Bayes Algorithm Python's Scikitlearn gives the user access to the following 3 Naive Bayes models.Gaussian The gaussian NB Alogorithm assumes all contnuous features (predictors) and all follow a Gaussian (Normal Distribution). Multinomial Multinomial NB is suited for discrete data that have frequencies and counts. Spam Filtering and Text/Document Classification are two very well-known use cases. Bernoulli Bernoulli is similar to Multinomial except it is for boolean/binary features. Like the multinomial method it can be used for spam filtering and document classification in which binary terms (i.e. word occurrence in a document represented with True or False). Lets implement a Multinomial and Gaussian Model with Scikitlearnfrom sklearn.naive_bayes import GaussianNB, BernoulliNB, MultinomialNBfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import *