Naive Bayes | CYBERODE

Naive Bayes Classifier

Supervised learning and Classification Model.
Probabilistic classification model.
Model uses Bayes Theorem for calculating probabilities.
Model makes classification using posterior decision rule.

Bayes Rule

P ( A | B ) = ( P ( B | A ) * P ( A ) ) / P ( B )

Where:

P ( A ∣ B) is the posterior probability: the probability of event A (e.g., Hire/Not Hire) given evidence B (the features like age, gender, etc.).
P ( B ∣ A ) is the likelihood: the probability of evidence B given that event A is true.
P( A ) is the prior probability: the initial probability of event A occurring.
P( B ) is the evidence: the total probability of the evidence (features), regardless of the class label.

What is priori and posterior probability?

Priori (prior) Probability:

It represents what is originally believed before new evidence.

This takes information into account simply, whatever is known (For ex: Coin, Dice, Card etc)

Posterior Probability:
Whatever probability of event/ happening that has to be calculated from the data is the posterior probability.

How Bayes Classification works?

Training Data:

Assume a new data point below is taken for testing/ prediction:

We will need to predict if we will need hire this person or not

Prediction will happen with the Bayes rule with the available training data.

Y = Hire/ Not Hire

Y = 0

Probability of Y = 0 ( Not Hire ) , given that age = 26 *
Probability of Y = 0 ( Not Hire ) , given that Gender = Male *

Probability of Y = 0 ( Not Hire ) , given that Occupation = Self*

Probability of Y = 0 ( Not Hire ) , given that Skill = ML

Formula:

P ( Y = 0 ) => P ( Y = 0 | Age = 26 ) * P ( Y = 0 | Gender = Male ) * P ( Y = 0 | Occupation = Self ) * P ( Y = 0 | Skill = ML )

Similarly, it will calculate the other probability as well, where

Y = 1

Probability of Y = 1 ( Not Hire ) , given that age = 26 *
Probability of Y = 1 ( Not Hire ) , given that Gender = Male *

Probability of Y = 1 ( Not Hire ) , given that Occupation = Self*

Probability of Y = 1 ( Not Hire ) , given that Skill = ML

Formula:

P ( Y = 1 ) => P ( Y = 1 | Age = 26 ) * P ( Y = 1 | Gender = Male ) * P ( Y = 1 | Occupation = Self ) * P ( Y = 1 | Skill = ML )

Decision:
If P ( Y = 0 ) is ( ex 0.75 ) and P ( Y = 1 ) is ( ex 0.25 ), then the new data will be classified as '0' since 0.75 > 0.25.

If P ( Y = 1 ) is ( ex 0.75 ) and P ( Y = 0) is ( ex 0.25 ), then the new data will be classified as '1' since 0.75 > 0.25.

Why is it called "Naive Bayes"?

Case Study

SMS Spam Dataset

- We will be using the SMS Spam dataset

- First we will import all the libraries as below:

- Load the data

- Check the shape

(5574, 2)

Checking the first 5 data in the datasets

Check the text of the first dataset

Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...

Describe the data

Group the data based on the type

Add the text length column in the data set

Check for outliers

# stats description of the length columns

data['text_length'].describe()

Shortest Message: Ok
Longest Message: For me the love should start with attraction.i should feel that I need her every time around me.she should be the first thing which comes in my thoughts ...

Text Preprocessing

Remove Punctuations

This is a sample message to remove punctuations

Remove Stop Words

[nltk_data] Downloading package stopwords to /root/nltk_data... [nltk_data] Package stopwords is already up-to-date!

List all the stop words in English

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", ...

Remove punctuations and stop words

- Apply the text processing function to the text in the text column