Path: blob/master/Data Science Essentials for Data Analysts/Naive_Bayes_Crop_Recommendation .ipynb
7216 views
Naive Bayes Crop Recommendation
What is Naive Bayes?
Naive Bayes is a probabilistic machine learning algorithm based on Bayes’ Theorem.It predicts the class with the highest posterior probability given input features.The main idea behind the Naive Bayes classifier is to use Bayes' Theorem to classify data based on the probabilities of different classes given the features of the data. It is used mostly in high-dimensional text classification
Posterior probability is the probability of a class after observing the data, calculated using Bayes’ theorem by combining prior probability and likelihood.
Bayes Theorem
Bayes’ Theorem provides a principled way to reverse conditional probabilities. It is defined as:

Assumption of Naive Bayes
Feature independence: This means that when we are trying to classify something, we assume that each feature (or piece of information) in the data does not affect any other feature.
Continuous features are normally distributed: If a feature is continuous, then it is assumed to be normally distributed within each class.
Discrete features have multinomial distributions: If a feature is discrete, then it is assumed to have a multinomial distribution within each class.
Features are equally important: All features are assumed to contribute equally to the prediction of the class label.
No missing data: The data should not contain any missing values.
Gaussian Naive Bayes Formula
In Gaussian Naive Bayes, continuous values associated with each feature are assumed to be distributed according to a Gaussian distribution. A Gaussian distribution is also called Normal distribution When plotted, it gives a bell shaped curve which is symmetric about the mean of the feature values as shown below:
Multinomial Naive Bayes
Multinomial Naive Bayes is used when features represent the frequency of terms (such as word counts) in a document. It is commonly applied in text classification, where term frequencies are important.
Hand‑Calculated Naive Bayes Example (Step‑by‑Step)
Problem : Predict crop for:
Temperature = 24°C
Humidity = 80%
Classes: Rice, Wheat
Step 1: Prior Probabilities
| Crop | Samples | Prior P(C) |
|---|---|---|
| Rice | 60 | 0.60 |
| Wheat | 40 | 0.40 |
Step 2: Feature Statistics (from training data)
Temperature
| Crop | Mean (μ) | Variance (σ²) |
|---|---|---|
| Rice | 22 | 4 |
| Wheat | 26 | 9 |
Humidity
| Crop | Mean (μ) | Variance (σ²) |
|---|---|---|
| Rice | 82 | 4 |
| Wheat | 70 | 16 |
Step 3: Likelihood Calculation
Temperature Likelihood
[ P(24|Rice) = 0.121 ]
[ P(24|Wheat) = 0.106 ]
Humidity Likelihood
[ P(80|Rice) = 0.121 ]
[ P(80|Wheat) = 0.020 ]
Step 4: Posterior Probability
Rice
[ P(Rice|X) = 0.60 \times 0.121 \times 0.121 = 0.00878 ]
Wheat
[ P(Wheat|X) = 0.40 \times 0.106 \times 0.020 = 0.00085 ]
Final Decision
| Crop | Posterior Probability |
|---|---|
| Rice | 0.00878 |
| Wheat | 0.00085 |
Predicted Crop = Rice
This is exactly what Gaussian Naive Bayes computes internally.
Let us Model complete data set using Python Libraries
Load Dataset
Exploratory Data Analysis
Import Data Prep it (null, shape,delete, add) Visualize Model Training Model Predict Model Evaluate Model Correction Test it on New Data and finally deploy it for pilot
We have 22 unique crops in our data with all crops having same percentage
Correlation Heatmap
Feature – Label Split
Train Test Split
Model Training
Model Evaluation
Prediction on New Data
Rework on this sheet and share
Insights after every block
Select only not correlated features
Add few more visuals
Name your Model as Crop_Pred
Update this using K-Fold