CoCalc -- Naive_Bayes_Crop_Recommendation .ipynb

GitHub Repository: suyashi29/python-su
Path: blob/master/Data Science Essentials for Data Analysts/Naive_Bayes_Crop_Recommendation .ipynb
⁷²¹⁶ views

Kernel: Python 3 (ipykernel)

Naive Bayes Crop Recommendation

What is Naive Bayes?

Naive Bayes is a probabilistic machine learning algorithm based on Bayes’ Theorem.It predicts the class with the highest posterior probability given input features.The main idea behind the Naive Bayes classifier is to use Bayes' Theorem to classify data based on the probabilities of different classes given the features of the data. It is used mostly in high-dimensional text classification

Posterior probability is the probability of a class after observing the data, calculated using Bayes’ theorem by combining prior probability and likelihood.

Bayes Theorem

Bayes’ Theorem provides a principled way to reverse conditional probabilities. It is defined as:

Assumption of Naive Bayes

Feature independence: This means that when we are trying to classify something, we assume that each feature (or piece of information) in the data does not affect any other feature.
Continuous features are normally distributed: If a feature is continuous, then it is assumed to be normally distributed within each class.
Discrete features have multinomial distributions: If a feature is discrete, then it is assumed to have a multinomial distribution within each class.
Features are equally important: All features are assumed to contribute equally to the prediction of the class label.
No missing data: The data should not contain any missing values.

Gaussian Naive Bayes Formula

In Gaussian Naive Bayes, continuous values associated with each feature are assumed to be distributed according to a Gaussian distribution. A Gaussian distribution is also called Normal distribution When plotted, it gives a bell shaped curve which is symmetric about the mean of the feature values as shown below:

Multinomial Naive Bayes

Multinomial Naive Bayes is used when features represent the frequency of terms (such as word counts) in a document. It is commonly applied in text classification, where term frequencies are important.

Hand‑Calculated Naive Bayes Example (Step‑by‑Step)

Problem : Predict crop for:

Temperature = 24°C
Humidity = 80%

Classes: Rice, Wheat

Step 1: Prior Probabilities

Crop	Samples	Prior P(C)
Rice	60	0.60
Wheat	40	0.40

Step 2: Feature Statistics (from training data)

Temperature

Crop	Mean (μ)	Variance (σ²)
Rice	22	4
Wheat	26	9

Humidity

Crop	Mean (μ)	Variance (σ²)
Rice	82	4
Wheat	70	16

Step 3: Likelihood Calculation

Temperature Likelihood

[ P(24|Rice) = 0.121 ]

[ P(24|Wheat) = 0.106 ]

Humidity Likelihood

[ P(80|Rice) = 0.121 ]

[ P(80|Wheat) = 0.020 ]

Step 4: Posterior Probability

Rice

[ P(Rice|X) = 0.60 \times 0.121 \times 0.121 = 0.00878 ]

Wheat

[ P(Wheat|X) = 0.40 \times 0.106 \times 0.020 = 0.00085 ]

Final Decision

Crop	Posterior Probability
Rice	0.00878
Wheat	0.00085

Predicted Crop = Rice

This is exactly what Gaussian Naive Bayes computes internally.

Let us Model complete data set using Python Libraries

In [1]:


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Load Dataset

In [2]:


df = pd.read_csv("Crop_recommendation.csv")
df.head()

Out[2]:

Exploratory Data Analysis

In [3]:


df.describe()

Out[3]:

Import Data Prep it (null, shape,delete, add) Visualize Model Training Model Predict Model Evaluate Model Correction Test it on New Data and finally deploy it for pilot

In [6]:

df.describe(include="object")

Out[6]:

We have 22 unique crops in our data with all crops having same percentage

In [4]:


df.hist(figsize=(14,10))
plt.show()

Out[4]:

Correlation Heatmap

In [7]:


plt.figure(figsize=(10,8))
sns.heatmap(df.drop('label', axis=1).corr(), annot=True, cmap='coolwarm')
plt.show()

Out[7]:

Feature – Label Split

In [8]:


X = df.drop("label", axis=1)
y = df["label"]

Train Test Split

In [9]:


X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

Model Training

In [10]:


model = GaussianNB()
model.fit(X_train, y_train)

Out[10]:

Model Evaluation

In [11]:


y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Out[11]:

Accuracy: 0.9954545454545455
              precision    recall  f1-score   support

       apple       1.00      1.00      1.00        20
      banana       1.00      1.00      1.00        20
   blackgram       1.00      1.00      1.00        20
    chickpea       1.00      1.00      1.00        20
     coconut       1.00      1.00      1.00        20
      coffee       1.00      1.00      1.00        20
      cotton       1.00      1.00      1.00        20
      grapes       1.00      1.00      1.00        20
        jute       0.91      1.00      0.95        20
 kidneybeans       1.00      1.00      1.00        20
      lentil       1.00      1.00      1.00        20
       maize       1.00      1.00      1.00        20
       mango       1.00      1.00      1.00        20
   mothbeans       1.00      1.00      1.00        20
    mungbean       1.00      1.00      1.00        20
   muskmelon       1.00      1.00      1.00        20
      orange       1.00      1.00      1.00        20
      papaya       1.00      1.00      1.00        20
  pigeonpeas       1.00      1.00      1.00        20
 pomegranate       1.00      1.00      1.00        20
        rice       1.00      0.90      0.95        20
  watermelon       1.00      1.00      1.00        20

    accuracy                           1.00       440
   macro avg       1.00      1.00      1.00       440
weighted avg       1.00      1.00      1.00       440

Prediction on New Data

In [12]:


new_data = pd.DataFrame({
    "N": [85],
    "P": [40],
    "K": [45],
    "temperature": [22],
    "humidity": [81],
    "ph": [6.8],
    "rainfall": [210]
})

model.predict(new_data)

Out[12]:

array(['rice'], dtype='<U11')

Insights after every block
Select only not correlated features
Add few more visuals
Name your Model as Crop_Pred
Update this using K-Fold

Naive Bayes Crop Recommendation

What is Naive Bayes?

Posterior probability is the probability of a class after observing the data, calculated using Bayes’ theorem by combining prior probability and likelihood.

Bayes Theorem

Assumption of Naive Bayes

Gaussian Naive Bayes Formula

Multinomial Naive Bayes

Hand‑Calculated Naive Bayes Example (Step‑by‑Step)

Problem : Predict crop for:

Step 1: Prior Probabilities

Step 2: Feature Statistics (from training data)

Temperature

Humidity

Step 3: Likelihood Calculation

Temperature Likelihood

Humidity Likelihood

Step 4: Posterior Probability

Rice

Wheat

Final Decision

Predicted Crop = Rice

Let us Model complete data set using Python Libraries

Load Dataset

Exploratory Data Analysis

We have 22 unique crops in our data with all crops having same percentage

Correlation Heatmap

Feature – Label Split

Train Test Split

Model Training

Model Evaluation

Prediction on New Data

Product

Resources

Company

Naive Bayes Crop Recommendation

What is Naive Bayes?

Posterior probability is the probability of a class after observing the data, calculated using Bayes’ theorem by combining prior probability and likelihood.

Bayes Theorem

Assumption of Naive Bayes

Gaussian Naive Bayes Formula

Multinomial Naive Bayes

Hand‑Calculated Naive Bayes Example (Step‑by‑Step)

Problem : Predict crop for:

Step 1: Prior Probabilities

Step 2: Feature Statistics (from training data)

Temperature

Humidity

Step 3: Likelihood Calculation

Temperature Likelihood

Humidity Likelihood

Step 4: Posterior Probability

Rice

Wheat

Final Decision

Predicted Crop = Rice

Let us Model complete data set using Python Libraries

Load Dataset

Exploratory Data Analysis

We have 22 unique crops in our data with all crops having same percentage

Correlation Heatmap

Feature – Label Split

Train Test Split

Model Training

Model Evaluation

Prediction on New Data

Rework on this sheet and share