Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Data Science Essentials for Data Analysts/Naive_Bayes_Crop_Recommendation .ipynb
7216 views
Kernel: Python 3 (ipykernel)

Naive Bayes Crop Recommendation

What is Naive Bayes?

Naive Bayes is a probabilistic machine learning algorithm based on Bayes’ Theorem.It predicts the class with the highest posterior probability given input features.The main idea behind the Naive Bayes classifier is to use Bayes' Theorem to classify data based on the probabilities of different classes given the features of the data. It is used mostly in high-dimensional text classification

Posterior probability is the probability of a class after observing the data, calculated using Bayes’ theorem by combining prior probability and likelihood.

Bayes Theorem

Bayes’ Theorem provides a principled way to reverse conditional probabilities. It is defined as:

%7B676B39EF-17FD-4F14-8446-C49737D002C9%7D.png

Assumption of Naive Bayes

  • Feature independence: This means that when we are trying to classify something, we assume that each feature (or piece of information) in the data does not affect any other feature.

  • Continuous features are normally distributed: If a feature is continuous, then it is assumed to be normally distributed within each class.

  • Discrete features have multinomial distributions: If a feature is discrete, then it is assumed to have a multinomial distribution within each class.

  • Features are equally important: All features are assumed to contribute equally to the prediction of the class label.

  • No missing data: The data should not contain any missing values.

Gaussian Naive Bayes Formula

In Gaussian Naive Bayes, continuous values associated with each feature are assumed to be distributed according to a Gaussian distribution. A Gaussian distribution is also called Normal distribution When plotted, it gives a bell shaped curve which is symmetric about the mean of the feature values as shown below:

Multinomial Naive Bayes

Multinomial Naive Bayes is used when features represent the frequency of terms (such as word counts) in a document. It is commonly applied in text classification, where term frequencies are important.

Hand‑Calculated Naive Bayes Example (Step‑by‑Step)

Problem : Predict crop for:

  • Temperature = 24°C

  • Humidity = 80%

Classes: Rice, Wheat


Step 1: Prior Probabilities

CropSamplesPrior P(C)
Rice600.60
Wheat400.40

Step 2: Feature Statistics (from training data)

Temperature

CropMean (μ)Variance (σ²)
Rice224
Wheat269

Humidity

CropMean (μ)Variance (σ²)
Rice824
Wheat7016

Step 3: Likelihood Calculation

Temperature Likelihood

[ P(24|Rice) = 0.121 ]

[ P(24|Wheat) = 0.106 ]

Humidity Likelihood

[ P(80|Rice) = 0.121 ]

[ P(80|Wheat) = 0.020 ]


Step 4: Posterior Probability

Rice

[ P(Rice|X) = 0.60 \times 0.121 \times 0.121 = 0.00878 ]

Wheat

[ P(Wheat|X) = 0.40 \times 0.106 \times 0.020 = 0.00085 ]


Final Decision

CropPosterior Probability
Rice0.00878
Wheat0.00085

Predicted Crop = Rice

This is exactly what Gaussian Naive Bayes computes internally.

Let us Model complete data set using Python Libraries

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.naive_bayes import GaussianNB from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Load Dataset

df = pd.read_csv("Crop_recommendation.csv") df.head()

Exploratory Data Analysis

df.describe()

Import Data Prep it (null, shape,delete, add) Visualize Model Training Model Predict Model Evaluate Model Correction Test it on New Data and finally deploy it for pilot

df.describe(include="object")

We have 22 unique crops in our data with all crops having same percentage

df.hist(figsize=(14,10)) plt.show()
Image in a Jupyter notebook

Correlation Heatmap

plt.figure(figsize=(10,8)) sns.heatmap(df.drop('label', axis=1).corr(), annot=True, cmap='coolwarm') plt.show()
Image in a Jupyter notebook

Feature – Label Split

X = df.drop("label", axis=1) y = df["label"]

Train Test Split

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y )

Model Training

model = GaussianNB() model.fit(X_train, y_train)

Model Evaluation

y_pred = model.predict(X_test) print("Accuracy:", accuracy_score(y_test, y_pred)) print(classification_report(y_test, y_pred))
Accuracy: 0.9954545454545455 precision recall f1-score support apple 1.00 1.00 1.00 20 banana 1.00 1.00 1.00 20 blackgram 1.00 1.00 1.00 20 chickpea 1.00 1.00 1.00 20 coconut 1.00 1.00 1.00 20 coffee 1.00 1.00 1.00 20 cotton 1.00 1.00 1.00 20 grapes 1.00 1.00 1.00 20 jute 0.91 1.00 0.95 20 kidneybeans 1.00 1.00 1.00 20 lentil 1.00 1.00 1.00 20 maize 1.00 1.00 1.00 20 mango 1.00 1.00 1.00 20 mothbeans 1.00 1.00 1.00 20 mungbean 1.00 1.00 1.00 20 muskmelon 1.00 1.00 1.00 20 orange 1.00 1.00 1.00 20 papaya 1.00 1.00 1.00 20 pigeonpeas 1.00 1.00 1.00 20 pomegranate 1.00 1.00 1.00 20 rice 1.00 0.90 0.95 20 watermelon 1.00 1.00 1.00 20 accuracy 1.00 440 macro avg 1.00 1.00 1.00 440 weighted avg 1.00 1.00 1.00 440

Prediction on New Data

new_data = pd.DataFrame({ "N": [85], "P": [40], "K": [45], "temperature": [22], "humidity": [81], "ph": [6.8], "rainfall": [210] }) model.predict(new_data)
array(['rice'], dtype='<U11')

Rework on this sheet and share

  • Insights after every block

  • Select only not correlated features

  • Add few more visuals

  • Name your Model as Crop_Pred

  • Update this using K-Fold