Path: blob/master/ML Regression Analysis/Step-by-Step Implementation for Polynomial Regression on Fuel Data.ipynb
3074 views
Handle Categorical Variables:
If the dataset contains categorical variables, you need to convert them to numerical values. One common way is to use LabelEncoder or one-hot encoding. Here's an example of using LabelEncoder for a column:
Y=mx+c Y=mx2+mx+1+c
1. Model Performance Metrics
Mean Squared Error (MSE): 219.32
What it means: On average, the squared difference between the predicted and actual CO2 emissions is 219.32.
Interpretation: Lower MSE means better model accuracy. An MSE of 219 is reasonable depending on your CO2 emission scale (which in your case ranges roughly between 100–400 g/km).
📌 MSE is in squared units of the target variable (
g/km²
).
R² Score: e.g., 0.8772
(if you got this earlier)
What it means: Your model explains ~87.72% of the variability in CO2 emissions based on input features.
Interpretation:
R² = 1
means perfect fit.R² = 0
means the model performs no better than a horizontal line (mean of target).R² ≈ 0.87
means it's a good fit, though there's still ~12% variance not captured — maybe due to noise or unmodeled factors like vehicle weight or fuel type.
2. Graph – Regression Curve
You plotted Engine Size vs CO2 Emissions:
Red points: Actual CO2 values from the test set.
Blue points: Predicted CO2 values from your model.
If the blue dots closely follow the red, the model is generalizing well. If there’s a large scatter or pattern, the model may be underfitting or overfitting.
Interpretation: A vehicle with these specs is estimated to emit ~260 grams of CO2 per kilometer, based on the training data and polynomial regression.
This prediction relies on patterns learned from the dataset — not real-world physics.
4. What Could Improve the Model?
Issue | Recommendation |
---|---|
Slight underfit | Try degree=3 in PolynomialFeatures |
Overfit | Try Ridge or Lasso regression |
Features too correlated | Use PCA or remove redundant variables |
Dataset limited | More rows (real data), include new features (e.g. vehicle weight, fuel type) |