Path: blob/master/ML Regression Analysis/3 Employee_Appraisal_Regression_Analysis.ipynb
3074 views
Employee Appraisal Regression Analysis
This notebook explores a dataset with employee appraisal data and builds a Linear Regression model.
Features used:
Rating
Behavior
Experience
Band (encoded as Band_Num)
Target:
Appraisal
Dataset Overview
Encode Categorical Feature (Band)
Visuals
A Pair Plot (or scatterplot matrix) is a visualization tool that shows relationships between multiple numerical variables in a dataset, plotted pairwise. It is widely used in Exploratory Data Analysis (EDA) to quickly identify correlations, trends, clusters, or outliers.
What It Shows:
Scatterplots: Each off-diagonal cell shows a scatterplot between two features.
Histograms / KDEs: The diagonal shows distributions (usually histograms or kernel density estimates) for each variable.
Color/Hue (optional): Categorical features can be used to color points and separate distributions by class (e.g., hue='Band').
Use Cases:
Spot linear or nonlinear relationships
Detect clusters or groupings
Observe correlations
Identify outliers
See distribution shapes of each feature
Limitations:
Can get crowded with too many variables (use with 5–6 numerical features max)
Doesn't show feature importance — just relationships
Data Preparation
Train Linear Regression Model
Linear equation for Salary Prediction
Appraisal = 10.2(rating) + 5(Exp) +1.5(be)+3(B_num) + 24.88
K-Fold Cross-Validation
K-Fold Cross Validation is a technique used to evaluate the performance of a machine learning model by splitting the dataset into K equal parts (folds):
The model is trained on K-1 folds and validated on the remaining fold.
This process is repeated K times, each time using a different fold as the validation set.
The final performance is the average of all K validation scores.
Why use it?
It helps reduce overfitting.
Provides a more robust estimate of model performance.
Ensures that every data point is used for both training and validation.
Insights & Conclusion
Experience and Rating positively impact Appraisal.
Appraisal is somewhat influenced by Band.
Linear Regression performs reasonably well.
Cross-validation helps validate the model across different subsets.
Quick Practice:
Check Model Performance taling two paramters, Three parameters,
Vary test and train Split
Name your Model as Appraisalsys