Path: blob/master/ML Regression Analysis/2 Numpy for Linear Regression .ipynb
3074 views
What is Linear regression?
Linear regression is an approach for modeling the relationship between two (simple linear regression) or more variables (multiple linear regression).
In simple linear regression, one variable is considered the predictor or independent variable, while the other variable is viewed as the outcome or dependent variable.
Component | Description |
---|---|
Formula | |
Where: | |
Dependent variable (target) | |
Intercept (bias term) | |
Coefficients (slopes for each independent variable) | |
Independent variables (features) | |
Error term (residual) |
Example Use Cases
Use Case | Description |
---|---|
House Price Prediction | Predict house price based on size, location, number of bedrooms, etc. |
Sales Forecasting | Estimate future sales using past sales data, advertising spend, and seasonality |
Student Performance Prediction | Predict exam scores based on hours studied, attendance, and prior grades |
Health Risk Assessment | Estimate risk score based on age, BMI, smoking habits, and family history |
Energy Consumption Estimation | Predict electricity usage from temperature, time of day, and appliance use |
WHY Linear Regression?
To find the parameters so that the model best fits the data.
Forecasting an effect
Determing a Trend
Assumptions of Linear Regression
Linear relationship. One of the most important assumptions is that a linear relationship is said to exist between the dependent and the independent variables
No auto-correlation or independence. The residuals (error terms) are independent of each other. In other words, there is no correlation between the consecutive error terms of the time series data
No Multicollinearity. The independent variables shouldn’t be correlated. If multicollinearity exists between the independent variables, it is challenging to predict the outcome of the model
Homoscedasticity. Homoscedasticity means the residuals have constant variance at every level of x. The absence of this phenomenon is known as heteroscedasticity
Normal distribution of error terms. The last assumption that needs to be checked for linear regression is the error terms’ normal distribution
Weight(Y) = b1(Height(x))+b0
bo=y-b1(height) b1=y-bo/x b1= b0
Mathematical Approach
Simple Linear Regression
Lets assume that the two variables are linearly related.
Find a linear function that predicts the response value(y) as accurately as possible as a function of the feature or independent variable(x).
x = [9, 10, 11, 12, 10, 9, 9, 10, 12, 11] y = [10, 11, 14, 13, 15, 11, 12, 11, 13, 15]
x as feature vector, i.e x = [x_1, x_2, …., x_n],
y as response vector, i.e y = [y_1, y_2, …., y_n]
for n observations (in above example, n=10).
Now, the task is to find a line which fits best in above scatter plot so that we can predict the response for any new feature values. (i.e a value of x not present in dataset) This line is called regression line.
Here,
h(xi) represents the predicted response value for ith observation. b(0) and b(1) are regression coefficients and represent y-intercept and slope of regression line respectively.
where (SSxx) is the sum of cross-deviations of y and x:
Predict Weight for a given Age
Converting X and Y into array using Numpy
Creating a function to determine regression coef
To Plot Regression line
Conclusion
Weight= 2.21(Age) + 1.34
Define the linear equation:
y=5x+4
Add noise:
y=5x+4+ε, where 𝜀 ∼ 𝑁 ( 0 , 𝜎 2 ) ε∼N(0,σ 2 )
Generate data points: Create a set of x values and compute corresponding noisy y values.