Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Data Analysis using Python/Numpy for Simple Linear Regression.ipynb
3047 views
Kernel: Python 3

Linear regression is a statistical approach for modelling relationship between a dependent variable with a given set of independent variables.

  • Simple Linear Regression Simple linear regression is an approach for predicting a response using a single feature.

WHY Linear Regression?

  • To find the parameters so that the model best fits the data.

  • Forecasting an effect

  • Determing a Trend

How do we determine the best fit line?

  • The line for which the the error between the predicted values and the observed values is minimum is called the best fit line or the regression line. These errors are also called as residuals.

  • The residuals can be visualized by the vertical lines from the observed data value to the regression line.

image.png

Ques - Find a linear function that predicts the response value(y) as accurately as possible as a function of the feature or independent variable(x).

x = [9, 10, 11, 12, 10, 9, 9, 10, 12, 11] y = [10, 11, 14, 13, 15, 11, 12, 11, 13, 15]

x as feature vector, i.e x = [x_1, x_2, …., x_n],

y as response vector, i.e y = [y_1, y_2, …., y_n]

for n observations (in above example, n=10).

image.png

import matplotlib.pyplot as plt import pandas as pd import numpy as np %matplotlib inline x = [9, 10, 11, 12, 10, 9, 9, 10, 12, 11] y = [10, 11, 14, 13, 15, 11, 12, 11, 13, 15] plt.scatter(x,y, edgecolors='r') plt.xlabel('feature vector',color="r") plt.ylabel('response vector',color="g") plt.show()
Image in a Jupyter notebook
import numpy as np import matplotlib.pyplot as plt x=np.array(x) y=np.array(y) def estimate_coef(x, y): # number of observations/points n = np.size(x) # mean of x and y vector m_x, m_y = np.mean(x), np.mean(y) # calculating cross-deviation and deviation about x SS_xy = np.sum(y*x) - n*m_y*m_x SS_xx = np.sum(x*x) - n*m_x*m_x # calculating regression coefficients b_1 = SS_xy / SS_xx b_0 = m_y - b_1*m_x return(b_0, b_1) def plot_regression_line(x, y, b): # plotting the actual points as scatter plot plt.scatter(x, y, color = "m", marker = "o", s = 30) # predicted response vector y_pred = b[0] + b[1]*x # plotting the regression line plt.plot(x, y_pred, color = "g") # putting labels plt.xlabel('x') plt.ylabel('y') # function to show plot plt.show() def main(): # observations x =np.array([9, 10, 11, 12, 10, 9, 9, 10, 12, 11]) y =np.array([10, 11, 14, 13, 15, 11, 12, 11, 13, 15]) # estimating coefficients b = estimate_coef(x, y) print("Estimated coefficients:\nb_0 = {} \ \nb_1 = {}".format(b[0], b[1])) # plotting regression line plot_regression_line(x, y, b) if __name__ == "__main__": main()
Estimated coefficients: b_0 = 3.5619834710743117 \ b_1 = 0.8677685950413289
Image in a Jupyter notebook