Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/ML Regression Analysis/5 Non-Linear-Regression.ipynb
3074 views
Kernel: Python 3 (ipykernel)

Non Linear Regression Analysis

Non-linear regression is a statistical technique used to model relationships between variables when the data does not follow a straight-line (linear) pattern. Unlike linear regression, which assumes a linear relationship between independent (predictor) and dependent (response) variables, non-linear regression is used when the relationship is more complex.


Key Characteristics

  • The model parameters appear in a non-linear way.

  • Can handle curves such as exponential, logarithmic, logistic, and polynomial relationships.

  • More flexible than linear regression but requires iterative methods to estimate parameters (e.g., gradient descent).


General Non-linear Regression Model

y=f(x,β)+ϵy = f(x, \beta) + \epsilon

Where:

  • yy: dependent variable

  • xx: independent variable(s)

  • β\beta: parameters to estimate

  • f()f(\cdot): a non-linear function in parameters

  • ϵ\epsilon: error term

Example of a non-linear model:

y=β0eβ1x+ϵy = \beta_0 e^{\beta_1 x} + \epsilon

Examples of Non-linear Models

  1. Exponential Growth/Decay

    y=aebxy = a e^{bx}
  2. Logistic Growth (S-curve)

    y=a1+eb(xc)y = \frac{a}{1 + e^{-b(x-c)}}
  3. Michaelis–Menten (used in biology)

    y=VmaxxKm+xy = \frac{V_{\max} x}{K_m + x}

Steps to Perform Non-linear Regression

  1. Choose the functional form (based on domain knowledge or data pattern).

  2. Provide initial parameter estimates (important for convergence).

  3. Iteratively estimate parameters using optimization methods like:

    • Gauss-Newton algorithm

    • Levenberg–Marquardt algorithm

  4. Check model fit (R², residual plots, etc.).


Applications

  • Biological growth curves

  • Population studies

  • Pharmacokinetics

  • Economics (diminishing returns models)


Key Difference from Linear Regression

  • Linear regression: parameters enter the model linearly.

  • Non-linear regression: parameters appear in non-linear form.


image-2.png

How to check if a problem is linear on non-linear?

  • Inspect visually

  • Calculate correlation coefficient if it is greater than 0.7, then data is not fit for non-linear case

  • If model cannot be accurately fitted with linear parameters, then for better accuracy we have to switch for non-linear methods.

How to model data?

  • Convert to linear model;

  • Ploynominal regression

  • Non-linear regression

Example: Non-linear Regression (Polynomial)

import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures from sklearn.metrics import mean_squared_error, r2_score # Step 1: Create synthetic non-linear data np.random.seed(42) X = np.linspace(0, 10, 100).reshape(-1, 1) y = 0.5 * X**2 + X + 2 + np.random.normal(0, 2, X.shape)
# Step 2: Convert X to polynomial features (degree 2) poly = PolynomialFeatures(degree=2) X_poly = poly.fit_transform(X) # Step 3: Fit the model model = LinearRegression() model.fit(X_poly, y) y_pred = model.predict(X_poly) # Step 4: Evaluate print("R² Score:", r2_score(y, y_pred)) print("MSE:", mean_squared_error(y, y_pred)) # Step 5: Visualize plt.scatter(X, y, label='Original Data') plt.plot(X, y_pred, color='red', label='Polynomial Fit') plt.xlabel("X") plt.ylabel("y") plt.title("Non-linear Regression (Polynomial)") plt.legend() plt.show()
R² Score: 0.9900830823328581 MSE: 3.2471947501886613
Image in a Jupyter notebook

Importing required libraries

import numpy as np import matplotlib.pyplot as plt %matplotlib inline

Though Linear regression is very good to solve many problems, it cannot be used for all datasets. First recall how linear regression, could model a dataset. It models a linear relation between a dependent variable y and independent variable x. It had a simple equation, of degree 1, for example y = 2x2x + 3.

x = np.arange(-5.0, 5.0, 0.1)
x = np.arange(-5.0, 5.0, 0.1) ##You can adjust the slope and intercept to verify the changes in the graph y = 2*(x) + 3 y_noise = 2 * np.random.normal(size=x.size) ydata = y + y_noise #plt.figure(figsize=(8,6)) plt.plot(x, ydata, 'bo') plt.plot(x,y, 'r') plt.ylabel('Dependent Variable') plt.xlabel('Indepdendent Variable') plt.show()
Image in a Jupyter notebook

Non-linear regressions are a relationship between independent variables xx and a dependent variable yy which result in a non-linear function modeled data. Essentially any relationship that is not linear can be termed as non-linear, and is usually represented by the polynomial of kk degrees (maximum power of xx).

$$\ y = a x^3 + b x^2 + c x + d \$$

Non-linear functions can have elements like exponentials, logarithms, fractions, and others. For example: y=log(x) y = \log(x)

Or even, more complicated such as : y=log(ax3+bx2+cx+d) y = \log(a x^3 + b x^2 + c x + d)

Let's take a look at a cubic function's graph.

x = np.arange(-5.0, 5.0, 0.1) ##You can adjust the slope and intercept to verify the changes in the graph y = 1*(x**3) + 1*(x**2) + 1*x + 3 y_noise = 20 * np.random.normal(size=x.size) ydata = y + y_noise plt.plot(x, ydata, 'bo') plt.plot(x,y, 'r') plt.ylabel('Dependent Variable') plt.xlabel('Indepdendent Variable') plt.show()
Image in a Jupyter notebook

As you can see, this function has x3x^3 and x2x^2 as independent variables. Also, the graphic of this function is not a straight line over the 2D plane. So this is a non-linear function.

Some other types of non-linear functions are:

Quadratic

Y=X2Y = X^2
x = np.arange(-5.0, 5.0, 0.1) ##You can adjust the slope and intercept to verify the changes in the graph y = np.power(x,2) y_noise = 2 * np.random.normal(size=x.size) ydata = y + y_noise plt.plot(x, ydata, 'bo') plt.plot(x,y, 'r') plt.ylabel('Dependent Variable') plt.xlabel('Indepdendent Variable') plt.show()
Image in a Jupyter notebook

Exponential

An exponential function with base c is defined by Y=a+bcX Y = a + b c^X where b ≠0, c > 0 , c ≠1, and x is any real number. The base, c, is constant and the exponent, x, is a variable.

X = np.arange(-5.0, 5.0, 0.1) ##You can adjust the slope and intercept to verify the changes in the graph Y= np.exp(X) plt.plot(X,Y) plt.ylabel('Dependent Variable') plt.xlabel('Indepdendent Variable') plt.show()
Image in a Jupyter notebook

Logarithmic

The response yy is a results of applying logarithmic map from input xx's to output variable yy. It is one of the simplest form of log(): i.e. y=log(x) y = \log(x)

Please consider that instead of xx, we can use XX, which can be polynomial representation of the xx's. In general form it would be written as y=log(X)\begin{equation} y = \log(X) \end{equation}

X = np.arange(-5.0, 5.0, 0.1) Y = np.log(X) plt.plot(X,Y) plt.ylabel('Dependent Variable') plt.xlabel('Indepdendent Variable') plt.show()
C:\Users\Suyashi144893\AppData\Local\Temp\1\ipykernel_20720\945852816.py:3: RuntimeWarning: invalid value encountered in log Y = np.log(X)
Image in a Jupyter notebook

Sigmoidal/Logistic

Y=a+b1+c(Xd)Y = a + \frac{b}{1+ c^{(X-d)}}
X = np.arange(-5.0, 5.0, 0.1) Y = 1-4/(1+np.power(3, X-2)) plt.plot(X,Y) plt.ylabel('Dependent Variable') plt.xlabel('Indepdendent Variable') plt.show()
Image in a Jupyter notebook

Non-Linear Regression example

In this notebook, we fit a non-linear model to the datapoints corrensponding to Italy's GDP from 1960 to 2014. For an example, we're going to try and fit a non-linear model to the datapoints corresponding to Italy's GDP from 1960 to 2014.

import numpy as np import pandas as pd df = pd.read_csv("gdp.csv") df.head(10)

Plotting the Dataset

This is what the datapoints look like. It kind of looks like an either logistic or exponential function. The growth starts off slow, then from 2005 on forward, the growth is very significant. And finally, it decelerate slightly in the 2010s.

plt.figure(figsize=(8,5)) x_data, y_data = (df["Year"].values, df["Value"].values) plt.plot(x_data, y_data, 'ro') plt.ylabel('GDP') plt.xlabel('Year') plt.show()
Image in a Jupyter notebook

Choosing a model

From an initial look at the plot, we determine that the logistic function could be a good approximation, since it has the property of starting with a slow growth, increasing growth in the middle, and then decreasing again at the end; as illustrated below:

X = np.arange(-5.0, 5.0, 0.1) Y = 1.0 / (1.0 + np.exp(-X)) plt.plot(X,Y) plt.ylabel('Dependent Variable') plt.xlabel('Indepdendent Variable') plt.show()
Image in a Jupyter notebook

The formula for the logistic function is the following:

Y^=11+eβ1(Xβ2)\hat{Y} = \frac1{1+e^{\beta_1(X-\beta_2)}}

β1\beta_1: Controls the curve's steepness,

β2\beta_2: Slides the curve on the x-axis.

Building The Model

Now, let's build our regression model and initialize its parameters.

def sigmoid(x, Beta_1, Beta_2): y = 1 / (1 + np.exp(-Beta_1*(x-Beta_2))) return y

Lets look at a sample sigmoid line that might fit with the data:

beta_1 = 0.10 beta_2 = 1990.0 #logistic function Y_pred = sigmoid(x_data, beta_1 , beta_2) #plot initial prediction against datapoints plt.plot(x_data, Y_pred*15000000000000.) plt.plot(x_data, y_data, 'ro')
[<matplotlib.lines.Line2D at 0x1b50deb9890>]
Image in a Jupyter notebook

Our task here is to find the best parameters for our model. Lets first normalize our x and y:

1/300, 200/300, 300/300
(0.0033333333333333335, 0.6666666666666666, 1.0)
# Lets normalize our data xdata =x_data/max(x_data) ydata =y_data/max(y_data)

How we find the best parameters for our fit line?

we can use curve_fit which uses non-linear least squares to fit our sigmoid function, to data. Optimal values for the parameters so that the sum of the squared residuals of sigmoid(xdata, *popt) - ydata is minimized.

popt are our optimized parameters.

from scipy.optimize import curve_fit popt, pcov = curve_fit(sigmoid, xdata, ydata) #print the final parameters print(" beta_1 = %f, beta_2 = %f" % (popt[0], popt[1]))
beta_1 = 690.451712, beta_2 = 0.997207

Now we plot our resulting regression model.

x = np.linspace(1960, 2015, 55) x = x/max(x) plt.figure(figsize=(8,5)) y = sigmoid(x, *popt) plt.plot(xdata, ydata, 'ro', label='data') plt.plot(x,y, linewidth=3.0, label='fit') plt.legend(loc='best') plt.ylabel('GDP') plt.xlabel('Year') plt.show()
Image in a Jupyter notebook

Practice

Can you calculate what is the accuracy of our model?

# write your code here # split data into train/test msk = np.random.rand(len(df)) < 0.7 train_x = xdata[msk] test_x = xdata[~msk] train_y = ydata[msk] test_y = ydata[~msk] # build the model using train set popt, pcov = curve_fit(sigmoid, train_x, train_y) # predict using test set y_pred = sigmoid(test_x, *popt) # evaluation print("Mean absolute error: %.2f" % np.mean(np.absolute(y_pred - test_y))) print("Residual sum of squares (MSE): %.2f" % np.mean((y_pred - test_y) ** 2))
Mean absolute error: 0.02 Residual sum of squares (MSE): 0.00
from sklearn.metrics import r2_score print("R2-score: %.2f" % r2_score(y_pred , test_y) )
R2-score: 0.98

R2 value is not valid in case non-linear : R2 value explains the variance by the model to the total variance. R2 value will not range between 0 - 100% for non-linear model

When to Use Non-linear Regression:

  • The residuals (errors) of linear regression are not randomly distributed.

  • The relationship between variables is curved or follows known functional forms.

  • You want to model growth, decay, or saturation (e.g., biological systems, marketing funnels, etc.).

AspectLinear RegressionNon-linear Regression
EquationLinearPolynomial, Exponential, etc.
VisualizationStraight lineCurved line or complex shape
Scikit-learn methodLinearRegression()+ PolynomialFeatures or curve_fit
ComplexityLowMedium to high