Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/ML Regression Analysis/ R² (R-squared) in Regression Analysis.ipynb
3074 views
Kernel: Python 3 (ipykernel)

R² (Coefficient of Determination) means in regression — specifically, how much variance in the data is captured by the regression line.

#Importing Required Libraries import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score

Explanation:

  • numpy: for generating and manipulating numeric arrays.

  • matplotlib.pyplot: for plotting.

  • LinearRegression: to create a regression model.

  • r2_score: to calculate the R² value.

#Generating Synthetic Dataset np.random.seed(1) X = 2 * np.random.rand(50, 1) y = 5 + 2 * X + np.random.randn(50, 1)

This simulates a linear relationship:

y=5+2X+noise

  • np.random.randn adds random noise to mimic real-world imperfections.

  • X is 2D because sklearn expects it that way.

## Fitting the Linear Regression Model model = LinearRegression() model.fit(X, y) y_pred = model.predict(X)
  • We fit the regression line to the data using LinearRegression.

  • y_pred contains the predicted values based on the model

## Flattening the Arrays for Plotting y = y.ravel() y_pred = y_pred.ravel()
  • y and y_pred are originally in shape (50, 1) → we flatten them to (50,) to avoid shape mismatch in plotting.

### Mean of y and R² Calculation y_mean = np.mean(y) r2 = r2_score(y, y_pred)

y_mean is used to calculate total variance in the actual data (SS_tot).

r2_score() computes the R² value: %7BABED71B2-E9CF-438A-B337-BF95E8A9AD41%7D.png

## Plotting the Data, Regression Line, and Mean plt.figure(figsize=(10, 6)) plt.scatter(X, y, color='blue', label='Actual Points') plt.plot(X, y_pred, color='red', label='Regression Line', linewidth=2) plt.axhline(y_mean, color='green', linestyle='--', label='Mean of y')
<matplotlib.lines.Line2D at 0x1f0e3acb5d0>
Image in a Jupyter notebook
  • scatter(): plots actual data points.

  • plot(): shows the regression line.

  • axhline(): shows the mean of actual y values (used in 𝑆 𝑆 𝑡 𝑜 𝑡 SS tot ).

For each data point:

  • Green dotted line: from actual y to mean → part of 𝑆 𝑆 𝑡 𝑜 𝑡 SS tot

  • Black dashed line: from actual y to predicted y → part of 𝑆 𝑆 𝑟 𝑒 𝑠 SS res

  • These lines visually explain what R² measures:

R² = How much of the green line lengths (total variance) is explained by the model (shorter black lines = better fit)

  • Adds labels, title, grid, legend.

  • Displays the R² score in the title for clarity. Component Meaning Line Color SS_tot Total variance in actual y Green (to mean) SS_res Variance not explained by model Black (residual) R² Score 1 𝑆 𝑆 𝑟 𝑒 𝑠 𝑆 𝑆 𝑡 𝑜 𝑡 1− SS tot

SS res

Higher = better fit Regression Line Best fit line (predictions) Red

plt.figure(figsize=(10, 6)) # Actual points plt.scatter(X, y, color='blue', label='Actual Points') # Regression Line plt.plot(X, y_pred, color='red', label='Regression Line', linewidth=2) # Mean of y (for SS_tot) plt.axhline(y_mean, color='green', linestyle='--', label='Mean of y') # Visual SS_tot and SS_res lines for i in range(len(X)): plt.plot([X[i], X[i]], [y[i], y_mean], color='green', linestyle='dotted', alpha=0.3) plt.plot([X[i], X[i]], [y[i], y_pred[i]], color='black', linestyle='dashed', alpha=0.4) # Add legend — now will work since all plots have labels plt.title(f"Visualizing R²: {r2:.3f} (SS_tot vs SS_res)") plt.xlabel("X") plt.ylabel("y") plt.legend() plt.grid(True) plt.tight_layout() plt.show()
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[7], line 14 12 # Visual SS_tot and SS_res lines 13 for i in range(len(X)): ---> 14 plt.plot([X[i], X[i]], [y[i], y_mean], color='green', linestyle='dotted', alpha=0.3) 15 plt.plot([X[i], X[i]], [y[i], y_pred[i]], color='black', linestyle='dashed', alpha=0.4) 17 # Add legend — now will work since all plots have labels
File ~\AppData\Local\anaconda3\Lib\site-packages\matplotlib\pyplot.py:2812, in plot(scalex, scaley, data, *args, **kwargs) 2810 @_copy_docstring_and_deprecators(Axes.plot) 2811 def plot(*args, scalex=True, scaley=True, data=None, **kwargs): -> 2812 return gca().plot( 2813 *args, scalex=scalex, scaley=scaley, 2814 **({"data": data} if data is not None else {}), **kwargs)
File ~\AppData\Local\anaconda3\Lib\site-packages\matplotlib\axes\_axes.py:1688, in Axes.plot(self, scalex, scaley, data, *args, **kwargs) 1445 """ 1446 Plot y versus x as lines and/or markers. 1447 (...) 1685 (``'green'``) or hex strings (``'#008000'``). 1686 """ 1687 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D) -> 1688 lines = [*self._get_lines(*args, data=data, **kwargs)] 1689 for line in lines: 1690 self.add_line(line)
File ~\AppData\Local\anaconda3\Lib\site-packages\matplotlib\axes\_base.py:311, in _process_plot_var_args.__call__(self, data, *args, **kwargs) 309 this += args[0], 310 args = args[1:] --> 311 yield from self._plot_args( 312 this, kwargs, ambiguous_fmt_datakey=ambiguous_fmt_datakey)
File ~\AppData\Local\anaconda3\Lib\site-packages\matplotlib\axes\_base.py:494, in _process_plot_var_args._plot_args(self, tup, kwargs, return_kwargs, ambiguous_fmt_datakey) 492 if len(xy) == 2: 493 x = _check_1d(xy[0]) --> 494 y = _check_1d(xy[1]) 495 else: 496 x, y = index_of(xy[-1])
File ~\AppData\Local\anaconda3\Lib\site-packages\matplotlib\cbook\__init__.py:1353, in _check_1d(x) 1347 # plot requires `shape` and `ndim`. If passed an 1348 # object that doesn't provide them, then force to numpy array. 1349 # Note this will strip unit information. 1350 if (not hasattr(x, 'shape') or 1351 not hasattr(x, 'ndim') or 1352 len(x.shape) < 1): -> 1353 return np.atleast_1d(x) 1354 else: 1355 return x
File <__array_function__ internals>:200, in atleast_1d(*args, **kwargs)
File ~\AppData\Local\anaconda3\Lib\site-packages\numpy\core\shape_base.py:65, in atleast_1d(*arys) 63 res = [] 64 for ary in arys: ---> 65 ary = asanyarray(ary) 66 if ary.ndim == 0: 67 result = ary.reshape(1)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
Image in a Jupyter notebook