Path: blob/master/ML Regression Analysis/ R² (R-squared) in Regression Analysis.ipynb
3074 views
R² (Coefficient of Determination) means in regression — specifically, how much variance in the data is captured by the regression line.
Explanation:
numpy: for generating and manipulating numeric arrays.
matplotlib.pyplot: for plotting.
LinearRegression: to create a regression model.
r2_score: to calculate the R² value.
This simulates a linear relationship:
y=5+2X+noise
np.random.randn adds random noise to mimic real-world imperfections.
X is 2D because sklearn expects it that way.
We fit the regression line to the data using LinearRegression.
y_pred contains the predicted values based on the model
y and y_pred are originally in shape (50, 1) → we flatten them to (50,) to avoid shape mismatch in plotting.
y_mean is used to calculate total variance in the actual data (SS_tot).
r2_score() computes the R² value:
scatter(): plots actual data points.
plot(): shows the regression line.
axhline(): shows the mean of actual y values (used in 𝑆 𝑆 𝑡 𝑜 𝑡 SS tot ).
For each data point:
Green dotted line: from actual y to mean → part of 𝑆 𝑆 𝑡 𝑜 𝑡 SS tot
Black dashed line: from actual y to predicted y → part of 𝑆 𝑆 𝑟 𝑒 𝑠 SS res
These lines visually explain what R² measures:
R² = How much of the green line lengths (total variance) is explained by the model (shorter black lines = better fit)
Adds labels, title, grid, legend.
Displays the R² score in the title for clarity. Component Meaning Line Color SS_tot Total variance in actual y Green (to mean) SS_res Variance not explained by model Black (residual) R² Score 1 − 𝑆 𝑆 𝑟 𝑒 𝑠 𝑆 𝑆 𝑡 𝑜 𝑡 1− SS tot
SS res
Higher = better fit Regression Line Best fit line (predictions) Red
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[7], line 14
12 # Visual SS_tot and SS_res lines
13 for i in range(len(X)):
---> 14 plt.plot([X[i], X[i]], [y[i], y_mean], color='green', linestyle='dotted', alpha=0.3)
15 plt.plot([X[i], X[i]], [y[i], y_pred[i]], color='black', linestyle='dashed', alpha=0.4)
17 # Add legend — now will work since all plots have labels
, in plot(scalex, scaley, data, *args, **kwargs)
2810 @_copy_docstring_and_deprecators(Axes.plot)
2811 def plot(*args, scalex=True, scaley=True, data=None, **kwargs):
-> 2812 return gca().plot(
2813 *args, scalex=scalex, scaley=scaley,
2814 **({"data": data} if data is not None else {}), **kwargs)
, in Axes.plot(self, scalex, scaley, data, *args, **kwargs)
1445 """
1446 Plot y versus x as lines and/or markers.
1447
(...)
1685 (``'green'``) or hex strings (``'#008000'``).
1686 """
1687 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D)
-> 1688 lines = [*self._get_lines(*args, data=data, **kwargs)]
1689 for line in lines:
1690 self.add_line(line)
, in _process_plot_var_args.__call__(self, data, *args, **kwargs)
309 this += args[0],
310 args = args[1:]
--> 311 yield from self._plot_args(
312 this, kwargs, ambiguous_fmt_datakey=ambiguous_fmt_datakey)
, in _process_plot_var_args._plot_args(self, tup, kwargs, return_kwargs, ambiguous_fmt_datakey)
492 if len(xy) == 2:
493 x = _check_1d(xy[0])
--> 494 y = _check_1d(xy[1])
495 else:
496 x, y = index_of(xy[-1])
, in _check_1d(x)
1347 # plot requires `shape` and `ndim`. If passed an
1348 # object that doesn't provide them, then force to numpy array.
1349 # Note this will strip unit information.
1350 if (not hasattr(x, 'shape') or
1351 not hasattr(x, 'ndim') or
1352 len(x.shape) < 1):
-> 1353 return np.atleast_1d(x)
1354 else:
1355 return x
, in atleast_1d(*args, **kwargs)
, in atleast_1d(*arys)
63 res = []
64 for ary in arys:
---> 65 ary = asanyarray(ary)
66 if ary.ndim == 0:
67 result = ary.reshape(1)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.