Kernel: Python 3 (ipykernel)
Machine Learning with PyTorch and Scikit-Learn
-- Code Examples
Package version checks
Add folder to path in order to load from the check_packages.py script:
In [1]:
Check recommended package versions:
In [2]:
Out[2]:
[OK] Your Python version is 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:24:02)
[Clang 11.1.0 ]
[OK] numpy 1.22.1
[OK] mlxtend 0.19.0
[OK] matplotlib 3.5.1
[OK] sklearn 1.0.2
[OK] pandas 1.4.0
Chapter 09 - Predicting Continuous Target Variables with Regression Analysis
Overview
In [3]:
Introducing linear regression
Simple linear regression
In [4]:
Out[4]:
Multiple linear regression
In [5]:
Out[5]:
Exploring the Ames Housing dataset
Loading the Ames Housing dataset into a data frame
Dataset source: http://jse.amstat.org/v19n3/decock/AmesHousing.txt
Dataset documentation: http://jse.amstat.org/v19n3/decock/DataDocumentation.txt
Dataset write-up: http://jse.amstat.org/v19n3/decock.pdf
'Overall Qual'
: Rates the overall material and finish of the house'Overall Cond'
: Rates the overall condition of the house'Gr Liv Area'
: Above grade (ground) living area square feet'Central Air'
: Central air conditioning'Total Bsmt SF'
: Total square feet of basement area'SalePrice'
: Sale price $$
In [6]:
Out[6]:
In [7]:
Out[7]:
(2930, 6)
In [8]:
In [9]:
Out[9]:
Overall Qual 0
Overall Cond 0
Total Bsmt SF 1
Central Air 0
Gr Liv Area 0
SalePrice 0
dtype: int64
In [10]:
Out[10]:
Overall Qual 0
Overall Cond 0
Total Bsmt SF 0
Central Air 0
Gr Liv Area 0
SalePrice 0
dtype: int64
Visualizing the important characteristics of a dataset
In [11]:
In [12]:
Out[12]:
In [13]:
Out[13]:
Implementing an ordinary least squares linear regression model
...
Solving regression for regression parameters with gradient descent
In [14]:
In [15]:
In [16]:
In [17]:
Out[17]:
<__main__.LinearRegressionGD at 0x13a60faf0>
In [18]:
Out[18]:
In [19]:
In [20]:
Out[20]:
In [21]:
Out[21]:
Sale price: $292507.07
In [22]:
Out[22]:
Slope: 0.707
Intercept: -0.000
Estimating the coefficient of a regression model via scikit-learn
In [23]:
In [24]:
Out[24]:
Slope: 111.666
Intercept: 13342.979
In [25]:
Out[25]:
Normal Equations alternative:
In [26]:
Out[26]:
Slope: 111.666
Intercept: 13342.979
Fitting a robust regression model using RANSAC
In [27]:
Out[27]:
In [28]:
Out[28]:
Slope: 106.348
Intercept: 20190.093
In [38]:
Out[38]:
37000.0
In [39]:
Out[39]:
In [31]:
Out[31]:
Slope: 105.631
Intercept: 18314.587
Evaluating the performance of linear regression models
In [32]:
In [33]:
In [34]:
Out[34]:
In [35]:
Out[35]:
MSE train: 1497216245.85
MSE test: 1516565821.00
In [36]:
Out[36]:
MAE train: 25983.03
MAE test: 24921.29
In [37]:
Out[37]:
R^2 train: 0.77
R^2 test: 0.75
Using regularized methods for regression
In [38]:
Out[38]:
[26251.38276394 804.70816337 41.94651964 11364.80761309
55.67855548]
In [39]:
Out[39]:
MSE train: 1497216262.014, test: 1516576825.348
R^2 train: 0.769, 0.752
Ridge regression:
In [40]:
LASSO regression:
In [41]:
Elastic Net regression:
In [42]:
Turning a linear regression model into a curve - polynomial regression
In [43]:
In [44]:
In [45]:
Out[45]:
In [46]:
In [47]:
Out[47]:
Training MSE linear: 569.780, quadratic: 61.330
Training R^2 linear: 0.832, quadratic: 0.982
Modeling nonlinear relationships in the Ames Housing dataset
In [48]:
Out[48]:
In [49]:
Out[49]:
Dealing with nonlinear relationships using random forests
...
Decision tree regression
In [50]:
Out[50]:
In [51]:
Out[51]:
0.5144569334885711
Random forest regression
In [52]:
In [53]:
Out[53]:
MAE train: 8305.18
MAE test: 20821.77
R^2 train: 0.98
R^2 test: 0.85
In [54]:
Out[54]:
Summary
...
Readers may ignore the next cell.
In [1]:
Out[1]:
[NbConvertApp] WARNING | Config option `kernel_spec_manager_class` not recognized by `NbConvertApp`.
[NbConvertApp] Converting notebook ch09.ipynb to script
[NbConvertApp] Writing 20411 bytes to ch09.py
In [ ]: