Path: blob/master/lessons/lesson_10-sub-Jacob_Koehler/04-Regression-Regularization.ipynb
1904 views
Regularized Methods
Feature Scaling
Test/Train split
Ridge, LASSO, Elastic Net Regression methods
In a regular linear scenario, we start with a regular linear function.
The mean square error of these predictions would be given by:
From this basic formulation, we can introduce some Regularized methods that add a regularization term to the . We will look at three methods that offer slight variations on this term.
Feature Scaling
To use these methods, we want to scale our data. Many Machine Learning algorithms don't do well with data operating on very different scales. Using the MinMaxScaler
normalizes the data and brings the values between 0 and 1. The StandardScaler
method is less sensitive to wide ranges of values. We will use both on our Ames housing data. To begin, we need to select the numeric columns from the DataFrame so we can transform them only.
Using the Scaler on a DataFrame
Below, we can compare the results of the two scaling transformations by passing a list of column names to the scaler. Note the practice of initializing the object, fitting it, and transforming.
Fit a Linear Model on Scaled Data
Splitting the Data
As we have seen, we will tend to overfit the data if we use the entire dataset to determine the model. To account for this, we will split our datasets into a training set to build our model on, and a test set to evaluate the performance of the model. We have a handy sklearn method for doing this, who by default splits the data into 80% for training and 20% for testing.
Regularized Methods Comparison
Ridge Regression
Many feature coefficients will be determined with small values. Larger means larger penalty, zero is base LinearRegression, and the default for sklearn's implementation is 1.0.
Lasso Regression
Now, we end up in effect setting variables with low influence to a coefficient of zero. Compared to Ridge, we would use Lasso if there are only a few variables with substantial effects.
Elastic Net
PROBLEM
Return to your Ames Data. We have covered a lot of ground today, so let's summarize the things we could do to improve the performance of our original model that compared the Above Ground Living Area to the Logarithm of the Sale Price.
Additional Resources
The last two lessons have pulled heavily from these resources. I recommend them all strongly as excellent resources:
SciKitLearn documentation on Regression: http://scikit-learn.org/stable/supervised_learning.html#supervised-learning
Aurelien Geron, Hands on Machine Learning with SciKitLearn and TensorFlow
James et. al, An Introduction to Statistical Learning: With Applications in R
Philipp K. Janert, Data Analysis with OpenSource Tools
University of Michigan Coursera Class on Machine Learning with SciKitLearn: https://www.coursera.org/learn/python-machine-learning
Stanford University course on Machine Learning: https://www.coursera.org/learn/machine-learning