Python Machine Learning - Code Examples
Chapter 4: Building Good Training Datasets – Data Preprocessing
Chapter Outline
Dealing with missing data
Identifying missing values in tabular data
Eliminating training examples or features with missing values
Imputing missing values
Understanding the scikit-learn estimator API
Handling categorical data
Nominal and ordinal features
Creating an example dataset
Mapping ordinal features
Encoding class labels
Performing one-hot encoding on nominal features
Partitioning a dataset into separate training and test sets
Bringing features onto the same scale
Selecting meaningful features
L1 and L2 regularization as penalties against model complexity
A geometric interpretation of L2 regularization
Sparse solutions with L1 regularization
Sequential feature selection algorithms
Assessing feature importance with random forests
Summary
Please refer to the README.md file in ../ch01
for more information about running the code examples.