Python Machine Learning - Code Examples


##  Chapter 4: Building Good Training Datasets – Data Preprocessing

### Chapter Outline

- Dealing with missing data
  - Identifying missing values in tabular data
  - Eliminating training examples or features with missing values
  - Imputing missing values
  - Understanding the scikit-learn estimator API
- Handling categorical data
  - Nominal and ordinal features
  - Creating an example dataset
  - Mapping ordinal features
  - Encoding class labels
  - Performing one-hot encoding on nominal features
- Partitioning a dataset into separate training and test sets
- Bringing features onto the same scale
- Selecting meaningful features
  - L1 and L2 regularization as penalties against model complexity
  - A geometric interpretation of L2 regularization
  - Sparse solutions with L1 regularization
  - Sequential feature selection algorithms
- Assessing feature importance with random forests
- Summary



**Please refer to the [README.md](../ch01/README.md) file in [`../ch01`](../ch01) for more information about running the code examples.**



Collaborative Calculation and Data Science

colby

sagemath

Chapter 4: Building Good Training Datasets – Data Preprocessing

Chapter Outline

Product

Resources

Company