Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
rasbt
GitHub Repository: rasbt/machine-learning-book
Path: blob/main/ch04/README.md
1247 views

Python Machine Learning - Code Examples

Chapter 4: Building Good Training Datasets – Data Preprocessing

Chapter Outline

  • Dealing with missing data

    • Identifying missing values in tabular data

    • Eliminating training examples or features with missing values

    • Imputing missing values

    • Understanding the scikit-learn estimator API

  • Handling categorical data

    • Nominal and ordinal features

    • Creating an example dataset

    • Mapping ordinal features

    • Encoding class labels

    • Performing one-hot encoding on nominal features

  • Partitioning a dataset into separate training and test sets

  • Bringing features onto the same scale

  • Selecting meaningful features

    • L1 and L2 regularization as penalties against model complexity

    • A geometric interpretation of L2 regularization

    • Sparse solutions with L1 regularization

    • Sequential feature selection algorithms

  • Assessing feature importance with random forests

  • Summary

Please refer to the README.md file in ../ch01 for more information about running the code examples.