Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
rasbt
GitHub Repository: rasbt/machine-learning-book
Path: blob/main/ch08/README.md
1247 views

Chapter 8: Applying Machine Learning to Sentiment Analysis

Chapter Outline

  • Preparing the IMDb movie review data for text processing

    • Obtaining the IMDb movie review dataset

    • Preprocessing the movie dataset into more convenient format

  • Introducing the bag-of-words model

    • Transforming words into feature vectors

    • Assessing word relevancy via term frequency-inverse document frequency

    • Cleaning text data

    • Processing documents into tokens

  • Training a logistic regression model for document classification

  • Working with bigger data – online algorithms and out-of-core learning

  • Topic modeling

    • Decomposing text documents with Latent Dirichlet Allocation

    • Latent Dirichlet Allocation with scikit-learn

  • Summary

Please refer to the README.md file in ../ch01 for more information about running the code examples.