Chapter 8: Applying Machine Learning to Sentiment Analysis
Chapter Outline
Preparing the IMDb movie review data for text processing
Obtaining the IMDb movie review dataset
Preprocessing the movie dataset into more convenient format
Introducing the bag-of-words model
Transforming words into feature vectors
Assessing word relevancy via term frequency-inverse document frequency
Cleaning text data
Processing documents into tokens
Training a logistic regression model for document classification
Working with bigger data – online algorithms and out-of-core learning
Topic modeling
Decomposing text documents with Latent Dirichlet Allocation
Latent Dirichlet Allocation with scikit-learn
Summary
Please refer to the README.md file in ../ch01
for more information about running the code examples.