Path: blob/master/notebooks/book1/15/rnn_sentiment_jax.ipynb
1192 views
Please find torch implementation of this notebook here: https://colab.research.google.com/github/probml/pyprobml/blob/master/notebooks/book1/15/rnn_sentiment_torch.ipynb
Bidirectional RNNs for sentiment classification
We use BiRNNs for IMDB movie review classification. This uses some code from the sst2
example in Flax.
Based on sec 15.2 of http://d2l.ai/chapter_natural-language-processing-applications/sentiment-analysis-rnn.html
Data
We use a subset of the Internet Movie Database (IMDB) reviews. There are 20k positive and 20k negative examples.
We tokenize using words, and drop words which occur less than 5 times in training set when creating the vocab.
We pad all sequences to length 500, for efficient minibatching.
Putting it altogether.
Model
Because we have a small training set, we use pre-trained GloVE word embeddings of dimension 100.
We adapt some code from the sst2
example in Flax to implement a bidirectional LSTM layer.
We create a biRNN, so the t'th word has a representation of size 2*h, where h is the number of hidden layers in each direction. The representation of the sentence is the concatenation of the representation of the first and last word. This is mapped to a binary output label.
We initialize the model's embedding layer with the GloVE weights. This layer will be frozen in the training process.