Path: blob/master/deep_learning/rnn/1_pytorch_rnn.ipynb
1480 views
Pytorch Introduction
At its core, PyTorch provides two main features:
An n-dimensional Tensor, similar to numpy array but can run on GPUs. PyTorch provides many functions for operating on these Tensors, thus it can be used as a general purpose scientific computing tool.
Automatic differentiation for building and training neural networks.
Let's dive in by looking at some examples:
Linear Regression
Here we start defining the linear regression model, recall that in linear regression, we are optimizing for the squared loss.
Linear Regression Version 2
A better way of defining our model is to inherit the nn.Module
class, to use it all we need to do is define our model's forward pass and the nn.Module
will automatically define the backward method for us, where the gradients will be computed using autograd.
After training our model, we can also save the model's parameter and load it back into the model in the future
Logistic Regression
Let's now look at a classification example, here we'll define a logistic regression that takes in a bag of words representation of some text and predicts over two labels "English" and "Spanish".
The next code chunk create words to index mappings. To build our bag of words (BoW) representation, we need to assign each word in our vocabulary an unique index. Let's say our entire corpus only consists of two words "hello" and "world", with "hello" corresponding to index 0 and "world" to index 1. Then the BoW vector for the sentence "hello world hello world" will be [2, 2], i.e. the count for the word "hello" will be at position 0 of the array and so on.
Next we define our model using the inherenting from nn.Module
approach and also two helper functions to convert our data to torch Tensors so we can use to during training.
We are now ready to train this!
Recurrent Neural Network (RNN)
The idea behind RNN is to make use of sequential information that exists in our dataset. In feedforward neural network, we assume that all inputs and outputs are independent of each other. But for some tasks, this might not be the best way to tackle the problem. For example, in Natural Language Processing (NLP) applications, if we wish to predict the next word in a sentence (one business application of this is Swiftkey), then we could imagine that knowing the word that comes before it can come in handy.
Vanilla RNN
The input will be a sequence of words, and each is a single word. And because of how matrix multiplication works, we can't simply use a word index like (36) as an input, instead we represent each word as a one-hot vector with a size of the total number of vocabulary. For example, the word with index 36 have the value 1 at position 36 and the rest of the value in the vector would all be 0's.
In the next section, we'll teach our RNN to produce "ihello" from "hihell".
LSTM
The example below uses an LSTM to generate part of speech tags. The usage of LSTM API is essentially the same as the RNN we were using in the last section. Expect in this example, we will prepare the word to index mapping ourselves and as for the modeling part, we will add an embedding layer before the LSTM layer, this is a common technique in NLP applications. So for each word, instead of using the one hot encoding way of representation the data (which can be inefficient and it treats all words as independent entities with no relationships amongst each other), word embeddings will compress them into a lower dimension that encode the semantics of the words, i.e. how similar each word is used within our given corpus.