Path: blob/master/2 - Natural Language Processing with Probabilistic Models/Week 4/edits/C2_W4_Assignment-5of5WORKING.ipynb
65 views
Assignment 4: Word Embeddings
Welcome to the fourth programming assignment of Course 2. In this assignment we will show you how to compute the word embeddings. In Natural Language Processing (NLP) we can not only rely on counting the number of positive words and negative words, as we did in the last course using logistic regression. Instead we will try to find a way to represent each word by a vector. The vector could then represent syntactic (i.e. parts of speech) and semantic (i.e. meaning) structures. In this assignment you will explore a classic way of learning embeddings, or representations, of words by using a famous model called the continuous bag of words (CBOW) model. By completing this assignment you will:
Train word vectors from scratch.
Learn how to create batches of data.
Understand how backpropagation works.
Plot and visualize your learned word vectors.
Because it will take a while to train your CBOW model, you will code the model and make sure to get the expected outputs. We will give you some slightly pre-trained vectors and then you will fine tune them on the Shakespeare dataset.
Knowing how to train these models will give you a better understanding of word vectors, which are building blocks to many applications in natural language processing.
1.0 The Continuous bag of words model
Let's take a look at the following sentence: 'I am happy because I am learning'. In continuous bag of words modeling we try to predict the center word given a few context words. For example, if you were to choose a context (say ), then you would try to predict the word happy given the context: {I, am, because, I}. In other words, you have
The structure of your model will look like this:
Figure 1 Where is the average one hot vector for all the context word encodings.
Figure 2 Once you have encoded all the context words, you can use as the input to your model. The architecture you will be implementing is as follows:
Mapping words to indices and indices to words
We provide a helper function to create a dictionary that maps words to indices and indices to words.
2.0 Training the Model
Initializing the model
You will now initialize two matrices and two vectors.
The first matrix () is of dimension , where is the number of words in your vocabulary and is the dimension of your word vector.
The second matrix () is of dimension . The two vectors, and are of dimension and respectively (column vectors). and are the bias vectors of the linear layers from matrices and . The overall structure of the model will look as in Figure 1, but at this stage we are just initializing the parameters.
Expected Output: (4, 10) (10, 4) [[0.88330609 0.62367221 0.75094243 0.34889834]]
Softmax
Before we can start training the model, we need to implement the softmax function as defined in equation 5:
Instructions: Implement the softmax function below.
Expected Ouput: array([0.04712342, 0.94649912, 0.00637746])
Forward propagation
Implement the forward propagation according to equations (1) to (3).
For that, you will use as activation the Rectified Linear Unit (ReLU) given by:
Cost function
We have implemented the cross-entropy cost function for you.
If you want to understand it better, we refer to a good explanation.
Training the Model
Now that you have understood how the CBOW model works, you will train it.
You created a function for the forward propagation. Now you will implement a function that computes the gradients to backpropagate the errors.
Now that you have implemented a function to compute the gradients, you will implement batch gradient descent over your training set.
Hint: For that, you will use initialize_model and the back_prop function that you just created (and the compute_cost function). You can also use the provided get_batches helper function:
for x, y in get_batches(data, word2Ind, V, C, batch_size):
...
Also: print the cost after each batch is processed (use batch size = 128)
Expected Output: iters 15 cost 41.66082959286652
3.0 Visualizing the word vectors
In this part you will visualize the word vectors trained using the function you just coded above.
You can see that woman and queen are next to each other. However, we have to be carefull with the interpretation of this projected word vectors, since the PCA depends on the projection -- as shown in the following illustration.