Path: blob/master/2 - Natural Language Processing with Probabilistic Models/Week 4/C2W4_L2_Intro to CBOW model.ipynb
65 views
Word Embeddings: Intro to CBOW model, activation functions and working with Numpy
In this lecture notebook you will be given an introduction to the continuous bag-of-words model, its activation functions and some considerations when working with Numpy.
Let's dive into it!
The continuous bag-of-words model
The CBOW model is based on a neural network, the architecture of which looks like the figure below, as you'll recall from the lecture.
Figure 1 Activation functions
Let's start by implementing the activation functions, ReLU and softmax.
ReLU
ReLU is used to calculate the values of the hidden layer, in the following formulas:
Let's fix a value for as a working example.
Notice that using numpy's random.rand function returns a numpy array filled with values taken from a uniform distribution over [0, 1). Numpy allows vectorization so each value is multiplied by 10 and then substracted 5.
To get the ReLU of this vector, you want all the negative values to become zeros.
First create a copy of this vector.
Now determine which of its values are negative.
You can now simply set all of the values which are negative to 0.
And that's it: you have the ReLU of !
Now implement ReLU as a function.
And check that it's working.
Expected output:
Softmax
The second activation function that you need is softmax. This function is used to calculate the values of the output layer of the neural network, using the following formulas:
To calculate softmax of a vector , the -th component of the resulting vector is given by:
Let's work through an example.
You'll need to calculate the exponentials of each element, both for the numerator and for the denominator.
The denominator is equal to the sum of these exponentials.
And the value of the first element of is given by:
This is for one element. You can use numpy's vectorized operations to calculate the values of all the elements of the vector in one go.
Implement the softmax function.
Now check that it works.
Expected output:
Notice that the sum of all these values is equal to 1.
Dimensions: 1-D arrays vs 2-D column vectors
Before moving on to implement forward propagation, backpropagation, and gradient descent in the next lecture notebook, let's have a look at the dimensions of the vectors you've been handling until now.
Create a vector of length filled with zeros.
This is a 1-dimensional array, as revealed by the .shape property of the array.
To perform matrix multiplication in the next steps, you actually need your column vectors to be represented as a matrix with one column. In numpy, this matrix is represented as a 2-dimensional array.
The easiest way to convert a 1D vector to a 2D column matrix is to set its .shape property to the number of rows and one column, as shown in the next cell.
The shape of the resulting "vector" is:
So you now have a 5x1 matrix that you can use to perform standard matrix multiplication.
Congratulations on finishing this lecture notebook! Hopefully you now have a better understanding of the activation functions used in the continuous bag-of-words model, as well as a clearer idea of how to leverage Numpy's power for these types of mathematical computations.
In the next lecture notebook you will get a comprehensive dive into:
Forward propagation.
Cross-entropy loss.
Backpropagation.
Gradient descent.
See you next time!