Path: blob/master/notebooks/book1/20/word_analogies_torch.ipynb
1192 views
Kernel: Python 3.6.7 64-bit ('base': conda)
Please find jax implementation of this notebook here: https://colab.research.google.com/github/probml/pyprobml/blob/master/notebooks/book1/20/word_analogies_jax.ipynb
Solving word analogies using pre-trained word embeddings
Based on D2L 14.7
http://d2l.ai/chapter_natural-language-processing-pretraining/similarity-analogy.html
In [2]:
Out[2]:
mkdir: figures: File exists
In [3]:
Get pre-trained word embeddings
Pretrained embeddings taken from
GloVe website: https://nlp.stanford.edu/projects/glove/
fastText website: https://fasttext.cc/
In [4]:
In [7]:
Get a 50dimensional glove embedding, with vocab size of 400k
In [8]:
Out[8]:
Downloading ../data/glove.6B.50d.zip from http://d2l-data.s3-accelerate.amazonaws.com/glove.6B.50d.zip...
In [9]:
Out[9]:
400001
Map from word to index and vice versa.
In [10]:
Out[10]:
(3367, 'beautiful')
In [11]:
In [16]:
Out[16]:
Downloading ../data/glove.6B.100d.zip from http://d2l-data.s3-accelerate.amazonaws.com/glove.6B.100d.zip...
In [12]:
Out[12]:
torch.Size([400001, 50])
Finding most similar words
In [13]:
In [14]:
In [15]:
Out[15]:
cosine sim=0.886: woman
cosine sim=0.856: boy
cosine sim=0.845: another
In [16]:
Out[16]:
cosine sim=0.815: bananas
cosine sim=0.787: coconut
cosine sim=0.758: pineapple
Word analogies
In [17]:
In [18]:
Out[18]:
'queen'
In [19]:
Out[19]:
'daughter'
In [20]:
Out[20]:
'japan'
In [ ]: