Path: blob/master/notebooks/book1/20/word_analogies_jax.ipynb
1192 views
Kernel: Python 3.6.7 64-bit ('base': conda)
Please find torch implementation of this notebook here: https://colab.research.google.com/github/probml/pyprobml/blob/master/notebooks/book1/20/word_analogies_torch.ipynb
Solving word analogies using pre-trained word embeddings
Based on D2L 14.7
http://d2l.ai/chapter_natural-language-processing-pretraining/similarity-analogy.html
In [1]:
In [2]:
Get pre-trained word embeddings
Pretrained embeddings taken from
GloVe website: https://nlp.stanford.edu/projects/glove/
fastText website: https://fasttext.cc/
In [3]:
In [4]:
Get a 50dimensional glove embedding, with vocab size of 400k
In [5]:
Out[5]:
Downloading ../data/glove.6B.50d.zip from http://d2l-data.s3-accelerate.amazonaws.com/glove.6B.50d.zip...
In [6]:
Out[6]:
400001
Map from word to index and vice versa.
In [7]:
Out[7]:
(3367, 'beautiful')
In [8]:
In [9]:
In [10]:
Out[10]:
(400001, 50)
Finding most similar words
In [11]:
In [12]:
In [13]:
Out[13]:
cosine sim=0.886: woman
cosine sim=0.856: boy
cosine sim=0.845: another
In [14]:
Out[14]:
cosine sim=0.815: bananas
cosine sim=0.787: coconut
cosine sim=0.758: pineapple
Word analogies
In [15]:
In [16]:
Out[16]:
'queen'
In [17]:
Out[17]:
'daughter'
In [18]:
Out[18]:
'japan'