Path: blob/master/lessons/lesson_14/code/solution-code/L14-Solutions.ipynb
1904 views
Exercise 1a
Write a function that can take a take a sentence parsed by spacy
and identify if it mentions a company named 'Google'. Remember, spacy
can find entities and codes them as ORG
if they are a company. Look at the slides for class 13 if you need a hint:
Bonus (1b)
Parameterize the company name so that the function works for any company.
Exercise 1c
Write a function that can take a sentence parsed by spacy
and return the verbs of the sentence (preferably lemmatized)
Exercise 1d
For each tweet, parse it using spacy and print it out if the tweet has 'release' or 'announce' as a verb. You'll need to use your mentions_company
and get_actions
functions.
Exercise 1e
Write a function that identifies countries - HINT: the entity label for countries is GPE (or GeoPolitical Entity)
Exercise 1f
Re-run (d) to find country tweets that discuss 'Iran' announcing or releasing.
Exercise 2
Build a word2vec model of the tweets we have collected using gensim. First take the collection of tweets and tokenize them using spacy.
Exercise 2a:
Think about how this should be done.
Should you only use upper-case or lower-case?
Should you remove punctuations or symbols?
Exercise 2b:
Build a word2vec model. Test the window size as well - this is how many surrounding words need to be used to model a word. What do you think is appropriate for Twitter?
Exercise 2c:
Test your word2vec model with a few similarity functions.
Find words similar to 'Syria'.
Find words similar to 'war'.
Find words similar to "Iran".
Find words similar to 'Verizon'.
Exercise 2d
Adjust the choices in (b) and (c) as necessary.
Exercise 3
Filter tweets to those that mention 'Iran' or similar entities and 'war' or similar entities.
Do this using just spacy.
Do this using word2vec similarity scores.