Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_14/code/starter-code/L14-Starter-Code.ipynb
1904 views
Kernel: Python 3
# Unicode Handling from __future__ import unicode_literals import codecs import numpy as np # spacy is used for pre-processing and traditional NLP import spacy nlp = spacy.load('en_core_web_sm')
# Loading the tweet data filename = '../../assets/dataset/captured-tweets.txt' tweets = [] for tweet in codecs.open(filename, 'r', encoding="utf-8"): tweets.append(tweet)
len(tweets)
4905
tweets[0]
'I made a(n) Small Tourmaline in Paradise Island! https://t.co/cAoW1b6DRc #Gameinsight #Androidgames #Android\n'

Exercise 1a

Write a function that can take a sentence parsed by spacy and identify if it mentions a company named 'Google'. Remember, spacy can find entities and codes them as ORG if they are a company. Look at the slides for class 13 if you need a hint:

Bonus (1b)

Parameterize the company name so that the function works for any company.

def mentions_company(parsed): # Return True if the sentence contains an organization and that organization is Google for entity in parsed.ents: # Fill in code here # Otherwise return False return False # 1b for tweetnum, tweet in enumerate(tweets[:200]): if mentions_company(nlp(tweet)): print(tweetnum, '\t', tweet) def mentions_company(parsed, company='Google'): # Your code here pass
16 I've entered to win a Google Nexus 6P from ! https://t.co/4vFHfhaBey 20 I LOVE your Google plus page with the other girls! 💜😆 23 RT @ShowerThoughtts: Apple has "air", Amazon has "Fire", Google has "earth", why doesn't Microsoft have "water"? 24 -Looks up on Google 'MikexJeremy' secretly- <33 ;) [@FnafSchimdt,@MikeSchmit10,]#SenpaiBot~ 38 Europe could produce a Facebook \' and the Google of healthcare 43 Google, you've failed me for the first time! Why can't you tell me what these crystals are inside my plum? https://t.co/vXmONcnTom 53 RT https://t.co/s2HOfLUV2t #Google '#Android N' Will Not Use #Oracle's #Java APIs https://t.co/fBMNzW4wdT gunjanraik 57 @ImbruedJoint Hector had softly tilted his head towards Stuart in a confused sort of way. "What's Google?" He couldn't possibly know what -- 65 RT @FortuneMagazine: Here’s how Google is taking on Uber in 2016 https://t.co/X5STl6P5Xh 67 shenbrood: What's your greatest #digitalmarketing #contentmarketing blunder? Well for me it's the ever changing Google algorith. What abo… 68 https://t.co/tkiiygFXRd What each state #Google d more than... #me #love #picoftheday #follow 70 Europe could produce a Facebook - and the Google of healthcare https://t.co/xEwaWcPRNF 72 https://t.co/GVac2xs9OO Get it: Google's latest free Android wallpaper shows useful phone ... https://t.co/KdbaKUYglP #Android 80 RT @chrismessina: Tell me that Zuck isn't planning to take on Google with Messenger... #ConvComm https://t.co/2ludfQPLTl https://t.co/8QbN… 82 https://t.co/wiRZHnHUYV RT ShowerThoughtts: Apple has "air", Amazon has "Fire", Google has "earth", why doesn't M… https://t.co/O69Q8NpBmK 88 I've entered to win a Google Nexus 6P from @MakeUseOf ! https://t.co/tiM479eSDN #giveaway #competition 91 GoogleExpertUK : #Google supports Swedish Chef from The Muppets! MuppetsStudio #internet #… https://t.co/pUrY7IHU4y) https://t.co/e43mzDlDzw 99 @Xen_Games Google doesn't say what 5 out 7 means why not 7 out of 7????? 100 @MakingStarWars Ugh. The rights sign messes up the link. Google 'force awakens 70mm' and the IMAX link should send you there. 111 shenbrood: What's your greatest #digitalmarketing #contentmarketing blunder? Well for me it's the ever changing Google algorith. What abo… 113 #creativitybooster: Communicate stupid affiliate at Google. TOP 10 ideas: https://t.co/9Dww7xv6t2 #growthhacking #blogging 116 Just like new college students are at risk for the Freshman 15, Google employees joke about the Google 15 from all the snaks available. 122 Google has no chill https://t.co/Rs2QG2NXQa 142 @Mrkenneyy I saw ur tweet no need for Google lol it's a very large dog 151 RT shenbrood What's your greatest #digitalmarketing #contentmarketing blunder? Well for me it's the ever changing Google algorith. What a… 162 I've entered to win a Google Nexus 6P from @MakeUseOf ! https://t.co/l1VzzbljaW #giveaway #competition

Exercise 1c

Write a function that can take a sentence parsed by spacy and return the verbs of the sentence (preferably lemmatized)

def get_actions(parsed): actions = [] # Your code here. You'll want to use actions.append() return actions

Exercise 1d

For each tweet, parse it using spacy and print it out if the tweet has 'release' or 'announce' as a verb. You'll need to use your mentions_company and get_actions functions.

for tweet in tweets: parsed = nlp(tweet) pass

Exercise 1e

Write a function that identifies countries - HINT: the entity label for countries is GPE (or GeoPolitical Entity)

def mentions_country(parsed, country): pass

Exercise 1f

Re-run (d) to find country tweets that discuss 'Iran' announcing or releasing.

for tweet in tweets: parsed = nlp_toolkit(tweet) pass