Path: blob/master/lessons/lesson_14/Spacy walkthrough (done).ipynb
1904 views
Introduction to NLP with Spacy
Spacy attempts to parse every word and identify many attributes including the following:
Text: The original word text.
Lemma: The base form of the word.
POS: The simple part-of-speech tag.
Tag: The detailed part-of-speech tag.
Dep: Syntactic dependency, i.e. the relation between tokens.
Shape: The word shape – capitalisation, punctuation, digits.
is alpha: Is the token an alpha character?
is stop: Is the token part of a stop list, i.e. the most common words of the language?
Beyond Individual Words
Spacy allows you to extract a number of different document components that may be useful: - Noun Chunks - Entities and Entity Type
We can visualize this
Excercise:
Go find a news article and drop it in here: https://explosion.ai/demos/displacy-ent. Check to see how well the algorithm does
We can extract individual sentences
The sentence is often the most useful level of analysis
We can also visualize the dependency tree
Why are all these features useful? Lets explore with Pride and Prejudice
Using an example of Pride and Prejudice adapted from:
I didn't run the below cell for memory purposes
But lets look at where I got it from: