Path: blob/master/2 - Natural Language Processing with Probabilistic Models/Week 1/C2W1_L2_Candidates from edits.ipynb
65 views
NLP Course 2 Week 1 Lesson : Building The Model - Lecture Exercise 02
Estimated Time: 20 minutes
Candidates from String Edits
Create a list of candidate strings by applying an edit operation
Imports and Data
Splits
Find all the ways you can split a word into 2 parts !
Delete Edit
Delete a letter from each string in the splits list.
What this does is effectivly delete each possible letter from the original word being edited.
It's worth taking a closer look at how this is excecuting a 'delete'.
Taking the first item from the splits list :
So the end result transforms 'dearz' to 'earz' by deleting the first character.
And you use a loop (code block above) or a list comprehension (code block below) to do
this for the entire splits list.
Ungraded Exercise
You now have a list of candidate strings created after performing a delete edit.
Next step will be to filter this list for candidate words found in a vocabulary.
Given the example vocab below, can you think of a way to create a list of candidate words ?
Remember, you already have a list of candidate strings, some of which are certainly not actual words you might find in your vocabulary !
So from the above list earz, darz, derz, deaz, dear.
You're really only interested in dear.
Expected Outcome:
vocab : ['dean', 'deer', 'dear', 'fries', 'and', 'coke']
edits : ['earz', 'darz', 'derz', 'deaz', 'dear']
candidate words : {'dear'}
Summary
You've unpacked an integral part of the assignment by breaking down splits and edits, specifically looking at deletes here.
Implementation of the other edit types (insert, replace, switch) follow a similar methodology and should now feel somewhat familiar when you see them.
This bit of the code isn't as intuitive as other sections, so well done!
You should now feel confident facing some of the more technical parts of the assignment at the end of the week.