Path: blob/master/lessons/lesson_15/2016_primary_speeches.ipynb
1904 views
Kernel: Python [conda env:ga]
In [1]:
Explore the Capital Words DataSet
This dataset comprises 11,000 speeches made in Congress by Congressmen and Senators who threw their hat into the ring in the 2016 primary. Note the dataset goes pretty far back (as early as 1996 for Bernie Sanders.)
In [7]:
Out[7]:
In [5]:
Out[5]:
(11376, 8)
In [8]:
Out[8]:
Unnamed: 0 int64
chamber object
congress int64
date datetime64[ns]
speaker_name object
speaker_party object
text object
title object
dtype: object
Who were the speakers and how many speeches did they make in the dataset?
Notice that one speech is incorrectly coded as 'Joe Biden' rather than 'Joseph Biden'
In [10]:
Out[10]:
Bernie Sanders 2241
Joseph Biden 1854
Rick Santorum 1613
Mike Pence 1238
Lindsey Graham 1158
Hillary Clinton 830
Rand Paul 455
Barack Obama 411
Jim Webb 381
Ted Cruz 365
Marco Rubio 359
John Kasich 316
Lincoln Chafee 154
Joe Biden 1
Name: speaker_name, dtype: int64
Lets look at one speech - First in the dataset
In [22]:
Out[22]:
u'Mr. Speaker, 480,000 Federal employees are working without pay, a form of involuntary servitude; 280,000 Federal employees are not working, and they will be paid. Virtually all of these workers have mortgages to pay, children to feed, and financial obligations to meet.\r\nMr. Speaker, what is happening to these workers is immoral, is wrong, and must be rectified immediately. Newt Gingrich and the Republican leadership must not continue to hold the House and the American people hostage while they push their disastrous 7-year balanced budget plan. The gentleman from Georgia, Mr. Gingrich, and the Republican leadership must join Senator Dole and the entire Senate and pass a continuing resolution now, now to reopen Government.\r\nMr. Speaker, that is what the American people want, that is what they need, and that is what this body must do.'
Your Task
Choose some way to analyze this dataset using LDA.
Options include:
Looking at how topics (or key words in topics) change over time for one speaker,
How topics compare across speakers
How topics compare for House vs Senate or Republican vs Democrat, etc,
Whether the topics that arrise from just the titles are interesting,
or any other interesting idea you have.
In [ ]: