Stuck at home while restaurants are closed and office spaces are slow to reopen, Americans are adapting their behaviors by spending more time online and as a result consuming more media. The United States is also approaching an election which may be, to many Americans, as important as any election before it.
I want to find out which historically important voter issues Americans are most focused on today and if those issues have changed when compared to recent election years. For this project I will use the New York Times API to retrieve data that is assumed to be representative of content the average American voter is consuming. I will focus on data from March, April, May, June and July in each of the previous four election years (2008, 2012, 2016, 2020) during the analysis.
- Which historically important voter issues are becoming more or less frequent in today's media publications when compared to recent election years?
- Assuming the New York Times publishes news content based on the average voter's preference, how has the average voter's core issues changed over time?
The New York Times API provides summary data of all published articles by month and year. Collecting individual keywords and calculating their counts overtime allows for plotting of keywords to give the reader a better understanding of the average American's media consumption habits.
I would like to use Natural Language Processing to analyze these same questions. Part of the returned data from a NYT API call is the lead paragraph of each article. I would like to use the Natural Language Toolkit to complete my own keyword analysis instead of relying on NYT's keywords. Using NLP algorithms I will be able to analyze for myself what people are talking about most.