Project 3: Web APIs & Classification

Description

For project 3, your goal is two-fold:

Using Reddit's API, you'll collect posts from two subreddits of your choosing.
You'll then use NLP to train a classifier on which subreddit a given post came from. This is a binary classification problem.

About the API

Reddit's API is fairly straightforward. For example, if I want the posts from /r/boardgames, all I have to do is add .json to the end of the url: https://www.reddit.com/r/boardgames.json

To help you get started, we have a primer video on how to use Reddit's API: https://www.youtube.com/watch?v=5Y3ZE26Ciuk

Requirements

Scrape and prepare your data using the requests library.
Create and compare two models. One of these must be a random forest, however the other can be a classifier of your choosing: logistic regression, KNN, SVM, etc.
A Jupyter Notebook with your analysis for a peer audience of data scientists.
An executive summary of the results you found.
A short presentation outlining your process and findings for a semi-technical audience.

Pro Tip 1: Reddit will give you 25 posts per request. To get enough data, you'll need to hit Reddit's API repeatedly (most likely in a for loop). Be sure to use the time.sleep() function at the end of your loop to allow for a break in between requests. THIS IS CRUCIAL

Pro tip 2: The API will cap you at 1,000 posts for each subreddit (assuming the subreddit has that many posts).

Pro tip 3: At the end of each loop, be sure to save the results from your scrape as a csv: JSON from Reddit > Pandas DataFrame > CSV. That way, if something goes wrong in your loop, you won't lose all your data.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
code		code
data		data
images		images
README.md		README.md
Requirements.txt		Requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project 3: Web APIs & Classification

Description

About the API

Requirements

About

Releases

Packages

Languages

gbkgwyneth/GA-DSI-project-03

Folders and files

Latest commit

History

Repository files navigation

Project 3: Web APIs & Classification

Description

About the API

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages