Skip to content

Python script for retrieving offer information from /r/cscareerquestions salary sharing threads.

Notifications You must be signed in to change notification settings

anders617/cscareerquestions-salaries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

/r/cscareerquestions Salary Scraper

This script scrapes /r/cscareerquestions salary sharing threads for offer information and writes these details to a csv file. Currently it records info for the following fields: company, location, salary, relocation bonus, signing bonus, stock, and total compensation.

Overview

The /r/cscareerquestions subreddit hosts periodic salary sharing threads where people share details of their job offers (like this one).

This script scrapes /r/cscareerquestions salary sharing threads for offer information and writes these details to a csv file. Currently it records info for the following fields: company, location, salary, relocation bonus, signing bonus, stock, and total compensation.

If you don't care about running the script and just want the data, then look at output/salaries.csv

Commenters don't use a common format when inputing data (e.g. they often write in text instead of numbers) so none of the fields are strictly numbers. This means its hard to do any analysis of salary/relocation/signing/stock without doing some serious cleanup of the data. So, for now this is mostly just useful as a personal reference for what salaries to expect from various companies.

Setup And Run

Clone this repository:

git clone https://github.com/anders617/cscareerquestions-salaries.git

Install the praw Reddit API wrapper:

pip install praw
conda install -c conda-forge praw

Install the dotenv library:

pip install -U python-dotenv
conda install -c conda-forge python-dotenv

Next you will need to get credentials to make use of the Reddit API

Navigate to https://www.reddit.com/prefs/apps and click the "create app" button. Create an app in order to get a client id and client secret.

You can find the CLIENT_ID and CLIENT_SECRET in the locations marked below:

img

Create a new .env file in the same directory as salaries.py with the following contents (using your new client id/secret):

CLIENT_ID='YOUR_CLIENT_ID'
CLIENT_SECRET='YOUR_CLIENT_SECRET'
USER_AGENT='python'

Run salaries.py in the terminal:

python salaries.py --output=output/salaries.csv --verbose

Output

You should get output similar to the following:

[...]
========================================================
Company: Financial Institution
Location: Charlotte, NC
Salary: 70k
Relocation: None
Signing: None
Stock: 5 - 10%
Total: 77k
========================================================
Company: Health Insurance
Location: Buffalo, NY
Salary: $45,00 (That was a year ago, offer is now $50k)
Relocation: $0
Signing: $0
Stock: $2,300, but since we are a non-profit, bonuses are dependent on meeting our financial goals for the year.
Total: $53,000
========================================================
Company: Northrop Grumman
Location: Richmond VA
Salary: 52.5K
Relocation: None
Signing: None
Stock: I think we get these, a couple thousand if we hit goals.
Total: None
========================================================
Company: Digital Agency
Location: Southern Brazil
Salary: $5.9k (year)
Relocation: None
Signing: None
Stock: None
Total: None
========================================================
Company: SAAS
Location: Chicago
Salary: $75,000
Relocation: $0
Signing: $0
Stock: No stock, yearly bonus depends. My last one was about 1.6k
Total: None
========================================================
718 Salaries Recorded From 718 Relevant Comments (Out Of 4458 Total) In 11 Salary Sharing Threads
16.1% of comments were salaries

10 Most Common Companies:
        Google: 30
        Amazon: 24
        Microsoft: 20
        Big 4: 19
        Finance: 18
        Facebook: 15
        Defense: 10
        IBM: 10
        Capital One: 10
        Fintech: 7

10 Most Common Locations:
        Seattle: 33
        NYC: 27
        Bay Area: 25
        San Francisco: 16
        Chicago: 16
        London: 14
        Toronto: 13
        Redmond, WA: 12
        SF: 12
        Austin, TX: 12

Here are the first few lines of output/salaries.csv:

Date Company Salary Location Relocation Signing Stock Total Url
2019-09-09 20:38:39 Amazon Web Services 112k/yr Austin Texas 9k lump sum post tax, miles/meals reimbursed 38k first year 22k second year 80k over 4 years ~150k a year? plus https://www.reddit.com/r/cscareerquestions/comments/czhew5/official_salary_sharing_thread_for_new_grads/ezqn8rr
2019-09-05 06:19:35 mature NYC startup $105,000 New York 0 0 17,000 stock options $105,000 (valuing options at $0) https://www.reddit.com/r/cscareerquestions/comments/czhew5/official_salary_sharing_thread_for_new_grads/ez3bre4
2019-09-04 13:38:07 Finance 80k Boston 5k 5k 0 85k https://www.reddit.com/r/cscareerquestions/comments/czhew5/official_salary_sharing_thread_for_new_grads/eyyx82q

You can view the entire output from a recent run in output/salaries.csv

Modifying The Script

Currently this only looks at New Grad salary sharing threads but can be pretty easily modified to parse whatever threads you want by modifying the submission_ids list in main.py to contain the ids of the desired salary sharing threads.

The id of a thread can be found in the url. (e.g. the id of reddit.com/r/cscareerquestions/comments/czhew5/official_salary_sharing_thread_for_new_grads/ is czhew5)

About

Python script for retrieving offer information from /r/cscareerquestions salary sharing threads.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages