Skip to content

Analysis of job postings in the US related to data science

License

Notifications You must be signed in to change notification settings

smerdov/JobPostingsAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

In this repo, I analyze data science job postings parsed from websites like Indeed, Dice, and others. To parse jobs descriptions, I preprocessed words into tokens with nltk library, created a count matrix, and reweighted it using TF-IDF.

Representations obtained by TF-IDF were clustered into groups, and it reveals several kinds of DS jobs:

  • Big-data engineer (mapreduce, sql, big, query)
  • Python machine learning developer (machine, model, analytics)
  • Business-oriented data scientist (business, product, analytics)
  • Data science manager (manage, communicate, business)
  • Data engineer (deploy, engineer, downstream)

Please feel free to check the code and results in Solution.ipynb, and more detailed code in main.py.