In this project, I scrape a job search feed and determine the fit of my resume and cover letter to each job posting using natural language processing. The 'fit' between my resume + cover letter to the job description is determined via three methods;
- Keyword Analysis
- Similarity Scoring
- Generative Text Summary Comparisons
I used the job search feed on workopolis as a source of data science jobs to compare to my resume+cover letter. The Beautiful Soup library was used to scrape the left navigation column to click through the feed pagination, extract the job link, open the job description, and extract the job text.
To compare keywords, a list of common skills was compiled from a cross-section of data science job postings. The list was then used to count the frequency of those skills in each job posting and in my resume+cover letter and compared. The plot below shows the difference in the number of mentions of each skill compared to a specific job posting. In the matching keywords plot, a bar greater than zero indicates my resume+cover-letter mentions that skill more than the job posting, and a negative job indicates the job posting has a greater frequency. The missing keywords plot shows keywords in the job posting which are absent from my resume, and the Job Specific keywords plot counts keywords that are highlighted in the job posting (if applicable)
I calculate a similarity score using the SPaCY library to quantify the similarity of the resume+cover-letter to each job posting. The table below ranks each job posting based on fit-to-resume. The summary also shows the keyword coverage, and weighted keyword-score, and lists keywords in the job posting which are missing from the resume.
We can visualize the resume match to each job posting in the plot below. The similarity score is on the y-axis and depicted with a color scale, and the keyword coverage is on the x-axis. The size of each bubble represents the weighted keyword score. We should have a better chance at applying for the jobs in the upper right corner of the plot which have the largest bubbles.