Candidate_Profile_and_Job_Description_Bidirectional_Matching_System

MUST READ: In order to protect our Candidate, this repo doesn't contain any dataset that includes real candidte information. If you need some test cases in any folder/file, please contact the Author (zhiyuzha@usc.edu) for more information!
This repo contains all files needed for constructing NLP based bidirectional matching system between candidate profile and job description and improve the efficiency as well as the accuracy in matching process.
This system was first constructed during summer 2022 internship but is improved and will improve continuously.
If you have any question or just show strong interest towards our project, please do not hesitate to contact the Author via zhiyuzha@usc.edu.
VERSION0:AUG 8, 2022;
VERSION1:AUG 23, 2022; (CREATE THE BASIC FRAMEWORK FOR THE LAST STEP)

Main Purpose and Brief Introduction:

Matching system is one of the most popular artificial intelligence systems for companies in different industries across the world. As a world-leading recruiting company, we also wants to introduce this kind of system to fill the gap and improve the experience of clients. We aim to construct a bidirectional matching system between recruiter and potential candidates with machine learning techniques (especially advanced NLP techniques), improving the efficiency of recruitment activity and grabbing market share of our start-up.

Basic Workflow and Algorithms:

Core Algorithms: In order to compare the similarity of two text-based content (candidate profile and job description), we need to clean the original dataset (seperate words and remove useless words), vectorize core features (transfor from text to numerical data) and project the vector to pre-defined recruiting matrix (recruiting maxtrix that contains all features we need to evaluate). Finally, calculate cosine similarity between features and select candidate with high number.
Basic Workflow:

Read Candidate Profile and Job Description & transform them to uniform json format
Use NLP NER to identify and label the key words occured in the first step
Project the key words into predefined recruiting matrix and transform each profile/JD into a vector
Calculate the cosine similarities between profile and JD and recommend based on rankings

Introduction of Files and Datasets in this Repo:

This repo contains 1 Powerpoint File and 1 Folder (Contains All Code and Data Files). All files and folders will be introduced in this section.

Powerpoint File

File(s): Matching System Detailed Explanation. pptx
Content: The detailed introduction and workflow of whole project.

Matching System Folder

Construct Major and Title Datasets Folder

Construct Job Title Dataset:

We use O*NET&zety online resources to construct a cleaned dataset that contains job titles in the market as many as possible.

File(s): scrap job title.ipynb -> uncleaned_job_title.xlsx -> clean_jobtitle.ipynb -> title_final.xlsx
Content: Get job titles from https://zety.com/blog/job-titles and do data cleaning. Results can be found in the corresponding excel files.
File(s): title_final.xlsx & XXX_job_title.xlsx (6 files) -> add_additional_job_titles.ipynb -> title_final.xlsx (cover the previous file with same name)
Content: Combine job titles scraped from https://www.onetonline.org with titles from the first step. Result can be found in the corresponding excel files.

Construct Major Dataset

We use act.org online resource, combined with self-owned dataset, to construct a cleaned dataset that contains majors in the college as many as possible.

File(s): webscrap_major.ipynb -> student_major.csv & student_major2.csv -> create student major.ipynb -> major.xlsx
Content: Get college major from O * NET and do data cleaning. Result can be found in the corresponding excel files.
File(s): major.xlsx & more_majors.xlsx -> add_more_majors.ipynb -> temp_merged_major.xlsx
Content: Combined self-owned major dataset with dataset from the first step, continually expand the dataset. Result can be found in the corresponding excel files.

Construct Skillset Folder

Construct Hardskill Set

We use O * NET online resoures, combined with some acvanced data processing techniques, to construct a cleaned json-format dataset that can be passed in NLP SpaCy Named Entity Recognition. 143 groups of and over 3000 single items of hardskills can be recognized.

Folder(s): active_listening/math/reading_comprehension/science/speaking/writing_position
Content: Gather hardskill related information for different types of position from O * NET website. Results can be found in XXX_skillset.xlsx file in each folder.
Folder(s): Final_merge&analysis_hardskills
Stream of the files: merge_skillset.ipynb -> final_skill_table.xlsx & large software company.csv -> clean_skillset.ipynb -> cleaned_skillset.xlsx -> create_hardskill_dataset.ipynb -> hardskills.json
Content: Clean and expand hard skill dataset. Try to cover all possible situations that may occur in profile (e.g. Microsoft Powerpoint, Powerpoint, PPT may point to the same skillset). Readable result can be found in the cleaned_skillset.xlsx and SpaCy usable result can be found in the hardskills.json.

Construct Softskill Set

We use O * NET online resoures, combined with some acvanced data processing techniques, to construct a cleaned json-format dataset that can be used for detecting softskills in original data. 40 groups of and over 2000 single items of hardskills can be recognized.

Folder(s): active_listening/math/reading_comprehension/science/speaking/writing_position
Content: Gather softskill related information (activity, content and soft skills for each position) for different types of position from O * NET website. Results can be found in XXX_skillset.xlsx file in each folder.
Folder(s): merge_activity, merge_softskills, merge_work_content
Content: Merge and clean the activities, softskills and work contents for each type of job.
Folder(s): softskills dataset
Stream of the files: pre_softskills_matrix.xlsx -> USE GOOGLE GET SOFT SKILLS.ipynb (or USE NEUR DATASET TO GET SOFT SKILLS.ipynb (template)) -> final_skill_keyword.xlsx -> softskills.json
Content: Expand the softskill key word dataset to accommodate different expressions of same softskills with google pre-trained dataset. (NEUR can also be a choice for this step) Readable result can be found in final_skill_keyword.xlsx and NLP usable format can be found in softskills.json.

Profile Cleaning Before Input Folder

This part may contain some sensitive information, please contact author for test case if you need.

Clean Candidate Profile

We use advanced NLP and data processing skills to deal with noisy data in original candidate profile and transform each profile into a managable dataset.

Folder(s): parse from csv
Content: Parse Candidate Profile to uniform json-format information.
File(s): temped_merged_major.xlsx/ title_final.xlsx -> clean_candidate_profile.ipynb
Content: Class that clean and store candidate profile.

Clean JD Profile

We use advanced NLP and data processing skills to deal with noisy data in original job description and transform each JD into a managable dataset.

File(s): matching responsibility.xlsx (test case)/ temp_merged_major.xlsx/title_final.xlsx -> clean_dataset_before_input.ipynb
Content: Class that clean and store job description.

Structure and Implementation of Matching System Folder

We use all information & dataset prepared above to transform each profile into a vector.

File(s): hardskills.json & softskills.json ->project_to_recruiting_matrix.ipynb
Content: Class that do transformation. (uncompleted, still working on)
File(s): project_to_matrix_example.ipynb
Content: Real example of the implementation of class listed above.
File(s): matching_system_workflow.ipynb
Content: Detailed Explanation of matching system workflow. (but priority is README.md file)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Matching System		Matching System
LICENSE		LICENSE
Matching System Detailed Explaination.pptx		Matching System Detailed Explaination.pptx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Candidate_Profile_and_Job_Description_Bidirectional_Matching_System

Main Purpose and Brief Introduction:

Basic Workflow and Algorithms:

Introduction of Files and Datasets in this Repo:

Powerpoint File

Matching System Folder

Construct Major and Title Datasets Folder

Construct Job Title Dataset:

Construct Major Dataset

Construct Skillset Folder

Construct Hardskill Set

Construct Softskill Set

Profile Cleaning Before Input Folder

Clean Candidate Profile

Clean JD Profile

Structure and Implementation of Matching System Folder

About

Releases

Packages

Languages

License

ZhiyuZhang803/Candidate_Profile_and_Job_Description_Bidirectional_Matching_System

Folders and files

Latest commit

History

Repository files navigation

Candidate_Profile_and_Job_Description_Bidirectional_Matching_System

Main Purpose and Brief Introduction:

Basic Workflow and Algorithms:

Introduction of Files and Datasets in this Repo:

Powerpoint File

Matching System Folder

Construct Major and Title Datasets Folder

Construct Job Title Dataset:

Construct Major Dataset

Construct Skillset Folder

Construct Hardskill Set

Construct Softskill Set

Profile Cleaning Before Input Folder

Clean Candidate Profile

Clean JD Profile

Structure and Implementation of Matching System Folder

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages