Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 23 additions & 125 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,138 +1,36 @@
# ResumeGPT
# 🏠 ResumeGPT

ResumeGPT is a Python package designed to extract structured information from a PDF Curriculum Vitae (CVs)/Resumes documents. It leverages OCR technology and utilizes the capabilities of ChatGPT AI language model (GPT-3.5 and GPT-4) to extract pieces of information from the CV content and organize them in a structured Excel-friendly format.
An AI-powered resume and job matching assistant built with React, TypeScript, and OpenAI.

[![Live Demo](https://img.shields.io/badge/demo-online-green?style=flat&logo=vercel)](https://your-live-link.com)
[![Made with React](https://img.shields.io/badge/React-TypeScript-blue)](https://reactjs.org)

## Features

- Extracts text from PDF CVs: Uses OCR technology to extract the CV's PDF content as text.
- Extracts information using GPT: Sends the extracted text to GPT for information extraction according to a predefined prompt.
- Structures information to Excel file: Processes the extracted information from GPT and structures it from JSON into a Excel-friendly format.
---

## 🚀 Features

## Module Overview
- Resume parsing and optimization
- AI-generated cover letters
- Smart job-matching with role suggestions
- Export to PDF
- Responsive and mobile-friendly

![ResumeGPT Workflow](ResumeGPT_Workflow/ResumeGPT_Workflow.PNG)
---

## 🛠️ Tech Stack

1. OCR Reader (CVsReader module): This process reads CVs from a specified directory and extracts the text from PDF files.
- **Frontend:** React, TypeScript, TailwindCSS
- **Backend:** Node.js, Express (or Supabase if applicable)
- **AI:** OpenAI GPT-4
- **Deployment:** Vercel / Netlify

2. Engineered Prompt and ChatGPT Pipeline (CVsInfoExtractor module): This process takes as an input the extracted text generated by the OCR Reader and extracts specific information using ChatGPT in a JSON format.
---

3. Extracted Information Structuring (CVsInfoExtractor module): This process takes the JSON output from the ChatGPT Pipeline, which contains the information extracted from each CV. This information is then structured and organized into a clear and easy-to-understand Excel format.
## 📦 Running Locally


## Requirements

1. Python: Python 3.8 or newer.

2. GPT-4 API Access: If GPT-3.5 tokens don not fit the CV content, the package uses GPT-4 to extract the information from the CVs, so you'll need an access to the GPT-4 API.


## How to Use

1. Prepare Your CVs: Make sure all the CVs you want to analyze are in the “CVs” directory.

2. Run the Script: Run the following scripts. This will clone the project, prepare the environment, and execute the code.
- Clone the project
```bash
git clone https://github.com/Aillian/ResumeGPT.git
```
- CD project directory
```bash
cd ResumeGPT
```
- Create a virtual environment
```bash
python -m venv resumegpt_venv
```
- Activate the virtual environment
```bash
source resumegpt_venv/Scripts/activate
```
- Upgrade pip version
```bash
pip install --upgrade pip
```
- Install requirements.txt
```bash
pip install -r requirements.txt
```
- CD codes directory
```bash
cd ResumeGPT
```
- Run main.py and provide the 3 required arguments:
- CVs Directory Path: use "../CVs" to read from 'CVs' directory
- Openai API Key: should include GPT-4 model access
- Desired Positions: written like the following "Data Scientist,Data Analyst,Data Engineer"
```bash
python main.py "../CVs" "sk-ldbuDCjkgJHiFnbLVCJvvcfKNBDFJTYCVfvRedevDdf" "Data Scientist, Data Analyst, Data Engineer"
```

3. Examine the Results: After the script finishes, you will find the output in “Output” directory which are two file (CSV & Excel) of the extracted information from each CV.


## Extracted Information

ResumeGPT is designed to extract 23 features from each CV:

- Education:
1. Education Bachelor University: name of university where bachelor degree was taken
2. Education Bachelor GPA: GPA of bachelor degree (Example: 4.5/5)
3. Education Bachelor Major: major of bachelor degree
4. Education Bachelor Graduation Date: date of graduation from bachelor degree (in format: Month_Name, YYYY)
5. Education Masters University: name of university where masters degree was taken
6. Education Masters GPA: GPA of masters degree (Example: 4.5/5)
7. Education Masters Major: major of masters degree
8. Education Masters Graduation Date: date of graduation from masters degree (in format: Month_Name, YYYY)
9. Education PhD University: name of university where PhD degree was taken
10. Education PhD GPA: GPA of PhD degree (Example: 4.5/5)
11. Education PhD Major: major of PhD degree
12. Education PhD Graduation Date: date of graduation from PhD degree (in format: Month_Name, YYYY)

- Work Experience:
13. Years of Experience: total years of experience in all jobs (Example: 3)
14. Experience Companies: list of all companies that the candidate worked with (Example: [Company1, Company2])
15. Top 5 Responsibilities/Projects Titles: list of top 5 responsibilities/projects titles that the candidate worked on (Example: [Project1, Project2, Project3, Project4, Project5])

- Courses/Certifications:
16. Top 5 Courses/Certifications Titles: list of top 5 courses/certifications titles that the candidate took (Example: [Course1, Course2, Course3, Course4, Course5])

- Skills:
17. Top 3 Technical Skills: list of top 3 technical skills (Example: [Skill1, Skill2, Skill3])
18. Top 3 Soft Skills: list of top 3 soft skills (Example: [Skill1, Skill2, Skill3])

- Employment Status:
19. Current Employment Status: one of the following (Full-time, Part-Time, Intern, Freelancer, Consultant, Unemployed)

- Personal Information:
20. Nationality: nationality of the candidate
21. Current Residence: where the candidate currently live

- Suitable Position:
22. Suitable Position: the most suitable position for the candidate, this will be taken from the user and dynamically replaced in the prompt

- Rating Score:
23. Candidate Rating (Out of 10): score of the candidate suitability for the classified position in point 19 (Example: 7.5)


This information is then organized into a structured Excel file.


## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Possible additional features and optimizations:
1. Add additional features to the prompt.
2. Handling exceeded tokens limit, by further cleansing cv content.
3. The code tries to call gpt-3.5-turbo model first, if token limit exceeds the acceptable limit, it calls gpt-4. But this has some problems: 1- it is costly 2- what if the provided API key does not have access to gpt-4 model?
4. Catching GPT-4 "service is down" error by calling the API again after some sleeping time.
5. Can the prompt be reduced so we save some tokens for the cv content?
6. Separating "Information To Extract" in the prompt to a different file so the user gets the flexibility of adding new features and then dynamically imputing it into the prompt after that the added features in "CVs_Info_Extracted.csv" should be reflected as column names in the csv file.
7. Additional errors handling.
8. What about extending the usage to other LLMs?


## License
ResumeGPT is released under the MIT License. See the LICENSE file for more details.
git clone https://github.com/your-username/resume-gpt
cd resume-gpt
npm install
npm run dev
20 changes: 20 additions & 0 deletions ResumeGPT/key_word_analyzer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
from sklearn.feature_extraction.text import CountVectorizer

def extract_keywords(text, top_n=30):
vectorizer = CountVectorizer(stop_words='english', max_features=top_n)
X = vectorizer.fit_transform([text])
return set(vectorizer.get_feature_names_out())

def compare_keywords(resume_text, jd_text):
resume_keywords = extract_keywords(resume_text)
jd_keywords = extract_keywords(jd_text)

missing_keywords = jd_keywords - resume_keywords
common_keywords = resume_keywords & jd_keywords

return {
"resume_keywords": list(resume_keywords),
"jd_keywords": list(jd_keywords),
"common_keywords": list(common_keywords),
"missing_keywords": list(missing_keywords),
}
62 changes: 61 additions & 1 deletion ResumeGPT/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
from OCR_Reader import CVsReader
from ChatGPT_Pipeline import CVsInfoExtractor
import sys
from datetime import datetime
import csv
import os

# Fetching command line arguments
cvs_directory_path_arg, openai_api_key_arg, desired_positions_arg = sys.argv[1], sys.argv[2], sys.argv[3].split(",")
Expand All @@ -26,4 +29,61 @@

# Use the extract_cv_info method of the CVsInfoExtractor instance to extract the desired information from the CVs.
# This method presumably returns a list of dataframes, each dataframe corresponding to the extracted information from each CV.
extract_cv_info_dfs = cvs_info_extractor.extract_cv_info()
extract_cv_info_dfs = cvs_info_extractor.extract_cv_info()
# Get Job Description from user input
print("\n--- Job Description Keyword Gap Analysis ---")
print("Paste the job description below (press Enter twice to finish):")

jd_lines = []
while True:
line = input()
if line == "":
break
jd_lines.append(line)
job_description_text = " ".join(jd_lines)

print("\n==== Summary ====")
print(f"Total CVs Processed: {len(cvs_content_df)}")
print(f"Job Description Keywords: {len(job_description_text.split())} words")



os.makedirs("output", exist_ok=True)
csv_path = "output/keyword_analysis.csv"

with open(csv_path, "w", newline="") as csvfile:
writer = csv.writer(csvfile)
writer.writerow(["Filename", "Common Keywords", "Missing Keywords"])

for index, row in cvs_content_df.iterrows():
filename = row['Filename']
results = compare_keywords(row['CV_Text'], job_description_text)
writer.writerow([
filename,
", ".join(results['common_keywords']),
", ".join(results['missing_keywords'])
])
print(f"Keyword analysis saved to {csv_path}")
# Optional: export to JSON if --json flag is passed
if "--json" in sys.argv:
output_json = []
for index, row in cvs_content_df.iterrows():
results = compare_keywords(row['CV_Text'], job_description_text)
output_json.append({
"filename": row['Filename'],
"common_keywords": results['common_keywords'],
"missing_keywords": results['missing_keywords']
})

with open("output/keyword_analysis.json", "w") as json_file:
import json
json.dump(output_json, json_file, indent=2)

print(" Keyword analysis also saved to output/keyword_analysis.json")


with open("logs/keyword_gap_log.txt", "a") as f:
f.write(f"\n--- {datetime.now()} ---\n")
f.write(f"Resume: {filename}\n")
f.write(f"Common: {results['common_keywords']}\n")
f.write(f"Missing: {results['missing_keywords']}\n")