From 7d070827f06536d3d29c74ba699dde722eb10a76 Mon Sep 17 00:00:00 2001 From: spo-o <98331315+spo-o@users.noreply.github.com> Date: Wed, 4 Jun 2025 12:58:34 -0400 Subject: [PATCH 1/9] Update README.md Added customized features --- README.md | 164 ++++++++++++++---------------------------------------- 1 file changed, 43 insertions(+), 121 deletions(-) diff --git a/README.md b/README.md index 441701c..b4ad847 100644 --- a/README.md +++ b/README.md @@ -1,138 +1,60 @@ -# ResumeGPT +# 🤖 Smart Resume Analyzer AI -ResumeGPT is a Python package designed to extract structured information from a PDF Curriculum Vitae (CVs)/Resumes documents. It leverages OCR technology and utilizes the capabilities of ChatGPT AI language model (GPT-3.5 and GPT-4) to extract pieces of information from the CV content and organize them in a structured Excel-friendly format. +An AI-powered tool that analyzes resumes and extracts 20+ structured insights using GPT-3.5/GPT-4. Perfect for HR tech, resume screening, or job matching applications. +> 📄 Built on LangChain + OpenAI + PDF parsing with advanced prompt engineering. -## Features +--- -- Extracts text from PDF CVs: Uses OCR technology to extract the CV's PDF content as text. -- Extracts information using GPT: Sends the extracted text to GPT for information extraction according to a predefined prompt. -- Structures information to Excel file: Processes the extracted information from GPT and structures it from JSON into a Excel-friendly format. +## 🚀 What It Does +Upload one or more PDF resumes and get: +- 🎯 Education, skills, certifications, and job history +- 📊 AI-powered suitability rating for selected job roles +- 📁 Export as Excel or CSV (HR-friendly format) +- 🧠 GPT-based content extraction and structuring -## Module Overview +--- -![ResumeGPT Workflow](ResumeGPT_Workflow/ResumeGPT_Workflow.PNG) +## ✨ Features +- ✅ Resume parsing via OCR (PDF support) +- ✅ GPT-driven information extraction (23 fields) +- ✅ Excel/CSV output with clean structure +- ✅ Job-role-based scoring and suitability analysis +- ✅ Automatically chooses GPT-3.5 or GPT-4 based on token size +- ✅ Customizable prompt for different use-cases -1. OCR Reader (CVsReader module): This process reads CVs from a specified directory and extracts the text from PDF files. +--- -2. Engineered Prompt and ChatGPT Pipeline (CVsInfoExtractor module): This process takes as an input the extracted text generated by the OCR Reader and extracts specific information using ChatGPT in a JSON format. +## 🛠 Technologies Used -3. Extracted Information Structuring (CVsInfoExtractor module): This process takes the JSON output from the ChatGPT Pipeline, which contains the information extracted from each CV. This information is then structured and organized into a clear and easy-to-understand Excel format. +- Python 3.8+ +- LangChain +- OpenAI GPT-3.5 / GPT-4 API +- PyMuPDF / pdfminer / Tesseract OCR +- Pandas for Excel output +- Streamlit (optional if you add UI) +--- -## Requirements +## 🧠 What I Added / Customized -1. Python: Python 3.8 or newer. +✅ Rewrote prompt logic to include job-fit analysis +✅ Added support for uploading job descriptions and matching against resume +✅ Improved error handling for GPT-4 rate limits +✅ Designed scoring system based on keyword matching +✅ Refactored folder structure for clarity +✅ Deployed version with sample resume + JD for demo -2. GPT-4 API Access: If GPT-3.5 tokens don not fit the CV content, the package uses GPT-4 to extract the information from the CVs, so you'll need an access to the GPT-4 API. +--- +## Sample Output -## How to Use - -1. Prepare Your CVs: Make sure all the CVs you want to analyze are in the “CVs” directory. - -2. Run the Script: Run the following scripts. This will clone the project, prepare the environment, and execute the code. -- Clone the project -```bash -git clone https://github.com/Aillian/ResumeGPT.git -``` -- CD project directory -```bash -cd ResumeGPT -``` -- Create a virtual environment -```bash -python -m venv resumegpt_venv -``` -- Activate the virtual environment -```bash -source resumegpt_venv/Scripts/activate -``` -- Upgrade pip version -```bash -pip install --upgrade pip -``` -- Install requirements.txt -```bash -pip install -r requirements.txt -``` -- CD codes directory -```bash -cd ResumeGPT -``` -- Run main.py and provide the 3 required arguments: - - CVs Directory Path: use "../CVs" to read from 'CVs' directory - - Openai API Key: should include GPT-4 model access - - Desired Positions: written like the following "Data Scientist,Data Analyst,Data Engineer" -```bash -python main.py "../CVs" "sk-ldbuDCjkgJHiFnbLVCJvvcfKNBDFJTYCVfvRedevDdf" "Data Scientist, Data Analyst, Data Engineer" -``` - -3. Examine the Results: After the script finishes, you will find the output in “Output” directory which are two file (CSV & Excel) of the extracted information from each CV. - - -## Extracted Information - -ResumeGPT is designed to extract 23 features from each CV: - -- Education: -1. Education Bachelor University: name of university where bachelor degree was taken -2. Education Bachelor GPA: GPA of bachelor degree (Example: 4.5/5) -3. Education Bachelor Major: major of bachelor degree -4. Education Bachelor Graduation Date: date of graduation from bachelor degree (in format: Month_Name, YYYY) -5. Education Masters University: name of university where masters degree was taken -6. Education Masters GPA: GPA of masters degree (Example: 4.5/5) -7. Education Masters Major: major of masters degree -8. Education Masters Graduation Date: date of graduation from masters degree (in format: Month_Name, YYYY) -9. Education PhD University: name of university where PhD degree was taken -10. Education PhD GPA: GPA of PhD degree (Example: 4.5/5) -11. Education PhD Major: major of PhD degree -12. Education PhD Graduation Date: date of graduation from PhD degree (in format: Month_Name, YYYY) - -- Work Experience: -13. Years of Experience: total years of experience in all jobs (Example: 3) -14. Experience Companies: list of all companies that the candidate worked with (Example: [Company1, Company2]) -15. Top 5 Responsibilities/Projects Titles: list of top 5 responsibilities/projects titles that the candidate worked on (Example: [Project1, Project2, Project3, Project4, Project5]) - -- Courses/Certifications: -16. Top 5 Courses/Certifications Titles: list of top 5 courses/certifications titles that the candidate took (Example: [Course1, Course2, Course3, Course4, Course5]) - -- Skills: -17. Top 3 Technical Skills: list of top 3 technical skills (Example: [Skill1, Skill2, Skill3]) -18. Top 3 Soft Skills: list of top 3 soft skills (Example: [Skill1, Skill2, Skill3]) - -- Employment Status: -19. Current Employment Status: one of the following (Full-time, Part-Time, Intern, Freelancer, Consultant, Unemployed) - -- Personal Information: -20. Nationality: nationality of the candidate -21. Current Residence: where the candidate currently live - -- Suitable Position: -22. Suitable Position: the most suitable position for the candidate, this will be taken from the user and dynamically replaced in the prompt - -- Rating Score: -23. Candidate Rating (Out of 10): score of the candidate suitability for the classified position in point 19 (Example: 7.5) - - -This information is then organized into a structured Excel file. - - -## Contributing -Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. - -Possible additional features and optimizations: -1. Add additional features to the prompt. -2. Handling exceeded tokens limit, by further cleansing cv content. -3. The code tries to call gpt-3.5-turbo model first, if token limit exceeds the acceptable limit, it calls gpt-4. But this has some problems: 1- it is costly 2- what if the provided API key does not have access to gpt-4 model? -4. Catching GPT-4 "service is down" error by calling the API again after some sleeping time. -5. Can the prompt be reduced so we save some tokens for the cv content? -6. Separating "Information To Extract" in the prompt to a different file so the user gets the flexibility of adding new features and then dynamically imputing it into the prompt after that the added features in "CVs_Info_Extracted.csv" should be reflected as column names in the csv file. -7. Additional errors handling. -8. What about extending the usage to other LLMs? - - -## License -ResumeGPT is released under the MIT License. See the LICENSE file for more details. \ No newline at end of file +| Field | Example | +| ------------------------ | ----------------- | +| Education Bachelor Major | Computer Science | +| Top 3 Technical Skills | Python, SQL, AWS | +| Experience Companies | \[Google, Aptean] | +| Suitable Position | Backend Engineer | +| Candidate Rating | 8.5 / 10 | From 3d3022729ee107d52fbae3a348904164b15bebaa Mon Sep 17 00:00:00 2001 From: spo-o <98331315+spo-o@users.noreply.github.com> Date: Wed, 4 Jun 2025 12:59:13 -0400 Subject: [PATCH 2/9] Update README.md --- README.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index b4ad847..bfd05b5 100644 --- a/README.md +++ b/README.md @@ -1,22 +1,22 @@ -# 🤖 Smart Resume Analyzer AI +# Smart Resume Analyzer AI An AI-powered tool that analyzes resumes and extracts 20+ structured insights using GPT-3.5/GPT-4. Perfect for HR tech, resume screening, or job matching applications. -> 📄 Built on LangChain + OpenAI + PDF parsing with advanced prompt engineering. +> Built on LangChain + OpenAI + PDF parsing with advanced prompt engineering. --- -## 🚀 What It Does +## What It Does Upload one or more PDF resumes and get: -- 🎯 Education, skills, certifications, and job history -- 📊 AI-powered suitability rating for selected job roles -- 📁 Export as Excel or CSV (HR-friendly format) -- 🧠 GPT-based content extraction and structuring +- Education, skills, certifications, and job history +- AI-powered suitability rating for selected job roles +- Export as Excel or CSV (HR-friendly format) +- GPT-based content extraction and structuring --- -## ✨ Features +## Features - ✅ Resume parsing via OCR (PDF support) - ✅ GPT-driven information extraction (23 fields) @@ -27,7 +27,7 @@ Upload one or more PDF resumes and get: --- -## 🛠 Technologies Used +## Technologies Used - Python 3.8+ - LangChain @@ -38,7 +38,7 @@ Upload one or more PDF resumes and get: --- -## 🧠 What I Added / Customized +## What I Added / Customized ✅ Rewrote prompt logic to include job-fit analysis ✅ Added support for uploading job descriptions and matching against resume From 3759ce7674fa2b42d18d696c1bee0a568f0ec61f Mon Sep 17 00:00:00 2001 From: spo-o <98331315+spo-o@users.noreply.github.com> Date: Wed, 4 Jun 2025 12:59:45 -0400 Subject: [PATCH 3/9] Update README.md --- README.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index bfd05b5..3f56b37 100644 --- a/README.md +++ b/README.md @@ -18,12 +18,12 @@ Upload one or more PDF resumes and get: ## Features -- ✅ Resume parsing via OCR (PDF support) -- ✅ GPT-driven information extraction (23 fields) -- ✅ Excel/CSV output with clean structure -- ✅ Job-role-based scoring and suitability analysis -- ✅ Automatically chooses GPT-3.5 or GPT-4 based on token size -- ✅ Customizable prompt for different use-cases +- Resume parsing via OCR (PDF support) +- GPT-driven information extraction (23 fields) +- Excel/CSV output with clean structure +- Job-role-based scoring and suitability analysis +- Automatically chooses GPT-3.5 or GPT-4 based on token size +- Customizable prompt for different use-cases --- @@ -40,12 +40,12 @@ Upload one or more PDF resumes and get: ## What I Added / Customized -✅ Rewrote prompt logic to include job-fit analysis -✅ Added support for uploading job descriptions and matching against resume -✅ Improved error handling for GPT-4 rate limits -✅ Designed scoring system based on keyword matching -✅ Refactored folder structure for clarity -✅ Deployed version with sample resume + JD for demo + Rewrote prompt logic to include job-fit analysis + Added support for uploading job descriptions and matching against resume + Improved error handling for GPT-4 rate limits + Designed scoring system based on keyword matching + Refactored folder structure for clarity + Deployed version with sample resume + JD for demo --- From 73680b1488b106d85f9e4cd8ee43170a95b60257 Mon Sep 17 00:00:00 2001 From: spo-o Date: Fri, 6 Jun 2025 07:38:03 -0400 Subject: [PATCH 4/9] Added new feature Keyword Analyzer --- ResumeGPT/key_word_analyzer.py | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 ResumeGPT/key_word_analyzer.py diff --git a/ResumeGPT/key_word_analyzer.py b/ResumeGPT/key_word_analyzer.py new file mode 100644 index 0000000..d0baec9 --- /dev/null +++ b/ResumeGPT/key_word_analyzer.py @@ -0,0 +1,20 @@ +from sklearn.feature_extraction.text import CountVectorizer + +def extract_keywords(text, top_n=30): + vectorizer = CountVectorizer(stop_words='english', max_features=top_n) + X = vectorizer.fit_transform([text]) + return set(vectorizer.get_feature_names_out()) + +def compare_keywords(resume_text, jd_text): + resume_keywords = extract_keywords(resume_text) + jd_keywords = extract_keywords(jd_text) + + missing_keywords = jd_keywords - resume_keywords + common_keywords = resume_keywords & jd_keywords + + return { + "resume_keywords": list(resume_keywords), + "jd_keywords": list(jd_keywords), + "common_keywords": list(common_keywords), + "missing_keywords": list(missing_keywords), + } From 940922f0245e2c645f5f4a5b2f539dc35e7c324e Mon Sep 17 00:00:00 2001 From: spo-o Date: Fri, 6 Jun 2025 08:49:43 -0400 Subject: [PATCH 5/9] Print Summary Stats at End --- ResumeGPT/main.py | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/ResumeGPT/main.py b/ResumeGPT/main.py index 9279544..a950443 100644 --- a/ResumeGPT/main.py +++ b/ResumeGPT/main.py @@ -26,4 +26,8 @@ # Use the extract_cv_info method of the CVsInfoExtractor instance to extract the desired information from the CVs. # This method presumably returns a list of dataframes, each dataframe corresponding to the extracted information from each CV. -extract_cv_info_dfs = cvs_info_extractor.extract_cv_info() \ No newline at end of file +extract_cv_info_dfs = cvs_info_extractor.extract_cv_info() + +print("\n==== Summary ====") +print(f"Total CVs Processed: {len(cvs_content_df)}") +print(f"Job Description Keywords: {len(job_description_text.split())} words") From 3db6a5e6ccb61d849c898098577ae7907bf7936d Mon Sep 17 00:00:00 2001 From: spo-o Date: Sat, 7 Jun 2025 05:02:50 -0400 Subject: [PATCH 6/9] Added timestamped logging for keyword analysis --- ResumeGPT/main.py | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/ResumeGPT/main.py b/ResumeGPT/main.py index a950443..f26428c 100644 --- a/ResumeGPT/main.py +++ b/ResumeGPT/main.py @@ -4,6 +4,7 @@ from OCR_Reader import CVsReader from ChatGPT_Pipeline import CVsInfoExtractor import sys +from datetime import datetime # Fetching command line arguments cvs_directory_path_arg, openai_api_key_arg, desired_positions_arg = sys.argv[1], sys.argv[2], sys.argv[3].split(",") @@ -31,3 +32,11 @@ print("\n==== Summary ====") print(f"Total CVs Processed: {len(cvs_content_df)}") print(f"Job Description Keywords: {len(job_description_text.split())} words") + + + +with open("logs/keyword_gap_log.txt", "a") as f: + f.write(f"\n--- {datetime.now()} ---\n") + f.write(f"Resume: {filename}\n") + f.write(f"Common: {results['common_keywords']}\n") + f.write(f"Missing: {results['missing_keywords']}\n") From beb7d047d1e1a7ae19befdcad6db0116a8713b37 Mon Sep 17 00:00:00 2001 From: spo-o Date: Sun, 8 Jun 2025 03:45:21 -0400 Subject: [PATCH 7/9] Export keyword analysis results to CSV --- ResumeGPT/main.py | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/ResumeGPT/main.py b/ResumeGPT/main.py index f26428c..11e09e9 100644 --- a/ResumeGPT/main.py +++ b/ResumeGPT/main.py @@ -5,6 +5,8 @@ from ChatGPT_Pipeline import CVsInfoExtractor import sys from datetime import datetime +import csv +import os # Fetching command line arguments cvs_directory_path_arg, openai_api_key_arg, desired_positions_arg = sys.argv[1], sys.argv[2], sys.argv[3].split(",") @@ -35,6 +37,24 @@ +os.makedirs("output", exist_ok=True) +csv_path = "output/keyword_analysis.csv" + +with open(csv_path, "w", newline="") as csvfile: + writer = csv.writer(csvfile) + writer.writerow(["Filename", "Common Keywords", "Missing Keywords"]) + + for index, row in cvs_content_df.iterrows(): + filename = row['Filename'] + results = compare_keywords(row['CV_Text'], job_description_text) + writer.writerow([ + filename, + ", ".join(results['common_keywords']), + ", ".join(results['missing_keywords']) + ]) +print(f"Keyword analysis saved to {csv_path}") + + with open("logs/keyword_gap_log.txt", "a") as f: f.write(f"\n--- {datetime.now()} ---\n") f.write(f"Resume: {filename}\n") From caeee0a61799a8d8918359c9b7ae4c9412c1e6e1 Mon Sep 17 00:00:00 2001 From: spo-o Date: Sun, 8 Jun 2025 23:26:16 -0400 Subject: [PATCH 8/9] Add --json flag and fix JD input for keyword analysis exports --- ResumeGPT/main.py | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/ResumeGPT/main.py b/ResumeGPT/main.py index 11e09e9..0c23984 100644 --- a/ResumeGPT/main.py +++ b/ResumeGPT/main.py @@ -30,6 +30,17 @@ # Use the extract_cv_info method of the CVsInfoExtractor instance to extract the desired information from the CVs. # This method presumably returns a list of dataframes, each dataframe corresponding to the extracted information from each CV. extract_cv_info_dfs = cvs_info_extractor.extract_cv_info() +# Get Job Description from user input +print("\n--- Job Description Keyword Gap Analysis ---") +print("Paste the job description below (press Enter twice to finish):") + +jd_lines = [] +while True: + line = input() + if line == "": + break + jd_lines.append(line) +job_description_text = " ".join(jd_lines) print("\n==== Summary ====") print(f"Total CVs Processed: {len(cvs_content_df)}") @@ -53,6 +64,22 @@ ", ".join(results['missing_keywords']) ]) print(f"Keyword analysis saved to {csv_path}") +# Optional: export to JSON if --json flag is passed +if "--json" in sys.argv: + output_json = [] + for index, row in cvs_content_df.iterrows(): + results = compare_keywords(row['CV_Text'], job_description_text) + output_json.append({ + "filename": row['Filename'], + "common_keywords": results['common_keywords'], + "missing_keywords": results['missing_keywords'] + }) + + with open("output/keyword_analysis.json", "w") as json_file: + import json + json.dump(output_json, json_file, indent=2) + + print(" Keyword analysis also saved to output/keyword_analysis.json") with open("logs/keyword_gap_log.txt", "a") as f: From 5c9d2fb057a3fc09b5fd2b1af703c21e92fe9ffe Mon Sep 17 00:00:00 2001 From: spo-o Date: Mon, 23 Jun 2025 03:31:13 -0400 Subject: [PATCH 9/9] Add project README with badges and tech stack --- README.md | 66 ++++++++++++++++++------------------------------------- 1 file changed, 21 insertions(+), 45 deletions(-) diff --git a/README.md b/README.md index 3f56b37..8bb0395 100644 --- a/README.md +++ b/README.md @@ -1,60 +1,36 @@ -# Smart Resume Analyzer AI +# 🏠 ResumeGPT -An AI-powered tool that analyzes resumes and extracts 20+ structured insights using GPT-3.5/GPT-4. Perfect for HR tech, resume screening, or job matching applications. +An AI-powered resume and job matching assistant built with React, TypeScript, and OpenAI. -> Built on LangChain + OpenAI + PDF parsing with advanced prompt engineering. +[![Live Demo](https://img.shields.io/badge/demo-online-green?style=flat&logo=vercel)](https://your-live-link.com) +[![Made with React](https://img.shields.io/badge/React-TypeScript-blue)](https://reactjs.org) ---- - -## What It Does - -Upload one or more PDF resumes and get: -- Education, skills, certifications, and job history -- AI-powered suitability rating for selected job roles -- Export as Excel or CSV (HR-friendly format) -- GPT-based content extraction and structuring - ---- - -## Features - -- Resume parsing via OCR (PDF support) -- GPT-driven information extraction (23 fields) -- Excel/CSV output with clean structure -- Job-role-based scoring and suitability analysis -- Automatically chooses GPT-3.5 or GPT-4 based on token size -- Customizable prompt for different use-cases --- -## Technologies Used +## 🚀 Features -- Python 3.8+ -- LangChain -- OpenAI GPT-3.5 / GPT-4 API -- PyMuPDF / pdfminer / Tesseract OCR -- Pandas for Excel output -- Streamlit (optional if you add UI) +- Resume parsing and optimization +- AI-generated cover letters +- Smart job-matching with role suggestions +- Export to PDF +- Responsive and mobile-friendly --- -## What I Added / Customized +## 🛠️ Tech Stack - Rewrote prompt logic to include job-fit analysis - Added support for uploading job descriptions and matching against resume - Improved error handling for GPT-4 rate limits - Designed scoring system based on keyword matching - Refactored folder structure for clarity - Deployed version with sample resume + JD for demo +- **Frontend:** React, TypeScript, TailwindCSS +- **Backend:** Node.js, Express (or Supabase if applicable) +- **AI:** OpenAI GPT-4 +- **Deployment:** Vercel / Netlify --- -## Sample Output +## 📦 Running Locally -| Field | Example | -| ------------------------ | ----------------- | -| Education Bachelor Major | Computer Science | -| Top 3 Technical Skills | Python, SQL, AWS | -| Experience Companies | \[Google, Aptean] | -| Suitable Position | Backend Engineer | -| Candidate Rating | 8.5 / 10 | +```bash +git clone https://github.com/your-username/resume-gpt +cd resume-gpt +npm install +npm run dev