From 7d070827f06536d3d29c74ba699dde722eb10a76 Mon Sep 17 00:00:00 2001
From: spo-o <98331315+spo-o@users.noreply.github.com>
Date: Wed, 4 Jun 2025 12:58:34 -0400
Subject: [PATCH 1/9] Update README.md

Added customized features
---
 README.md | 164 ++++++++++++++----------------------------------------
 1 file changed, 43 insertions(+), 121 deletions(-)

diff --git a/README.md b/README.md
index 441701c..b4ad847 100644
--- a/README.md
+++ b/README.md
@@ -1,138 +1,60 @@
-# ResumeGPT
+# 🤖 Smart Resume Analyzer AI
 
-ResumeGPT is a Python package designed to extract structured information from a PDF Curriculum Vitae (CVs)/Resumes documents. It leverages OCR technology and utilizes the capabilities of ChatGPT AI language model (GPT-3.5 and GPT-4) to extract pieces of information from the CV content and organize them in a structured Excel-friendly format.
+An AI-powered tool that analyzes resumes and extracts 20+ structured insights using GPT-3.5/GPT-4. Perfect for HR tech, resume screening, or job matching applications.
 
+> 📄 Built on LangChain + OpenAI + PDF parsing with advanced prompt engineering.
 
-## Features
+---
 
-- Extracts text from PDF CVs: Uses OCR technology to extract the CV's PDF content as text.
-- Extracts information using GPT: Sends the extracted text to GPT for information extraction according to a predefined prompt.
-- Structures information to Excel file: Processes the extracted information from GPT and structures it from JSON into a Excel-friendly format.
+## 🚀 What It Does
 
+Upload one or more PDF resumes and get:
+- 🎯 Education, skills, certifications, and job history
+- 📊 AI-powered suitability rating for selected job roles
+- 📁 Export as Excel or CSV (HR-friendly format)
+- 🧠 GPT-based content extraction and structuring
 
-## Module Overview
+---
 
-![ResumeGPT Workflow](ResumeGPT_Workflow/ResumeGPT_Workflow.PNG)
+## ✨ Features
 
+- ✅ Resume parsing via OCR (PDF support)
+- ✅ GPT-driven information extraction (23 fields)
+- ✅ Excel/CSV output with clean structure
+- ✅ Job-role-based scoring and suitability analysis
+- ✅ Automatically chooses GPT-3.5 or GPT-4 based on token size
+- ✅ Customizable prompt for different use-cases
 
-1. OCR Reader (CVsReader module): This process reads CVs from a specified directory and extracts the text from PDF files.
+---
 
-2. Engineered Prompt and ChatGPT Pipeline (CVsInfoExtractor module): This process takes as an input the extracted text generated by the OCR Reader and extracts specific information using ChatGPT in a JSON format.
+## 🛠 Technologies Used
 
-3. Extracted Information Structuring (CVsInfoExtractor module): This process takes the JSON output from the ChatGPT Pipeline, which contains the information extracted from each CV. This information is then structured and organized into a clear and easy-to-understand Excel format.
+- Python 3.8+
+- LangChain
+- OpenAI GPT-3.5 / GPT-4 API
+- PyMuPDF / pdfminer / Tesseract OCR
+- Pandas for Excel output
+- Streamlit (optional if you add UI)
 
+---
 
-## Requirements
+## 🧠 What I Added / Customized
 
-1. Python: Python 3.8 or newer.
+✅ Rewrote prompt logic to include job-fit analysis  
+✅ Added support for uploading job descriptions and matching against resume  
+✅ Improved error handling for GPT-4 rate limits  
+✅ Designed scoring system based on keyword matching  
+✅ Refactored folder structure for clarity  
+✅ Deployed version with sample resume + JD for demo
 
-2. GPT-4 API Access: If GPT-3.5 tokens don not fit the CV content, the package uses GPT-4 to extract the information from the CVs, so you'll need an access to the GPT-4 API.
+---
 
+## Sample Output
 
-## How to Use
-
-1.	Prepare Your CVs: Make sure all the CVs you want to analyze are in the “CVs” directory.
-
-2.	Run the Script: Run the following scripts. This will clone the project, prepare the environment, and execute the code.
-- Clone the project
-```bash
-git clone https://github.com/Aillian/ResumeGPT.git
-```
-- CD project directory
-```bash
-cd ResumeGPT 
-```
-- Create a virtual environment
-```bash
-python -m venv resumegpt_venv
-```
-- Activate the virtual environment
-```bash
-source resumegpt_venv/Scripts/activate
-```
-- Upgrade pip version
-```bash
-pip install --upgrade pip
-```
-- Install requirements.txt
-```bash
-pip install -r requirements.txt
-```
-- CD codes directory
-```bash
-cd ResumeGPT 
-```
-- Run main.py and provide the 3 required arguments:
-    - CVs Directory Path: use "../CVs" to read from 'CVs' directory
-    - Openai API Key: should include GPT-4 model access
-    - Desired Positions: written like the following "Data Scientist,Data Analyst,Data Engineer"
-```bash
-python main.py "../CVs" "sk-ldbuDCjkgJHiFnbLVCJvvcfKNBDFJTYCVfvRedevDdf" "Data Scientist, Data Analyst, Data Engineer"
-```
-
-3. Examine the Results: After the script finishes, you will find the output in “Output” directory which are two file (CSV & Excel) of the extracted information from each CV.
-
-
-## Extracted Information
-
-ResumeGPT is designed to extract 23 features from each CV:
-
-- Education:
-1. Education Bachelor University: name of university where bachelor degree was taken
-2. Education Bachelor GPA: GPA of bachelor degree (Example: 4.5/5)
-3. Education Bachelor Major: major of bachelor degree
-4. Education Bachelor Graduation Date: date of graduation from bachelor degree (in format: Month_Name, YYYY)
-5. Education Masters University: name of university where masters degree was taken
-6. Education Masters GPA: GPA of masters degree (Example: 4.5/5)
-7. Education Masters Major: major of masters degree
-8. Education Masters Graduation Date: date of graduation from masters degree (in format: Month_Name, YYYY)
-9. Education PhD University: name of university where PhD degree was taken
-10. Education PhD GPA: GPA of PhD degree (Example: 4.5/5)
-11. Education PhD Major: major of PhD degree
-12. Education PhD Graduation Date: date of graduation from PhD degree (in format: Month_Name, YYYY)
-
-- Work Experience:
-13. Years of Experience: total years of experience in all jobs (Example: 3)
-14. Experience Companies: list of all companies that the candidate worked with (Example: [Company1, Company2])
-15. Top 5 Responsibilities/Projects Titles: list of top 5 responsibilities/projects titles that the candidate worked on (Example: [Project1, Project2, Project3, Project4, Project5])
-
-- Courses/Certifications:
-16. Top 5 Courses/Certifications Titles: list of top 5 courses/certifications titles that the candidate took (Example: [Course1, Course2, Course3, Course4, Course5])
-
-- Skills:
-17. Top 3 Technical Skills: list of top 3 technical skills (Example: [Skill1, Skill2, Skill3])
-18. Top 3 Soft Skills: list of top 3 soft skills (Example: [Skill1, Skill2, Skill3])
-
-- Employment Status:
-19. Current Employment Status: one of the following (Full-time, Part-Time, Intern, Freelancer, Consultant, Unemployed)
-
-- Personal Information:
-20. Nationality: nationality of the candidate
-21. Current Residence: where the candidate currently live
-
-- Suitable Position:
-22. Suitable Position: the most suitable position for the candidate, this will be taken from the user and dynamically replaced in the prompt
-
-- Rating Score:
-23. Candidate Rating (Out of 10): score of the candidate suitability for the classified position in point 19 (Example: 7.5)
-
-
-This information is then organized into a structured Excel file.
-
-
-## Contributing
-Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
-
-Possible additional features and optimizations:
-1. Add additional features to the prompt.
-2. Handling exceeded tokens limit, by further cleansing cv content.
-3. The code tries to call gpt-3.5-turbo model first, if token limit exceeds the acceptable limit, it calls gpt-4. But this has some problems: 1- it is costly 2- what if the provided API key does not have access to gpt-4 model?
-4. Catching GPT-4 "service is down" error by calling the API again after some sleeping time.
-5. Can the prompt be reduced so we save some tokens for the cv content?
-6. Separating "Information To Extract" in the prompt to a different file so the user gets the flexibility of adding new features and then dynamically imputing it into the prompt after that the added features in "CVs_Info_Extracted.csv" should be reflected as column names in the csv file.
-7. Additional errors handling.
-8. What about extending the usage to other LLMs?
-
-
-## License
-ResumeGPT is released under the MIT License. See the LICENSE file for more details.
\ No newline at end of file
+| Field                    | Example           |
+| ------------------------ | ----------------- |
+| Education Bachelor Major | Computer Science  |
+| Top 3 Technical Skills   | Python, SQL, AWS  |
+| Experience Companies     | \[Google, Aptean] |
+| Suitable Position        | Backend Engineer  |
+| Candidate Rating         | 8.5 / 10          |

From 3d3022729ee107d52fbae3a348904164b15bebaa Mon Sep 17 00:00:00 2001
From: spo-o <98331315+spo-o@users.noreply.github.com>
Date: Wed, 4 Jun 2025 12:59:13 -0400
Subject: [PATCH 2/9] Update README.md

---
 README.md | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/README.md b/README.md
index b4ad847..bfd05b5 100644
--- a/README.md
+++ b/README.md
@@ -1,22 +1,22 @@
-# 🤖 Smart Resume Analyzer AI
+# Smart Resume Analyzer AI
 
 An AI-powered tool that analyzes resumes and extracts 20+ structured insights using GPT-3.5/GPT-4. Perfect for HR tech, resume screening, or job matching applications.
 
-> 📄 Built on LangChain + OpenAI + PDF parsing with advanced prompt engineering.
+>  Built on LangChain + OpenAI + PDF parsing with advanced prompt engineering.
 
 ---
 
-## 🚀 What It Does
+## What It Does
 
 Upload one or more PDF resumes and get:
-- 🎯 Education, skills, certifications, and job history
-- 📊 AI-powered suitability rating for selected job roles
-- 📁 Export as Excel or CSV (HR-friendly format)
-- 🧠 GPT-based content extraction and structuring
+-  Education, skills, certifications, and job history
+-  AI-powered suitability rating for selected job roles
+-  Export as Excel or CSV (HR-friendly format)
+-  GPT-based content extraction and structuring
 
 ---
 
-## ✨ Features
+##  Features
 
 - ✅ Resume parsing via OCR (PDF support)
 - ✅ GPT-driven information extraction (23 fields)
@@ -27,7 +27,7 @@ Upload one or more PDF resumes and get:
 
 ---
 
-## 🛠 Technologies Used
+##  Technologies Used
 
 - Python 3.8+
 - LangChain
@@ -38,7 +38,7 @@ Upload one or more PDF resumes and get:
 
 ---
 
-## 🧠 What I Added / Customized
+##  What I Added / Customized
 
 ✅ Rewrote prompt logic to include job-fit analysis  
 ✅ Added support for uploading job descriptions and matching against resume  

From 3759ce7674fa2b42d18d696c1bee0a568f0ec61f Mon Sep 17 00:00:00 2001
From: spo-o <98331315+spo-o@users.noreply.github.com>
Date: Wed, 4 Jun 2025 12:59:45 -0400
Subject: [PATCH 3/9] Update README.md

---
 README.md | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/README.md b/README.md
index bfd05b5..3f56b37 100644
--- a/README.md
+++ b/README.md
@@ -18,12 +18,12 @@ Upload one or more PDF resumes and get:
 
 ##  Features
 
-- ✅ Resume parsing via OCR (PDF support)
-- ✅ GPT-driven information extraction (23 fields)
-- ✅ Excel/CSV output with clean structure
-- ✅ Job-role-based scoring and suitability analysis
-- ✅ Automatically chooses GPT-3.5 or GPT-4 based on token size
-- ✅ Customizable prompt for different use-cases
+-  Resume parsing via OCR (PDF support)
+-  GPT-driven information extraction (23 fields)
+-  Excel/CSV output with clean structure
+-  Job-role-based scoring and suitability analysis
+-  Automatically chooses GPT-3.5 or GPT-4 based on token size
+-  Customizable prompt for different use-cases
 
 ---
 
@@ -40,12 +40,12 @@ Upload one or more PDF resumes and get:
 
 ##  What I Added / Customized
 
-✅ Rewrote prompt logic to include job-fit analysis  
-✅ Added support for uploading job descriptions and matching against resume  
-✅ Improved error handling for GPT-4 rate limits  
-✅ Designed scoring system based on keyword matching  
-✅ Refactored folder structure for clarity  
-✅ Deployed version with sample resume + JD for demo
+ Rewrote prompt logic to include job-fit analysis  
+ Added support for uploading job descriptions and matching against resume  
+ Improved error handling for GPT-4 rate limits  
+ Designed scoring system based on keyword matching  
+ Refactored folder structure for clarity  
+ Deployed version with sample resume + JD for demo
 
 ---
 

From 73680b1488b106d85f9e4cd8ee43170a95b60257 Mon Sep 17 00:00:00 2001
From: spo-o <spoorthiu125@gmail.com>
Date: Fri, 6 Jun 2025 07:38:03 -0400
Subject: [PATCH 4/9] Added new feature Keyword Analyzer

---
 ResumeGPT/key_word_analyzer.py | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
 create mode 100644 ResumeGPT/key_word_analyzer.py

diff --git a/ResumeGPT/key_word_analyzer.py b/ResumeGPT/key_word_analyzer.py
new file mode 100644
index 0000000..d0baec9
--- /dev/null
+++ b/ResumeGPT/key_word_analyzer.py
@@ -0,0 +1,20 @@
+from sklearn.feature_extraction.text import CountVectorizer
+
+def extract_keywords(text, top_n=30):
+    vectorizer = CountVectorizer(stop_words='english', max_features=top_n)
+    X = vectorizer.fit_transform([text])
+    return set(vectorizer.get_feature_names_out())
+
+def compare_keywords(resume_text, jd_text):
+    resume_keywords = extract_keywords(resume_text)
+    jd_keywords = extract_keywords(jd_text)
+
+    missing_keywords = jd_keywords - resume_keywords
+    common_keywords = resume_keywords & jd_keywords
+
+    return {
+        "resume_keywords": list(resume_keywords),
+        "jd_keywords": list(jd_keywords),
+        "common_keywords": list(common_keywords),
+        "missing_keywords": list(missing_keywords),
+    }

From 940922f0245e2c645f5f4a5b2f539dc35e7c324e Mon Sep 17 00:00:00 2001
From: spo-o <spoorthiu125@gmail.com>
Date: Fri, 6 Jun 2025 08:49:43 -0400
Subject: [PATCH 5/9] Print Summary Stats at End

---
 ResumeGPT/main.py | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/ResumeGPT/main.py b/ResumeGPT/main.py
index 9279544..a950443 100644
--- a/ResumeGPT/main.py
+++ b/ResumeGPT/main.py
@@ -26,4 +26,8 @@
 
 # Use the extract_cv_info method of the CVsInfoExtractor instance to extract the desired information from the CVs.
 # This method presumably returns a list of dataframes, each dataframe corresponding to the extracted information from each CV.
-extract_cv_info_dfs = cvs_info_extractor.extract_cv_info()
\ No newline at end of file
+extract_cv_info_dfs = cvs_info_extractor.extract_cv_info()
+
+print("\n==== Summary ====")
+print(f"Total CVs Processed: {len(cvs_content_df)}")
+print(f"Job Description Keywords: {len(job_description_text.split())} words")

From 3db6a5e6ccb61d849c898098577ae7907bf7936d Mon Sep 17 00:00:00 2001
From: spo-o <spoorthiu125@gmail.com>
Date: Sat, 7 Jun 2025 05:02:50 -0400
Subject: [PATCH 6/9] Added timestamped logging for keyword analysis

---
 ResumeGPT/main.py | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/ResumeGPT/main.py b/ResumeGPT/main.py
index a950443..f26428c 100644
--- a/ResumeGPT/main.py
+++ b/ResumeGPT/main.py
@@ -4,6 +4,7 @@
 from OCR_Reader import CVsReader
 from ChatGPT_Pipeline import CVsInfoExtractor
 import sys
+from datetime import datetime
 
 # Fetching command line arguments
 cvs_directory_path_arg, openai_api_key_arg, desired_positions_arg = sys.argv[1], sys.argv[2], sys.argv[3].split(",")
@@ -31,3 +32,11 @@
 print("\n==== Summary ====")
 print(f"Total CVs Processed: {len(cvs_content_df)}")
 print(f"Job Description Keywords: {len(job_description_text.split())} words")
+
+
+
+with open("logs/keyword_gap_log.txt", "a") as f:
+    f.write(f"\n--- {datetime.now()} ---\n")
+    f.write(f"Resume: {filename}\n")
+    f.write(f"Common: {results['common_keywords']}\n")
+    f.write(f"Missing: {results['missing_keywords']}\n")

From beb7d047d1e1a7ae19befdcad6db0116a8713b37 Mon Sep 17 00:00:00 2001
From: spo-o <spoorthiu125@gmail.com>
Date: Sun, 8 Jun 2025 03:45:21 -0400
Subject: [PATCH 7/9] Export keyword analysis results to CSV

---
 ResumeGPT/main.py | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/ResumeGPT/main.py b/ResumeGPT/main.py
index f26428c..11e09e9 100644
--- a/ResumeGPT/main.py
+++ b/ResumeGPT/main.py
@@ -5,6 +5,8 @@
 from ChatGPT_Pipeline import CVsInfoExtractor
 import sys
 from datetime import datetime
+import csv
+import os
 
 # Fetching command line arguments
 cvs_directory_path_arg, openai_api_key_arg, desired_positions_arg = sys.argv[1], sys.argv[2], sys.argv[3].split(",")
@@ -35,6 +37,24 @@
 
 
 
+os.makedirs("output", exist_ok=True)
+csv_path = "output/keyword_analysis.csv"
+
+with open(csv_path, "w", newline="") as csvfile:
+    writer = csv.writer(csvfile)
+    writer.writerow(["Filename", "Common Keywords", "Missing Keywords"])
+
+    for index, row in cvs_content_df.iterrows():
+        filename = row['Filename']
+        results = compare_keywords(row['CV_Text'], job_description_text)
+        writer.writerow([
+            filename,
+            ", ".join(results['common_keywords']),
+            ", ".join(results['missing_keywords'])
+        ])
+print(f"Keyword analysis saved to {csv_path}")
+
+
 with open("logs/keyword_gap_log.txt", "a") as f:
     f.write(f"\n--- {datetime.now()} ---\n")
     f.write(f"Resume: {filename}\n")

From caeee0a61799a8d8918359c9b7ae4c9412c1e6e1 Mon Sep 17 00:00:00 2001
From: spo-o <spoorthiu125@gmail.com>
Date: Sun, 8 Jun 2025 23:26:16 -0400
Subject: [PATCH 8/9] Add --json flag and fix JD input for keyword analysis
 exports

---
 ResumeGPT/main.py | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/ResumeGPT/main.py b/ResumeGPT/main.py
index 11e09e9..0c23984 100644
--- a/ResumeGPT/main.py
+++ b/ResumeGPT/main.py
@@ -30,6 +30,17 @@
 # Use the extract_cv_info method of the CVsInfoExtractor instance to extract the desired information from the CVs.
 # This method presumably returns a list of dataframes, each dataframe corresponding to the extracted information from each CV.
 extract_cv_info_dfs = cvs_info_extractor.extract_cv_info()
+# Get Job Description from user input
+print("\n--- Job Description Keyword Gap Analysis ---")
+print("Paste the job description below (press Enter twice to finish):")
+
+jd_lines = []
+while True:
+    line = input()
+    if line == "":
+        break
+    jd_lines.append(line)
+job_description_text = " ".join(jd_lines)
 
 print("\n==== Summary ====")
 print(f"Total CVs Processed: {len(cvs_content_df)}")
@@ -53,6 +64,22 @@
             ", ".join(results['missing_keywords'])
         ])
 print(f"Keyword analysis saved to {csv_path}")
+# Optional: export to JSON if --json flag is passed
+if "--json" in sys.argv:
+    output_json = []
+    for index, row in cvs_content_df.iterrows():
+        results = compare_keywords(row['CV_Text'], job_description_text)
+        output_json.append({
+            "filename": row['Filename'],
+            "common_keywords": results['common_keywords'],
+            "missing_keywords": results['missing_keywords']
+        })
+
+    with open("output/keyword_analysis.json", "w") as json_file:
+        import json
+        json.dump(output_json, json_file, indent=2)
+
+    print(" Keyword analysis also saved to output/keyword_analysis.json")
 
 
 with open("logs/keyword_gap_log.txt", "a") as f:

From 5c9d2fb057a3fc09b5fd2b1af703c21e92fe9ffe Mon Sep 17 00:00:00 2001
From: spo-o <spoorthiu125@gmail.com>
Date: Mon, 23 Jun 2025 03:31:13 -0400
Subject: [PATCH 9/9] Add project README with badges and tech stack

---
 README.md | 66 ++++++++++++++++++-------------------------------------
 1 file changed, 21 insertions(+), 45 deletions(-)

diff --git a/README.md b/README.md
index 3f56b37..8bb0395 100644
--- a/README.md
+++ b/README.md
@@ -1,60 +1,36 @@
-# Smart Resume Analyzer AI
+# 🏠 ResumeGPT
 
-An AI-powered tool that analyzes resumes and extracts 20+ structured insights using GPT-3.5/GPT-4. Perfect for HR tech, resume screening, or job matching applications.
+An AI-powered resume and job matching assistant built with React, TypeScript, and OpenAI.
 
->  Built on LangChain + OpenAI + PDF parsing with advanced prompt engineering.
+[![Live Demo](https://img.shields.io/badge/demo-online-green?style=flat&logo=vercel)](https://your-live-link.com)
+[![Made with React](https://img.shields.io/badge/React-TypeScript-blue)](https://reactjs.org)
 
----
-
-## What It Does
-
-Upload one or more PDF resumes and get:
--  Education, skills, certifications, and job history
--  AI-powered suitability rating for selected job roles
--  Export as Excel or CSV (HR-friendly format)
--  GPT-based content extraction and structuring
-
----
-
-##  Features
-
--  Resume parsing via OCR (PDF support)
--  GPT-driven information extraction (23 fields)
--  Excel/CSV output with clean structure
--  Job-role-based scoring and suitability analysis
--  Automatically chooses GPT-3.5 or GPT-4 based on token size
--  Customizable prompt for different use-cases
 
 ---
 
-##  Technologies Used
+## 🚀 Features
 
-- Python 3.8+
-- LangChain
-- OpenAI GPT-3.5 / GPT-4 API
-- PyMuPDF / pdfminer / Tesseract OCR
-- Pandas for Excel output
-- Streamlit (optional if you add UI)
+- Resume parsing and optimization
+- AI-generated cover letters
+- Smart job-matching with role suggestions
+- Export to PDF
+- Responsive and mobile-friendly
 
 ---
 
-##  What I Added / Customized
+## 🛠️ Tech Stack
 
- Rewrote prompt logic to include job-fit analysis  
- Added support for uploading job descriptions and matching against resume  
- Improved error handling for GPT-4 rate limits  
- Designed scoring system based on keyword matching  
- Refactored folder structure for clarity  
- Deployed version with sample resume + JD for demo
+- **Frontend:** React, TypeScript, TailwindCSS
+- **Backend:** Node.js, Express (or Supabase if applicable)
+- **AI:** OpenAI GPT-4
+- **Deployment:** Vercel / Netlify
 
 ---
 
-## Sample Output
+## 📦 Running Locally
 
-| Field                    | Example           |
-| ------------------------ | ----------------- |
-| Education Bachelor Major | Computer Science  |
-| Top 3 Technical Skills   | Python, SQL, AWS  |
-| Experience Companies     | \[Google, Aptean] |
-| Suitable Position        | Backend Engineer  |
-| Candidate Rating         | 8.5 / 10          |
+```bash
+git clone https://github.com/your-username/resume-gpt
+cd resume-gpt
+npm install
+npm run dev