Aillian · spo-o · Jun 4, 2025 · Jun 4, 2025 · Jun 4, 2025 · Jun 6, 2025
diff --git a/README.md b/README.md
@@ -1,138 +1,36 @@
-# ResumeGPT
+# 🏠 ResumeGPT
 
-ResumeGPT is a Python package designed to extract structured information from a PDF Curriculum Vitae (CVs)/Resumes documents. It leverages OCR technology and utilizes the capabilities of ChatGPT AI language model (GPT-3.5 and GPT-4) to extract pieces of information from the CV content and organize them in a structured Excel-friendly format.
+An AI-powered resume and job matching assistant built with React, TypeScript, and OpenAI.
 
+[![Live Demo](https://img.shields.io/badge/demo-online-green?style=flat&logo=vercel)](https://your-live-link.com)
+[![Made with React](https://img.shields.io/badge/React-TypeScript-blue)](https://reactjs.org)
 
-## Features
 
-- Extracts text from PDF CVs: Uses OCR technology to extract the CV's PDF content as text.
-- Extracts information using GPT: Sends the extracted text to GPT for information extraction according to a predefined prompt.
-- Structures information to Excel file: Processes the extracted information from GPT and structures it from JSON into a Excel-friendly format.
+---
 
+## 🚀 Features
 
-## Module Overview
+- Resume parsing and optimization
+- AI-generated cover letters
+- Smart job-matching with role suggestions
+- Export to PDF
+- Responsive and mobile-friendly
 
-![ResumeGPT Workflow](ResumeGPT_Workflow/ResumeGPT_Workflow.PNG)
+---
 
+## 🛠️ Tech Stack
 
-1. OCR Reader (CVsReader module): This process reads CVs from a specified directory and extracts the text from PDF files.
+- **Frontend:** React, TypeScript, TailwindCSS
+- **Backend:** Node.js, Express (or Supabase if applicable)
+- **AI:** OpenAI GPT-4
+- **Deployment:** Vercel / Netlify
 
-2. Engineered Prompt and ChatGPT Pipeline (CVsInfoExtractor module): This process takes as an input the extracted text generated by the OCR Reader and extracts specific information using ChatGPT in a JSON format.
+---
 
-3. Extracted Information Structuring (CVsInfoExtractor module): This process takes the JSON output from the ChatGPT Pipeline, which contains the information extracted from each CV. This information is then structured and organized into a clear and easy-to-understand Excel format.
+## 📦 Running Locally
 
-
-## Requirements
-
-1. Python: Python 3.8 or newer.
-
-2. GPT-4 API Access: If GPT-3.5 tokens don not fit the CV content, the package uses GPT-4 to extract the information from the CVs, so you'll need an access to the GPT-4 API.
-
-
-## How to Use
-
-1.	Prepare Your CVs: Make sure all the CVs you want to analyze are in the “CVs” directory.
-
-2.	Run the Script: Run the following scripts. This will clone the project, prepare the environment, and execute the code.
-- Clone the project
-```bash
-git clone https://github.com/Aillian/ResumeGPT.git
-```
-- CD project directory
-```bash
-cd ResumeGPT 
-```
-- Create a virtual environment
 ```bash
-python -m venv resumegpt_venv
-```
-- Activate the virtual environment
-```bash
-source resumegpt_venv/Scripts/activate
-```
-- Upgrade pip version
-```bash
-pip install --upgrade pip
-```
-- Install requirements.txt
-```bash
-pip install -r requirements.txt
-```
-- CD codes directory
-```bash
-cd ResumeGPT 
-```
-- Run main.py and provide the 3 required arguments:
-    - CVs Directory Path: use "../CVs" to read from 'CVs' directory
-    - Openai API Key: should include GPT-4 model access
-    - Desired Positions: written like the following "Data Scientist,Data Analyst,Data Engineer"
-```bash
-python main.py "../CVs" "sk-ldbuDCjkgJHiFnbLVCJvvcfKNBDFJTYCVfvRedevDdf" "Data Scientist, Data Analyst, Data Engineer"
-```
-
-3. Examine the Results: After the script finishes, you will find the output in “Output” directory which are two file (CSV & Excel) of the extracted information from each CV.
-
-
-## Extracted Information
-
-ResumeGPT is designed to extract 23 features from each CV:
-
-- Education:
-1. Education Bachelor University: name of university where bachelor degree was taken
-2. Education Bachelor GPA: GPA of bachelor degree (Example: 4.5/5)
-3. Education Bachelor Major: major of bachelor degree
-4. Education Bachelor Graduation Date: date of graduation from bachelor degree (in format: Month_Name, YYYY)
-5. Education Masters University: name of university where masters degree was taken
-6. Education Masters GPA: GPA of masters degree (Example: 4.5/5)
-7. Education Masters Major: major of masters degree
-8. Education Masters Graduation Date: date of graduation from masters degree (in format: Month_Name, YYYY)
-9. Education PhD University: name of university where PhD degree was taken
-10. Education PhD GPA: GPA of PhD degree (Example: 4.5/5)
-11. Education PhD Major: major of PhD degree
-12. Education PhD Graduation Date: date of graduation from PhD degree (in format: Month_Name, YYYY)
-
-- Work Experience:
-13. Years of Experience: total years of experience in all jobs (Example: 3)
-14. Experience Companies: list of all companies that the candidate worked with (Example: [Company1, Company2])
-15. Top 5 Responsibilities/Projects Titles: list of top 5 responsibilities/projects titles that the candidate worked on (Example: [Project1, Project2, Project3, Project4, Project5])
-
-- Courses/Certifications:
-16. Top 5 Courses/Certifications Titles: list of top 5 courses/certifications titles that the candidate took (Example: [Course1, Course2, Course3, Course4, Course5])
-
-- Skills:
-17. Top 3 Technical Skills: list of top 3 technical skills (Example: [Skill1, Skill2, Skill3])
-18. Top 3 Soft Skills: list of top 3 soft skills (Example: [Skill1, Skill2, Skill3])
-
-- Employment Status:
-19. Current Employment Status: one of the following (Full-time, Part-Time, Intern, Freelancer, Consultant, Unemployed)
-
-- Personal Information:
-20. Nationality: nationality of the candidate
-21. Current Residence: where the candidate currently live
-
-- Suitable Position:
-22. Suitable Position: the most suitable position for the candidate, this will be taken from the user and dynamically replaced in the prompt
-
-- Rating Score:
-23. Candidate Rating (Out of 10): score of the candidate suitability for the classified position in point 19 (Example: 7.5)
-
-
-This information is then organized into a structured Excel file.
-
-
-## Contributing
-Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
-
-Possible additional features and optimizations:
-1. Add additional features to the prompt.
-2. Handling exceeded tokens limit, by further cleansing cv content.
-3. The code tries to call gpt-3.5-turbo model first, if token limit exceeds the acceptable limit, it calls gpt-4. But this has some problems: 1- it is costly 2- what if the provided API key does not have access to gpt-4 model?
-4. Catching GPT-4 "service is down" error by calling the API again after some sleeping time.
-5. Can the prompt be reduced so we save some tokens for the cv content?
-6. Separating "Information To Extract" in the prompt to a different file so the user gets the flexibility of adding new features and then dynamically imputing it into the prompt after that the added features in "CVs_Info_Extracted.csv" should be reflected as column names in the csv file.
-7. Additional errors handling.
-8. What about extending the usage to other LLMs?
-
-
-## License
-ResumeGPT is released under the MIT License. See the LICENSE file for more details.
+git clone https://github.com/your-username/resume-gpt
+cd resume-gpt
+npm install
+npm run dev
diff --git a/ResumeGPT/key_word_analyzer.py b/ResumeGPT/key_word_analyzer.py
@@ -0,0 +1,20 @@
+from sklearn.feature_extraction.text import CountVectorizer
+
+def extract_keywords(text, top_n=30):
+    vectorizer = CountVectorizer(stop_words='english', max_features=top_n)
+    X = vectorizer.fit_transform([text])
+    return set(vectorizer.get_feature_names_out())
+
+def compare_keywords(resume_text, jd_text):
+    resume_keywords = extract_keywords(resume_text)
+    jd_keywords = extract_keywords(jd_text)
+
+    missing_keywords = jd_keywords - resume_keywords
+    common_keywords = resume_keywords & jd_keywords
+
+    return {
+        "resume_keywords": list(resume_keywords),
+        "jd_keywords": list(jd_keywords),
+        "common_keywords": list(common_keywords),
+        "missing_keywords": list(missing_keywords),
+    }
diff --git a/ResumeGPT/main.py b/ResumeGPT/main.py
@@ -4,6 +4,9 @@
 from OCR_Reader import CVsReader
 from ChatGPT_Pipeline import CVsInfoExtractor
 import sys
+from datetime import datetime
+import csv
+import os
 
 # Fetching command line arguments
 cvs_directory_path_arg, openai_api_key_arg, desired_positions_arg = sys.argv[1], sys.argv[2], sys.argv[3].split(",")
@@ -26,4 +29,61 @@
 
 # Use the extract_cv_info method of the CVsInfoExtractor instance to extract the desired information from the CVs.
 # This method presumably returns a list of dataframes, each dataframe corresponding to the extracted information from each CV.
-extract_cv_info_dfs = cvs_info_extractor.extract_cv_info()
+extract_cv_info_dfs = cvs_info_extractor.extract_cv_info()
+# Get Job Description from user input
+print("\n--- Job Description Keyword Gap Analysis ---")
+print("Paste the job description below (press Enter twice to finish):")
+
+jd_lines = []
+while True:
+    line = input()
+    if line == "":
+        break
+    jd_lines.append(line)
+job_description_text = " ".join(jd_lines)
+
+print("\n==== Summary ====")
+print(f"Total CVs Processed: {len(cvs_content_df)}")
+print(f"Job Description Keywords: {len(job_description_text.split())} words")
+
+
+
+os.makedirs("output", exist_ok=True)
+csv_path = "output/keyword_analysis.csv"
+
+with open(csv_path, "w", newline="") as csvfile:
+    writer = csv.writer(csvfile)
+    writer.writerow(["Filename", "Common Keywords", "Missing Keywords"])
+
+    for index, row in cvs_content_df.iterrows():
+        filename = row['Filename']
+        results = compare_keywords(row['CV_Text'], job_description_text)
+        writer.writerow([
+            filename,
+            ", ".join(results['common_keywords']),
+            ", ".join(results['missing_keywords'])
+        ])
+print(f"Keyword analysis saved to {csv_path}")
+# Optional: export to JSON if --json flag is passed
+if "--json" in sys.argv:
+    output_json = []
+    for index, row in cvs_content_df.iterrows():
+        results = compare_keywords(row['CV_Text'], job_description_text)
+        output_json.append({
+            "filename": row['Filename'],
+            "common_keywords": results['common_keywords'],
+            "missing_keywords": results['missing_keywords']
+        })
+
+    with open("output/keyword_analysis.json", "w") as json_file:
+        import json
+        json.dump(output_json, json_file, indent=2)
+
+    print(" Keyword analysis also saved to output/keyword_analysis.json")
+
+
+with open("logs/keyword_gap_log.txt", "a") as f:
+    f.write(f"\n--- {datetime.now()} ---\n")
+    f.write(f"Resume: {filename}\n")
+    f.write(f"Common: {results['common_keywords']}\n")
+    f.write(f"Missing: {results['missing_keywords']}\n")