feder-cr · feder-cr · Oct 2, 2024 · Oct 1, 2024 · Oct 1, 2024 · Oct 1, 2024
diff --git a/ANSWERS_UTILS.md b/ANSWERS_UTILS.md
@@ -0,0 +1,54 @@
+# Answer Editor and Cleaner
+
+This project consists of two main Python scripts: `answer_editor.py` and `cleanse_answers.py`. These scripts work together to manage and clean a set of questions and answers stored in JSON format.
+
+## answer_editor.py
+
+This script is a Flask web application that provides a user interface for viewing and editing a set of questions and answers.
+
+### Key Features:
+- Uses Flask and Flask-Bootstrap for the web interface
+- Reads and writes data to a JSON file (`answers.json`)
+- Allows viewing all questions and answers
+- Supports editing answers
+- Handles both radio button and text input answers
+- Allows deletion of individual question-answer pairs
+
+### How it works:
+1. The main route (`/`) displays all questions and answers when accessed via GET request
+2. When a POST request is made (i.e., when the form is submitted), it updates the answers in the JSON file
+3. It uses a template (`index.html`, not shown in the provided code) to render the web interface
+
+## cleanse_answers.py
+
+This script is designed to clean and sanitize the questions and answers stored in the JSON file.
+
+### Key Features:
+- Removes duplicate words in questions
+- Converts text to lowercase
+- Removes common suffixes and unnecessary characters
+- Eliminates non-ASCII characters
+- Removes duplicate questions
+
+### How it works:
+1. Reads the input JSON file (`answers.json`)
+2. Sanitizes each question using the `sanitize_text` function
+3. Removes duplicate questions
+4. Writes the cleansed data to a new JSON file (`cleansed_answers.json`)
+
+## Usage
+
+1. Run `answer_editor.py` to start the web application for viewing and editing answers:
+   ```
+   python answer_editor.py
+   ```
+   Then open a web browser and navigate to `http://localhost:5000`
+
+2. After editing answers, run `cleanse_answers.py` to clean the data:
+   ```
+   python cleanse_answers.py
+   ```
+
+This will create a new file `cleansed_answers.json` with the sanitized data.
+
+Note: Make sure you have Flask and Flask-Bootstrap installed (`pip install flask flask-bootstrap`) before running `answer_editor.py`. (they are inlcuded in the requirements.txt file)
diff --git a/README.md b/README.md
@@ -153,8 +153,25 @@ Auto_Jobs_Applier_AIHawk steps in as a game-changing solution to these challenge
    pip install -r requirements.txt
    ```
 
+6. **Copy example files in data_folder for configuration:**
+   ```bash
+   cp data_folder_example/*.yaml data_folder/
+   ```
+
 ## Configuration
 
+
+### 0. Data Folder
+
+The `data_folder` directory contains all the files necessary for the bot to operate. This folder should be structured as follows:
+
+  ```bash
+  data_folder/
+  ├── config.yaml
+  ├── plain_text_resume.yaml
+  └── secrets.yaml
+  ```
+  Examples of each file are provided in the `data_folder_example` directory.
 ### 1. secrets.yaml
 
 This file contains sensitive information. Never share or commit this file to version control.
@@ -624,6 +641,10 @@ yaml.scanner.ScannerError: while scanning a simple key
 
 For further assistance, please create an issue on the [GitHub repository](https://github.com/feder-cr/Auto_Jobs_Applier_AIHawk/issues) with detailed information about your problem, including error messages and your configuration (with sensitive information removed).
 
+**Answer Editor and Cleaner**
+
+See ANSWERS_UTILS.md for more information on the Answer Editor and Cleaner.
+
 ## Setup Documents
 
 ### Ollama & Gemini Setup
@@ -677,3 +698,4 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
 
 ## Disclaimer
 This tool, Auto_Jobs_Applier_AIHawk, is intended for educational purposes only. The creator assumes no responsibility for any consequences arising from its use. Users are advised to comply with the terms of service of relevant platforms and adhere to all applicable laws, regulations, and ethical guidelines. The use of automated tools for job applications may carry risks, including potential impacts on user accounts. Proceed with caution and at your own discretion.
+
diff --git a/answer_editor.py b/answer_editor.py
@@ -0,0 +1,47 @@
+from flask import Flask, render_template, request, jsonify, redirect, url_for
+import json
+import os
+from pathlib import Path
+from flask_bootstrap import Bootstrap
+
+app = Flask(__name__)
+Bootstrap(app)
+
+JSON_FILE = Path(__file__).parent / 'answers.json'
+
+@app.route('/', methods=['GET', 'POST'])
+def index():
+    if request.method == 'POST':
+        return update()
+    else:
+        if not JSON_FILE.exists():
+            data = []  # Default empty list if file doesn't exist
+        else:
+            with open(JSON_FILE, 'r') as f:
+                data = json.load(f)
+                print(data)
+        return render_template('index.html', data=data if isinstance(data, list) else [])
+
+def update():
+    if not JSON_FILE.exists():
+        data = []
+    else:
+        with open(JSON_FILE, 'r') as f:
+            data = json.load(f)
+
+    updated_data = []
+    for i, item in enumerate(data):
+        if f'delete_{i}' not in request.form:
+            if item['type'] == 'radio':
+                item['answer'] = request.form.get(f'answer_{i}_radio', item['answer'])
+            else:
+                item['answer'] = request.form.get(f'answer_{i}', item['answer'])
+            updated_data.append(item)
+
+    with open(JSON_FILE, 'w') as f:
+        json.dump(updated_data, f, indent=2)
+
+    return redirect(url_for('index'))
+
+if __name__ == '__main__':
+    app.run(debug=True)
diff --git a/cleanse_answers.py b/cleanse_answers.py
@@ -0,0 +1,47 @@
+import json
+import re
+
+def sanitize_text(text: str) -> str:
+    # Remove duplicates by splitting and rejoining
+    text = text.rstrip()
+    text = re.sub(r'\s+', ' ', text)
+    text = text.replace('?', '').replace('"', '').replace('\\', '')
+    words = text.lower().split()
+    unique_words = []
+    for word in words:
+        if word not in unique_words:
+            unique_words.append(word)
+    text = ' '.join(unique_words)
+
+    # Remove common suffixes
+    text = re.sub(r'\s*\(?required\)?', '', text, flags=re.IGNORECASE)
+    text = re.sub(r'(\s*\(?yes\/no\)?|\s*\(?yes\)?|\s*\(?no\)?|\?)$', '', text, flags=re.IGNORECASE)
+    sanitized_text = re.sub(r'[^[:ascii:]]','', text)
+    return sanitized_text
+
+def cleanse_answers_json(input_file: str, output_file: str):
+    with open(input_file, 'r') as f:
+        data = json.load(f)
+
+    cleansed_data = []
+    seen_questions = set()
+
+    for item in data:
+        sanitized_question = sanitize_text(item['question'])
+        if sanitized_question not in seen_questions:
+            seen_questions.add(sanitized_question)
+            cleansed_item = {
+                'type': item['type'],
+                'question': sanitized_question,
+                'answer': item['answer']
+            }
+            cleansed_data.append(cleansed_item)
+
+    with open(output_file, 'w') as f:
+        json.dump(cleansed_data, f, indent=4)
+
+if __name__ == "__main__":
+    input_file = "answers.json"
+    output_file = "cleansed_answers.json"
+    cleanse_answers_json(input_file, output_file)
+    print(f"Cleansed answers have been saved to {output_file}")
diff --git a/data_folder/config.yaml b/data_folder/config.yaml
diff --git a/data_folder/plain_text_resume.yaml b/data_folder/plain_text_resume.yaml
diff --git a/data_folder/secrets.yaml b/data_folder/secrets.yaml