- Introduction
- Project Features
- System Architecture
- File Structure
- Setup Instructions
- Running the Application
- Key Components
- Agents
- Tasks
- Tools
- LangGraph Workflow
- Database
- RAG Support
- Logging with Weave and LangSmith
- Dependencies
- Future Enhancements
- Contributing
- License
The Coding Assistant AI is an intelligent assistant application that captures screenshots, extracts coding questions, and answers them using a multi-agent architecture powered by CrewAI and LangGraph. It uses OpenAI's GPT-3.5 to generate answers to coding questions and supports advanced reasoning with RAG (Retrieval Augmented Generation). The assistant can handle cases where no question is found and can log activities using Weave and LangSmith. All interactions, including questions and answers, are stored in a SQLite database for future reference.
- Multi-Agent System: Built using CrewAI, with dedicated agents for screen capture, question extraction, and answering.
- Conditional Workflow: Managed via LangGraph, with branches for different states (e.g., no question found).
- Advanced Reasoning: Supports Retrieval Augmented Generation (RAG) for better context-based answers.
- Screenshot Capture: Takes screenshots when
Ctrl+S
is pressed. - OCR Support: Extracts coding questions from screenshots using Tesseract OCR.
- Database Storage: Stores questions and answers in a SQLite database for easy retrieval.
- Logging: Uses Weave and LangSmith for structured logging and event tracing.
- RAG Knowledge Base: Supports retrieval of relevant information from a knowledge base to augment GPT-3.5 answers.
The application is built around a multi-agent system where each agent performs a specific task. The process is orchestrated using LangGraph, which manages the flow from screen capture to question identification and answer generation.
- Ctrl+S Trigger: The user presses
Ctrl+S
, triggering the screen capture agent. - Screenshot and OCR: The screenshot is processed to extract a coding question using OCR.
- Agent Workflow: If a question is found, the answer agent generates a response using GPT-3.5. If no question is found, a handling agent provides feedback.
- Database Storage: The question and its corresponding answer are saved in the database.
- RAG Search: The answer agent can augment the GPT-3.5 response with relevant information from a RAG knowledge base.
/coding-assistant-ai/
│
├── /screenshots/ # Stores screenshots taken by the assistant
├── agents.py # Defines CrewAI agents for capturing, extracting, and answering questions
├── crew.py # Manages the multi-agent crew and task orchestration
├── database.py # Handles SQLite database connections and Q&A storage
├── graph.py # LangGraph workflow definition
├── main.py # Main entry point for running the application
├── nodes.py # Nodes for LangGraph, handling individual steps in the workflow
├── rag.py # Implements the RAG (Retrieval Augmented Generation) knowledge base
├── state.py # Defines the state structure used by the LangGraph workflow
├── tasks.py # Defines tasks for agents to perform (screen capture, answering questions, etc.)
├── tools.py # Provides reusable tools for agents (screen capture, OCR, answering, etc.)
├── app.log # Log file generated by Weave for event tracking
└── .env # Environment file for sensitive information (e.g., API keys)
- Python 3.8+
- Tesseract OCR installed on your machine.
- Windows: Download Tesseract OCR
- Linux:
sudo apt-get install tesseract-ocr
- MacOS:
brew install tesseract
- OpenAI API Key (For GPT-3.5)
-
Clone the Repository:
git clone https://github.com/yourusername/coding-assistant-ai.git cd coding-assistant-ai
-
Create a Virtual Environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install Required Packages:
pip install -r requirements.txt
-
Create
.env
file:- Add your OpenAI API key and other sensitive information.
touch .env
Inside
.env
:OPENAI_API_KEY=your_openai_api_key_here
-
Create the Screenshots Directory:
mkdir screenshots
-
Run the application with:
python main.py
-
Trigger the Workflow:
- Press
Ctrl+S
to capture a screenshot and start the question identification and answering process.
- Press
-
Logging:
- Logs will be written to
app.log
by Weave, capturing all important events.
- Logs will be written to
-
View Questions and Answers:
- The SQLite database
qa.db
will store the extracted coding questions and their corresponding answers.
- The SQLite database
- Screen Capture Agent: Captures the screen and extracts text using OCR.
- Question Not Found Agent: Handles cases where no question is found and provides feedback.
- Answer Agent: Uses GPT-3.5 to answer coding questions with advanced reasoning and RAG support.
- capture_and_identify_task: Captures the screen and identifies coding questions.
- handle_no_question_task: Provides feedback if no question is found.
- answer_question_task: Generates an answer using GPT-3.5 and RAG.
- CaptureScreenTool: Captures the screen and saves it as an image file.
- ExtractQuestionTool: Extracts coding questions from screenshots using Tesseract OCR.
- AnswerQuestionTool: Sends coding questions to GPT-3.5 for answers.
- RAGSearchTool: Supports RAG by searching the knowledge base for relevant information.
- wait_for_next_trigger: Waits for the user to press
Ctrl+S
. - capture_and_identify: Captures the screen and extracts a coding question.
- check_question_found: Determines whether a question was found in the screenshot.
- answer_question: Generates an answer using GPT-3.5.
- store_result: Stores the question and answer in the database.
- The SQLite database (
qa.db
) stores all extracted coding questions and their corresponding answers. It is initialized indatabase.py
and is updated with every interaction.
- RAG (Retrieval Augmented Generation) is implemented using the
KnowledgeBase
class inrag.py
. This supports the retrieval of relevant documents from a knowledge base to provide context-aware answers.
The application uses Weave and LangSmith for logging and event tracing. Key actions such as capturing screenshots, finding questions, and generating answers are logged in app.log
. This provides a clear audit trail of the assistant's activities.
Install dependencies via requirements.txt
or manually as listed below:
- CrewAI: Multi-agent architecture.
- LangGraph: State graph management.
- Weave: Structured logging.
- LangSmith: Event tracing.
- OpenAI: GPT-3.5 API for question answering.
- Pytesseract: OCR library for extracting text from screenshots.
- Pillow: Image handling and manipulation.
- SQLite: Database for storing questions and answers.
- Tesseract: OCR engine for recognizing text from screenshots.
- Expand RAG Knowledge Base: Integrate with a larger dataset or knowledge base to improve the quality of retrieved information.
- Error Handling: Add more robust error handling and retries for tasks such as API calls and image processing.
- Improved Question Identification: Use advanced NLP techniques to better identify and classify coding questions.
- User Interface: Add a simple GUI to interact with the assistant instead of relying on
Ctrl+S
key presses.
If you'd like to contribute to this project, feel free to open an issue or submit a pull request on the Git
Hub repository. Please ensure all contributions adhere to the project’s code of conduct and follow the contribution guidelines.
This project is licensed under the MIT License - see the LICENSE
file for more details.
The Coding Assistant AI is a powerful multi-agent system for answering coding questions by combining OCR, GPT-3.5, and retrieval-augmented generation (RAG). With its modular design and extensive logging, it provides a strong foundation for automating coding assistance and task management.
Happy Coding!