SourceCode Analysis using Llama2

This project implements a chatbot using Retrieval Question Answering (QA) techniques. The chatbot is built using Flask, LangChain, Hugging Face Transformers, and Llama2.

Overview

The SourceCode Analysis using Llama2 project is a system designed to analyze source code repositories using advanced natural language processing techniques. The project workflow involves several key steps:

Git Repository Ingestion:
- The system reads a Git repository and stores its contents locally in a designated "repo" folder. This folder serves as the source of code for analysis.
Chunking:
- After ingesting the repository, the system breaks down the source code into smaller, manageable "chunks." These chunks could represent individual files, functions, or sections of code.
Retrieval QA using Llama2:
- With the code chunks prepared, the system applies retrieval-based question-answering (QA) techniques using Llama2. Llama2, a powerful language model, is utilized to understand and process natural language queries related to the source code.
Source Code Analysis:
- The retrieval QA process enables the system to analyze the source code in response to user queries or prompts. Users can inquire about specific code functions, documentation, or implementation details, and the system leverages Llama2 to provide accurate and relevant responses.

Key Components:

GitPython Integration:
- GitPython is used to interact with Git repositories programmatically, facilitating the ingestion of source code into the system.
Chunking Mechanism:
- The system implements a chunking mechanism to break down the source code into smaller units, enabling more focused analysis and retrieval.
GenAI (Llama2) Integration:
- GenAI (Llama2) serves as the core AI model for natural language understanding and question-answering. It powers the retrieval-based QA process, allowing the system to respond to user queries effectively.
Flask Web Interface:
- The project may incorporate a Flask-based web interface where users can interact with the system by submitting queries or prompts related to the source code. The interface provides a user-friendly way to access the analysis capabilities of the system.

Installation

Clone the repository:

git clone https://github.com/deepakthakur-92/SourceCode_Analysis_using_GenAI.git

Create a virtual environment:

conda create -n envname python=3.8 -y
conda activate envname
pip install -r requirements.txt

Download the quantize model and place it in the model/ directory.

## Download the Llama 2 Model:

llama-2-7b-chat.ggmlv3.q4_0.bin


## From the following link:
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/tree/main

Tools Used

Flask: Web framework used for building the chatbot interface.
LangChain: Python library for natural language processing tasks such as document retrieval and question answering.
Hugging Face Transformers: Library providing pre-trained models and tools for natural language understanding tasks.
Chromadb: Library for efficient similarity search and clustering of dense vectors.
GitPython: Library for interacting with Git repositories programmatically, providing functionality for tasks such as repository management, committing changes, and querying repository history.

Usage

Run the Flask app:

python app.py

Access the chatbot interface through your web browser at http://localhost:8080.
Enter your questions in the input field and click "Submit" to receive answers from the chatbot.

Screenshot:

Benefits:

Efficient Source Code Understanding: By leveraging advanced NLP techniques and AI models like Llama2, the system can understand and interpret source code in a human-like manner, enabling comprehensive analysis and insights.
Enhanced Developer Productivity: Developers and software engineers can benefit from quick access to relevant code information and insights, facilitating faster debugging, code comprehension, and development workflows.
Improved Documentation and Knowledge Management: The system can assist in generating documentation, answering questions, and capturing knowledge about the codebase, contributing to better code maintenance and collaboration.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
research		research
src		src
static		static
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
image-1.png		image-1.png
image.png		image.png
requirements.txt		requirements.txt
setup.py		setup.py
store_index.py		store_index.py
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SourceCode Analysis using Llama2

Overview

Git Repository Ingestion:

Chunking:

Retrieval QA using Llama2:

Source Code Analysis:

Key Components:

GitPython Integration:

Chunking Mechanism:

GenAI (Llama2) Integration:

Flask Web Interface:

Installation

Tools Used

Usage

Screenshot:

Benefits:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

deepakthakur-92/SourceCode_Analysis_using_GenAI

Folders and files

Latest commit

History

Repository files navigation

SourceCode Analysis using Llama2

Overview

Git Repository Ingestion:

Chunking:

Retrieval QA using Llama2:

Source Code Analysis:

Key Components:

GitPython Integration:

Chunking Mechanism:

GenAI (Llama2) Integration:

Flask Web Interface:

Installation

Tools Used

Usage

Screenshot:

Benefits:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages