GitHub RAG Tutorial

Project Overview

This project allows you to chat with your starred GitHub repositories to easily find the repos you need. It utilizes RAG (Retrieval-Augmented Generation) technology in the background. The process flow is as follows:

OAuth GitHub login
Fetch all your starred repos
Retrieve each repo's README
Create chunks from the README content
Generate embeddings for each chunk
Store the embeddings in a Supabase PostgreSQL vector store with pgvector enabled

Prerequisites

Python 3.8+
Node.js 14+
Supabase account (for PostgreSQL with pgvector)
Poetry (for Python dependency management)
npm or pnpm (for Node.js dependency management)

Backend Setup

Navigate to the backend directory:
```
cd backend
```

Install Poetry if you haven't already:

curl -sSL https://install.python-poetry.org | python3 -

Add Poetry to your PATH:

echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc

Or if using bash:

echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bash_profile

Restart your terminal or run:
```
source ~/.zshrc  # or ~/.bash_profile
```
Install dependencies:
```
poetry install
```
Copy the .env.example file to .env and configure the necessary environment variables:
```
cp .env.example .env
```

Start the backend server:

poetry run uvicorn server:app --host 0.0.0.0 --port 8000 --reload

Helicone Integration

The backend of this project uses Helicone for monitoring the Language Model (LLM). Helicone provides insights and analytics for your LLM usage, helping you optimize performance and costs.

To set up Helicone:

Sign up for a Helicone account if you haven't already.
Obtain your Helicone API key.
Add the Helicone API key to your backend/.env file:
```
HELICONE_API_KEY=your_helicone_api_key
```
Ensure that your LLM requests in the backend code are properly configured to use Helicone.

For more detailed information on setting up and using Helicone, please refer to the Helicone Quick Start Guide.

Web Frontend Setup

Navigate to the web directory:
```
cd web
```
Install dependencies:
```
npm install
```
or if using pnpm:
```
pnpm install
```
Copy the .env.example file to .env and configure the necessary environment variables:
```
cp .env.example .env
```
Start the development server:
```
npm run dev
```
or with pnpm:
```
pnpm dev
```

Database Setup

This project uses Supabase PostgreSQL with pgvector enabled for storing and querying vector embeddings. To set up your database:

Create a Supabase account if you haven't already.
Create a new project in Supabase.
In your project's SQL editor, enable the pgvector extension:
```
CREATE EXTENSION IF NOT EXISTS vector;
```

Create the necessary tables and functions:

-- Create a table for storing repository information
CREATE TABLE repositories (
  id SERIAL PRIMARY KEY,
  github_username TEXT NOT NULL,
  name TEXT NOT NULL,
  full_name TEXT NOT NULL,
  description TEXT,
  url TEXT NOT NULL,
  language TEXT,
  stars INTEGER,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Create a table for storing README chunks
CREATE TABLE readme_chunks (
  id SERIAL PRIMARY KEY,
  repository_id INTEGER REFERENCES repositories(id) ON DELETE CASCADE,
  chunk_index INTEGER NOT NULL,
  content TEXT NOT NULL,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Create a table for storing embeddings
CREATE TABLE embeddings (
  id SERIAL PRIMARY KEY,
  repository_id INTEGER REFERENCES repositories(id) ON DELETE CASCADE,
  chunk_id INTEGER REFERENCES readme_chunks(id) ON DELETE CASCADE,
  embedding vector(1536),
  created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Create indexes for better query performance
CREATE INDEX idx_repositories_github_username ON repositories(github_username);
CREATE INDEX idx_readme_chunks_repository_id ON readme_chunks(repository_id);
CREATE INDEX idx_embeddings_repository_id ON embeddings(repository_id);
CREATE INDEX idx_embeddings_chunk_id ON embeddings(chunk_id);

-- Create a function to search for similar embeddings
CREATE OR REPLACE FUNCTION search_similar_embeddings(query_embedding vector(1536), match_threshold FLOAT, match_count INT)
RETURNS TABLE (
  repository_id INTEGER,
  chunk_id INTEGER,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    e.repository_id,
    e.chunk_id,
    1 - (e.embedding <=> query_embedding) AS similarity
  FROM
    embeddings e
  WHERE
    1 - (e.embedding <=> query_embedding) > match_threshold
  ORDER BY
    similarity DESC
  LIMIT
    match_count;
END;
$$;

In your project settings, find your database connection details and add them to your backend/.env file:
```
DATABASE_URL=your_supabase_postgres_connection_string
```

For more information on using Supabase with pgvector, refer to the Supabase Vector documentation.

GitHub OAuth Configuration

Register a new OAuth application on GitHub:
- Go to your GitHub account settings
- Navigate to "Developer settings" > "OAuth Apps" > "New OAuth App"
- Set the "Authorization callback URL" to http://localhost:3000/api/auth/callback/github
Once registered, you'll receive a Client ID and Client Secret.

Add these credentials to your web/.env file:

GITHUB_CLIENT_ID=your_client_id
GITHUB_CLIENT_SECRET=your_client_secret

Running the Application

Start the backend server (from the backend directory):

poetry run uvicorn server:app --host 0.0.0.0 --port 8000 --reload

Start the web frontend (from the web directory):
```
npm run dev
```
or with pnpm:
```
pnpm dev
```
Access the application in your web browser at http://localhost:3000

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
backend		backend
images		images
web		web
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitHub RAG Tutorial

Project Overview

Table of Contents

Prerequisites

Backend Setup

Helicone Integration

Web Frontend Setup

Database Setup

GitHub OAuth Configuration

Running the Application

About

Releases

Packages

Languages

XamHans/github-rag

Folders and files

Latest commit

History

Repository files navigation

GitHub RAG Tutorial

Project Overview

Table of Contents

Prerequisites

Backend Setup

Helicone Integration

Web Frontend Setup

Database Setup

GitHub OAuth Configuration

Running the Application

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages