Skip to content

reabdi/RAG-Guru

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Goal:

Enhance document handling efficiency and retrieval accuracy using state-of-the-art AI technologies.

  • Document Uploading: Processes involve setting up a temporary cloud-based storage for uploaded files and converting them into a manageable format for further operations.
  • Page Evaluation: The system evaluates documents, which helps in assessing the quality and completeness of the document data.
  • Text Processing: Utilizes advanced machine learning models from Hugging Face for text processing. This includes generating embeddings that represent the document contents in a numerical format suitable for comparison and retrieval.
  • Index Management: Creates and manages an index/vector database using ChroamDB or Pinecone, which stores these embeddings efficiently. This setup allows for quick retrieval of documents based on their content similarity.
  • Question-Answering Capabilities: Integrates machine learning models to extract and retrieve information relevant to user queries, enhancing the accessibility and usability of stored documents.

Steps to follow:

1. Creating the environment:

  • Mac
    python3 -m venv .venv
    source .venv/bin/activate
  • Windows
    python -m venv .venv
    .venv\scripts\activate

2. Installing the pakages

pip install --upgrade --quiet -r requirements.txt

3. Running the Interface

streamlit run main.py

User Interface Layout:

alt text

Notes

Notes for Pincone Index:
Note 1: The dimension of Google's "embedding-001" embeddings is 768.

Note 2: For semantic similarity tasks, the cosine similarity between the embeddings is typically used to measure their relatedness. Embeddings with higher cosine similarity are considered more semantically similar. For retrieval tasks like "retrieval_query" and "retrieval_document", the relative distances between the query and document embeddings matter.
See more here

Note 3: The Pinecode index can be creaed directly from the code too. See this link for more information.

Notes for Google Gemini Models: Note 1: See this link to get more infoamtion about Gemini's available models.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages