GitHub - Pavansomisetty21/RAG-based-Intelligent-Conversational-AI-Agent-for-Knowledge-Extraction-Using-LangChain-Gemini-LLM: In this we implements a Retrieval-Augmented Generation (RAG) based conversational AI agent designed for intelligent knowledge extraction from PDF documents. Leveraging LangChain and Google’s Gemini LLM

RAG based Intelligent Conversational AI Agent for Knowledge Extraction using Langchain Gemini LLM

In the above google colab contain detailed code

Retrieval-Augmented Generation (RAG) is a framework that combines information retrieval with generative AI. It allows models to retrieve relevant information from external sources or databases and use that data to generate more accurate and contextually relevant responses. By leveraging both retrieval and generation, RAG improves the accuracy and reliability of AI models, particularly in providing up-to-date information or handling complex questions.

Workflow

This project provides an AI-based conversational assistant that leverages Retrieval-Augmented Generation (RAG) to extract knowledge from PDF documents. The system combines text embeddings, vector search, and an LLM to provide answers to user questions. Below is a detailed step-by-step workflow of how the application operates:

1. Uploading the PDF Document

Users upload a PDF file through the path mentioning on notebook. The uploaded file is processed to extract the text using pdfplumber, a Python library for extracting text from PDFs.

2. Text Extraction

The Notebook utilizes the pdfplumber library to extract raw text from the uploaded PDF. Each page of the document is parsed, and the resulting text is prepared for further processing.

3. Text Chunking

The extracted text is split into smaller chunks using RecursiveCharacterTextSplitter. This ensures the content is manageable for embeddings and retrieval, typically with a chunk size of 500 characters and an overlap of 50 characters.

4. Embeddings Generation

The chunked text is converted into numerical embeddings using SpacyEmbeddings. These embeddings represent the semantic meaning of the chunks, enabling efficient search.

5. Vector Store with Chroma

A vector database is created using the Chroma library, where the embeddings are stored. The vector database allows fast and efficient retrieval of relevant information based on user queries.

6. Conversational Retrieval Chain

The ConversationalRetrievalChain is established using LangChain, combining the embeddings stored in Chroma with a conversational memory buffer to track chat history and context.

7. LLM Interaction

The Notebook integrates the ChatGoogleGenerativeAI (Google's Gemini LLM) to generate relevant and intelligent responses to the user's questions based on the retrieved chunks of text from the vector store.

8. User Query and AI Response

Users can input their questions about the uploaded PDF document, and the system responds by retrieving the most relevant chunks from the vector store and generating an answer using the LLM. The conversation history is preserved for context.

9. Display of Conversation History

The features an expandable section where users can view the conversation history. This transparency allows users to revisit past queries and responses, fostering a better understanding of the context and flow of the interaction.

RAG Flow in the process

Importance

Efficient Knowledge Retrieval: By leveraging the power of RAG, the system combines retrieval and generation to answer specific questions accurately based on the content of uploaded PDF documents.
Scalability and Flexibility: With text chunking and embeddings, the app can handle large documents while ensuring fast and precise information retrieval.
Conversational AI: The conversation history memory makes the system more interactive, as it keeps track of previous questions and answers, maintaining context over long conversations.
Integration of Modern AI Tools: This project demonstrates the use of advanced tools like Chroma for vector storage, LangChain for conversation management, and Google's Gemini LLM for generating human-like answers.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Images		Images
src		src
LICENSE		LICENSE
RAG_based_AI_Tutor.ipynb		RAG_based_AI_Tutor.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG based Intelligent Conversational AI Agent for Knowledge Extraction using Langchain Gemini LLM

In the above google colab contain detailed code

Workflow

1. Uploading the PDF Document

2. Text Extraction

3. Text Chunking

4. Embeddings Generation

5. Vector Store with Chroma

6. Conversational Retrieval Chain

7. LLM Interaction

8. User Query and AI Response

9. Display of Conversation History

RAG Flow in the process

Importance

About

Languages

License

Pavansomisetty21/RAG-based-Intelligent-Conversational-AI-Agent-for-Knowledge-Extraction-Using-LangChain-Gemini-LLM

Folders and files

Latest commit

History

Repository files navigation

RAG based Intelligent Conversational AI Agent for Knowledge Extraction using Langchain Gemini LLM

In the above google colab contain detailed code

Workflow

1. Uploading the PDF Document

2. Text Extraction

3. Text Chunking

4. Embeddings Generation

5. Vector Store with Chroma

6. Conversational Retrieval Chain

7. LLM Interaction

8. User Query and AI Response

9. Display of Conversation History

RAG Flow in the process

Importance

About

Topics

Resources

License

Stars

Watchers

Forks

Languages