Talk2PDFs is a web application that lets users interact with PDF documents through a chatbot. Users can upload PDFs or provide URLs, and the chatbot will use the extracted content to answer questions.
-
Python
-
Streamlit - For the web interface.
-
Ollama - Using the
llama3.2
model for language processing. -
Visual Studio Build Tools - Required for compiling dependencies like ChromaDB.
- Download and install Visual Studio Build Tools.
- During installation, make sure to select the C++ build tools workload.
- After installation, use pip to install ChromaDB
- PDF Upload/URL Input: Upload PDFs or provide URLs to process and extract text.
- Text Extraction: Extracts and processes text from PDFs for interaction.
- Chatbot Interaction: Ask questions related to the uploaded PDFs and get responses.
- Text Preview: View a snippet of the extracted text before asking questions.
- Real-Time Responses: Quickly get answers based on the content of the documents.
- Integration with Vector Database: Uses ChromaDB for efficient document retrieval.
- Session Memory: The chatbot retains previous interactions during a session for continuity.
- Chat History: Keeps a log of the session’s conversations.
You can set up the project using the provided setup.py
file. This will automatically install the required dependencies listed in requirements.txt
.
- Make sure you have a
requirements.txt
file with the necessary packages. - Run the following command to install the package:
If you prefer to set up manually:
-
Install Streamlit, Langchain, Langchain Community and ChromaDB:
pip install streamlit langchain langchain_community chromadb
-
Running:
streamlit run application.py
-
For running Ollama (LLM):
ollama run llama3.2
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
- On Windows:
venv\Scripts\activate
- On Windows:
source venv/bin/activate