Skip to content

A modular Python project that uses LangChain and Groq LLMs to extract and process information from PDFs, text files, directories, and web pages using dedicated loaders and prompt templates.

Notifications You must be signed in to change notification settings

Nitesh-lng/Data-Loader-LLM

Repository files navigation

Data Loader LLM 🚀

This project is a lightweight, modular pipeline for extracting and processing data from various sources like PDFs, text files, directories, and web pages using LangChain and Groq's LLMs.

🔧 Features

  • ✅ PDF data extraction with PyPDFLoader
  • ✅ Directory-wise PDF processing using DirectoryLoader
  • ✅ Raw .txt file summarization
  • ✅ Web scraping + LLM-based question answering
  • ✅ Uses ChatGroq with DeepSeek or LLaMA-3 models

🧱 Directory Structure

.
├── dataloader/
│   ├── directory_loader.py       # Load multiple PDFs from a folder
│   ├── pypdf_loader.py           # Load and query a single PDF
│   ├── text_loader.py            # Summarize .txt files
│   ├── webbase_loader.py         # Extract info from websites
│   ├── extra.py                  # (Optional utility file)
│   ├── text.txt                  # Sample text file
│   ├── data.pdf                  # Sample PDF
│   └── .env                      # Stores your GROQ_API_KEY

📦 Requirements

Install dependencies via:

pip install -r requirements.txt

requirements.txt

langchain-groq
groq
python-dotenv
langchain_community
pypdf
bs4

🔑 Setup

  1. Create a .env file:

    GROQ_API_KEY=your_groq_api_key_here
  2. Run any of the scripts as needed:

    python pypdf_loader.py
    python webbase_loader.py
    python text_loader.py
    python directory_loader.py

💡 Prompt Examples

  • PDF: Tell me all the education institute names of the person
  • Web: Name of the darkest coffee
  • Text: Summarize the following text

📄 License

This project is licensed under the MIT License.


Author: Nitesh Kumar Singh
Built with ❤️ using LangChain, Groq, and Python

About

A modular Python project that uses LangChain and Groq LLMs to extract and process information from PDFs, text files, directories, and web pages using dedicated loaders and prompt templates.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages