LLM Question generator with any uploaded content by users
An AI-powered web application that automatically generates questions and answers from PDF documents using Large Language Models (LLMs) and vector embeddings. Built with FastAPI and Groq's Llama 3.1 model for fast, accurate Q&A generation.
- PDF Upload Interface: Drag-and-drop or click-to-browse file upload
- AI Question Generation: Automatically generates relevant questions from PDF content
- Smart Answer Generation: Uses RAG (Retrieval Augmented Generation) with FAISS vector store
- CSV Export: Download generated Q&A pairs in CSV format
- Real-time Progress: Loading indicators and status messages
- Fast Processing: Optimized for speed with configurable question limits (default: 10 questions)
- Backend: FastAPI (Python 3.10+)
- LLM: Groq API (Llama 3.1-8b-instant)
- Embeddings: HuggingFace sentence-transformers (all-MiniLM-L6-v2)
- Vector Store: FAISS
- Framework: LangChain
- Frontend: HTML, CSS, Vanilla JavaScript
git clone git@github.com:asmshaon/question-generator.git
cd question-generatorUsing venv:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activateUsing conda:
conda create -n question-generator python=3.10 -y
conda activate question-generatorpip install -r requirements.txtCreate a .env file in the project root:
GROQ_API_KEY=your_groq_api_key_hereGet your Groq API key from: https://console.groq.com/
python app.pyThe server will start at http://localhost:8080
- Open your browser and navigate to
http://localhost:8080 - Upload a PDF file (drag & drop or click to browse)
- Click "Upload PDF" button
- Once uploaded, click "Generate Q&A"
- Wait for processing (typically 30-60 seconds for 10 questions)
- Download the generated CSV file
Edit app.py line 46:
# Generate 10 questions (default)
ques_list = ques_list[:10]
# Generate 20 questions
ques_list = ques_list[:20]
# Generate all questions (remove limit)
# Comment out or delete the lineEdit src/helper.py:
- Temperature: Lines 66, 90 (default: 0.3)
- Chunk sizes: Lines 36, 46
- Model: Lines 65, 89
question-generator/
├── app.py # FastAPI application
├── requirements.txt # Python dependencies
├── setup.py # Package setup
├── .env # Environment variables (create this)
├── .gitignore # Git ignore rules
├── LICENSE # MIT License
├── README.md # This file
├── src/
│ ├── helper.py # LLM pipeline and processing
│ └── prompt.py # Prompt templates
├── templates/
│ └── index.html # Web interface
├── static/
│ ├── docs/ # Uploaded PDFs (gitignored)
│ └── output/ # Generated CSVs (gitignored)
└── data/ # Sample data (optional)
Serves the main web interface
Upload a PDF file
- Request: FormData with
pdf_fileandfilename - Response:
{"msg": "success", "pdf_filename": "path/to/file"}
Generate Q&A from uploaded PDF
- Request: FormData with
pdf_filename - Response:
{"output_file": "path/to/csv"}or{"error": "message"}
The application includes several optimizations:
- Uses "stuff" chain type instead of "refine" (3-5x faster)
- Reduced chunk sizes (5000/800 tokens)
- Lower temperature (0.3) for faster responses
- Limited to 10 questions by default
- Progress indicators in console
Processing time varies based on document size and number of questions.
lsof -ti:8080 | xargs kill -9pip install -r requirements.txtEnsure your .env file exists in the project root with a valid GROQ_API_KEY
Reduce the number of questions in app.py line 46 or decrease chunk sizes in src/helper.py
Key dependencies include:
fastapi- Web frameworkuvicorn- ASGI serverlangchain- LLM orchestrationlangchain-community- LangChain integrationslangchain-groq- Groq LLM providerlangchain-huggingface- HuggingFace embeddingssentence-transformers- Embedding modelsfaiss-cpu- Vector similarity searchpypdf- PDF processingpython-dotenv- Environment management
See requirements.txt for complete list.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Abu Saleh Muhammad Shaon
- Email: srabon.php@gmail.com
- GitHub: @asmshaon
- Groq - Fast LLM inference
- LangChain - LLM orchestration framework
- HuggingFace - Embedding models
- FAISS - Vector similarity search
- FastAPI - Modern web framework
- Support for multiple file formats (DOCX, TXT, etc.)
- Batch processing of multiple files
- Custom prompt templates
- Question difficulty levels
- Export to multiple formats (JSON, Excel, etc.)
- User authentication and history
- API rate limiting and caching