The AI Research Paper Assistant is a Retrieval-Augmented Generation (RAG) application that enables users to interact with AI research papers. Users can upload a research paper, ask domain-specific questions, and receive context-aware answers generated using LLMs.
π Key Features:
- Upload research papers in PDF format.
- Semantic search for retrieving relevant sections.
- Summarization of key paper sections.
- Interactive Q&A based on user queries.
- Citation assistance for proper referencing.
Understanding complex AI research papers can be challenging. This project aims to simplify knowledge extraction by enabling users to ask questions and receive relevant, summarized insights.
- Text Extraction:
PyPDF2
,PDFMiner
,Tesseract (OCR)
- Vectorization:
Sentence Transformers (all-MiniLM-L6-v2)
- Vector Database:
Pinecone
,FAISS
, orWeaviate
- LLM Integration:
OpenAI GPT-4
,Flan-T5
- Frontend:
Streamlit
,Flask
, orFastAPI
- Deployment:
Streamlit Cloud
,Hugging Face Spaces
,AWS
-
Upload & Process Paper π
- Extract text while preserving section hierarchy.
- Handle multi-column layouts and citations.
-
Preprocessing & Embedding Creation π’
- Chunk text (200-500 words per segment).
- Convert text into embeddings using
Sentence Transformers
. - Store embeddings in a vector database.
-
Query Processing & Response Generation π―
- User submits a query.
- Perform semantic search to fetch relevant chunks.
- Use an LLM (GPT-4 or Flan-T5) to generate answers based on retrieved text.
-
Frontend Interaction π
- Users upload papers, enter queries, and view AI-generated responses.
Query: "What is the main contribution of the paper?"
Processing: Retrieve key sections (Abstract
, Conclusion
) & generate response.
Output: "The main contribution of this paper is the introduction of a novel transformer-based architecture that improves efficiency by 25% while maintaining state-of-the-art performance."
# Clone the repository
git clone https://github.com/your-repo/ai-research-assistant.git
cd ai-research-assistant
# Install dependencies
pip install -r requirements.txt
# Run the application
streamlit run app.py
- Support for multi-paper retrieval π
- Integration with Google Scholar API for external references π
- More fine-tuned LLM models for improved accuracy π€
This project is licensed under the MIT License.
Contributions are welcome! Feel free to open issues or submit pull requests.
π Star this repo if you find it useful!