MediPedia is an AI-powered application that provides accurate medical information based on trusted medical resources. It uses Retrieval Augmented Generation (RAG) to answer medical questions by searching through a database of medical knowledge.
- Medical Q&A: Ask any medical question and get accurate information sourced from medical literature
- Source Attribution: Responses include references to the source documents and page numbers
- Chat Interface: User-friendly chat interface built with Streamlit
- Secure Knowledge Base: Information is retrieved from trusted medical resources
- Frontend: Streamlit for the interactive web interface
- NLP & ML:
- LangChain for orchestrating LLM workflows
- Hugging Face for embeddings and the LLM model (Mistral-7B)
- FAISS for efficient vector search
- Document Processing:
- LangChain for document loading and text splitting
- PDF processing for medical documents
medipedia/
├── data/ # Medical PDF documents
│ └── GaleEncyclopediaOfMedicine.pdf
├── vector_store/ # Vector embeddings database
│ └── db_faiss/ # FAISS vector store
├── create_memory_for_llm.py # Script to create embeddings from PDFs
├── connect_memory_with_llm.py # Script to test LLM with vector store
├── medipedia.py # Main Streamlit application
├── requirements.txt # Dependencies
└── README.md # Project documentation
- Data Ingestion: Medical PDFs are loaded and processed into chunks
- Embedding Generation: Text chunks are converted into vector embeddings
- Knowledge Storage: Embeddings are stored in a FAISS vector database
- User Query: User enters a medical question through the Streamlit interface
- Semantic Search: The system finds relevant information from the knowledge base
- Response Generation: The LLM generates a comprehensive answer based on retrieved context
- Source Attribution: References to source documents are provided for transparency
- Clone the repository:
git clone https://github.com/yourusername/medipedia.git
cd medipedia- Create a virtual environment and activate it:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up your Hugging Face API token:
Create a
.envfile in the project root with your Hugging Face API token:
HF_TOKEN=your_huggingface_token_here
-
Place your medical PDFs in the
data/directory. -
Generate the vector database:
python create_memory_for_llm.pyStart the Streamlit application:
streamlit run medipedia.pyThe application will be available at http://localhost:8501
- Add more medical resources to expand the knowledge base
- Implement multi-modal support for medical images
- Add user authentication for personalized medical information
- Implement search history and favorite responses
- Optimize for mobile devices
This project is licensed under the MIT License - see the LICENSE file for details.
- The Gale Encyclopedia of Medicine for the medical knowledge base
- Hugging Face for providing access to state-of-the-art NLP models
- LangChain for the powerful RAG framework
This project was created as a demonstration of building AI-powered knowledge systems using retrieval augmented generation techniques. It showcases skills in natural language processing, vector embeddings, and building interactive AI applications.
Note: This application is for educational purposes only and should not be used as a substitute for professional medical advice, diagnosis, or treatment.
