DocScribe: Medical QA Chatbot

Introduction

DocScribe is a medical question-and-answer chatbot that revolutionizes how we interact with medical data. DocScribe provides quick, accurate responses to both general medical inquiries and patient-specific questions. Our main goal is to enhance medical report accessibility and comprehension. DocScribe achieves this through:

Transcription of medical reports into a more accessible format.
Providing answers to general medical questions.
Summarizing reports from multiple visits.
Identifying relevant information from patient history.

Architecture

DocScribe's architecture facilitates a seamless interaction between users and medical data. It incorporates:

A Jupyter Notebook-based Web UI for uploading reports and further interaction.
The creation of embeddings and indexes from medical transcripts.
The utilization of LangChain and HuggingFace frameworks for processing and answering questions.

Data Sources

Our project leverages a diverse range of data sources to train our model, including:

Medical Transcripts: We leveraged the GPT-3.5 model to generate 4.5k QA prompts from the medical transcripts. MTSamples
WikiDoc: Visit WikiDoc
WikiPatient: Visit WikiPatient

Sample Data

Dataset	Instruction	Input	Output
Wikidoc	Answer this question truthfully	Can you provide an overview of the lung's squamous cell carcinoma?	Squamous cell carcinoma of the lung may be classified according to the WHO histological classification system into 4 main types: papillary, clear cell, small cell, and basaloid.
WikiPatient	Answer this question truthfully	When to seek urgent medical care when I have Alstrom syndrome?	Call your healthcare provider if you or your child have symptoms of diabetes such as increased thirst and urination. Seek medical attention promptly if you think that your child cannot see or hear normally.
MT Samples	Based on the given medical transcript generate prompt and answer to train LLM	What was the patient's preoperative diagnosis?	The patient's preoperative diagnosis was prostate cancer.

Modeling Approach

We opted for the Vicuna-13B model, fine-tuned with LoRA, PEFT, and bitsandbytes. Our approach was validated through rigorous testing and has shown promising results in interpreting medical data.

Sample Results

Results & Future Work

DocScribe has shown remarkable proficiency in processing medical queries and summarizing patient reports. Future directions include expanding the model's training on medical corpuses, incorporating medical image analysis, and exploring its application in clinical research.

Installation

git clone https://github.com/kmnis/DocScribe.git
cd DocScribe
pip install -r requirements.txt

# Start the jupyter server by running
jupyter notebook

# Open your browser and open http://localhost:8888/inference and open a notebook

Team

Manish Kumar (mnis@uchicago.edu)
Kargil Thakur (kargil@uchicago.edu)
Ekansh Trivedi (ekansh@uchicago.edu)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets/images		assets/images
data		data
inference		inference
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocScribe: Medical QA Chatbot

Introduction

Architecture

Data Sources

Sample Data

Modeling Approach

Sample Results

Results & Future Work

Installation

Team

📈 Star History

About

Releases

Packages

Contributors 4

Languages

License

kmnis/DocScribe

Folders and files

Latest commit

History

Repository files navigation

DocScribe: Medical QA Chatbot

Introduction

Architecture

Data Sources

Sample Data

Modeling Approach

Sample Results

Results & Future Work

Installation

Team

📈 Star History

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages