MediSOAP: MediSOAP: Enhanced Clinical Note Generation with Fine-Tuned Llama2

Project Overview

This project involves fine-tuning the Llama2-7B model from scratch using LoRA and QLoRA techniques. The goal is to generate structured SOAP (Subjective, Objective, Assessment, Plan) notes from patient-doctor conversations. The dataset used for training comprises transcribed medical dialogues that follow the SOAP note format.

Introduction

SOAP notes are a method of documentation employed by healthcare providers to write out notes in a patient's chart, along with other common formats. This project automates the generation of SOAP notes from patient-doctor conversations using a fine-tuned Llama2-7B model. The model leverages Low-Rank Adaptation (LoRA) and Quantized LoRA (QLoRA) for efficient training.

Prerequisites

Python 3.11 or higher
PyTorch 1.10.0 or higher
CUDA 10.2 or higher (for GPU support)

Installation

Clone the repository:

git clone https://github.com/aman-17/MediSOAP.git
cd MediSOAP

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Dataset

The dataset used for this project is a collection of patient-doctor conversation transcripts formatted into SOAP notes. The dataset must be preprocessed into the required format before training.

To preprocess your custom dataset, follow the format of train.jsonl, then:

Place your raw data files in the data/ directory.
Run the preprocessing script:
```
python data_preprocessing.py
```

Fine-Tuning Process

Fine-tuning involves adapting the pre-trained Llama2-7B and phi2 model to our specific task using LoRA technique.

Steps:

Data Preparation: Ensure your preprocessed data is in the data/ directory.
Training: Run the training script:
```
python train_phi2.py
```

Evaluation

Evaluate the model's performance on a test dataset:

python evaluate.py --model-path path/to/fine-tuned-model --test-data path/to/test-data

Metrics such as BLEU, ROUGE, and accuracy can be used to assess the model's performance.

Usage

To generate SOAP notes from new patient-doctor conversations, use the inference script:

python generate.py --model-path path/to/fine-tuned-model --input path/to/conversation.txt

The output will be a structured SOAP note based on the input conversation.

Results

Summarize the results obtained from the model's performance on the test dataset, including key metrics and example outputs.

Contributing

We welcome contributions from the community. To contribute, please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Make your changes.
Commit your changes (git commit -m 'Add new feature').
Push to the branch (git push origin feature-branch).
Create a new Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Feel free to update this README with additional details as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
data_preprocessing.py		data_preprocessing.py
evaluation.py		evaluation.py
lora_scratch.py		lora_scratch.py
rag.ipynb		rag.ipynb
train_llama2.py		train_llama2.py
train_phi2.py		train_phi2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MediSOAP: MediSOAP: Enhanced Clinical Note Generation with Fine-Tuned Llama2

Project Overview

Table of Contents

Introduction

Prerequisites

Installation

Dataset

Fine-Tuning Process

Steps:

Evaluation

Usage

Results

Contributing

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Releases

Packages

Languages

aman-17/MediSOAP

Folders and files

Latest commit

History

Repository files navigation

MediSOAP: MediSOAP: Enhanced Clinical Note Generation with Fine-Tuned Llama2

Project Overview

Table of Contents

Introduction

Prerequisites

Installation

Dataset

Fine-Tuning Process

Steps:

Evaluation

Usage

Results

Contributing

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages