Ai powered behaviour profile chamoth #363

Chamoth · 2025-09-21T05:45:46Z

Summary

This PR delivers a complete end-to-end pipeline for the Guardian Monitor AI project, focusing on generating synthetic patient data and developing a Transformer-based summarization model. It establishes the foundation for downstream clinical NLP tasks such as summarization, entity extraction, and pattern discovery.

Key Changes

Synthetic Data Generator

Implemented a Python-based data generation script to create realistic, privacy-safe patient records.
Each record includes demographics, vitals, medications, ADLs, nursing notes, behavioural tags, and alerts.
Dual output formats:
- CSV for statistical analysis and quick validation.
- JSON for hierarchical clinical context preservation.
Added checks to validate data realism, including:
- Range checks on vitals and ADLs.
- Flagging inconsistent medication compliance states.

Data Preprocessing Pipeline

Cleaned and normalized dataset with imputation for missing values.
Standardized input format with lightweight tags.
Added task-specific prefixes (summarize nursing note:) for instruction tuning.
Implemented 90/10 train-validation split with adaptive token length caps:
Max Input Tokens: 231
Max Target Tokens: 80

Model Development

Integrated FLAN-T5-Large with LoRA adapters (2.28% trainable params) for efficient fine-tuning.
Enabled 4-bit quantization (NF4) for memory optimization and training on limited GPU resources.
Targeted key modules for fine-tuning: ["q", "k", "v", "o", "wi_0", "wi_1", "wo"].

Training & Hyperparameters

Epochs: 3 with early stopping (patience = 2)
Batch Size: 4 (gradient accumulation = 2 → effective batch = 8)
Optimizer: AdamW, LR = 2e-4 with warmup ratio of 0.06
Label Smoothing: 0.1 for stable training
Beam search with 4 beams during evaluation

Evaluation Results

Final Validation Metrics:
ROUGE-1: 68.25%
ROUGE-2: 48.63%
ROUGE-L: 59.07%
Consistent improvement in training and validation loss with no signs of overfitting.
Example qualitative output shows effective summarization with minor hallucinations to be addressed in future iterations.

Chamoth added 3 commits September 21, 2025 03:33

Added the Transformer based model and the dataset

ba2bfb7

Updated the Transformer based model.

843c421

Added the Dataset Generation Code And the Model Documentation

3d43bc3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ai powered behaviour profile chamoth #363

Ai powered behaviour profile chamoth #363

Uh oh!

Chamoth commented Sep 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ai powered behaviour profile chamoth #363

Are you sure you want to change the base?

Ai powered behaviour profile chamoth #363

Uh oh!

Conversation

Chamoth commented Sep 21, 2025

Summary

Key Changes

Synthetic Data Generator

Data Preprocessing Pipeline

Model Development

Training & Hyperparameters

Evaluation Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant