An OpenMRS module that lets clinicians ask natural language questions about a patient's chart and get answers with source citations.
For project background, community discussion, and roadmap, see the wiki project page.
- Java 11+
- OpenMRS Platform 2.8.0+
- Webservices REST module 2.44.0+
- 10GB+ RAM recommended (for LLM inference with the default 8B model)
mvn package
The .omod file is in omod/target/.
Download Llama 3.3 8B (Q4_K_M quantization) in GGUF format (~5GB) from Hugging Face.
Place the .gguf file inside the OpenMRS application data directory (e.g., <openmrs-application-data-directory>/chartsearchai/). Model paths are resolved relative to this directory for security.
Available models:
| Model | RAM Needed | Chat Template | Download |
|---|---|---|---|
| Llama 3.2 3B | ~6GB total | llama3 |
GGUF |
| Llama 3.3 8B (default) | ~10GB total | llama3 |
GGUF |
| Mistral Nemo 12B | ~12GB total | mistral |
GGUF |
Larger models produce more accurate answers with better instruction following. Smaller models use less RAM but may produce lower quality responses. To switch models, update chartsearchai.llm.modelFilePath and chartsearchai.llm.chatTemplate — no rebuild needed.
If embedding pre-filtering is enabled (default), download the all-MiniLM-L6-v2 ONNX model (~90MB) from Hugging Face. You need both model.onnx and vocab.txt from the repository.
Place them alongside the LLM model (e.g., <openmrs-application-data-directory>/chartsearchai/).
Copy the .omod file into the modules folder of the OpenMRS application data directory (e.g., <openmrs-application-data-directory>/modules/). The module will be loaded on the next OpenMRS startup.
Set these global properties in Admin > Settings:
| Property | Description |
|---|---|
chartsearchai.llm.modelFilePath |
Relative path (within the OpenMRS application data directory) to the .gguf model file, e.g. chartsearchai/Llama-3.3-8B-Instruct-Q4_K_M.gguf |
| Property | Default | Description |
|---|---|---|
chartsearchai.embedding.preFilter |
true |
When true, uses embedding similarity to narrow patient records to the most relevant ones before sending to the LLM. Set to false to send the full chart instead |
chartsearchai.embedding.topK |
15 |
Maximum number of records to retrieve via embedding similarity when pre-filtering is enabled |
chartsearchai.embedding.similarityRatio |
0.80 |
Minimum similarity score as a fraction of the top result's score. Records scoring below this ratio are excluded even if within the topK limit. Must be between 0 and 1 |
chartsearchai.embedding.modelFilePath |
— | Required when pre-filtering is enabled. Relative path to the ONNX model file (all-MiniLM-L6-v2), e.g. chartsearchai/all-MiniLM-L6-v2.onnx |
chartsearchai.embedding.vocabFilePath |
— | Required when pre-filtering is enabled. Relative path to the WordPiece vocab.txt file, e.g. chartsearchai/vocab.txt |
| Property | Default | Description |
|---|---|---|
chartsearchai.llm.chatTemplate |
llama3 |
Chat template for formatting prompts. Presets: llama3, mistral, phi3, chatml, gemma. Or a custom template string with {system} and {user} placeholders |
chartsearchai.llm.systemPrompt |
(built-in clinical prompt) | System prompt that guides how the LLM responds — e.g. answering only the question asked, using only the provided patient records, citing records by number, declining to answer when records lack relevant information, keeping answers concise, and returning structured JSON |
chartsearchai.llm.timeoutSeconds |
120 |
Maximum seconds to wait for LLM inference before timing out |
| Property | Default | Description |
|---|---|---|
chartsearchai.rateLimitPerMinute |
10 |
Maximum queries per user per minute. Set to 0 to disable |
chartsearchai.cacheTtlMinutes |
0 |
Minutes to cache identical (patient, question) answers. Set to 0 to disable (default) |
| Property | Default | Description |
|---|---|---|
chartsearchai.auditLogRetentionDays |
90 |
Audit log entries older than this are purged daily. Set to 0 to retain all |
| Privilege | Purpose |
|---|---|
| AI Query Patient Data | Execute chart search queries |
| View AI Audit Logs | Access the audit log endpoint |
When chartsearchai.embedding.preFilter is true (default), patient records are automatically indexed on first chart access. Subsequent data changes trigger automatic re-indexing via AOP hooks on encounter, obs, condition, diagnosis, allergy, order, program enrollment, medication dispense, and patient merge operations. A bulk backfill task ("Chart Search AI - Embedding Backfill") is also available in Admin > Scheduler > Manage Scheduler if you prefer to pre-index all patients at once.
Switching embedding models: The default model is all-MiniLM-L6-v2 (general-purpose, 384 dimensions). Any BERT-based ONNX embedding model can be used as a drop-in replacement by updating chartsearchai.embedding.modelFilePath and chartsearchai.embedding.vocabFilePath. Embedding dimensions are auto-detected from the model output, so models with any dimension size work without code changes. After switching models, existing embeddings are incompatible — run the "Chart Search AI - Embedding Backfill" task to re-index all patients with the new model.
POST /ws/rest/v1/chartsearchai/search
Content-Type: application/json
{
"patient": "patient-uuid-here",
"question": "What medications is this patient on?"
}
Response:
{
"answer": "The patient is currently on Metformin [1] and Lisinopril [3]...",
"disclaimer": "This response is AI-generated and may not be accurate...",
"references": [
{ "index": 3, "resourceType": "order", "resourceId": 789, "date": "2025-03-15" },
{ "index": 1, "resourceType": "order", "resourceId": 456, "date": "2025-01-10" }
]
}For real-time token-by-token streaming:
POST /ws/rest/v1/chartsearchai/search/stream
Content-Type: application/json
Accept: text/event-stream
{
"patient": "patient-uuid-here",
"question": "What medications is this patient on?"
}
SSE events:
| Event | Description |
|---|---|
token |
A chunk of the answer text as it is generated |
done |
Final JSON with the complete answer, references (sorted most recent first, with index, resourceType, resourceId, date), and disclaimer |
error |
Error message if something goes wrong |
Requires the "View AI Audit Logs" privilege.
GET /ws/rest/v1/chartsearchai/auditlog?patient=...&user=...&fromDate=...&toDate=...&startIndex=0&limit=50
All query parameters are optional. fromDate and toDate are epoch milliseconds. Returns paginated results ordered by most recent first, with a totalCount for pagination.
By default, any user with the "AI Query Patient Data" privilege can query any patient. To add patient-level restrictions (e.g., location-based or care-team-based), provide a custom Spring bean that implements the PatientAccessCheck interface:
<bean id="chartSearchAi.patientAccessCheck"
class="com.example.LocationBasedPatientAccessCheck"/>This overrides the default permissive implementation.
See docs/adr.md for architectural decisions and design rationale.
This project is licensed under the MPL 2.0.
Llama 3.3 is licensed under the Llama 3.2 Community License, Copyright (C) Meta Platforms, Inc. All Rights Reserved.