The purpose of this repository is to demonstrate a proof-of-concept of how Philter can be used to redacted medical transcriptions generated by a transacription service (i.e. Amazon Transcribe).
Because available samples of medical transcriptions are limited, and because the actual transcription is not the primary focus here, we are using samples of text that has already been transcribed. However, in this repository we provide the plumbing (and architecture ideas) to produce the text transcripts from audio files.
The transcriptions used in this repository were acquired from https://www.mtsamples.com/site/pages/sample.asp?Type=21-Endocrinology&Sample=1134-Diabetes+Mellitus+%2D+SOAP+Note+%2D+2 and then modified to include PII.
For an environment where new audio narratives are continuously dropped into an S3 bucket and near real-time processing is required:
- Put audio narratives in an S3 bucket.
- Use a Lambda function to process the audio files and generate transcriptions (which are written back to S3).
- Use a Lambda function to process the transcriptions and send them to Philter. Do whatever is required with the redacted text.
[ User/System ]
|
v (Upload MP3)
+------------------+
| S3: Input Bkt |----(Event: ObjectCreated)----> [ Lambda A: Transcriber ]
+------------------+ |
| (Start Job)
v
+-----------------------+
| Amazon Transcribe |
+-----------+-----------+
|
(JSON Output) <--------------------------------------+
|
v
+------------------+
| S3: Raw JSON Bkt |----(Event: ObjectCreated)----> [ Lambda B: Redactor ]
+------------------+ |
| (Filter PII)
v
+-----------------------+
| S3: Final Redacted Bkt|
+-----------------------+
For an environment where transcriptions are received in bulk and processed in batches:
- Create an AWS Step Functions to process the transcriptions in batches.
- Use Amazon EventBridge to trigger the Step Functions when needed (weekly/monthly/etc.).
[ Scheduled Trigger ]
|
v
+-----------------------------+
| Amazon EventBridge | <-- Runs once a month
| (Cron: 0 0 1 * ? *) |
+--------------+--------------+
|
v
+-----------------------------------------------------------+
| AWS STEP FUNCTIONS |
| (Distributed Map / Orchestration) |
| |
| 1. List Objects in S3 Input Bucket |
| 2. For Each File Found: |
| +-----------------------------------------------+ |
| | [ Choice State ] | |
| | Is file extension .mp3? ----(No)----> [Skip] | |
| | | | |
| | (Yes) | |
| | v | |
| | [ Transcribe Task ] | |
| | StartTranscriptionJob() | |
| +---------+-------------------------------------+ |
+---------------|-------------------------------------------+
|
v
+------------------------------+ +-------------------------+
| AMAZON TRANSCRIBE | | AMAZON S3 BUCKETS |
| | | |
| - Processes MP3s in Batch +----->| [ Output Bucket ] |
| - Manages internal queue | | (transcripts.json) |
+------------------------------+ +-------------------------+
Clone this repository:
git clone https://github.com/philterd/philter-transcriptions.git
Start the Philter docker containers:
docker compose up
Now you can parse the text transcriptions by sending them to Philter:
./redact.sh
The response will look like the following (trimmed for readability):
{
"filteredText": "I am asked to see the {{{REDACTED-age}}} patient today with ongoing issues around her diabetic control. Her patient ID is {{{REDACTED-id}}}. We have been fairly aggressively, downwardly adjusting her insulins, both the Lantus insulin, which we had been giving at night as well as her sliding scale Humalog insulin prior to meals. Despite frequent decreases in her insulin regimen, she continues to have somewhat low blood glucoses, most notably in the morning when the glucoses have been in the 70s despite decreasing her Lantus insulin from around 84 units down to 60 units, which is a considerable change. What I cannot explain is why her glucoses have not really climbed at all despite the decrease in insulin. The staff reports to me that her appetite is good and that she is eating as well as ever. I talked to {{{REDACTED-person}}} today. She feels a little fatigued. Otherwise, she is doing well.",
"piece": 0,
"context": "none",
"explanation": {
"appliedSpans": [
{
"characterStart": 22,
"characterEnd": 33,
"filterType": "AGE",
"context": "none",
"confidence": 0.9,
"text": "69 year old",
"replacement": "{{{REDACTED-age}}}"
},
{
"characterStart": 115,
"characterEnd": 124,
"filterType": "IDENTIFIER",
"context": "none",
"confidence": 0.9,
"text": "231113490",
"replacement": "{{{REDACTED-id}}}"
},
{
"characterStart": 801,
"characterEnd": 805,
"filterType": "PERSON",
"context": "none",
"confidence": 0.9312306046485901,
"text": "Anna",
"replacement": "{{{REDACTED-person}}}"
}
],
"identifiedSpans": [
{
"characterStart": 22,
"characterEnd": 33,
"filterType": "AGE",
"context": "none",
"confidence": 0.9,
"text": "69 year old",
"replacement": "{{{REDACTED-age}}}"
},
{
"characterStart": 115,
"characterEnd": 124,
"filterType": "IDENTIFIER",
"context": "none",
"classification": "custom-identifier",
"confidence": 0.9,
"text": "231113490",
"replacement": "{{{REDACTED-id}}}"
},
{
"characterStart": 801,
"characterEnd": 805,
"filterType": "PERSON",
"context": "none",
"confidence": 0.9312306046485901,
"text": "Anna",
"replacement": "{{{REDACTED-person}}}"
}
]
},
"tokens": 153
}