Philter with Audio Transcriptions

The purpose of this repository is to demonstrate a proof-of-concept of how Philter can be used to redacted medical transcriptions generated by a transacription service (i.e. Amazon Transcribe).

Because available samples of medical transcriptions are limited, and because the actual transcription is not the primary focus here, we are using samples of text that has already been transcribed. However, in this repository we provide the plumbing (and architecture ideas) to produce the text transcripts from audio files.

The transcriptions used in this repository were acquired from https://www.mtsamples.com/site/pages/sample.asp?Type=21-Endocrinology&Sample=1134-Diabetes+Mellitus+%2D+SOAP+Note+%2D+2 and then modified to include PII.

Potential AWS Architectures for Redacting Audio Transcriptions

Continuous Transcription Processing

For an environment where new audio narratives are continuously dropped into an S3 bucket and near real-time processing is required:

Put audio narratives in an S3 bucket.
Use a Lambda function to process the audio files and generate transcriptions (which are written back to S3).
Use a Lambda function to process the transcriptions and send them to Philter. Do whatever is required with the redacted text.

[ User/System ]
       |
       v (Upload MP3)
+------------------+
|  S3: Input Bkt   |----(Event: ObjectCreated)----> [ Lambda A: Transcriber ]
+------------------+                                       |
                                                           | (Start Job)
                                                           v
                                                +-----------------------+
                                                |   Amazon Transcribe   |
                                                +-----------+-----------+
                                                            |
       (JSON Output) <--------------------------------------+
             |
             v
+------------------+
| S3: Raw JSON Bkt |----(Event: ObjectCreated)----> [ Lambda B: Redactor ]
+------------------+                                       |
                                                           | (Filter PII)
                                                           v
                                                +-----------------------+
                                                | S3: Final Redacted Bkt|
                                                +-----------------------+

Batch Transcription Processing

For an environment where transcriptions are received in bulk and processed in batches:

Create an AWS Step Functions to process the transcriptions in batches.
Use Amazon EventBridge to trigger the Step Functions when needed (weekly/monthly/etc.).

[ Scheduled Trigger ]
           |
           v
+-----------------------------+
|   Amazon EventBridge        |  <-- Runs once a month
|   (Cron: 0 0 1 * ? *)       |
+--------------+--------------+
               |
               v
+-----------------------------------------------------------+
|                  AWS STEP FUNCTIONS                       |
|  (Distributed Map / Orchestration)                        |
|                                                           |
|  1. List Objects in S3 Input Bucket                       |
|  2. For Each File Found:                                  |
|     +-----------------------------------------------+     |
|     |  [ Choice State ]                             |     |
|     |  Is file extension .mp3? ----(No)----> [Skip] |     |
|     |         |                                     |     |
|     |       (Yes)                                   |     |
|     |         v                                     |     |
|     |  [ Transcribe Task ]                          |     |
|     |  StartTranscriptionJob()                      |     |
|     +---------+-------------------------------------+     |
+---------------|-------------------------------------------+
                |
                v
+------------------------------+      +-------------------------+
|      AMAZON TRANSCRIBE       |      |    AMAZON S3 BUCKETS    |
|                              |      |                         |
|  - Processes MP3s in Batch   +----->| [ Output Bucket ]       |
|  - Manages internal queue    |      | (transcripts.json)      |
+------------------------------+      +-------------------------+

Running the Demo

Clone this repository:

git clone https://github.com/philterd/philter-transcriptions.git

Start the Philter docker containers:

docker compose up

Now you can parse the text transcriptions by sending them to Philter:

./redact.sh

The response will look like the following (trimmed for readability):

{
  "filteredText": "I am asked to see the {{{REDACTED-age}}} patient today with ongoing issues around her diabetic control. Her patient ID is {{{REDACTED-id}}}. We have been fairly aggressively, downwardly adjusting her insulins, both the Lantus insulin, which we had been giving at night as well as her sliding scale Humalog insulin prior to meals. Despite frequent decreases in her insulin regimen, she continues to have somewhat low blood glucoses, most notably in the morning when the glucoses have been in the 70s despite decreasing her Lantus insulin from around 84 units down to 60 units, which is a considerable change. What I cannot explain is why her glucoses have not really climbed at all despite the decrease in insulin. The staff reports to me that her appetite is good and that she is eating as well as ever. I talked to {{{REDACTED-person}}} today. She feels a little fatigued. Otherwise, she is doing well.",
  "piece": 0,
  "context": "none",
  "explanation": {
    "appliedSpans": [
      {
        "characterStart": 22,
        "characterEnd": 33,
        "filterType": "AGE",
        "context": "none",
        "confidence": 0.9,
        "text": "69 year old",
        "replacement": "{{{REDACTED-age}}}"
      },
      {
        "characterStart": 115,
        "characterEnd": 124,
        "filterType": "IDENTIFIER",
        "context": "none",
        "confidence": 0.9,
        "text": "231113490",
        "replacement": "{{{REDACTED-id}}}"
      },
      {
        "characterStart": 801,
        "characterEnd": 805,
        "filterType": "PERSON",
        "context": "none",
        "confidence": 0.9312306046485901,
        "text": "Anna",
        "replacement": "{{{REDACTED-person}}}"
      }
    ],
    "identifiedSpans": [
      {
        "characterStart": 22,
        "characterEnd": 33,
        "filterType": "AGE",
        "context": "none",
        "confidence": 0.9,
        "text": "69 year old",
        "replacement": "{{{REDACTED-age}}}"
      },
      {
        "characterStart": 115,
        "characterEnd": 124,
        "filterType": "IDENTIFIER",
        "context": "none",
        "classification": "custom-identifier",
        "confidence": 0.9,
        "text": "231113490",
        "replacement": "{{{REDACTED-id}}}"
      },
      {
        "characterStart": 801,
        "characterEnd": 805,
        "filterType": "PERSON",
        "context": "none",
        "confidence": 0.9312306046485901,
        "text": "Anna",
        "replacement": "{{{REDACTED-person}}}"
      }
    ]
  },
  "tokens": 153
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
policies		policies
transcribe		transcribe
.gitignore		.gitignore
README.md		README.md
docker-compose.yaml		docker-compose.yaml
philter.properties		philter.properties
redact.sh		redact.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Philter with Audio Transcriptions

Potential AWS Architectures for Redacting Audio Transcriptions

Continuous Transcription Processing

Batch Transcription Processing

Running the Demo

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Philter with Audio Transcriptions

Potential AWS Architectures for Redacting Audio Transcriptions

Continuous Transcription Processing

Batch Transcription Processing

Running the Demo

About

Topics

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages