-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add whisperx support (including diarization) #123
Conversation
An update - the offline issue is fixed in m-bain/whisperX#1021 and 163505e Some more performance measurements:
|
"worker::build": "npm run build --workspace worker; npm run build --workspace worker", | ||
"worker::package": "npm run package --workspace worker", | ||
"worker::start": "AWS_REGION=eu-west-1 STAGE=DEV npm run start --workspace worker", | ||
"worker::start": "APP=transcription-service-gpu-worker AWS_REGION=eu-west-1 STAGE=DEV npm run start --workspace worker", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: add worker-gpu::start, worker-cpu::start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
name: `${mediaDownloadApp}-temp-volume`, | ||
}; | ||
mediaDownloadTask.taskDefinition.addVolume(downloadVolume); | ||
mediaDownloadTask.taskDefinition.addVolume(tempVolume); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo - pull this out into a different PR as it's unrelated whisperX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👏
Thanks for the walkthrough
@@ -7,18 +7,18 @@ | |||
"prettier:check": "prettier . --check", | |||
"prettier:fix": "prettier . --write", | |||
"api::build": "npm run build --workspace api", | |||
"api::start": "AWS_REGION=eu-west-1 STAGE=DEV npm run start --workspace api", | |||
"api::start": "APP=api AWS_REGION=eu-west-1 STAGE=DEV npm run start --workspace api", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking: it looks like we only look at config.app.app
in the worker so might be simpler to put an env variable like GPU=true or processor type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went for APP because we are already tagging the instances with APP - it's just the start script that needs changing, whereas with a new variable I'd need to tag the instances accordingly
machineImage: MachineImage.genericLinux({ | ||
'eu-west-1': workerAmi.valueAsString, | ||
}), | ||
instanceType: InstanceType.of(InstanceClass.C7G, InstanceSize.XLARGE4), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be a little easier to follow if this object only holds the props in common between the two launch templates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might not be worth the effort now if we get rid of non-gpu pipeline soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I was just being lazy. Resolved in c34debc
packages/worker/src/transcribe.ts
Outdated
const metadata = extractWhisperXStderrData(result.stderr); | ||
logger.info('Whisper finished successfully', metadata); | ||
return { | ||
fileName: `${fileName}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fileName: `${fileName}`, | |
fileName, |
return Promise.resolve('auto'); | ||
} | ||
const dlParams = whisperParams(true, whisperBaseParams.wavPath); | ||
const { metadata } = await runWhisper(whisperBaseParams, dlParams); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was the case before your change, but are we running whisper twice (or thrice if we're translating)? once here and once in runTranscription
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth adding a comment to explain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added quite a bit of extra documentation here, it's confusing as this bit of code is only used by Giant 68967d9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
whisperX: boolean, | ||
): Promise<LanguageCode> => { | ||
if (whisperX) { | ||
return Promise.resolve('auto'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does whisperx not take a language arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whisperx takes so long to start up I didn't think there was any point running a pre-pass of language detection and decided to just detect it every time
@@ -0,0 +1,136 @@ | |||
import torchaudio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth adding a comment linking to amigo role where this is used
…ainer on cpu worker
e63057a
to
93177c8
Compare
What does this change?
This adds support for using https://github.com/m-bain/whisperX on the transcription workers.
The main benefit of whisperx is that it supports diarization, or speaker recognition. This is a frequently requested feature of the transcription tool.
The downside of whisperx is that it needs to be run on a GPU instance. So far I've been running it on a g5.xlarge instance. These instances are roughly 2x the price of the c7g.4xlarge instances we've been using to run whisper.cpp on. I think we might get some savings moving to a g4dn.2xlarge instance, but performance testing would be required for that. See here for the cost figures. I think that the cost increase may be a little more than 2x, as I imagine that GPU instances are less available on the spot market.
Update: Performance is actually better on a g4dn.2xlarge than on the g5.xlarge instance, and 25% cheaper at $0.75/hr - so only a 30% increase over our existing instances (the spot issue probably remains)
As well as cost, the other downside of whisper x is that getting it running (and using the gpu) is a right ole faff. To get it working I needed to:
I'm not yet certain that whisperx has sufficient performance improvements such that we'll want to use it for all transcripts - it may be we only want to use it where the user has requested diarization. With that in mind, I've set whisperx up as a totally separate pipeline, with its own SQS queue and autoscaling group. Currently the use of whisperx (and the gpu instances) is controlled by a parameter store parameter, and is enabled for DEV/CODE and disabled for PROD.
I decided to install whisperx onto the base AMI rather than using a containerised version as we are doing with whisper.cpp. Whilst you can fairly easily allow docker containers to make use of the GPU, I didn't see much benefit of adding an extra layer of virtualisation. This change has lead to some awkwardness as 'useWhisperX' also means "don't use whisper.cpp container for ffmpeg or transcription' - it will be good if whisperX proves superioir across the board so that we can remove all this conditional logic.
Reviewing this PR
The most interesting changes are to the worker app - everything else is just wiring really. Even the worker app is now just running whisperx instead of whispercpp.
How to test
I've tested this on CODE using an AMI baked on amigo CODE - I'll need to get guardian/amigo#1607 merged before we can get a PROD AMI.
How can we measure success?
Happy users thanks to the new diarization feature!
Current performance tests:
I don't think diarization adds much to the time.
I think the performance boost would be much higher if we could pre-warm the gpu instance somehow, but need to do more research there.