Whisper in Sagemaker

Repo for experiments on speech transcription.

Refer here for test file descriptions.

All tests done on a g4dn.xlarge EC2 instance (16GB RAM, T4 GPU with 16GM VRAM).

Model	Speed	Peak VRAM	Config	Test File
openai/whisper-large-v3	31.5x	9792MB	fp16, batch=24, SDPA	1hour.flac
distil-whisper/distil-large-v3	87.1x	2667MB	fp16, batch=24, SDPA	1hour.flac
distil-whisper/distil-large-v3	62.0x		fp16, batch=24	1hour.flac
openai/whisper-large-v3-turbo	81.8x	2772MB	fp16, batch=24, SDPA	1hour.flac
openai/whisper-large-v3-turbo	31.4x	2772MB	fp16, batch=None, SDPA	1hour.flac
nvidia/canary-1b		OOM	vanilla	1hour.flac
Systran/faster-whisper-large-v3	10.5x		batch=None, beam_size=1	1hour.flac
Systran/faster-whisper-large-v3	51.8x		batch=24, beam_size=1	1hour.flac
Systran/faster-whisper-large-v2	50.9x	10GB	batch=16, beam_size=1	1hour.flac
Systran/faster-whisper-large-v2	34x	11GB	batch=16, beam_size=5	1hour.flac
Systran/faster-whisper-large-v2		OOM	batch=16, beam_size=5	long.flac
Systran/faster-whisper-large-v2	53.5x	14GB	batch=12, beam_size=5	long.flac
Systran/faster-whisper-large-v2	50.9x	13GB	batch=8, beam_size=5	long.flac
Systran/faster-whisper-large-v2	49x	12GB	batch=6, beam_size=5	long.flac
Systran/faster-whisper-large-v2	49x	~11GB	batch=6, beam_size=5, int8_float16	long.flac
Systran/faster-whisper-large-v2	11.9x	6.8GB	batch=None, beam_size=5	1hour.flac
Systran/faster-whisper-large-v2	18.3x	6.8GB	batch=None, beam_size=1	1hour.flac
Systran/faster-distil-whisper-large-v3	89.4x		batch=24, beam_size=1	1hour.flac
Systran/faster-distil-whisper-large-v3	83.7x		batch=24, beam_size=5	1hour.flac
deepdml/faster-whisper-large-v3-turbo-ct2	21.8x	~4700MB	batch=None, beam_size=5	1hour.flac
deepdml/faster-whisper-large-v3-turbo-ct2	76.2x	~10237MB	batch=24, beam_size=5	1hour.flac

Setup

You will need a GPU (e.g. an EC2 instance with a T4 GPU like ml.g4dn.xlarge, $0.70/hour), with the drivers installed. Installing the CUDA toolkit is unnecessary as PyTorch bundles its own CUDA runtime.

Note: Presently, ctranslate2>=4.5.0 is not able to locate and use the CuDNN v9 libraries included with Pytorch 2.*.*+cu124. You have to point LD_LIBRARY_PATH to the Pytorch CuDNN libraries at .venv/lib/python3.12/site-packages/nvidia/cudnn/lib, or to your system CuDNN installation.

Create and sync the virtual environment with uv sync, then activate it with source .venv/bin/activate.

Installing a different version of Pytorch

The steps above will install the latest version of Pytorch, which is generally compatible with the latest CUDA drivers. If for some reason, you need to run an older version of Pytorch (e.g., if you are unable to upgrade your GPU drivers to support a later CUDA runtime version), follow these instructions.

Note: Pytorch ships with its own CUDA runtime API versions, which will work as long as they are supported by the version of the driver.

To install Pytorch with a specific CUDA runtime version, edit your pyproject.toml file. For example, to install Pytorch compiled against CUDA runtime version 12.1:

[[tool.uv.index]]
url = "https://download.pytorch.org/whl/cu121"
name = "pytorch-cu121"
explicit = true

[tool.uv.sources]
torch = { index = "pytorch-cu121" }
torchvision = { index = "pytorch-cu121" }
torchaudio = { index = "pytorch-cu121" }

Then run uv add torch==2.5.1+cu121 torchvision torchaudio. The correct versions of torchvision and torchaudio will be installed automatically.

Similarly, for the CUDA 12.4 runtime, modify the index urls in pyproject.toml accordingly and then run uv add torch==2.5.1+cu124 torchvision torchaudio.

Benchmark

from transcription_benchmarks import bench, BenchArgs
res = bench(
    "openai/whisper-tiny.en", BenchArgs(test_file="short.flac", device="cpu")
)
res.get_text(with_timestamps=True)
# [0:00:00 -> 0:00:04]  Ladies and gentlemen, thank you for being here and for your written representations.
# [0:00:05 -> 0:00:09]  You know what the purpose of this Select Committee is.
)

Testing

coverage run --branch -m pytest && coverage html

Notes

nvidia-smi and CUDA runtime versions

nvidia-smi does not show does not show the CUDA runtime API version (what is commonly and confusingly referred to as 'CUDA version'). Instead, it shows the CUDA driver API of the installed driver (e.g. 535.182.01 ships with CUDA driver API 12.2 - full table here).

Quality Comparison

Systran/faster-whisper-large-v3 translates medical terms better than Systran/faster-whisper-large-v2, however repetition is much more frequent.
distil models are all bad at multilingual speech.
faster-whisper performs better than the native Transformers library.
The best performing model for noisy/multilingual environments is Systran/faster-whisper-large/v2, presumably because of many enhancements by faster-whisper made to filter out low probability segments (I cannot replicate the exact parameters, e.g. the output of both are different even with batch_size=None and num_beams=5).
The output of faster-whisper is only deterministic if temperature=0. Otherwise, the default parameters use repeatedly higher temperatures when repetition is detected.

FAQ

Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so} Invalid handle. Cannot load symbol cudnnCreateTensorDescriptor

This is caused by ctranslate>=4.5.0 not being able to locate the CuDNN v9 libraries. There is a known incompatibility between Pytorch and CTranslate2 versions currently.

If you are running torch 2.*.*+cu121 pin ctranslate2==4.4.0.
If you are running torch 2.*.*+cu124 or later, follow the instructions in the Setup section. to point LD_LIBRARY_PATH to Pytorch's bundled CuDNN library.

Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory

Similar to the above, however this time it is caused by ctranslate <=4.4.0 not being able to locate CuDNN v8. You can use those bundled with Pytorch 2.*.*+cu121 by adding path/to/your/venv/lib/python3.xx/site-packages/nvidia/cudnn/lib to LD_LIBRARY_PATH.

ImportError: /home/ubuntu/transcription-benchmarks/.venv/lib/python3.12/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12

Pytorch is not using its bundled version of the CUDA Runtime libraries, and is instead using those on your system, which are incompatible.

Either uninstall CuDNN and CUDA, or unset the environment variable LD_LIBRARY_PATH and ensure that scripts setting it in e.g. ~/.bashrc or /etc/profile.d are removed.

RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH

PyTorch is using your system's CuDNN library and not its bundled one. Confirm this with ldconfig -p | grep libcudnn.

Either point LD_LIBRARY_PATH to PyTorch's bundled version, or uninstall CuDNN and CUDA from your system.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
.vscode		.vscode
src/transcription_benchmarks		src/transcription_benchmarks
test_audio		test_audio
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
free_space.sh		free_space.sh
pyproject.toml		pyproject.toml
renovate.json		renovate.json
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper in Sagemaker

Setup

Benchmark

Testing

Notes

nvidia-smi and CUDA runtime versions

Quality Comparison

FAQ

Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so} Invalid handle. Cannot load symbol cudnnCreateTensorDescriptor

Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory

ImportError: /home/ubuntu/transcription-benchmarks/.venv/lib/python3.12/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12

RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH

About

Releases

Packages

Contributors 2

Languages

nuhs-projects/transcription-benchmarks

Folders and files

Latest commit

History

Repository files navigation

Whisper in Sagemaker

Setup

Benchmark

Testing

Notes

nvidia-smi and CUDA runtime versions

Quality Comparison

FAQ

Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so} Invalid handle. Cannot load symbol cudnnCreateTensorDescriptor

Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory

ImportError: /home/ubuntu/transcription-benchmarks/.venv/lib/python3.12/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12

RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages