🌋 LavaSR

LavaSR is a lightweight and high quality speech enhancement model that enhances low quality audio with noise into clean crisp audio with speeds reaching roughly 5000x realtime on GPU and over 60x realtime on CPU.

LavaSR v2 just released: Massive increase in quality and speed, surpassing 6gb slow diffusion models. Check it out!

lavasr_v2_cropped.mp4

Main features

Extremely fast: Reaches speeds over 5000x realtime on GPUs and 50x realtime on CPUs
High quality: Quality surpasses diffusion models.
Efficency: Just uses 500mb vram and potentially less.
Universal input: Supports any input sampling rate from 8khz to 48khz.

Why is this useful?

Enhancing TTS: LavaSR can enhance TTS(text-to-speech) model quality considerably with nearly 0 computational cost.
Real-time enhancement: LavaSR allows for on device enhancement of any low quality calls, audio, etc. while using little memory.
Restoring datasets: LavaSR can enhance audio quality of any audio dataset.

Comparisons

Quality comparisons using Log-Spectral-distance on VCTK validation. Lower is better(more similar to original 48khz file).

Method	8→48 kHz	16→48 kHz	24→48 kHz
Sinc upsampling	2.98	2.75	2.17
AudioSR (diffusion)	1.13	0.98	0.82
NU-WAVE2(diffusion)	1.10	0.94	0.87
AP-BWE(previous best)	0.86	0.74	0.64
Proposed model	0.85	0.72	0.63

Speed Comparisons were done on A100 gpu. Higher realtime means faster processing speeds.

Model	Speed (Real-Time)	Model Size
LavaSR	5000x realtime	~50 MB
AP-BWE	300x realtime	~70 MB
FlowHigh	80x realtime	~450 MB
FlashSR	14x realtime	~1000 MB
AudioSR	0.6x realtime	~6000 MB

Usage

You can try it locally, colab, or spaces.

Simple 1 line installation:

uv pip install git+https://github.com/ysharma3501/LavaSR.git

Load model:

from LavaSR.model import LavaEnhance2 

## change device to your torch device type(cuda, mps, etc.)
device = 'cpu'
lava_model = LavaEnhance2("YatharthS/LavaSR", device)

Simple inference

import soundfile as sf
from IPython.display import Audio

input_audio, input_sr = lava_model.load_audio('input.wav')

## Enhance Audio
output_audio = lava_model.enhance(input_audio).cpu().numpy().squeeze()

## Save Audio(both input and output)
sf.write('input.wav', input_audio.cpu().numpy().squeeze(), 16000)
sf.write('output.wav', output_audio, 48000)

Advanced inference

import soundfile as sf
from IPython.display import Audio

cutoff = None ## Default is roughly half your sampling rate. You can lower it for higher quality but might sound "metallic".
input_sr = 16000 ## Change to any sr you want(from 8khz-48khz).
denoise = False ## Change this to True only if your audio has noise you want to filter.
batch = False ## Change this to True if audio is very long.

## Load Audio
input_audio, input_sr = lava_model.load_audio('input.wav', input_sr=input_sr)

## Enhance Audio
output_audio = lava_model.enhance(input_audio, denoise=denoise, batch=batch).cpu().numpy().squeeze()

## Save Audio(both input and output)
sf.write('input.wav', input_audio.cpu().numpy().squeeze(), 16000)
sf.write('output.wav', output_audio, 48000)

Info

Q: How is this novel?

A: It adapts Vocos based architecture for BWE(bandwidth extension/audio upsampling). We also propose linkwitz-riley inspired refiner to further significantly increase quality.

Q: How is it so fast?

A: Because it uses the Vocos architecture which is isotropic and single pass, it's much faster then time-domain based and diffusion based models.

Q: What is it trained on?

A: It starts off from a Vocos prior and trained with just 50k steps on VCTK dataset. Dataset is randomly resampled to 8khz/16khz/24khz and randomly noise is added.

Roadmap

Release model and code
Huggingface spaces demo
Release model with no metallic issue.
Release training code
Release model trained on music and audio

Acknowledgments

Vocos for their excellant architecture.
UL-UNAS for their great denoiser model.

Final Notes

Currently writing an Interspeech paper for LavaSR, receiving feedback from community would be great.

The model and code are licensed under the Apache-2.0 license. See LICENSE for details.

Stars/Likes would be appreciated, thank you.

Email: yatharthsharma3501@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
LavaSR		LavaSR
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌋 LavaSR

Main features

Why is this useful?

Comparisons

Usage

Simple 1 line installation:

Load model:

Simple inference

Advanced inference

Info

Roadmap

Acknowledgments

Final Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

ysharma3501/LavaSR

Folders and files

Latest commit

History

Repository files navigation

🌋 LavaSR

Main features

Why is this useful?

Comparisons

Usage

Simple 1 line installation:

Load model:

Simple inference

Advanced inference

Info

Roadmap

Acknowledgments

Final Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages