512x512 Spectrogram Reconstruction Experimentation

Spectrogram Generation

10 seconds of John Belfram's 10 Days of Blue

Stable Diffusion Fine Tune

A finetune test using the dreambooth method. Dataset is five 512x512 spectrograms of 10 second audio chunks of the same song as above. Trained using diffusers acceleration with 400 iterations (only a few mins on 4090 if I remember correctly)

Pretty good considering such quick finetuning.

Audio reconstruction

Reconstructed with phase information

Up to 10 seconds audio is indistinguishable from the original

chunk_10_seconds.mp4

Above 10 seconds audio quality gets progressively worse, starting with the low-end.

chunk_12_seconds.mp4

Reconstructed without phase information

Using librosa, converting to Mel then reconstructing with Griffin-Lim

reconstructed_audio.mp4

Using Riffusion's conversion pipeline that wraps torchaudio, again converting to Mel then reconstructing with Griffin-Lim. Very hard to find any similarity with original audio.

output_audio.1.mp4

Finetuned model spectrogram output using the librosa code

reconstructed_audio_from_finetuned.mp4

It's a start I guess! :D

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
__pycache__		__pycache__
spectrogram-data		spectrogram-data
spectrogram_to_audio		spectrogram_to_audio
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

512x512 Spectrogram Reconstruction Experimentation

Spectrogram Generation

Stable Diffusion Fine Tune

Audio reconstruction

Reconstructed with phase information

Reconstructed without phase information

About

Releases

Packages

Languages

danhearn/thesis

Folders and files

Latest commit

History

Repository files navigation

512x512 Spectrogram Reconstruction Experimentation

Spectrogram Generation

Stable Diffusion Fine Tune

Audio reconstruction

Reconstructed with phase information

Reconstructed without phase information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages