Skip to content

danhearn/thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

512x512 Spectrogram Reconstruction Experimentation

Spectrogram Generation

10 seconds of John Belfram's 10 Days of Blue

chunk_10_seconds_spectrogram_512x512

Stable Diffusion Fine Tune

A finetune test using the dreambooth method. Dataset is five 512x512 spectrograms of 10 second audio chunks of the same song as above. Trained using diffusers acceleration with 400 iterations (only a few mins on 4090 if I remember correctly)

image_0

Pretty good considering such quick finetuning.

Audio reconstruction

Reconstructed with phase information

Up to 10 seconds audio is indistinguishable from the original

chunk_10_seconds.mp4

Above 10 seconds audio quality gets progressively worse, starting with the low-end.

chunk_12_seconds.mp4

Reconstructed without phase information

Using librosa, converting to Mel then reconstructing with Griffin-Lim

reconstructed_audio.mp4

Using Riffusion's conversion pipeline that wraps torchaudio, again converting to Mel then reconstructing with Griffin-Lim. Very hard to find any similarity with original audio.

output_audio.1.mp4

Finetuned model spectrogram output using the librosa code

reconstructed_audio_from_finetuned.mp4

It's a start I guess! :D

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published