Skip to content

Spleeter Pre-Processing #1

@gazugafan

Description

@gazugafan

I've been testing this out to automatically generate forced alignment Karaoke lyrics for an open source project. So far the results are super impressive! I've tested a TON of options, and this is by far the best I've found. Nothing comes close!

My basic workflow is...

  1. Isolate the vocals using spleeter. This works really well and leaves you with separate WAV files for the vocals and the accompaniment.
  2. Lookup lyrics on genius
  3. Supply AutoLyrixAlign with the lyrics and the original polyphonic music file to get timestamped words.

It works pretty great! I started wondering, though... since spleeter is fairly new, and seems to work really well... have you considered training a new dataset on just the isolated vocals? Would that give even more accurate results?

In other words... isolate the vocals by processing all of the songs in the dataset with spleeter first, and then train the same way you did before (but using just the isolated vocals instead of the original polyphonic audio). And of course, when running the alignment, be sure to pre-process the input using spleeter (or assume the input is already isolated vocals from spleeter).

What do you think? Is this a crazy idea?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions