Spleeter Pre-Processing

I've been testing this out to automatically generate forced alignment Karaoke lyrics for an open source project. So far the results are super impressive! I've tested a TON of options, and this is by far the best I've found. Nothing comes close!

My basic workflow is...
1) Isolate the vocals using spleeter. This works really well and leaves you with separate WAV files for the vocals and the accompaniment.
2) Lookup lyrics on genius
3) Supply AutoLyrixAlign with the lyrics and the original polyphonic music file to get timestamped words.

It works pretty great! I started wondering, though... since spleeter is fairly new, and seems to work really well... have you considered training a new dataset on just the isolated vocals? Would that give even more accurate results?

In other words... isolate the vocals by processing all of the songs in the dataset with spleeter first, and then train the same way you did before (but using just the isolated vocals instead of the original polyphonic audio). And of course, when running the alignment, be sure to pre-process the input using spleeter (or assume the input is already isolated vocals from spleeter).

What do you think? Is this a crazy idea?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spleeter Pre-Processing #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Spleeter Pre-Processing #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions