Running UMAP score method on train split before running on test split leads to obviously wrong test embeddings. (This was hot fixed for the paper by scoring test before train and the results are valid, but the bug should still be investigated).