[DRAFT] pytests for losses #3167

JINO-ROHIT · 2025-01-12T12:59:30Z

Im writing this PR to help write test cases for all the losses .

Im starting with ContrastiveLoss and I wanted to get some early feedback on this. Like you mentioned, the idea is have a single parametrized func and pass all loss cases into it.

wdyt
@tomaarsen

tomaarsen · 2025-01-20T14:09:31Z

Hello!

I think this is heading in the right direction, but I do have some suggestions:

In particular, I think there's a risk that our test implementation has similar/the same bug as our real implementation. However, we still want to ensure that the loss "works".

So, one option is to create 2 batches. If we want to use the same model for both, then we have to create 1 "good" batch and 1 "bad" batch. The former is normal data, whereas the latter is opposite of reality: the positive is the negative, or related texts are marked as 0.0 similarity, etc.

Then, the same trained model should give a low loss for the "good" batch and a high loss for the "bad" batch. We should in both cases still test that the output 1) is a torch Tensor, 2) not nan, 3) not 0.0, 4) has requires_grad, etc.

I think this is safer than just calculating an "expected loss", because if our implementation is buggy, that expected loss is probably also wrong.

And then we'd have to create the good and bad batches for each row in the Loss Overview, and then use each of the appropriate loss functions with those good/bad batches and ensure that loss(good_batch) < loss(bad_batch).

Does that make sense?

Tom Aarsen

JINO-ROHIT · 2025-01-20T15:37:59Z

umm i must admit i dont completely understand, do you mind giving an example?

tomaarsen · 2025-01-21T12:28:49Z

Apologies, it is a bit confusing! Here is a detailed example:

from __future__ import annotations

import pytest
import torch
from torch import nn

from datasets import Dataset
from sentence_transformers import SentenceTransformer
from sentence_transformers.losses import MultipleNegativesRankingLoss, CachedMultipleNegativesRankingLoss, TripletLoss, CachedGISTEmbedLoss, GISTEmbedLoss
from sentence_transformers.util import batch_to_device

# TODO: Preferably initialize the guide model in a fixture
GUIDE_MODEL = SentenceTransformer("sentence-transformers-testing/stsb-bert-tiny-safetensors")

anchor_positive_negative_triplet = {
    "losses": [
        (MultipleNegativesRankingLoss, {}),
        (CachedMultipleNegativesRankingLoss, {}),
        (TripletLoss, {}),
        (CachedGISTEmbedLoss, {"guide": GUIDE_MODEL}),
        (GISTEmbedLoss, {"guide": GUIDE_MODEL}),
    ],
    "correct": Dataset.from_dict({
        "anchor": ["It's very sunny outside", "I love playing soccer", "I am a student"],
        "positive": ["The sun is out today", "I like playing soccer", "I am studying at university"],
        "negative": ["Data science is fun", "Cacti are beautiful", "Speakers are loud"],
    }),
    "incorrect": Dataset.from_dict({
        "anchor": ["It's very sunny outside", "I love playing soccer", "I am a student"],
        "positive": ["Data science is fun", "Cacti are beautiful", "Speakers are loud"],
        "negative": ["The sun is out today", "I like playing soccer", "I am studying at university"],
    }),
}

LOSS_TEST_CASES = [
    (loss_class, loss_args, anchor_positive_negative_triplet["correct"], anchor_positive_negative_triplet["incorrect"])
    for loss_class, loss_args in anchor_positive_negative_triplet["losses"]
]

def prepare_features_labels_from_dataset(model: SentenceTransformer, dataset: Dataset):
    device = model.device
    features = [
        batch_to_device(model.tokenize(dataset[column]), device) for column in dataset.column_names if column not in ["label", "score"]
    ]
    labels = None
    if "label" in dataset.column_names:
        labels = torch.tensor(dataset["label"]).to(device)
    elif "score" in dataset.column_names:
        labels = torch.tensor(dataset["score"]).to(device)
    return features, labels

def get_and_assert_loss_from_dataset(model: SentenceTransformer, loss_fn: nn.Module, dataset: Dataset):
    features, labels = prepare_features_labels_from_dataset(model, dataset)
    loss = loss_fn.forward(features, labels)
    assert isinstance(loss, torch.Tensor), f"Loss should be a torch.Tensor, but got {type(loss)}"
    assert loss.item() > 0, "Loss should be positive"
    assert loss.shape == (), "Loss should be a scalar"
    assert loss.requires_grad, "Loss should require gradients"
    return loss

@pytest.mark.parametrize("loss_class, loss_args, correct, incorrect", LOSS_TEST_CASES)
def test_loss_function(stsb_bert_tiny_model_reused: SentenceTransformer, loss_class, loss_args, correct, incorrect):
    model = stsb_bert_tiny_model_reused
    loss_fn = loss_class(model, **loss_args)
    correct_loss = get_and_assert_loss_from_dataset(model, loss_fn, correct)
    incorrect_loss = get_and_assert_loss_from_dataset(model, loss_fn, incorrect)

    assert correct_loss < incorrect_loss, "Loss should be lower for correct data than for incorrect data"

It can be changed up a bit, but the overall idea is that we have 1 batch of "correct" data and 1 batch of "incorrect" data. If we use a trained model, then the loss of the "correct" data will be lower than the loss of the "incorrect" data.
Beyond that, we do some simple sanity checks on the losses themselves (positive, single value, torch.Tensor, has gradient).

Does that make some more sense? How this file would be structured can be updated to whatever is convenient.

Tom Aarsen

JINO-ROHIT · 2025-01-21T15:18:14Z

yeap, thanks so much tom, working on it!

JINO-ROHIT · 2025-01-21T17:21:02Z

i couldnt find an elegant way to pass the guide model as fixture, would this work?

tomaarsen · 2025-01-21T18:11:47Z

I think there is a way for pytest to load a fixture by name, but I think it's almost as hacky as this, so I think this is fine to keep for now! If I ever run into it again, then I can update things.

JINO-ROHIT · 2025-01-22T09:27:47Z

cool then , ill add for other losses

JINO-ROHIT · 2025-01-22T13:09:16Z

wb for losses like softmax?
this condition fails

assert correct_loss < incorrect_loss, "Loss should be lower for correct data than for incorrect data"

I assume its since the test tiny model hasnt been trained and the random init makes the losses inconsistent ?

tomaarsen · 2025-01-23T13:35:20Z

Oh, I hadn't thought about that the tiny model might not be trained enough, hah.
I think I did train them a bit, perhaps you can check if this one does work? https://huggingface.co/sentence-transformers-testing/all-nli-bert-tiny-dense

And otherwise we'll have to use a bigger, normal model, like all-MiniLM-L6-v2.

Tom Aarsen

JINO-ROHIT · 2025-01-23T14:43:56Z

yeah, this models seems to be working for now. thanks!

JINO-ROHIT · 2025-01-23T16:25:15Z

tried using a larger model and the smaller ones as well, the losses are still kinda random.

Also i think ive added the losses that can be tested together, i think the other losses have to be handled differently, wdyt?

pytests for losses

7d25f98

JINO-ROHIT marked this pull request as draft January 12, 2025 12:59

guide model as fixture(slightly ugly)

a1a41be

softmax loss

25b8797

switch model to nli tiny bert

34112fb

JINO-ROHIT added 2 commits January 23, 2025 20:18

adding CachedMultipleNegativesSymmetricRankingLoss

fdf955c

mega batch margin loss

dec857c

JINO-ROHIT marked this pull request as ready for review January 27, 2025 14:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] pytests for losses #3167

[DRAFT] pytests for losses #3167

JINO-ROHIT commented Jan 12, 2025

tomaarsen commented Jan 20, 2025

JINO-ROHIT commented Jan 20, 2025

tomaarsen commented Jan 21, 2025

JINO-ROHIT commented Jan 21, 2025

JINO-ROHIT commented Jan 21, 2025

tomaarsen commented Jan 21, 2025

JINO-ROHIT commented Jan 22, 2025

JINO-ROHIT commented Jan 22, 2025

tomaarsen commented Jan 23, 2025

JINO-ROHIT commented Jan 23, 2025

JINO-ROHIT commented Jan 23, 2025

[DRAFT] pytests for losses #3167

Are you sure you want to change the base?

[DRAFT] pytests for losses #3167

Conversation

JINO-ROHIT commented Jan 12, 2025

tomaarsen commented Jan 20, 2025

JINO-ROHIT commented Jan 20, 2025

tomaarsen commented Jan 21, 2025

JINO-ROHIT commented Jan 21, 2025

JINO-ROHIT commented Jan 21, 2025

tomaarsen commented Jan 21, 2025

JINO-ROHIT commented Jan 22, 2025

JINO-ROHIT commented Jan 22, 2025

tomaarsen commented Jan 23, 2025

JINO-ROHIT commented Jan 23, 2025

JINO-ROHIT commented Jan 23, 2025