[WIP] Blackwell compatibility changes #707

trvachov · 2025-02-27T03:04:12Z

Description

Blackwell compability

Type of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactor
Documentation update
Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels:

SKIP_CI - Skip all continuous integration tests
INCLUDE_NOTEBOOKS_TESTS - Execute notebook validation tests in pytest
INCLUDE_SLOW_TESTS - Execute tests labelled as slow in pytest for extensive testing

Note

By default, the notebooks validation tests are skipped unless explicitly enabled.

Usage

TODO: Add code snippet

Pre-submit Checklist

I have tested these changes locally
I have updated the documentation accordingly
I have added/updated tests as needed
All existing tests pass successfully

codecov-commenter · 2025-02-27T04:10:03Z

❌ 10 Tests Failed:

Tests completed	Failed	Passed	Skipped
916	10	906	18

View the top 3 failed test(s) by shortest run time

sub-packages/bionemo-esm2/tests/bionemo/esm2/model/test_model.py::test_model_equivalence_with_huggingface_8m[bf16]

Stack Traces | 1.31s run time

precision = 'bf16'

    @pytest.mark.parametrize("precision", ["fp32", "bf16", "fp16", "bf16-mixed"])
    def test_model_equivalence_with_huggingface_8m(precision):
        model_tag = "facebook/esm2_t6_8M_UR50D"
        ckpt_path = load("esm2/8m:2.0")
        with megatron_parallel_state_utils.distributed_model_parallel_state():
>           assert_model_equivalence(ckpt_path, model_tag, precision=precision)

.../esm2/model/test_model.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ckpt_path = PosixPath('.../github/home/.cache/bionemo/2957b2c36d5978d0f595d6f1b72104b312621cf0329209086537b613c1c96d16-esm2_hf_converted_8m_checkpoint.tar.gz.untar')
model_tag = 'facebook/esm2_t6_8M_UR50D', precision = 'bf16', rtol = None
atol = None

    def assert_model_equivalence(
        ckpt_path: Path | str,
        model_tag: str,
        precision: PrecisionTypes = "fp32",
        rtol: float | None = None,
        atol: float | None = None,
    ) -> None:
        """Testing utility to compare the outputs of a NeMo2 checkpoint to the original HuggingFace model weights.
    
        Compares the cosine similarity of the logit and hidden state outputs of a NeMo2 model checkpoint to the outputs of
        the corresponding HuggingFace model.
    
        Args:
            ckpt_path: A path to a NeMo2 checkpoint for an ESM-2 model.
            model_tag: The HuggingFace model tag for the model to compare against.
            precision: The precision type to use for the comparison. Defaults to "fp32".
            rtol: The relative tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
            atol: The absolute tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
        """
        tokenizer = get_tokenizer()
    
        test_proteins = [
            "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA",
            "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLAGG",
        ]
        tokens = tokenizer(test_proteins, return_tensors="pt", padding=True, truncation=True).to("cuda")
        input_ids = tokens["input_ids"]
        attention_mask = tokens["attention_mask"]
    
        dtype = get_autocast_dtype(precision)
        nemo_config = ESM2Config(
            initial_ckpt_path=str(ckpt_path),
            include_embeddings=True,
            include_hiddens=True,
            params_dtype=dtype,
            pipeline_dtype=dtype,
            autocast_dtype=dtype,
            bf16=dtype is torch.bfloat16,
            fp16=dtype is torch.float16,
        )
    
        nemo_model = nemo_config.configure_model(tokenizer).to("cuda").eval()
    
        if dtype is torch.float16 or dtype is torch.bfloat16:
            nemo_model = Float16Module(nemo_config, nemo_model)
    
        nemo_output = nemo_model(input_ids, attention_mask)
        nemo_logits = nemo_output["token_logits"].transpose(0, 1).contiguous()[..., : tokenizer.vocab_size]
        nemo_hidden_state = nemo_output["hidden_states"]
    
        del nemo_model
        gc.collect()
        torch.cuda.empty_cache()
    
        hf_model = AutoModelForMaskedLM.from_pretrained(model_tag, torch_dtype=get_autocast_dtype(precision)).cuda().eval()
        hf_output_all = hf_model(input_ids, attention_mask, output_hidden_states=True)
        hf_hidden_state = hf_output_all.hidden_states[-1]
    
        # Rather than directly comparing the logit or hidden state tensors, we compare their cosine similarity. These
        # should be essentially 1 if the outputs are equivalent, but is less sensitive to small numerical differences.
        # We don't care about the padding tokens, so we only compare the non-padding tokens.
        logit_similarity = torch.nn.functional.cosine_similarity(nemo_logits, hf_output_all.logits, dim=2)
        logit_similarity = logit_similarity[attention_mask == 1]
    
        hidden_state_similarity = torch.nn.functional.cosine_similarity(nemo_hidden_state, hf_hidden_state, dim=2)
        hidden_state_similarity = hidden_state_similarity[attention_mask == 1]
    
        torch.testing.assert_close(logit_similarity, torch.ones_like(logit_similarity), rtol=rtol, atol=atol)
>       torch.testing.assert_close(hidden_state_similarity, torch.ones_like(hidden_state_similarity), rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 125 / 132 (94.7%)
E       Greatest absolute difference: 0.07421875 at index (15,) (up to 1e-05 allowed)
E       Greatest relative difference: 0.07421875 at index (15,) (up to 0.016 allowed)

.../local/lib/python3.12.../esm2/testing/compare.py:99: AssertionError

sub-packages/bionemo-esm2/tests/bionemo/esm2/model/test_model.py::test_model_equivalence_with_huggingface_8m[bf16-mixed]

Stack Traces | 1.31s run time

precision = 'bf16-mixed'

    @pytest.mark.parametrize("precision", ["fp32", "bf16", "fp16", "bf16-mixed"])
    def test_model_equivalence_with_huggingface_8m(precision):
        model_tag = "facebook/esm2_t6_8M_UR50D"
        ckpt_path = load("esm2/8m:2.0")
        with megatron_parallel_state_utils.distributed_model_parallel_state():
>           assert_model_equivalence(ckpt_path, model_tag, precision=precision)

.../esm2/model/test_model.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ckpt_path = PosixPath('.../github/home/.cache/bionemo/2957b2c36d5978d0f595d6f1b72104b312621cf0329209086537b613c1c96d16-esm2_hf_converted_8m_checkpoint.tar.gz.untar')
model_tag = 'facebook/esm2_t6_8M_UR50D', precision = 'bf16-mixed', rtol = None
atol = None

    def assert_model_equivalence(
        ckpt_path: Path | str,
        model_tag: str,
        precision: PrecisionTypes = "fp32",
        rtol: float | None = None,
        atol: float | None = None,
    ) -> None:
        """Testing utility to compare the outputs of a NeMo2 checkpoint to the original HuggingFace model weights.
    
        Compares the cosine similarity of the logit and hidden state outputs of a NeMo2 model checkpoint to the outputs of
        the corresponding HuggingFace model.
    
        Args:
            ckpt_path: A path to a NeMo2 checkpoint for an ESM-2 model.
            model_tag: The HuggingFace model tag for the model to compare against.
            precision: The precision type to use for the comparison. Defaults to "fp32".
            rtol: The relative tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
            atol: The absolute tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
        """
        tokenizer = get_tokenizer()
    
        test_proteins = [
            "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA",
            "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLAGG",
        ]
        tokens = tokenizer(test_proteins, return_tensors="pt", padding=True, truncation=True).to("cuda")
        input_ids = tokens["input_ids"]
        attention_mask = tokens["attention_mask"]
    
        dtype = get_autocast_dtype(precision)
        nemo_config = ESM2Config(
            initial_ckpt_path=str(ckpt_path),
            include_embeddings=True,
            include_hiddens=True,
            params_dtype=dtype,
            pipeline_dtype=dtype,
            autocast_dtype=dtype,
            bf16=dtype is torch.bfloat16,
            fp16=dtype is torch.float16,
        )
    
        nemo_model = nemo_config.configure_model(tokenizer).to("cuda").eval()
    
        if dtype is torch.float16 or dtype is torch.bfloat16:
            nemo_model = Float16Module(nemo_config, nemo_model)
    
        nemo_output = nemo_model(input_ids, attention_mask)
        nemo_logits = nemo_output["token_logits"].transpose(0, 1).contiguous()[..., : tokenizer.vocab_size]
        nemo_hidden_state = nemo_output["hidden_states"]
    
        del nemo_model
        gc.collect()
        torch.cuda.empty_cache()
    
        hf_model = AutoModelForMaskedLM.from_pretrained(model_tag, torch_dtype=get_autocast_dtype(precision)).cuda().eval()
        hf_output_all = hf_model(input_ids, attention_mask, output_hidden_states=True)
        hf_hidden_state = hf_output_all.hidden_states[-1]
    
        # Rather than directly comparing the logit or hidden state tensors, we compare their cosine similarity. These
        # should be essentially 1 if the outputs are equivalent, but is less sensitive to small numerical differences.
        # We don't care about the padding tokens, so we only compare the non-padding tokens.
        logit_similarity = torch.nn.functional.cosine_similarity(nemo_logits, hf_output_all.logits, dim=2)
        logit_similarity = logit_similarity[attention_mask == 1]
    
        hidden_state_similarity = torch.nn.functional.cosine_similarity(nemo_hidden_state, hf_hidden_state, dim=2)
        hidden_state_similarity = hidden_state_similarity[attention_mask == 1]
    
        torch.testing.assert_close(logit_similarity, torch.ones_like(logit_similarity), rtol=rtol, atol=atol)
>       torch.testing.assert_close(hidden_state_similarity, torch.ones_like(hidden_state_similarity), rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 125 / 132 (94.7%)
E       Greatest absolute difference: 0.07421875 at index (15,) (up to 1e-05 allowed)
E       Greatest relative difference: 0.07421875 at index (15,) (up to 0.016 allowed)

.../local/lib/python3.12.../esm2/testing/compare.py:99: AssertionError

sub-packages/bionemo-esm2/tests/bionemo/esm2/model/test_model.py::test_model_equivalence_with_huggingface_8m[fp32]

Stack Traces | 1.38s run time

precision = 'fp32'

    @pytest.mark.parametrize("precision", ["fp32", "bf16", "fp16", "bf16-mixed"])
    def test_model_equivalence_with_huggingface_8m(precision):
        model_tag = "facebook/esm2_t6_8M_UR50D"
        ckpt_path = load("esm2/8m:2.0")
        with megatron_parallel_state_utils.distributed_model_parallel_state():
>           assert_model_equivalence(ckpt_path, model_tag, precision=precision)

.../esm2/model/test_model.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ckpt_path = PosixPath('.../github/home/.cache/bionemo/2957b2c36d5978d0f595d6f1b72104b312621cf0329209086537b613c1c96d16-esm2_hf_converted_8m_checkpoint.tar.gz.untar')
model_tag = 'facebook/esm2_t6_8M_UR50D', precision = 'fp32', rtol = None
atol = None

    def assert_model_equivalence(
        ckpt_path: Path | str,
        model_tag: str,
        precision: PrecisionTypes = "fp32",
        rtol: float | None = None,
        atol: float | None = None,
    ) -> None:
        """Testing utility to compare the outputs of a NeMo2 checkpoint to the original HuggingFace model weights.
    
        Compares the cosine similarity of the logit and hidden state outputs of a NeMo2 model checkpoint to the outputs of
        the corresponding HuggingFace model.
    
        Args:
            ckpt_path: A path to a NeMo2 checkpoint for an ESM-2 model.
            model_tag: The HuggingFace model tag for the model to compare against.
            precision: The precision type to use for the comparison. Defaults to "fp32".
            rtol: The relative tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
            atol: The absolute tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
        """
        tokenizer = get_tokenizer()
    
        test_proteins = [
            "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA",
            "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLAGG",
        ]
        tokens = tokenizer(test_proteins, return_tensors="pt", padding=True, truncation=True).to("cuda")
        input_ids = tokens["input_ids"]
        attention_mask = tokens["attention_mask"]
    
        dtype = get_autocast_dtype(precision)
        nemo_config = ESM2Config(
            initial_ckpt_path=str(ckpt_path),
            include_embeddings=True,
            include_hiddens=True,
            params_dtype=dtype,
            pipeline_dtype=dtype,
            autocast_dtype=dtype,
            bf16=dtype is torch.bfloat16,
            fp16=dtype is torch.float16,
        )
    
        nemo_model = nemo_config.configure_model(tokenizer).to("cuda").eval()
    
        if dtype is torch.float16 or dtype is torch.bfloat16:
            nemo_model = Float16Module(nemo_config, nemo_model)
    
        nemo_output = nemo_model(input_ids, attention_mask)
        nemo_logits = nemo_output["token_logits"].transpose(0, 1).contiguous()[..., : tokenizer.vocab_size]
        nemo_hidden_state = nemo_output["hidden_states"]
    
        del nemo_model
        gc.collect()
        torch.cuda.empty_cache()
    
        hf_model = AutoModelForMaskedLM.from_pretrained(model_tag, torch_dtype=get_autocast_dtype(precision)).cuda().eval()
        hf_output_all = hf_model(input_ids, attention_mask, output_hidden_states=True)
        hf_hidden_state = hf_output_all.hidden_states[-1]
    
        # Rather than directly comparing the logit or hidden state tensors, we compare their cosine similarity. These
        # should be essentially 1 if the outputs are equivalent, but is less sensitive to small numerical differences.
        # We don't care about the padding tokens, so we only compare the non-padding tokens.
        logit_similarity = torch.nn.functional.cosine_similarity(nemo_logits, hf_output_all.logits, dim=2)
        logit_similarity = logit_similarity[attention_mask == 1]
    
        hidden_state_similarity = torch.nn.functional.cosine_similarity(nemo_hidden_state, hf_hidden_state, dim=2)
        hidden_state_similarity = hidden_state_similarity[attention_mask == 1]
    
>       torch.testing.assert_close(logit_similarity, torch.ones_like(logit_similarity), rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 132 / 132 (100.0%)
E       Greatest absolute difference: 0.003114163875579834 at index (124,) (up to 1e-05 allowed)
E       Greatest relative difference: 0.003114163875579834 at index (124,) (up to 1.3e-06 allowed)

.../local/lib/python3.12.../esm2/testing/compare.py:98: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

trvachov requested review from dorotat-nv, jstjohn, malcolmgreaves, ohadmo, pstjohn, sichu2023, skothenhill-nv, jomitchellnv, jwilber and cspades as code owners February 27, 2025 03:04

trvachov force-pushed the trvachov/blackwell-compatibility branch from b7e684a to 9eae95f Compare February 27, 2025 21:41

Blackwell compatibility changes

e1be4e9

trvachov force-pushed the trvachov/blackwell-compatibility branch from 9eae95f to e1be4e9 Compare February 27, 2025 22:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Blackwell compatibility changes #707

[WIP] Blackwell compatibility changes #707

trvachov commented Feb 27, 2025

codecov-commenter commented Feb 27, 2025 •

edited

Loading

[WIP] Blackwell compatibility changes #707

Are you sure you want to change the base?

[WIP] Blackwell compatibility changes #707

Conversation

trvachov commented Feb 27, 2025

Description

Type of changes

CI Pipeline Configuration

Usage

Pre-submit Checklist

codecov-commenter commented Feb 27, 2025 • edited Loading

❌ 10 Tests Failed:

codecov-commenter commented Feb 27, 2025 •

edited

Loading