Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Blackwell compatibility changes #707

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

trvachov
Copy link
Collaborator

Description

Blackwell compability

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels:

Note

By default, the notebooks validation tests are skipped unless explicitly enabled.

Usage

TODO: Add code snippet

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

@codecov-commenter
Copy link

codecov-commenter commented Feb 27, 2025

❌ 10 Tests Failed:

Tests completed Failed Passed Skipped
916 10 906 18
View the top 3 failed test(s) by shortest run time
sub-packages/bionemo-esm2/tests/bionemo/esm2/model/test_model.py::test_model_equivalence_with_huggingface_8m[bf16]
Stack Traces | 1.31s run time
precision = 'bf16'

    @pytest.mark.parametrize("precision", ["fp32", "bf16", "fp16", "bf16-mixed"])
    def test_model_equivalence_with_huggingface_8m(precision):
        model_tag = "facebook/esm2_t6_8M_UR50D"
        ckpt_path = load("esm2/8m:2.0")
        with megatron_parallel_state_utils.distributed_model_parallel_state():
>           assert_model_equivalence(ckpt_path, model_tag, precision=precision)

.../esm2/model/test_model.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ckpt_path = PosixPath('.../github/home/.cache/bionemo/2957b2c36d5978d0f595d6f1b72104b312621cf0329209086537b613c1c96d16-esm2_hf_converted_8m_checkpoint.tar.gz.untar')
model_tag = 'facebook/esm2_t6_8M_UR50D', precision = 'bf16', rtol = None
atol = None

    def assert_model_equivalence(
        ckpt_path: Path | str,
        model_tag: str,
        precision: PrecisionTypes = "fp32",
        rtol: float | None = None,
        atol: float | None = None,
    ) -> None:
        """Testing utility to compare the outputs of a NeMo2 checkpoint to the original HuggingFace model weights.
    
        Compares the cosine similarity of the logit and hidden state outputs of a NeMo2 model checkpoint to the outputs of
        the corresponding HuggingFace model.
    
        Args:
            ckpt_path: A path to a NeMo2 checkpoint for an ESM-2 model.
            model_tag: The HuggingFace model tag for the model to compare against.
            precision: The precision type to use for the comparison. Defaults to "fp32".
            rtol: The relative tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
            atol: The absolute tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
        """
        tokenizer = get_tokenizer()
    
        test_proteins = [
            "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA",
            "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLAGG",
        ]
        tokens = tokenizer(test_proteins, return_tensors="pt", padding=True, truncation=True).to("cuda")
        input_ids = tokens["input_ids"]
        attention_mask = tokens["attention_mask"]
    
        dtype = get_autocast_dtype(precision)
        nemo_config = ESM2Config(
            initial_ckpt_path=str(ckpt_path),
            include_embeddings=True,
            include_hiddens=True,
            params_dtype=dtype,
            pipeline_dtype=dtype,
            autocast_dtype=dtype,
            bf16=dtype is torch.bfloat16,
            fp16=dtype is torch.float16,
        )
    
        nemo_model = nemo_config.configure_model(tokenizer).to("cuda").eval()
    
        if dtype is torch.float16 or dtype is torch.bfloat16:
            nemo_model = Float16Module(nemo_config, nemo_model)
    
        nemo_output = nemo_model(input_ids, attention_mask)
        nemo_logits = nemo_output["token_logits"].transpose(0, 1).contiguous()[..., : tokenizer.vocab_size]
        nemo_hidden_state = nemo_output["hidden_states"]
    
        del nemo_model
        gc.collect()
        torch.cuda.empty_cache()
    
        hf_model = AutoModelForMaskedLM.from_pretrained(model_tag, torch_dtype=get_autocast_dtype(precision)).cuda().eval()
        hf_output_all = hf_model(input_ids, attention_mask, output_hidden_states=True)
        hf_hidden_state = hf_output_all.hidden_states[-1]
    
        # Rather than directly comparing the logit or hidden state tensors, we compare their cosine similarity. These
        # should be essentially 1 if the outputs are equivalent, but is less sensitive to small numerical differences.
        # We don't care about the padding tokens, so we only compare the non-padding tokens.
        logit_similarity = torch.nn.functional.cosine_similarity(nemo_logits, hf_output_all.logits, dim=2)
        logit_similarity = logit_similarity[attention_mask == 1]
    
        hidden_state_similarity = torch.nn.functional.cosine_similarity(nemo_hidden_state, hf_hidden_state, dim=2)
        hidden_state_similarity = hidden_state_similarity[attention_mask == 1]
    
        torch.testing.assert_close(logit_similarity, torch.ones_like(logit_similarity), rtol=rtol, atol=atol)
>       torch.testing.assert_close(hidden_state_similarity, torch.ones_like(hidden_state_similarity), rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 125 / 132 (94.7%)
E       Greatest absolute difference: 0.07421875 at index (15,) (up to 1e-05 allowed)
E       Greatest relative difference: 0.07421875 at index (15,) (up to 0.016 allowed)

.../local/lib/python3.12.../esm2/testing/compare.py:99: AssertionError
sub-packages/bionemo-esm2/tests/bionemo/esm2/model/test_model.py::test_model_equivalence_with_huggingface_8m[bf16-mixed]
Stack Traces | 1.31s run time
precision = 'bf16-mixed'

    @pytest.mark.parametrize("precision", ["fp32", "bf16", "fp16", "bf16-mixed"])
    def test_model_equivalence_with_huggingface_8m(precision):
        model_tag = "facebook/esm2_t6_8M_UR50D"
        ckpt_path = load("esm2/8m:2.0")
        with megatron_parallel_state_utils.distributed_model_parallel_state():
>           assert_model_equivalence(ckpt_path, model_tag, precision=precision)

.../esm2/model/test_model.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ckpt_path = PosixPath('.../github/home/.cache/bionemo/2957b2c36d5978d0f595d6f1b72104b312621cf0329209086537b613c1c96d16-esm2_hf_converted_8m_checkpoint.tar.gz.untar')
model_tag = 'facebook/esm2_t6_8M_UR50D', precision = 'bf16-mixed', rtol = None
atol = None

    def assert_model_equivalence(
        ckpt_path: Path | str,
        model_tag: str,
        precision: PrecisionTypes = "fp32",
        rtol: float | None = None,
        atol: float | None = None,
    ) -> None:
        """Testing utility to compare the outputs of a NeMo2 checkpoint to the original HuggingFace model weights.
    
        Compares the cosine similarity of the logit and hidden state outputs of a NeMo2 model checkpoint to the outputs of
        the corresponding HuggingFace model.
    
        Args:
            ckpt_path: A path to a NeMo2 checkpoint for an ESM-2 model.
            model_tag: The HuggingFace model tag for the model to compare against.
            precision: The precision type to use for the comparison. Defaults to "fp32".
            rtol: The relative tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
            atol: The absolute tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
        """
        tokenizer = get_tokenizer()
    
        test_proteins = [
            "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA",
            "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLAGG",
        ]
        tokens = tokenizer(test_proteins, return_tensors="pt", padding=True, truncation=True).to("cuda")
        input_ids = tokens["input_ids"]
        attention_mask = tokens["attention_mask"]
    
        dtype = get_autocast_dtype(precision)
        nemo_config = ESM2Config(
            initial_ckpt_path=str(ckpt_path),
            include_embeddings=True,
            include_hiddens=True,
            params_dtype=dtype,
            pipeline_dtype=dtype,
            autocast_dtype=dtype,
            bf16=dtype is torch.bfloat16,
            fp16=dtype is torch.float16,
        )
    
        nemo_model = nemo_config.configure_model(tokenizer).to("cuda").eval()
    
        if dtype is torch.float16 or dtype is torch.bfloat16:
            nemo_model = Float16Module(nemo_config, nemo_model)
    
        nemo_output = nemo_model(input_ids, attention_mask)
        nemo_logits = nemo_output["token_logits"].transpose(0, 1).contiguous()[..., : tokenizer.vocab_size]
        nemo_hidden_state = nemo_output["hidden_states"]
    
        del nemo_model
        gc.collect()
        torch.cuda.empty_cache()
    
        hf_model = AutoModelForMaskedLM.from_pretrained(model_tag, torch_dtype=get_autocast_dtype(precision)).cuda().eval()
        hf_output_all = hf_model(input_ids, attention_mask, output_hidden_states=True)
        hf_hidden_state = hf_output_all.hidden_states[-1]
    
        # Rather than directly comparing the logit or hidden state tensors, we compare their cosine similarity. These
        # should be essentially 1 if the outputs are equivalent, but is less sensitive to small numerical differences.
        # We don't care about the padding tokens, so we only compare the non-padding tokens.
        logit_similarity = torch.nn.functional.cosine_similarity(nemo_logits, hf_output_all.logits, dim=2)
        logit_similarity = logit_similarity[attention_mask == 1]
    
        hidden_state_similarity = torch.nn.functional.cosine_similarity(nemo_hidden_state, hf_hidden_state, dim=2)
        hidden_state_similarity = hidden_state_similarity[attention_mask == 1]
    
        torch.testing.assert_close(logit_similarity, torch.ones_like(logit_similarity), rtol=rtol, atol=atol)
>       torch.testing.assert_close(hidden_state_similarity, torch.ones_like(hidden_state_similarity), rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 125 / 132 (94.7%)
E       Greatest absolute difference: 0.07421875 at index (15,) (up to 1e-05 allowed)
E       Greatest relative difference: 0.07421875 at index (15,) (up to 0.016 allowed)

.../local/lib/python3.12.../esm2/testing/compare.py:99: AssertionError
sub-packages/bionemo-esm2/tests/bionemo/esm2/model/test_model.py::test_model_equivalence_with_huggingface_8m[fp32]
Stack Traces | 1.38s run time
precision = 'fp32'

    @pytest.mark.parametrize("precision", ["fp32", "bf16", "fp16", "bf16-mixed"])
    def test_model_equivalence_with_huggingface_8m(precision):
        model_tag = "facebook/esm2_t6_8M_UR50D"
        ckpt_path = load("esm2/8m:2.0")
        with megatron_parallel_state_utils.distributed_model_parallel_state():
>           assert_model_equivalence(ckpt_path, model_tag, precision=precision)

.../esm2/model/test_model.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

ckpt_path = PosixPath('.../github/home/.cache/bionemo/2957b2c36d5978d0f595d6f1b72104b312621cf0329209086537b613c1c96d16-esm2_hf_converted_8m_checkpoint.tar.gz.untar')
model_tag = 'facebook/esm2_t6_8M_UR50D', precision = 'fp32', rtol = None
atol = None

    def assert_model_equivalence(
        ckpt_path: Path | str,
        model_tag: str,
        precision: PrecisionTypes = "fp32",
        rtol: float | None = None,
        atol: float | None = None,
    ) -> None:
        """Testing utility to compare the outputs of a NeMo2 checkpoint to the original HuggingFace model weights.
    
        Compares the cosine similarity of the logit and hidden state outputs of a NeMo2 model checkpoint to the outputs of
        the corresponding HuggingFace model.
    
        Args:
            ckpt_path: A path to a NeMo2 checkpoint for an ESM-2 model.
            model_tag: The HuggingFace model tag for the model to compare against.
            precision: The precision type to use for the comparison. Defaults to "fp32".
            rtol: The relative tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
            atol: The absolute tolerance to use for the comparison. Defaults to None, which chooses the tolerance based on
                the precision.
        """
        tokenizer = get_tokenizer()
    
        test_proteins = [
            "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLA",
            "MKTVRQERLKSI<mask>RILERSKEPVSGAQLAEELS<mask>SRQVIVQDIAYLRSLGYN<mask>VATPRGYVLAGG",
        ]
        tokens = tokenizer(test_proteins, return_tensors="pt", padding=True, truncation=True).to("cuda")
        input_ids = tokens["input_ids"]
        attention_mask = tokens["attention_mask"]
    
        dtype = get_autocast_dtype(precision)
        nemo_config = ESM2Config(
            initial_ckpt_path=str(ckpt_path),
            include_embeddings=True,
            include_hiddens=True,
            params_dtype=dtype,
            pipeline_dtype=dtype,
            autocast_dtype=dtype,
            bf16=dtype is torch.bfloat16,
            fp16=dtype is torch.float16,
        )
    
        nemo_model = nemo_config.configure_model(tokenizer).to("cuda").eval()
    
        if dtype is torch.float16 or dtype is torch.bfloat16:
            nemo_model = Float16Module(nemo_config, nemo_model)
    
        nemo_output = nemo_model(input_ids, attention_mask)
        nemo_logits = nemo_output["token_logits"].transpose(0, 1).contiguous()[..., : tokenizer.vocab_size]
        nemo_hidden_state = nemo_output["hidden_states"]
    
        del nemo_model
        gc.collect()
        torch.cuda.empty_cache()
    
        hf_model = AutoModelForMaskedLM.from_pretrained(model_tag, torch_dtype=get_autocast_dtype(precision)).cuda().eval()
        hf_output_all = hf_model(input_ids, attention_mask, output_hidden_states=True)
        hf_hidden_state = hf_output_all.hidden_states[-1]
    
        # Rather than directly comparing the logit or hidden state tensors, we compare their cosine similarity. These
        # should be essentially 1 if the outputs are equivalent, but is less sensitive to small numerical differences.
        # We don't care about the padding tokens, so we only compare the non-padding tokens.
        logit_similarity = torch.nn.functional.cosine_similarity(nemo_logits, hf_output_all.logits, dim=2)
        logit_similarity = logit_similarity[attention_mask == 1]
    
        hidden_state_similarity = torch.nn.functional.cosine_similarity(nemo_hidden_state, hf_hidden_state, dim=2)
        hidden_state_similarity = hidden_state_similarity[attention_mask == 1]
    
>       torch.testing.assert_close(logit_similarity, torch.ones_like(logit_similarity), rtol=rtol, atol=atol)
E       AssertionError: Tensor-likes are not close!
E       
E       Mismatched elements: 132 / 132 (100.0%)
E       Greatest absolute difference: 0.003114163875579834 at index (124,) (up to 1e-05 allowed)
E       Greatest relative difference: 0.003114163875579834 at index (124,) (up to 1.3e-06 allowed)

.../local/lib/python3.12.../esm2/testing/compare.py:98: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@trvachov trvachov force-pushed the trvachov/blackwell-compatibility branch from b7e684a to 9eae95f Compare February 27, 2025 21:41
@trvachov trvachov force-pushed the trvachov/blackwell-compatibility branch from 9eae95f to e1be4e9 Compare February 27, 2025 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants