Cannot Load moonshotai/Moonlight-16B-A3B #36385

Shomvel · 2025-02-25T03:58:09Z

System Info

transformers version: 4.47.1
Platform: Linux-5.15.0-124-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.26.1
Safetensors version: 0.4.5
Accelerate version: 1.4.0
Accelerate config: - compute_environment: LOCAL_MACHINE
- distributed_type: DEEPSPEED
- use_cpu: False
- debug: True
- num_processes: 8
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- deepspeed_config: {'deepspeed_config_file': './training/configs/deepspeed.json', 'zero3_init_flag': False}
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
- dynamo_config: {'dynamo_backend': 'INDUCTOR'}
PyTorch version (GPU?): 2.6.0+cu124 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: No

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "moonshotai/Moonlight-16B-A3B/"
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype=torch.float16, trust_remote_code=True
)

The error is :


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[36], line 10
      8 # 加载模型和分词器
      9 model_name = "/gpfs/models/huggingface.co/moonshotai/Moonlight-16B-A3B/"
---> 10 model = AutoModelForCausalLM.from_pretrained(
     11     model_name, torch_dtype=torch.float16, trust_remote_code=True
     12 )

File ~/experiments/.venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:559, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    557     cls.register(config.__class__, model_class, exist_ok=True)
    558     model_class = add_generation_mixin_to_remote_model(model_class)
--> 559     return model_class.from_pretrained(
    560         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    561     )
    562 elif type(config) in cls._model_mapping.keys():
    563     model_class = _get_model_class(config, cls._model_mapping)

File ~/experiments/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py:4264, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, weights_only, *model_args, **kwargs)
   4254         load_contexts.append(tp_device)
   4256     with ContextManagers(load_contexts):
   4257         (
   4258             model,
   4259             missing_keys,
   4260             unexpected_keys,
   4261             mismatched_keys,
   4262             offload_index,
   4263             error_msgs,
-> 4264         ) = cls._load_pretrained_model(
   4265             model,
   4266             state_dict,
   4267             loaded_state_dict_keys,  # XXX: rename?
   4268             resolved_archive_file,
   4269             pretrained_model_name_or_path,
   4270             ignore_mismatched_sizes=ignore_mismatched_sizes,
   4271             sharded_metadata=sharded_metadata,
   4272             _fast_init=_fast_init,
   4273             low_cpu_mem_usage=low_cpu_mem_usage,
   4274             device_map=device_map,
   4275             offload_folder=offload_folder,
   4276             offload_state_dict=offload_state_dict,
   4277             dtype=torch_dtype,
   4278             hf_quantizer=hf_quantizer,
   4279             keep_in_fp32_modules=keep_in_fp32_modules,
   4280             gguf_path=gguf_path,
   4281             weights_only=weights_only,
   4282         )
   4284 # make sure token embedding weights are still tied if needed
   4285 model.tie_weights()

File ~/experiments/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py:4755, in PreTrainedModel._load_pretrained_model(cls, model, state_dict, loaded_keys, resolved_archive_file, pretrained_model_name_or_path, ignore_mismatched_sizes, sharded_metadata, _fast_init, low_cpu_mem_usage, device_map, offload_folder, offload_state_dict, dtype, hf_quantizer, keep_in_fp32_modules, gguf_path, weights_only)
   4748 if (
   4749     device_map is not None
   4750     and hf_quantizer is not None
   4751     and hf_quantizer.quantization_config.quant_method == QuantizationMethod.TORCHAO
   4752     and hf_quantizer.quantization_config.quant_type == "int4_weight_only"
   4753 ):
   4754     map_location = torch.device([d for d in device_map.values() if d not in ["cpu", "disk"]][0])
-> 4755 state_dict = load_state_dict(
   4756     shard_file, is_quantized=is_quantized, map_location=map_location, weights_only=weights_only
   4757 )
   4759 # Mistmatched keys contains tuples key/shape1/shape2 of weights in the checkpoint that have a shape not
   4760 # matching the weights in the model.
   4761 mismatched_keys += _find_mismatched_keys(
   4762     state_dict,
   4763     model_state_dict,
   (...)
   4767     ignore_mismatched_sizes,
   4768 )

File ~/experiments/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py:506, in load_state_dict(checkpoint_file, is_quantized, map_location, weights_only)
    504 with safe_open(checkpoint_file, framework="pt") as f:
    505     metadata = f.metadata()
--> 506 if metadata.get("format") not in ["pt", "tf", "flax", "mlx"]:
    507     raise OSError(
    508         f"The safetensors archive passed at {checkpoint_file} does not contain the valid metadata. Make sure "
    509         "you save your model with the `save_pretrained` method."
    510     )
    511 return safe_load_file(checkpoint_file)

AttributeError: 'NoneType' object has no attribute 'get'

I found all the safetensors file didn't have metadata, which you can test with

from safetensors import safe_open
for i in range(1, 28):
    tensors = safe_open(f"/model/path/moonshotai/Moonlight-16B-A3B/model-{i}-of-27.safetensors", framework="pt")
    metadata = tensors.metadata()
    print("Metadata:", metadata)

Expected behavior

The model loads.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-02-25T14:11:19Z

hi @Shomvel, have you raised the issue with the repository owners? I'm not sure this is a bug in transformers, because it's a custom code model with its own weights!

Shomvel added the bug label Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot Load moonshotai/Moonlight-16B-A3B #36385

Cannot Load moonshotai/Moonlight-16B-A3B #36385

Shomvel commented Feb 25, 2025 •

edited

Loading

Rocketknight1 commented Feb 25, 2025

Cannot Load moonshotai/Moonlight-16B-A3B #36385

Cannot Load moonshotai/Moonlight-16B-A3B #36385

Comments

Shomvel commented Feb 25, 2025 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Feb 25, 2025

Shomvel commented Feb 25, 2025 •

edited

Loading