-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Hello, Thanks for your great work!
I met an error when loading the "state_dict" of the O1-nano model, it seems the vocab_size in your released parameter is 39 instead of 40. Could you please check it out? btw, the model still doesn't work if I simply change the vocab_size to 39.
RuntimeError: Error(s) in loading state_dict for O1Model: size mismatch for embed.weight: copying a param with shape torch.Size([39, 64]) from checkpoint, the shape in current model is torch.Size([40, 64]). size mismatch for completion_decoder.weight: copying a param with shape torch.Size([39, 64]) from checkpoint, the shape in current model is torch.Size([40, 64]). size mismatch for completion_decoder.bias: copying a param with shape torch.Size([39]) from checkpoint, the shape in current model is torch.Size([40]). size mismatch for reasoning_decoder.weight: copying a param with shape torch.Size([39, 64]) from checkpoint, the shape in current model is torch.Size([40, 64]). size mismatch for reasoning_decoder.bias: copying a param with shape torch.Size([39]) from checkpoint, the shape in current model is torch.Size([40]).