-
Notifications
You must be signed in to change notification settings - Fork 564
[Refactor] Refactor model nparams calculation into single static function #1767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
from torchtitan.tools.logging import logger | ||
|
||
|
||
def get_dense_model_nparams_and_flops( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this function for dense can be entirely covered by the MoE one. Should we keep it for readability? I'm OK with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made separate functions because in moe model nparams function, there are several places accessing model_args.moe_args
. For our dense model definition, we don't have this field. So if we want to merge them, we need to add several checks like if hasattr(model_args, "xxx")
. Technically we can merge them, but for readability I make the separate for now
# If weight tying is enabled, subtract embedding parameters from total count | ||
if hasattr(model_args, "enable_weight_tying") and model_args.enable_weight_tying: | ||
nparams = nparams - nparams_embedding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May not be needed by real models in torchtitan, but I feel this should be added to the dense part as well, because later on new dense + weight tying model could be added so if they reuse the dense fn it's still correct. An example would be Llama 3.2 1B / 3B.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one minor thing is that we seem to have too many utils.py, lol. In the future we may need to revisit some of the utils.py, otherwise those are quickly becoming a dump ground.
As titled, remove repeated copy