Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] when parameters has no grad or ScalingParameter has no is_meta property it will crash #135

Merged
merged 7 commits into from
Nov 30, 2023

Conversation

tocean
Copy link
Contributor

@tocean tocean commented Nov 24, 2023

Description
Fix some bugs reported by 1P customers:

  • When some parameters have no gradient, _get_gradients_for_reduction will create a torch.cuda.ByteTensor for gradient, which is not supported in split_half_float_double_sparse.
  • In pytorch 2.1, it will call param.is_meta, which does not exist in ScalingParameter and will crash.

Copy link
Contributor

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs a unittest. Thanks!

@tocean tocean changed the title [Bugfix] fix bug when some parameters have no grad in DeepSpeed [Bugfix] fix bug in DeepSpeed when parameters has no grad or parameter group has no high precision parameters Nov 27, 2023
@tocean tocean changed the title [Bugfix] fix bug in DeepSpeed when parameters has no grad or parameter group has no high precision parameters [Bugfix] parameters has no grad or parameter group has no high precision parameters will crash Nov 27, 2023
@tocean
Copy link
Contributor Author

tocean commented Nov 29, 2023

It needs a unittest. Thanks!

Have updated related UT to cover this corner case. Thanks.

@tocean tocean requested a review from guoshzhao November 29, 2023 12:13
Copy link
Contributor

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@tocean tocean changed the title [Bugfix] parameters has no grad or parameter group has no high precision parameters will crash [Bugfix] parameters has no grad or ScalingParameter has no is_metawill crash Nov 30, 2023
@tocean tocean changed the title [Bugfix] parameters has no grad or ScalingParameter has no is_metawill crash [Bugfix] when parameters has no grad or ScalingParameter has no is_meta property it will crash Nov 30, 2023
@tocean tocean enabled auto-merge (squash) November 30, 2023 03:58
@tocean tocean merged commit aef18eb into main Nov 30, 2023
9 checks passed
@tocean tocean deleted the yuxiang/ds_bugfix branch November 30, 2023 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants