-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix randomness for threading #7925
base: dev
Are you sure you want to change the base?
Conversation
/build |
monai/transforms/compose.py
Outdated
if isinstance(_transform, ThreadUnsafe): | ||
if isinstance(_transform, Randomizable): | ||
# update the random state before deepcopy, otherwise there is no randomness | ||
_transform.randomize(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can definitely update the random state here, but I guess the issue here is that if the transform is thread unsafe, we can't guarantee that the same transform will be performed on all keys, which may cause problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of my understanding, the state is frozen for a single thread after the subsequent deepcopy of the Transform. Since all keys are processed by this copied Transform, a consistent state is guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I realized that .randomize()
is not necessarily updating the random state self.R
(cf. monai.transforms.transform.RandomizableTranform
)
Therefore the correct way here would be to call the _transform.set_random_state()
which is implemented in the Randomizable
base class und updates self.R
MONAI/monai/transforms/transform.py
Line 188 in 59a7211
def set_random_state(self, seed: int | None = None, state: np.random.RandomState | None = None) -> Randomizable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KumoLiu are there more transforms which inherit directly from ThreadUnsafe
? I can only find Randomizable
in the monai codebase, which would be covered here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, only Randomizable
but all random transform inherit from RandomizableTransform
. I'm not sure whether this change can also works well with invert. May also need to check that.
I'd like @atbenmurray to have a chance to review this before approving please. |
Hi folks. Thanks @marcus-wirtz-snkeos for taking the time to raise the issue and PR. I need to take a careful look at this fix. From a design standpoint, we are very much focused on an "as if the pytorch team wrote it" design philosophy and I need to destruct test the change from this standpoint. |
Signed-off-by: marcus.wirtz <marcus.wirtz@snkeos.com>
Thanks everyone for the amazing work on Monai. In my opinion this should be forbidden by default and throw an error that needs to be disabled with a flag to prevent users from accidentally stumbling on this. The problem with the proposed solution is that there would be no reproducibility since a new randomstate is used every time. That is fine in my opinion if users have to use a flag to manually enable this behavior and will turn off threading when they need reproducibility. But in the future, the whole random generation of Monai needs a refactor that solves the problem of multi-threading and randomness (see #7582 ) . |
Thanks for bringing this up @johnzielke. I'll take a look at these items also. |
@johnzielke thanks for the feedback, fully agreeing. This fix can only be a temporary one, since the earlier introduced deepcopy() is problematic per se. I verified with local batch generation that there is no randomness for the Originally I tried to use |
Should work now, the issue was in some of my custom Transforms indeed not implementing |
@atbenmurray @ericspod what is the state of this PR? |
@lukas-folle-snkeos, I'm refamiliarizing myself with it. Ideally, we'd like to do more to improve the randomness for threading, but if this change isn't breaking any scenarios, then we can go ahead with it and think about that subsequently. |
I don't think anyone relies on the current non-randomization behavior. I think there is an issue with the proposed approach, which I think are both part of randomize() not being the "correct" function in this case
This makes sure that each iteration uses a different randomstate. You do not have reproducibility though, since the inidividual threads might not be calling this in a reproducible order |
If I understand the rationale correctly. I think that calling randomize on the shared transform is the point of this modification. Now, this can absolutely cause race conditions. One source of race conditions is mitigated by the fact that This can be fixed by locking the section that calls randomize and deepcopies the transform. I made a suggested code change in the review. |
I think it comes down to one of three choices:
I'm not a huge fan of 1, as it is relying on side-effects of the way transforms are implemented. That said, I'm also not a huge fan of 2, because we are overwriting the random state that has been set by the user with new random state instances. This means that if the user provides a RandomState-like mock for testing purposes, for example, it is defeated by our replacing their random_state like object with an actual RandomState. I've had this problem myself before. It is definitely true that reproducibility is compromised by threading / multiprocessing of randomized augmentation pipelines. Given that the mutation of a shared RandomState across threads already makes every run not-reproducible from an augmentation perspective, maybe it doesn't matter whether we replace random states but then, as I mentioned, there are other reasons to not do it this way? I think a gold standard solution would involve random states / seeds being assigned up front at the point that the pipeline is instantiated and the number of threads set. WDYT? |
monai/transforms/compose.py
Outdated
if isinstance(_transform, ThreadUnsafe): | ||
if isinstance(_transform, Randomizable): | ||
# update the random state before deepcopy, otherwise there is no randomness | ||
_transform.set_random_state() | ||
_transform = deepcopy(_transform) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this needs a lock
Thanks for your analysis @atbenmurray, and I agree that a lock would be a good idea here. I think the source of race conditions when using randomize on the "main-thread" instance mainly results from the fact that transforms sometimes rely on instance attributes in the randomize() method. This also means that if expensive calculations are performed in this step (i.e. calculating some information from the input), they would be single-threaded here. Regardless of the option, I think we should add a single-time warning explaining whatever caveats the solution has. |
Co-authored-by: Ben Murray <ben.murray@gmail.com> Signed-off-by: Marcus Wirtz <24655255+marcus-wirtz-snkeos@users.noreply.github.com>
@@ -106,8 +106,13 @@ def execute_compose( | |||
return data | |||
|
|||
for _transform in transforms[start:end]: | |||
if threading: | |||
_transform = deepcopy(_transform) if isinstance(_transform, ThreadUnsafe) else _transform | |||
with lock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll still need to create the lock object somewhere and get it to this function. Note that the lock must be created somewhere where only a single thread of execution is occurring
Description
Fixes #7922 by updating the random state of the Randomizable transform BEFORE copying the transforms. In the current implementation
self.randomizable()
is only called within the__call__()
function and thus only updated inside the copy.Types of changes
./runtests.sh -f -u --net --coverage
../runtests.sh --quick --unittests --disttests
.make html
command in thedocs/
folder.