You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In case, we hit a failure during thunder.jit invocation, it would be great to have a debug option where thunder.jit would save the fn to jit, args, kwargs (and thunder.jit arguments) it received so that we can reproduce the failure in a smaller script for nicer debugging experience.
Motivation
With thunderFX path, we may have multiple invocations of thunder.jit with different sub-regions of the model and inputs. It may happen that one of the thunder.jit invocation may fail. In this case, it would be great if we can save some debug information so that it can be reproduced independently with a smaller script.
NOTE - This maybe helpful outside of thunderFX path as well where there maybe a lot of boilerplate training code around thunder.jit and enabling this option will dump a smaller repro if thunder.jit invocation fails.
Alternatives
Manually insert points where the required details are captured (but this requires some knowledge of the codebase).
Really like the idea of better debug option.
Ideally, we would consolidate various debug options, e.g. record_history option which does not seem to be the most used (useful?) debug option.
My idea would be to have a single debug argument and a very discoverable way.
One (but certainly not the only):
maybe a DebugFlag class thing similar to the ProxyTags where thunder components and extensions can register their own debug flags,
a single argument debug: bool | set[DebugFlags]=False to the jit, with True translating to all registered debug flags being turned on.
My idea would be to have a single debug argument and a very discoverable way.
One (but certainly not the only):
maybe a DebugFlag class thing similar to the ProxyTags where thunder components and extensions can register their own debug flags,
a single argument debug: bool | set[DebugFlags]=False to the jit, with True translating to all registered debug flags being turned on.
Sounds good. It seems similar to gc.set_debug which takes flags/constants. This is good as users would likely be familiar with such API.
🚀 Feature
In case, we hit a failure during
thunder.jit
invocation, it would be great to have a debug option where thunder.jit would save thefn
to jit,args
,kwargs
(andthunder.jit
arguments) it received so that we can reproduce the failure in a smaller script for nicer debugging experience.Motivation
With thunderFX path, we may have multiple invocations of
thunder.jit
with different sub-regions of the model and inputs. It may happen that one of thethunder.jit
invocation may fail. In this case, it would be great if we can save some debug information so that it can be reproduced independently with a smaller script.NOTE - This maybe helpful outside of
thunderFX
path as well where there maybe a lot of boilerplate training code aroundthunder.jit
and enabling this option will dump a smaller repro ifthunder.jit
invocation fails.Alternatives
Manually insert points where the required details are captured (but this requires some knowledge of the codebase).
Additional context
Related - #270, #387
cc: @riccardofelluga (who introduced similar feature for nvFuser region debugging #387) for ideas and suggestions.
cc @carmocca @apaz-cli
The text was updated successfully, but these errors were encountered: