Save debug information for a smaller reproducer if thunder.jit fails #1214

kshitij12345 · 2024-09-30T09:25:00Z

🚀 Feature

In case, we hit a failure during thunder.jit invocation, it would be great to have a debug option where thunder.jit would save the fn to jit, args, kwargs (and thunder.jit arguments) it received so that we can reproduce the failure in a smaller script for nicer debugging experience.

Motivation

With thunderFX path, we may have multiple invocations of thunder.jit with different sub-regions of the model and inputs. It may happen that one of the thunder.jit invocation may fail. In this case, it would be great if we can save some debug information so that it can be reproduced independently with a smaller script.

NOTE - This maybe helpful outside of thunderFX path as well where there maybe a lot of boilerplate training code around thunder.jit and enabling this option will dump a smaller repro if thunder.jit invocation fails.

Alternatives

Manually insert points where the required details are captured (but this requires some knowledge of the codebase).

Additional context

Related - #270, #387

cc: @riccardofelluga (who introduced similar feature for nvFuser region debugging #387) for ideas and suggestions.

cc @carmocca @apaz-cli

The text was updated successfully, but these errors were encountered:

t-vi · 2024-09-30T13:55:09Z

Really like the idea of better debug option.
Ideally, we would consolidate various debug options, e.g. record_history option which does not seem to be the most used (useful?) debug option.
My idea would be to have a single debug argument and a very discoverable way.
One (but certainly not the only):

maybe a DebugFlag class thing similar to the ProxyTags where thunder components and extensions can register their own debug flags,
a single argument debug: bool | set[DebugFlags]=False to the jit, with True translating to all registered debug flags being turned on.

WDYT?

mruberry · 2024-09-30T15:30:27Z

fyi @tfogal, who's been thinking about improved reproduction tooling, too

kshitij12345 · 2024-10-01T09:50:20Z

My idea would be to have a single debug argument and a very discoverable way.
One (but certainly not the only):

maybe a DebugFlag class thing similar to the ProxyTags where thunder components and extensions can register their own debug flags,
a single argument debug: bool | set[DebugFlags]=False to the jit, with True translating to all registered debug flags being turned on.

Sounds good. It seems similar to gc.set_debug which takes flags/constants. This is good as users would likely be familiar with such API.

kshitij12345 added enhancement New feature or request debugging labels Sep 30, 2024

tfogal self-assigned this Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save debug information for a smaller reproducer if thunder.jit fails #1214

Save debug information for a smaller reproducer if thunder.jit fails #1214

kshitij12345 commented Sep 30, 2024 •

edited by github-actions bot

Loading

t-vi commented Sep 30, 2024

mruberry commented Sep 30, 2024

kshitij12345 commented Oct 1, 2024

Save debug information for a smaller reproducer if thunder.jit fails #1214

Save debug information for a smaller reproducer if thunder.jit fails #1214

Comments

kshitij12345 commented Sep 30, 2024 • edited by github-actions bot Loading

🚀 Feature

Motivation

Alternatives

Additional context

t-vi commented Sep 30, 2024

mruberry commented Sep 30, 2024

kshitij12345 commented Oct 1, 2024

kshitij12345 commented Sep 30, 2024 •

edited by github-actions bot

Loading