Skip to content

Conversation

@NirSonnenschein
Copy link
Contributor

Add a pytest configuration to test existing tests
using torch.compile. This should increase the
torch.compile test coverage. added to existing
tests in-place to avoid code duplication.

@NirSonnenschein
Copy link
Contributor Author

hi @stas00,
question:
I see your commit was added to this PR, but I'm not sure it is related to the PR.
is this intentional? if so why add it to this PR?

@stas00
Copy link
Collaborator

stas00 commented Oct 27, 2025

From the logs you can see yourself as a co-committer, which means you pushed it into this branch. If I were to do it you won't have been a co-committer - Have you by chance cherry picked some commit unintentionally or perhaps started from non-master branch?

@tohtana
Copy link
Contributor

tohtana commented Oct 27, 2025

Hi @NirSonnenschein, everyone,

This is a great improvement, but I just wonder if this significantly increases the time to run tests. Most of the test should be independent from whether we compile or not. At least, I want to keep an option not to run the compiled version.

How about adding an option to globally enable the compile tests with pytest_generate_tests, for example?

We could have something like this in conftest.py.

def pytest_addoption(parser):
    parser.addoption(
        "--enable-compile-mode", action="store_true",
        help="Run both compiled/non-compiled versions"
    )

def pytest_generate_tests(metafunc):
    if "compile_mode" in metafunc.fixturenames:
        both = metafunc.config.getoption("--enable-compile-mode")
        params = [False, True] if both else [False]
        metafunc.parametrize("compile_mode", params)

@sfc-gh-truwase, @loadams, @stas00 What are your thoughts?

@stas00
Copy link
Collaborator

stas00 commented Oct 27, 2025

I agree with Masahiro, the normal CI should be fast both because of the $$ to run and the waiting time to merge. torch.compile is likely to introduce a huge proportional overhead to each test's runtime duration, i.e. much more than double for light tests.

That said having an automatic coverage for all tests to be run in both modes would be a fantastic addition.

We are discussing going back to having nightly tests which can be slow and take 5x times w/o incurring much extra $$ and nobody is waiting for tests to finish ASAP. So this would be a great feature to use there.

For example at HF Transformers there is a RUN_SLOW=1 env var that activates all tests decorated with @slow, which do not run on a normal CI, but only on the nightly one. But if we want something global then perhaps what Masahiro is proposing would work better.

While at it you probably need to instrument a way to decorate some tests as @known_not_to_work_with_torch_compile, no? Surely, not everything will always work with torch.compile

@NirSonnenschein
Copy link
Contributor Author

NirSonnenschein commented Oct 28, 2025

Hi @tohtana and @stas00,
Yes, I understand that test running time increase may be a concern. This was my main worry before raising the PR about whether it could be accepted. Adding another binary flag will double the total number of configurations for a test if all configurations are run (actual runtime may scale differently depending on the actual runtime of the torch.compile configurations).
this PR isn't intended to add torch.compile support to all tests, just a relevant sub-set. I will check if there is a way to make these additional test configurations switchable based on the suggestions above.

@stas00 regarding the commit thanks, there may have been an issue on my side. I rebased to fix a DCO issue, and it may have added your commit unintentionally. fixed it.

Add a pytest configuration to test existing tests
using torch.compile. This should increase the
torch.compile test coverage. added to existing
tests in-place to avoid code duplication.

Signed-off-by: Nir Sonnenschein <nsonnenschein@habana.ai>
@NirSonnenschein NirSonnenschein force-pushed the add_torch_compile_test_configs branch from 08c8191 to afbdd6f Compare October 28, 2025 09:02
@sfc-gh-sbekman
Copy link
Contributor

sfc-gh-sbekman commented Oct 28, 2025

this PR isn't intended to add torch.compile support to all tests, just a relevant sub-set.

should we not eventually want to have all features run under torch.compile?

Adding another binary flag will double the total number of configurations for a test if all configurations are run

but that's what we want, no? Eventually we want to run each test with torch.compile and without. Perhaps once torch.compile works everywhere - we in theory could just always enable it - though because of the significant initial overhead I'm predicting it'll never be used on normal CI, other than perhaps on a few select tests (really 1-2). That's why we would have nightly CI jobs where multiple matrices could be run and it's not a problem if it takes a few hours to finish.

To clarify - your PR is very welcome, we just want this feature to be enabled when desired and not be on by default.

@stas00 regarding the commit thanks, there may have been an issue on my side.

all is good ;)

@sfc-gh-sbekman
Copy link
Contributor

could torch.compile somehow support caching, like triton's autotune does? perhaps if the result could be cached and there is no additional overhead unless the test's code has changed the compilation could be skipped? I suppose the caching aspect would be difficult because of the unknown code dependency path - you'd somehow need to cache many dependent packages...

@NirSonnenschein
Copy link
Contributor Author

thanks @sfc-gh-sbekman and @tohtana for your inputs.
I've modified the PR to make the torch.compile tests switchable via a flag (should not impact the running time if flag is not used) as @tohtana suggested.

Copy link
Collaborator

@stas00 stas00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions/suggestions:

  1. Is torch.compile still being tested by default in a few tests so that there is at least one test always running it?

  2. Is there a way to make it enabled in some other way and not via an argument to the test function? Otherwise all tests, including future ones need to have quite an irrelevant to the test argument. e.g. could have some sort of deepspeed_test.use_torch_compile = True that could be set in testconf.py or test utils? then you don't need to check if it's enabled in every test, just write a few functions:

one PoC suggestion - not attached to it:

# tests/unit/util.py
use_torch_compile = False
def enable_torch_compile():
    global use_torch_compile
    use_torch_compile = True
    
def maybe_apply_torch_compile(ds_model): 
    if use_torch_compile:
        ds_model.compile()

then in tests:

from tests.unit.util import maybe_apply_torch_compile
...
def test...()
    ...
    maybe_apply_torch_compile(ds_model)
    ...

and in testconf.py you call enable_torch_compile if the flag has been set

on the other hand if torch.compile is enabled, do we still want to test the non-enabled version? I guess we would just run the tests twice - once enabled and once not - which is what your PR looks like right now.


Actually I have a better idea than the above, why not make deepspeed_initialize run compile if it was configured to do so (and return a model that has already been compiled? Then we could turn it on/off for any test w/o changing any test at all - since now it can be just configured via a config file and perhaps backed up by an env var flag. Then using an env var it'd be super-clean and out of the way of the test writer. But let's validate with @tohtana and @tjruwase first, so not to waste your time should they disagree with my proposal.

def pytest_addoption(parser):
parser.addoption("--torch_ver", default=None, type=str)
parser.addoption("--cuda_ver", default=None, type=str)
parser.addoption("--enable-compile-mode", action="store_true", help="Run both compiled/non-compiled versions")
Copy link
Collaborator

@stas00 stas00 Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this would be less ambiguous to what it implies?

Suggested change
parser.addoption("--enable-compile-mode", action="store_true", help="Run both compiled/non-compiled versions")
parser.addoption("--enable-torch-compile", action="store_true", help="Run both compiled/non-compiled versions")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stas00 regarding your questions:

  1. There are some basic torch.compile tests which are always run (e.g. deepspeed-fork/tests/unit/runtime/compile/util.py ). this PR is a way to add more torch.compile coverage for a lot more scenarios using the same logic as the original tests.
  2. I'm sure there are ways to do this other than passing a parameter, but this is standard way to parametrize pytorch tests. it is printed clearly in the pytest logs which config is running in this way. choosing another approach may have some advantages, but could make it less clear which scenario is actually running (e.g. in your example this wouldn't explicitly be printed in the test name).
    example:
    7.84s call unit/checkpoint/test_latest_checkpoint.py::TestLatestCheckpoint::test_existing_latest[False]

I'm not opposed to a different approach, but I'm not convinced it offers an actual advantage. will wait for feedback from @tohtana and @tjruwase.

regarding the flag, I used the one suggested by @tohtana and it makes some semantic sense since this flag enables compile mode in the tests it doesn't enable torch.compile in general ( which might make it more confusing). I don't mind changing this to whatever there is a consensus on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: flag - whatever the team feels is more intuitive works for me.

the rest I took out from this thread, as it doesn't belong to this particular discussion.

make torch compile test scenarios switchable
using a pytest flag. if the flag --enable-compile-mode is not
used the additional tests will not run (saving test cyctle time)

Signed-off-by: Nir Sonnenschein <nsonnenschein@habana.ai>
@NirSonnenschein NirSonnenschein force-pushed the add_torch_compile_test_configs branch from 70f076e to 6b7b887 Compare November 3, 2025 14:18
@stas00
Copy link
Collaborator

stas00 commented Nov 3, 2025

Taking the unrelated comments out of the specific suggestion reply, since they will disappear once resolved:

  1. There are some basic torch.compile tests which are always run (e.g. deepspeed-fork/tests/unit/runtime/compile/util.py ). this PR is a way to add more torch.compile coverage for a lot more scenarios using the same logic as the original tests.

thank you for validating that.

  1. I'm sure there are ways to do this other than passing a parameter, but this is standard way to parametrize pytorch tests. it is printed clearly in the pytest logs which config is running in this way. choosing another approach may have some advantages, but could make it less clear which scenario is actually running (e.g. in your example this wouldn't explicitly be printed in the test name).
    example:
    7.84s call unit/checkpoint/test_latest_checkpoint.py::TestLatestCheckpoint::test_existing_latest[False]

I'm not opposed to a different approach, but I'm not convinced it offers an actual advantage. will wait for feedback from @tohtana and @tjruwase.

I don't see it this way. The flag you added would enable a global feature that would impact the whole test suite, therefore instead of forever being stuck with needing to enforce manual addition of the flag and needing to plaster the same code associated to it to every test, polluting the test args and confusing the observer with test_latest_checkpoint.py::TestLatestCheckpoint::test_existing_latest[False] (what does False relate it?), why not flip the behavior on deepspeed_initialize level and be done with it.

What is there to print per test if you're running the whole CI under torch compile mode? Name the job torch.compile unit tests CI and it's loud and clear.

@tohtana
Copy link
Contributor

tohtana commented Nov 3, 2025

Hi @NirSonnenschein @stas00,
I agree that we should minimize the changes in each test (basically two points: 1. test argument, 2. ds_engine.compile()).

Can we clarify the benefit of the approaches discussed?

  • Using pytest_generate_tests: it becomes easier to switch compile/no-compile globally. However, we’d still need to add a new argument to each test.
  • Running model.compile() in deepspeed.initialize(): this would simplify tests by removing the need for ds_engine.compile(), but we’d still need to add a flag instead (perhaps enabled under the compile section?) to the ds_config in each test.
    (We can do both of the above)

Is my understanding correct?

One more point: I’m not entirely sure all tests will pass once we automatically enable compilation. For example, if a test compares values against PyTorch’s normal (non-compiled) behavior, the numerical differences might increase unless we also enable PyTorch’s compile mode.

@sfc-gh-sbekman
Copy link
Contributor

sfc-gh-sbekman commented Nov 3, 2025

but we’d still need to add a flag instead (perhaps enabled under the compile section?) to the ds_config in each test.

no, we don't need to do that.

instead you set up a deepspeed config env var that will impact the default setting - no need to change any test. the env var will be added along with the new enable_torch_compile config entry I propose. The test could still override it to False, but otherwise all tests will run the default enable_torch_compile: True if --enable-torch-compile (or --enable-compile-mode) is passed to pytest.

One more point: I’m not entirely sure all tests will pass once we automatically enable compilation. For example, if a test compares values against PyTorch’s normal (non-compiled) behavior, the numerical differences might increase unless we also enable PyTorch’s compile mode.

if it's just a few tests then these tests will disable this functionality via enable_torch_compile: False in the config file they create.

To better address the important point you have just raised, Masahiro, I suppose it'd be good to understand the scope of this feature - does everything we have that is tested support .compile? We could start with globally testing this feature by writing out tests dirs that support it in the dedicated nightly CI job. i.e. pytest tests/foo tests/bar

@NirSonnenschein
Copy link
Contributor Author

@sfc-gh-sbekman: one thing that the parameter + flag model does that the global var doesn't is allow you to run both tests sequentially after the other in a single run (and all other combinations of params). this is what I was trying to illustrate with the print. if running them together it is important to know which one failed (assuming only one failed).

since torch.compile may not be a relevant use case for all tests, in order to run both the compile version and non-compile version of a test you would need to run the whole test suite twice (including all the tests which don't have a relevant compile test case)
so using the param allows two options:

  1. shorter run with no torch.compile configuraitons run (except those that always run) - same as today (with an additona 'if' in some tests)
  2. a longer run which runs both the torch.compile configrations and the non-compile configurations in a single run.

using a global flag allows you two options:

  1. run with no torch.compile configuraitons
  2. run with just the torch.compile configurations for tests that have them

if running in an upcoming "nightly" job, the global flag would need to run the test suite twice (once with the flag and once without) with potentially more redundancy vs a longer single run.

@tohtana
I think your understanding is correct with the addition of the point presented above.
I'm not suggesting that a torch.compile can or even should be added to all tests. Once an agreed upon way of doing this is established, support can be added to additional tests gradually as needed.

@stas00
Copy link
Collaborator

stas00 commented Nov 4, 2025

@NirSonnenschein, your logic is totally valid if we were to do that for a few tests - but we don't want to run the same test twice in the live CI (PRs). torch.compile is very expensive for CI - so there will be no double run of the whole test suite.

The nightly CI can just have a single job that will be dedicated to torch.compile - here we also don't need to re-run the non-torch.compile use case, since it has already been exercised in live CI (live is the CI running in PRs).

Besides the non-torch.compile tests won't fail, so True/False doesn't contribute anything. All we want is to know that torch.compile works with eventually full coverage. If torch.compile fails we already know the non-torch.compile test is fine, so I'm not sure why do you even care to re-run again the non-torch.compile version of the test if the torch.compile version fails. The test isn't the issue, the issue would be torch.compile support for what's being tested. I'm not sure if the logic I propose makes sense.

@NirSonnenschein
Copy link
Contributor Author

@stas00 thanks, I think understand your approach and I think it is probably workable.
the original suggestion was to have 2 configurations:

  1. one which runs the current test load without the added compile tests for CI (when no flag is given)
  2. a second one which runs both compile and regular flavors as part of a nightly job (with flag given, a single run for both)
    both approaches can probably give a similar result, will wait for additional inputs on the desired approach from @tohtana and @tjruwase

@sfc-gh-truwase
Copy link
Collaborator

@NirSonnenschein and @stas00 thanks for a thorough discussion of the options.

My preference is for @stas00 proposal for the cost benefits. If I understand correctly this means

  1. PR CI runs ==> non-torch compile tests.
  2. Nightly CI runs ==> torch compile tests.

@sfc-gh-sbekman
Copy link
Contributor

sfc-gh-sbekman commented Nov 5, 2025

yes, with a small addition to (1):

  1. PR CI runs ==> non-torch compile tests + a few dedicated torch compile tests (since we want to at least run 1-2 tests for that)
  2. Nightly CI runs ==> torch compile tests. (that would be one of the many CI jobs, e.g. another job would be to run slow tests with real models.)

@sfc-gh-sbekman
Copy link
Contributor

sfc-gh-sbekman commented Nov 5, 2025

@NirSonnenschein,

one which runs the current test load without the added compile tests for CI (when no flag is given)

which is totally fine in what is run, but the problem is in the long term cost to get there, since from then on we will be stuck with requiring most tests to instrument themselves with the flag and the code to run on flag, which is a boiler plate we can easily avoid. Like you will have to police contributors to add this flag to every new test being added.

@NirSonnenschein
Copy link
Contributor Author

Sorry, I have been diverted to other activities and can't advance this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants