Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in Gatling Mill build clean compile performance due to worker closing #3920

Open
lihaoyi opened this issue Nov 7, 2024 · 10 comments

Comments

@lihaoyi
Copy link
Member

lihaoyi commented Nov 7, 2024

In Mill 0.11.12, ./mill clean && time ./mill -j1 __.compile on the example/thirdparty/gatling/ build used to start at about 30s, and when run repeatedly it would drop to 10-15s as the JVM warmed up. In Mill 0.12.0 and after, it seems to stick around 20-30s, which is a significant slowdown. We should investigate and figure out why.

At a cursory investigation, the total number of source files reported to the logs seems identical between the two versions, and the Netty Mill build does not seem to suffer from this slowdown (when I re-ran the benchmarks in #3918). --ticker false to disable the new PromptLogger doesn't help

@lihaoyi
Copy link
Member Author

lihaoyi commented Nov 7, 2024

Seems the regression happens somewhere between 0.12.0-RC1 and 0.12.0-RC2

@lefou
Copy link
Member

lefou commented Nov 7, 2024

It could be the mill clean command, which now also cleans workers, hence any hot zinc compiler instance is dropped.

@lefou
Copy link
Member

lefou commented Nov 7, 2024

Maybe, we should not clean workers by default. We could introduce an option or a dedicated cleanWorker command.

@lihaoyi
Copy link
Member Author

lihaoyi commented Nov 7, 2024

@lefou good catch, yes that probably explains the behavior I'm seeing. The Java compiler is probably fast enough it doesn't matter for Netty, and for day-to-day work we don't clean everything that often so it doesn't impact much, so it only really slows down my benchmarks. The other alternative is to clean but preserve workers. Might need to think about this to decide what the proper thing to do is; if it's just a benchmarking problem we might introduce a cleanNonWorkers instead

@lefou
Copy link
Member

lefou commented Nov 7, 2024

I think cleanup worker isn't what users want. It's most likely a seldomly used method to free up memory, close ports or deal with some poorly implemented worker. But most of the time, when using clean, we don't want to clean workers. Esp. since shutting down the Mill server already has a reset-worker behavior. I vote to disable worker cleanup in clean and add a dedicated cleanWorker command. Alternatively, we could add an option clean --worker=yes or something like that.

@lihaoyi
Copy link
Member Author

lihaoyi commented Nov 7, 2024

One issue about not cleaning workers I thought about: what if the workers depend on upstream tasks output files? Won't the references to the upstream task outputs be invalidated? I thought such a failure mode would cause issues with any workers that load upstream classpath files into a classloader, so I'm not sure why that hasn't caused any problems in the past when we used to clean the workers. Maybe when the task graph is evaluated, any missing files end up getting re-generated anyway before the worker is used?

@lefou
Copy link
Member

lefou commented Nov 7, 2024

Yeah, Worker were always rebuild when their inputs changed. But since a clean job typically does not result in changed input hashes as workers typically only depend on some rather stable versions or tooling classpaths. Using clean has typically the purpose to prune the Mill caches and chances are high that workers won't re-triggered after a clean. Instead, workers might re-trigger after a version bump, even without a clean.

@lihaoyi lihaoyi changed the title Regression in Gatling Mill build clean compile performance Regression in Gatling Mill build clean compile performance due to worker closing Nov 8, 2024
@lihaoyi
Copy link
Member Author

lihaoyi commented Nov 8, 2024

From #3276, it seems like the issue is we were wiping the worker out dir, because the naive clean does not discriminate for who it is deleting.

A possible workaround they preserves the workers would be to make clean resolve each tasks metadata json before deleting it, preserving the dest folder if it is a worker. That adds considerable complexity to clean which currently has a shortcut when run without args, but maybe could be worth it

Another approach would be to provide another workflow to run my benchmarks. I think most users out there don't mind workers being wiped as part of clean, and as discussed in the original ticket it kind of does make sense to delete workers and their in memory state.

Probably need to think about this a bit, since it probably doesnt impact users we can take our time

@roman-mibex-2
Copy link

Yes, the origin of the worker stopping was that working directory of a worker also got wiped.
So, if that worker relied on on the working directory, then a ./mill clean brought the build into a unstable and inconsistent state.

@lefou
Copy link
Member

lefou commented Nov 8, 2024

Another approach would be to provide another workflow to run my benchmarks.

For the benchmark, you could try to run clean with a narrower selector like gatlin.__, which should exclude workers living in external modules.

Additionally, we could try to improve out type selector support, which currently only works for modules. If we could request something in the spitit of clean __:^Worker or clean __:^WorkerType (where WorkerType is the T in Worker[T] or Task[T]), it would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants