-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Simplify fine grain locking to lock before job execution #16657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -237,19 +237,26 @@ fn compile<'gctx>( | |
| work.then(link_targets(build_runner, unit, true)?) | ||
| }); | ||
|
|
||
| // If -Zfine-grain-locking is enabled, we wrap the job with an upgrade to exclusive | ||
| // If -Zfine-grain-locking is enabled, we take an exclusive | ||
| // lock before starting, then downgrade to a shared lock after the job is finished. | ||
| if build_runner.bcx.gctx.cli_unstable().fine_grain_locking && job.freshness().is_dirty() | ||
| { | ||
| if let Some(lock) = lock { | ||
| // Here we unlock the current shared lock to avoid deadlocking with other cargo | ||
| // processes. Then we configure our compile job to take an exclusive lock | ||
| // before starting. Once we are done compiling (including both rmeta and rlib) | ||
| // we downgrade to a shared lock to allow other cargo's to read the build unit. | ||
| // We will hold this shared lock for the remainder of compilation to prevent | ||
| // other cargo from re-compiling while we are still using the unit. | ||
| // We take an exclusive lock before scheduling the job to keep our locking model | ||
| // simple. While this does mean we could potentially be waiting for another job | ||
| // when this could begin immediately, it reduces the risk of deadlock. | ||
| // | ||
| // We are also optimizing for avoiding rust-analyzer's `cargo check` preventing | ||
| // the user from running `cargo build` while developing. Generally, only a few | ||
| // workspace units will be changing in each build and the units will not be | ||
| // shared between `build` and `check` allowing them to run in parallel. | ||
| // | ||
| // Also note that we unlock before taking the exclusive lock as not all | ||
| // platforms support lock upgrading. This is safe as we hold the acquisition | ||
| // lock so we should be the only one operating on the locks so we can assume | ||
| // that the lock will not be stolen by another instance. | ||
| build_runner.lock_manager.unlock(&lock)?; | ||
| job.before(prebuild_lock_exclusive(lock.clone())); | ||
| build_runner.lock_manager.lock(&lock)?; | ||
|
Comment on lines
+254
to
+259
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Re: #16657 (comment) I found the issue on Windows. There is different behavior between Linux and Windows when taking a lock on an EX file that already locked with a SH lock. Linux will allow the lock upgrade while Windows will block. I re-added the |
||
| job.after(downgrade_lock_to_shared(lock)); | ||
|
Comment on lines
+245
to
260
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The deadlock concern is if you have two
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if we had another profile level lock that we took (exclusively) during the fingerprint checking and while we took the unit level locks. Then dropped that lock before executing compiler jobs. That would ensure that only a single Cargo instance is in that critical section where dead locking is possible. And like you mention this part of cargo is generally quite fast so I think this should be okay. If we were to do this, I suppose we could probably reuse the existing What do you think?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that should be safe. The one downside I can think of is if we block when getting a unit lock, we now block all other
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I implemented this in 8d87975 I initially tried to use |
||
| } | ||
| } | ||
|
|
@@ -615,13 +622,6 @@ fn verbose_if_simple_exit_code(err: Error) -> Error { | |
| } | ||
| } | ||
|
|
||
| fn prebuild_lock_exclusive(lock: LockKey) -> Work { | ||
| Work::new(move |state| { | ||
| state.lock_exclusive(&lock)?; | ||
| Ok(()) | ||
| }) | ||
| } | ||
|
|
||
| fn downgrade_lock_to_shared(lock: LockKey) -> Work { | ||
| Work::new(move |state| { | ||
| state.downgrade_to_shared(&lock)?; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry about the maintainability of having side band communication going on.
Is it possible for us to manage this within the existing flow? For example, could we use
PollonJob::runto return early if we couldn't acquire the lock?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I think it might be tricky to use
Pollwith the generic structure ofJob. (like how to resume the job since internally its a closure)Though I am happy to look into that to see what is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, if we know what jobs will actually build a prior (see #16659 (comment)), then why can't we do all of the lock processing before the build, grabbing our exclusive locks and blocking then, rather than waiting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thats an interesting idea.
So basically the
DrainStatecould try to lock before callingDrainState::run(running the job)This is single threaded so it will change the way we take the locks.
I am imagining spawning a thread to take each lock as to avoid blocking the main job queue loop.
I think this could be encapsulated in the
LockManager. (and usetry_lock()as an optimization to avoid spawning a new thread)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm saying that we acquire the locks in
compileand downgrade after build.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats
super::compilewhich currently coordinates locks.How much does grabbing as we go benefit in practice when the build we block on would hold a shared lock until its done, blocking our job and all dependents anyways? We can build some jobs that don't overlap, so a little?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit we get is when there are build units that do not overlap.
From my perspective, this is that main benefit of fine grain locking over what we currently have.
If we were to wait for everything, I feel like the benefits would be limited to a few more niche scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal we are working towards is non-blocking. If there is overlap, then we are blocking.
So the question then is does the benefit from some overlap justify a more complex architecture (both for this and blocking messages).
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some thinking on this and I think we might be able to get away with simply locking before kicking off jobs.
The primary use case for fine grain locking is to avoid rust-analyzer's
cargo checkfrom blocking the userscargo buildwhich slows down iteration during development.If the user is making small incremental changes to their project, this means full rebuilds should be uncommon. So we will likely take shared locks to all of the dependencies, so we don't need an exclusive lock for those. We just need to take an exclusive lock on the things that changed(dirty units). Build scripts are still a problem with this but I think those will not be dirty unless the user is editing the build script itself.
I think the big question if this is worth the extra complexity of suspending jobs is if we care about build script units (or any other units that are shared between
buildandcheck) being blocking.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the changes I mentioned above and I believe they are working correctly. I was able to run a
buildandcheckin parallel, as well as couldn't make it deadlock. (though much more testing is needed before stabilizing of course)Thanks for the reviews and helping guide me in the direction!
I think this approach is much better in term of simplicity while still accomplishing the goals for fine grain locking
I've updated the the PR description and title to better reflect the changes (and minimized the original PR description)