Skip to content

electron-builder: Fix race condition when preparing to copy binaries #1137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 19, 2024

Conversation

DeeDeeG
Copy link
Member

@DeeDeeG DeeDeeG commented Nov 19, 2024

Problem

There was a race condition where two separate pieces of async code would A) create the 'binaries' folder, and B) copy binaries to that folder. The problem was the lack of a guarantee they would happen in that order! If the attempt to copy files happened before the directory got created, the whole electron-builder run would error and/or stall out from unsuccessful auto-retries failing.

Stakes

This issue blocks the x64 macOS CI "build Pulsar binaries" job from producing any binaries.

(Optional commentary: For unknown reasons, macOS CI and my local machine seem to be adversely affected by the race condition, whereas at least CI Linux and Windows x64, and possibly both Cirrus (ARM) CI tasks, do not appear affected, again for unknown reasons. But as race conditions are unpredictable and might depend on the minutiae of the system the code runs on, this arguably isn't all that surprising...)

Solution

To fix the situation:

Ensure the 'binaries' dir exists before copying binaries to it.

Do so by making the directory first with a synchronous function.

(I might usually try to eliminate race conditions while still using asynchronous code, but for simple scripts, I prefer to use synchronous code, for simplicity's sake and for reliability.)

In other words: Can't copy files to a destination dir that doesn't exist! This makes sure the directory exists first, then copy.

Also: Log any error we hit while making the dir, but try to continue. (For anyone who has successfully run electorn-builder once in the repo, they already have a 'binaries' folder, so we can ignore any error when trying to "make the dir" that already exists. If it doesn't exist, the copy operation will print another relevant error message. So we don't need to error early here. But logging it should help if this issue needs any troubleshooting on anyone's machines (including CI) any time in the future.)

Verification Process

Tested extensively on my local machine (macOS 10.15.7). I would like to see a passing "build Pulsar binaries" CI run as well.

Looked up some info to refresh my memory on promise/async execution and the Node fs and fs/promises APIs. Decided to lean into synchronous code for this, but the code seems logically like it should reliably work now, in the intended order. At least to me, review welcome.

Long writeup, simple and sweet diff!

Review would be much appreciated!

(Sidebar: This is another one of those PRs with a simple fix that I wanted to write up in detail so we can understand the issue quickly if it comes up again. But also because I like going a bit too in-depth fixing and writing up these things, makes me feel like a detective. It's therapeutic. Anyhow, sorry for the delay in getting this out. But I think it's rock solid now! Hopefully!)

There was a race condition where two separate pieces of async code
would A) create the 'binaries' folder, and B) copy binaries to that
folder. The problem was the lack of a guarantee they would happen
in that order! If the attempt to copy files happened before
the directory got created, the whole electron-builder run would
error and/or stall out from unsuccessful auto-retries failing.

To fix the situation:

Ensure the 'binaries' dir exists *before* copying binaries to it.

Do so by making the directory *first* with a *synchronous* function.

(I might usually try to eliminate race conditions while still using
asynchronous code, but for simple scripts, I prefer to use
synchronous code, for simplicity's sake and for reliability.)

In other words: Can't copy files to a destination dir that
doesn't exist! This makes sure the directory exists *first*, then copy.

Also: Log any error we hit while making the dir, but try to continue.
(For anyone who has successfully run electorn-builder once in the repo,
they already have a 'binaries' folder, so we can ignore any error when
trying to "make the dir" that already exists. If it doesn't exist, the
copy operation will print another relevant error message. So we don't
need to error early here. But logging it should help if this issue
needs any troubleshooting on anyone's machines (including CI) any time
in the future.)
Comment on lines +293 to +299
try {
mkdirSync('binaries', {recursive: true})
} catch (err) {
console.warn("Warning: error encountered when making the 'binaries' dir.")
console.warn("(HINT: If the 'binaries' folder already exists, then this error message is probably fine to ignore!)")
console.warn(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the try/catch, could we use existsSync to help us decide whether the binaries directory needs to be created in the first place?

Copy link
Member Author

@DeeDeeG DeeDeeG Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mkdirSync is already set to recursive: true which doesn't complain if the dir already exists.

I'm anticipating some other exotic error condition might be possible, at which point I'd want to catch it, log it, and keep going.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha. In that case it strikes me as a little bit belt-and-suspenders, but I suppose there's nothing wrong with that.

Copy link
Member Author

@DeeDeeG DeeDeeG Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I wasn't sure if the try/catch is overkill. Optimistically, it shouldn't be needed. But pessimistically, maybe it might?

(And try/catch makes the code more verbose to read or skim, but IMO its "performance cost" is a non-issue for how infrequent a code path this is to hit. Compiling code is many orders of magnitude slower than running a try/catch.)

I think I was motivated to do it this way because the output can be extremely vague if not explicitly logging things. For example the output for this very PR:

[ . . . ]
  • building        target=DMG arch=x64 file=dist/Pulsar-1.122.2024111920.dmg
  • building block map  blockMapFile=dist/Pulsar-1.122.2024111920-mac.zip.blockmap
  • Above command failed, retrying 5 more times
  • Above command failed, retrying 4 more times
  • Above command failed, retrying 3 more times
Error: Timeout of 2700000ms hit

https://github.com/pulsar-edit/pulsar/actions/runs/11921198561/job/33224800808?pr=1137#step:18:43

Hard to guess what is going wrong here. (I did confirm in local testing that console.log(), console.warn(), and console.error() will print what you try and print.)

@DeeDeeG
Copy link
Member Author

DeeDeeG commented Nov 19, 2024

Dang... while this solves an issue for me locally, it's not solving the particular issue for macOS CI.

Still a bit of a mystery to me.

@DeeDeeG
Copy link
Member Author

DeeDeeG commented Nov 19, 2024

Thanks for the review. Will merge soon.

I'm investigating the CI failure still... I know the blockmaps are made with 7-zip, at least on macOS x64. (7za). Not sure what else to look into there. Maybe an electron-builder update addresses this?

@DeeDeeG DeeDeeG merged commit 39a4a4c into master Nov 19, 2024
101 of 103 checks passed
@savetheclocktower savetheclocktower deleted the electron-builder-fix-race-condition-copying-bins branch November 20, 2024 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants