-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential for Rstudio (and other Git-polling apps) to hang when overwriting model #252
Comments
I like this idea. We've had people mention this before too. Questions:
|
Ah, that's good to note. I didn't realize that there had been other reports of hanging aside from the latest one that triggered this.
That's the current state. Lines 969 to 975 in 8826ea7
Right, "that" is a fast operation that The main thing I was worried about is the performance degrading to Here are some things that would need to be done to go this direction:
|
great, thanks for doing all that Kyle. Let's keep this on the back burner for the moment, but once we get the test refactor released and #220 cleaned up and merged then this might rise to the top of the list. |
[ I'm submitting this to capture internal discussion for posterity and to see whether others think this kludge is worth adding. I'm leaning towards "no". ]
problem
@dpastoor figured out that a report of Rstudio server hanging was due to Rstudio Git-functionality trying to keep up with changes to model outputs. Normally this is prevented by
bbi
placing a temporary.gitignore
with a catch-all*
, which is later replaced by more tailored contents. When models are submitted withoverwrite = TRUE
, however, output files may already be tracked. In this case, the catch-all gitignore pattern has no effect on the already tracked files, and Rstudio ends up trying to keep up with the changes.workaround
The main workaround I can think of is to remove the corresponding index entries before the working tree directory is cleared. Something like this:
patch
(note: a real patch should probably at least propagate the command failure as a warning)
That's safe in the sense that it doesn't touch the working tree. The main failure cases I can think of:
existing index lock if this code path has potential to be executed concurrently across different models
"git" executable not found on
PATH
the output directory isn't a subdirectory of a Git repo
None of those are destructive, and on failure the behavior would be what it currently is.
performance penalty
As mentioned by @dpastoor, one concern about the above approach is the performance hit. Here are some timings from a large tree.
At the extreme end, we could have the cached ctimes in the index be stale for all entries in the index, leading to Git needing to re-hash all tracked objects in the working tree. This results in a very slow
git status
, but thegit rm --cached
time for the subdirectory doesn't take that big of a hit.git-rm with stale index ctimes
Based on the above case, it seems like the performance of
git rm --cached
probably won't degrade horribly in a repository with a large tree.On the other hand, the
git rm --cached
overhead would be on top of the (equivalent of)rm -r $model_dir
that's currently done, andgit rm --cached
takes a decent chunk of time relative to that.The text was updated successfully, but these errors were encountered: