Potential for Rstudio (and other Git-polling apps) to hang when overwriting model #252

kyleam · 2021-10-06T22:14:15Z

[ I'm submitting this to capture internal discussion for posterity and to see whether others think this kludge is worth adding. I'm leaning towards "no". ]

problem

@dpastoor figured out that a report of Rstudio server hanging was due to Rstudio Git-functionality trying to keep up with changes to model outputs. Normally this is prevented by bbi placing a temporary .gitignore with a catch-all *, which is later replaced by more tailored contents. When models are submitted with overwrite = TRUE, however, output files may already be tracked. In this case, the catch-all gitignore pattern has no effect on the already tracked files, and Rstudio ends up trying to keep up with the changes.

workaround

The main workaround I can think of is to remove the corresponding index entries before the working tree directory is cleared. Something like this:

patch

(note: a real patch should probably at least propagate the command failure as a warning)

diff --git a/cmd/nonmem.go b/cmd/nonmem.go
index aa045ab..051413a 100644
--- a/cmd/nonmem.go
+++ b/cmd/nonmem.go
@@ -20,6 +20,7 @@ import (
    "fmt"
    "io/ioutil"
    "os"
+	"os/exec"
    "path"
    "path/filepath"
    "regexp"
@@ -936,6 +937,17 @@ func createChildDirectories(l *NonMemModel, sge bool) error {
    if ok, _ := afero.DirExists(fs, l.OutputDir); ok {
        // If so are we configured to overwrite?
        if l.Configuration.Overwrite {
+			if viper.GetBool("git") {
+				log.Debugf("%s Removing any entries in Git index for %s",
+					l.LogIdentifier(), l.OutputDir)
+				command := exec.Command(
+					"git", "rm",
+					"--cached", "-r", "-q", "--ignore-unmatch", ".")
+				command.Dir = l.OutputDir
+				command.Env = os.Environ()
+				command.Run()
+			}
+
            log.Debugf("%s Removing directory %s", l.LogIdentifier(), l.OutputDir)
            err := fs.RemoveAll(l.OutputDir)
            if err != nil {

That's safe in the sense that it doesn't touch the working tree. The main failure cases I can think of:

existing index lock if this code path has potential to be executed concurrently across different models
"git" executable not found on PATH
the output directory isn't a subdirectory of a Git repo

None of those are destructive, and on failure the behavior would be what it currently is.

performance penalty

As mentioned by @dpastoor, one concern about the above approach is the performance hit. Here are some timings from a large tree.

$ git ls-files | wc -l
105906

At the extreme end, we could have the cached ctimes in the index be stale for all entries in the index, leading to Git needing to re-hash all tracked objects in the working tree. This results in a very slow git status, but the git rm --cached time for the subdirectory doesn't take that big of a hit.

git-rm with stale index ctimes

$ git ls-files | xargs touch

$ /usr/bin/time sh -c 'git rm --cached -rq model/pk/boot/111'
0.14user 0.00system 0:00.16elapsed 94%CPU (0avgtext+0avgdata 28584maxresident)k
8inputs+19880outputs (0major+4440minor)pagefaults 0swaps

$ /usr/bin/time sh -c 'git status'
On branch scenario-03
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        deleted:    model/pk/boot/111/.gitignore
        deleted:    model/pk/boot/111/111.cpu
        deleted:    model/pk/boot/111/111.ctl
        deleted:    model/pk/boot/111/111.ctl.out
        deleted:    model/pk/boot/111/111.ext
        deleted:    model/pk/boot/111/111.grd
        deleted:    model/pk/boot/111/111.lst
        deleted:    model/pk/boot/111/111.phi
        deleted:    model/pk/boot/111/111.sh
        deleted:    model/pk/boot/111/111.shk
        deleted:    model/pk/boot/111/111.shm
        deleted:    model/pk/boot/111/111.tab
        deleted:    model/pk/boot/111/111.xml
        deleted:    model/pk/boot/111/111par.tab
        deleted:    model/pk/boot/111/Run_111.o133
        deleted:    model/pk/boot/111/bbi.yaml
        deleted:    model/pk/boot/111/bbi_config.json
        deleted:    model/pk/boot/111/grid.sh

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        model/pk/boot/111/

42.33user 7.86system 2:42.29elapsed 30%CPU (0avgtext+0avgdata 31664maxresident)k
22201960inputs+19880outputs (20556major+239154minor)pagefaults 0swaps

Based on the above case, it seems like the performance of git rm --cached probably won't degrade horribly in a repository with a large tree.

On the other hand, the git rm --cached overhead would be on top of the (equivalent of) rm -r $model_dir that's currently done, and git rm --cached takes a decent chunk of time relative to that.

$ hyperfine --warmup 3 \
>           --prepare "git reset --hard && git status && sleep 0.1" \
>           'rm -r model/pk/boot/111' \
>           'git rm --cached -rq model/pk/boot/111'
Benchmark #1: rm -r model/pk/boot/111
  Time (mean ± σ):       1.4 ms ±   0.2 ms    [User: 1.0 ms, System: 0.4 ms]
  Range (min … max):     1.2 ms …   1.6 ms    10 runs

  Warning: Command took less than 5 ms to complete. Results might be inaccurate.

Benchmark #2: git rm --cached -rq model/pk/boot/111
  Time (mean ± σ):     514.8 ms ±  92.8 ms    [User: 96.9 ms, System: 18.0 ms]
  Range (min … max):   427.7 ms … 708.9 ms    10 runs

Summary
  'rm -r model/pk/boot/111' ran
  376.69 ± 84.50 times faster than 'git rm --cached -rq model/pk/boot/111'

The text was updated successfully, but these errors were encountered:

seth127 · 2021-10-07T14:45:05Z

I like this idea. We've had people mention this before too. Questions:

so, are we currently dropping the temp * .gitignore in there, so that it ignores new files in any model dir (until the run finishes), or were you saying we could do that (but it wouldn't solve this problem).
you say the git rm --cached takes "a decent chunk of time relative to that" but, if I'm reading this right, that was still less than a second, right? Even in a bootstrap or something with thousands of models, that's probably gonna save the user time vs having Rstudio potentially hang on them, right?

kyleam · 2021-10-07T18:13:03Z

We've had people mention this before too.

Ah, that's good to note. I didn't realize that there had been other reports of hanging aside from the latest one that triggered this.

are we currently dropping the temp * .gitignore in there

That's the current state.

bbi/cmd/nonmem.go

Lines 969 to 975 in 8826ea7

    
           // Now that the directory is created, let's create the gitignore file if specified 
        
           if viper.GetBool("git") { 
        
           	log.Debugf("%s Writing initial gitignore file", l.LogIdentifier()) 
        
           	if err = WriteGitIgnoreFile(l.OutputDir); err != nil { 
        
           		return err 
        
           	} 
        
           }

you say the git rm --cached takes "a decent chunk of time relative to that" but, if I'm reading this right, that was still less than a second, right?

Right, "that" is a fast operation that git rm --cached couldn't possibly compete with. I agree that the penalty hit will be swamped in the larger context (and should have worded my last post more clearly).

The main thing I was worried about is the performance degrading to git status levels with stale cache info, and the first timing snippet (where git status took close to 3 minutes) convinces me that that's not the case.

Here are some things that would need to be done to go this direction:

confirm that this code path is executed serially (see above)
update patch to give warning when git call fails and to add a comment explaining the reason for the kludge

come up with a setup to trigger hang and confirm that the above patch actually fixes it

As the script below shows, git rm --cached will make the .gitignore pattern be honored, but perhaps there's some reason this won't actually help the reported case (or it's even possible the original diagnosis was incorrect).

demo

#!/bin/sh

set -uex

cd "$(mktemp -d "${TMPDIR:-/tmp}"/bbi-XXXXXXX)"
git init
mkdir sub
echo a >sub/f
git add sub/f
git commit -mf

echo '*' >sub/.gitignore

echo modify >>sub/f
git status

git rm --cached sub/f
git status

echo modify even more >>sub/f
git status

rm sub/.gitignore
git status

[...]
+ echo '*'
+ echo modify
+ git status
On branch main
Changes not staged for commit:
  modified:   sub/f

no changes added to commit
+ git rm --cached sub/f
rm 'sub/f'
+ git status
On branch main
Changes to be committed:
  deleted:    sub/f

+ echo modify even more
+ git status
On branch main
Changes to be committed:
  deleted:    sub/f

+ rm sub/.gitignore
+ git status
On branch main
Changes to be committed:
  deleted:    sub/f

Untracked files:
  sub/

seth127 · 2021-10-08T13:29:15Z

great, thanks for doing all that Kyle. Let's keep this on the back burner for the moment, but once we get the test refactor released and #220 cleaned up and merged then this might rise to the top of the list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential for Rstudio (and other Git-polling apps) to hang when overwriting model #252

Potential for Rstudio (and other Git-polling apps) to hang when overwriting model #252

kyleam commented Oct 6, 2021 •

edited

Loading

seth127 commented Oct 7, 2021

kyleam commented Oct 7, 2021

seth127 commented Oct 8, 2021

Potential for Rstudio (and other Git-polling apps) to hang when overwriting model #252

Potential for Rstudio (and other Git-polling apps) to hang when overwriting model #252

Comments

kyleam commented Oct 6, 2021 • edited Loading

problem

workaround

performance penalty

seth127 commented Oct 7, 2021

kyleam commented Oct 7, 2021

seth127 commented Oct 8, 2021

kyleam commented Oct 6, 2021 •

edited

Loading