Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytestream write can leak goroutines if disk.Put doesn't drain the io.Reader #473

Closed
BoyangTian-Robinhood opened this issue Sep 20, 2021 · 12 comments · Fixed by #474
Closed

Comments

@BoyangTian-Robinhood
Copy link

BoyangTian-Robinhood commented Sep 20, 2021

We run bazel-remote cache as a gRPC server and have observed it leaks memory.

 
go_goroutines appears to increase in an unbounded fashion in proportion to the number of requests the cache serves. At our request current scale our cache crashes with an OOM every 2 days.
 
Configuration
 
Bazel Remote Version: 2.1.1
Disk: 1 TB
Memory: 300 GB
 
Observed OOM Crash
 
[Aug18 20:41] bazel-remote invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null), order=0, oom_score_adj=0
[ +0.003323] oom_kill_process+0x223/0x410
[ +0.006224] Out of memory: Kill process 48578 (bazel-remote) score 963 or sacrifice child
[ +0.012113] Killed process 48578 (bazel-remote) total-vm:391175744kB, anon-rss:388574312kB, file-rss:0kB, shmem-rss:0kB
 
Reproduction Steps
 

  1. Start a local gRPC bazel-remote cache instance.
  2. Clean and issue gRPC calls to the cache to fetch artifacts.
     
bazel clean
bazel build //<same target>/... --remote_cache=grpc://localhost:9092 --remote_upload_local_results
  1. curl localhost:8080/metrics | grep gorou
  2. Repeat steps 2-3 and observe the number of goroutines grows in an unbounded fashion.
     
    This issue does not reproduce when artifacts are fetched from the remote cache server over HTTP.
     
    Go Routine Profile
     
    Writes appear to be blocked for individual go routines.
     
goroutine 152 [select, 7 minutes]:
io.(*pipe).Write(0xc00009c720, 0xc00084a000, 0x4000, 0x4000, 0x0, 0x0, 0x0)
GOROOT/src/io/pipe.go:94 +0x1c5
io.(*PipeWriter).Write(...)
GOROOT/src/io/pipe.go:163
github.com/buchgr/bazel-remote/server.(*grpcServer).Write.func2(0x1a7df90, 0xc0007a0980, 0xc0001141ba, 0xc0000a9e90, 0xc00009c7e0, 0xc0004ca120, 0xc00009c840, 0xc00009c780, 0xc0003080a8, 0xc0003080b0)
server/grpc_bytestream.go:474 +0x36b
created by github.com/buchgr/bazel-remote/server.(*grpcServer).Write
server/grpc_bytestream.go:384 +0x2ea
 
goroutine 250 [select, 42 minutes]:
io.(*pipe).Write(0xc000484180, 0xc000230000, 0x4000, 0x4000, 0x0, 0x0, 0x0)
GOROOT/src/io/pipe.go:94 +0x1c5
io.(*PipeWriter).Write(...)
GOROOT/src/io/pipe.go:163
github.com/buchgr/bazel-remote/server.(*grpcServer).Write.func2(0x1a7df90, 0xc00040c740, 0xc0002b843a, 0xc000529530, 0xc000484240, 0xc000218990, 0xc0004842a0, 0xc0004841e0, 0xc00000e298, 0xc00000e2a8)
server/grpc_bytestream.go:474 +0x36b
created by github.com/buchgr/bazel-remote/server.(*grpcServer).Write
server/grpc_bytestream.go:384 +0x2ea
 

 
One Potential Culprit
 
https://github.com/buchgr/bazel-remote/blob/master/server/grpc_bytestream.go#L448
 

go func() {
err := s.cache.Put(cache.CAS, hash, size, rc)
putResult <- err
}()

 
putResult is in a nested channel that does not return. Upon subsequent iterations this code tries to push to recvResult, but nobody no one is reading from it which may lead to blocked writes.

@mostynb
Copy link
Collaborator

mostynb commented Sep 20, 2021

Thanks for the bug report, I will take a look.

mostynb added a commit to mostynb/bazel-remote that referenced this issue Sep 20, 2021
io.PipeWriter Write calls block until all the data written in the call
is read from the corresponding io.PipeReader. If we don't read all that
data, then the writing goroutine will block forever.

This PipeWriter is intended to be consumed by disk.Put(), but if that
returns early then there will be blocked writes. To un-block them, we
can ensure that any remaining data is drained after disk.Put returns.

This might fix buchgr#473
mostynb added a commit to mostynb/bazel-remote that referenced this issue Sep 20, 2021
io.PipeWriter Write calls block until all the data written in the call
is read from the corresponding io.PipeReader. If we don't read all that
data, then the writing goroutine will block forever.

This PipeWriter is intended to be consumed by disk.Put(), but if that
returns early then there will be blocked writes. To un-block them, we
can ensure that any remaining data is drained after disk.Put returns.

This might fix buchgr#473
mostynb added a commit to mostynb/bazel-remote that referenced this issue Sep 20, 2021
io.PipeWriter Write calls block until all the data written in the
call is read from the corresponding io.PipeReader. If we don't read
all that data, then the writing goroutine will block forever.

This PipeWriter used in bytestream Write calls is intended to be
consumed by disk.Put(), but if that returns early then there will
be blocked writes. To un-block them, we should ensure that any
remaining data is drained before disk.Put returns.

This might fix buchgr#473
mostynb added a commit to mostynb/bazel-remote that referenced this issue Sep 20, 2021
io.PipeWriter Write calls block until all the data written in the
call is read from the corresponding io.PipeReader. If we don't read
all that data, then the writing goroutine will block forever.

The PipeWriter used in bytestream Write calls is intended to be
consumed by disk.Put(), but if that returns early then there will
be blocked writes. To un-block them, we should ensure that any
remaining data is drained before disk.Put returns.

This might fix buchgr#473
@mostynb
Copy link
Collaborator

mostynb commented Sep 20, 2021

I think what's happening is:

  • each pw.Write call blocks until of the data that it writes is read by one or more pr.Read calls
  • the s.cache.Put(cache.CAS, hash, size, rc) call reads from pr, but it might return early without reading all the data (eg in case of an error)
  • if that happens, then there must be a permanently blocked pw.Write call (like the ones blocked for 7 and 42 minutes in your stack trace), causing a goroutine leak

To fix this, we can ensure that disk.Put always drains the reader before returning. Could you test a build from this PR, and let me know if it worked or not? #474

@BoyangTian-Robinhood
Copy link
Author

Thanks for the quick fix @mostynb ! I verified that goroutines number did decrease. it seems works
But I also check pprof and top and shows me this:
top

Processes: 638 total, 4 running, 634 sleeping, 4297 threads                                                                                                       15:29:11
Load Avg: 2.50, 3.85, 4.40  CPU usage: 16.94% user, 3.84% sys, 79.21% idle  SharedLibs: 508M resident, 91M data, 91M linkedit.
MemRegions: 530533 total, 8565M resident, 124M private, 4303M shared. PhysMem: 32G used (3643M wired), 31M unused.
VM: 5989G vsize, 2322M framework vsize, 50774957(0) swapins, 54369618(0) swapouts.   Networks: packets: 30884122/32G in, 22342064/12G out.
Disks: 56172151/635G read, 19701599/480G written.

PID    COMMAND      %CPU TIME     #TH  #WQ  #POR MEM   PURG CMPR PGRP  PPID STATE    BOOSTS    %CPU_ME %CPU_OTHRS UID  FAULT COW  MSGS MSGR SYSBSD   SYSM CSW     PAGE
56582  bazel-remote 0.0  00:22.58 16   0    36   99M   0B   18M  56582 1558 sleeping *0[1]     0.00000 0.00000    501  70316 261  108  18   1141655  138  817310  0

pprof

Showing nodes accounting for 112.31MB, 97.37% of 115.35MB total
Dropped 39 nodes (cum <= 0.58MB)
      flat  flat%   sum%        cum   cum%
      96MB 83.23% 83.23%       96MB 83.23%  github.com/klauspost/compress/zstd.(*fastBase).ensureHist (inline)
    5.12MB  4.44% 87.66%     5.12MB  4.44%  github.com/klauspost/compress/zstd.encoderOptions.encoder
    4.96MB  4.30% 91.96%     4.96MB  4.30%  github.com/klauspost/compress/zstd.(*blockEnc).init
    3.50MB  3.04% 95.00%     3.50MB  3.04%  runtime.allocm
    1.73MB  1.50% 96.50%    97.73MB 84.73%  github.com/klauspost/compress/zstd.(*fastEncoder).Encode
       1MB  0.87% 97.37%        1MB  0.87%  github.com/klauspost/compress/zstd.(*blockDec).startDecoder
         0     0% 97.37%   109.34MB 94.79%  github.com/buchgr/bazel-remote/cache/disk.(*Cache).Put
         0     0% 97.37%   108.84MB 94.36%  github.com/buchgr/bazel-remote/cache/disk.(*Cache).writeAndCloseFile
         0     0% 97.37%   108.84MB 94.36%  github.com/buchgr/bazel-remote/cache/disk/casblob.WriteAndClose
         0     0% 97.37%   109.34MB 94.79%  github.com/buchgr/bazel-remote/server.(*grpcServer).Write.func2.1
         0     0% 97.37%   108.84MB 94.36%  github.com/klauspost/compress/zstd.(*Encoder).EncodeAll
         0     0% 97.37%     5.12MB  4.44%  github.com/klauspost/compress/zstd.(*Encoder).initialize
         0     0% 97.37%     1.03MB   0.9%  github.com/klauspost/compress/zstd.(*blockEnc).encode
         0     0% 97.37%       96MB 83.23%  github.com/klauspost/compress/zstd.(*fastBase).addBlock
         0     0% 97.37%     4.96MB  4.30%  github.com/klauspost/compress/zstd.(*fastBase).resetBase
         0     0% 97.37%     4.96MB  4.30%  github.com/klauspost/compress/zstd.(*fastEncoder).Reset
         0     0% 97.37%     1.50MB  1.30%  runtime.mcall

This seems too big, I check the disk folder and it only have 206.2 MB data.
I also restart the process and keep the disk folder and it seems this is more reasonable use for memory

Processes: 643 total, 2 running, 641 sleeping, 4153 threads                                                                                                       15:32:42
Load Avg: 2.84, 3.20, 3.99  CPU usage: 7.97% user, 4.51% sys, 87.50% idle   SharedLibs: 509M resident, 91M data, 91M linkedit.
MemRegions: 531467 total, 8933M resident, 129M private, 4319M shared. PhysMem: 32G used (3645M wired), 36M unused.
VM: 6012G vsize, 2322M framework vsize, 50978697(64) swapins, 54414972(0) swapouts.    Networks: packets: 30892411/32G in, 22351980/12G out.
Disks: 56289193/636G read, 19712077/480G written.

PID    COMMAND      %CPU TIME     #TH  #WQ  #POR MEM   PURG CMPR PGRP  PPID STATE    BOOSTS    %CPU_ME %CPU_OTHRS UID  FAUL COW  MSGS MSGR SYSBSD SYSM CSW  PAGE IDLE POWE
72108  bazel-remote 0.0  00:00.13 9    0    29   7388K 0B   0B   72108 1558 sleeping *0[1]     0.00000 0.00000    501  4285 254  126  18   12066  118  990  0    0    0.0

pprof

(pprof) top40
Showing nodes accounting for 3585.80kB, 100% of 3585.80kB total
      flat  flat%   sum%        cum   cum%
 1537.69kB 42.88% 42.88%  1537.69kB 42.88%  runtime.allocm
  512.05kB 14.28% 57.16%   512.05kB 14.28%  github.com/klauspost/compress/zstd.(*blockDec).startDecoder
  512.04kB 14.28% 71.44%   512.04kB 14.28%  runtime.bgscavenge
  512.02kB 14.28% 85.72%   512.02kB 14.28%  runtime.gcBgMarkWorker
  512.01kB 14.28%   100%   512.01kB 14.28%  sync.(*Map).Store
         0     0%   100%   512.01kB 14.28%  github.com/json-iterator/go.init

I think disk only have around 200 MB use and memory shouldn't use near that much.

@mostynb
Copy link
Collaborator

mostynb commented Sep 20, 2021

I will need to look at your numbers and try to understand them some more, but my initial guess is that you're using --storage_mode zstd (which is enabled by default) that increases memory usage when filling the cache and when handling requests for uncompressed data from compressed storage (since bazel doesn't support compressed bytestream requests yet) then this will increase memory usage (but still probably save disk usage).

If you clear the cache directory and try --storage_mode uncompressed you will probably see lower memory usage (but higher disk usage) - if you try that then we might have a good test of whether the fix worked or not and you should also see lower memory usage.

On the other hand, if you are seeing a lot of errors when uploading to the cache, then maybe it's worth finding a way to stop writing this data that needs to be drained. Do you have any put/upload errors in your logs?

(Another thing we could try would be to use low-memory options for the zstd library bazel-remote uses.)

@BoyangTian-Robinhood
Copy link
Author

BoyangTian-Robinhood commented Sep 21, 2021

Even when compressed the data, after the data is been saved, it should decrease the memory right? I monitor sometime after my bazel test finished and the memory still be that much.
I only find this error

WARNING: Writing to Remote Cache:
BulkTransferException

I set the upload file limit so guess this is the place cause this error, no other errors.

One thing is after I set --storage_mode uncompressed I can see memory become low and it has cache hits

Processes: 641 total, 2 running, 639 sleeping, 4156 threads                                                                                                       16:14:41
Load Avg: 5.04, 5.81, 5.12  CPU usage: 6.39% user, 4.54% sys, 89.5% idle     SharedLibs: 509M resident, 91M data, 91M linkedit.
MemRegions: 531445 total, 9877M resident, 133M private, 5135M shared. PhysMem: 29G used (3637M wired), 2577M unused.
VM: 5951G vsize, 2322M framework vsize, 52288723(192) swapins, 56175494(0) swapouts. Networks: packets: 31074870/32G in, 22547045/12G out.
Disks: 57656345/655G read, 20353926/497G written.

PID    COMMAND      %CPU TIME     #TH  #WQ  #POR MEM    PURG CMPR PGRP  PPID STATE    BOOSTS    %CPU_ME %CPU_OTHRS UID  FAULT COW  MSGS MSGR SYSBSD  SYSM CSW     PAGE IDLE
77497  bazel-remote 0.2  00:03.57 14   0    34   9848K  0B   0B   77497 1558 sleeping *0[1]     0.00000 0.00000    501  5918  252  59   18   226573+ 140  155905+ 0    120
INFO: Elapsed time: 53.966s, Critical Path: 34.89s
INFO: 1713 processes: 1540 remote cache hit, 172 internal, 1 darwin-sandbox.
INFO: Build completed successfully, 1713 total actions

But the problem is this time when I check the disk folder it is only used around 1 MB after I run the same tests packages.
And when I set back storage_mode this time memory also not that high as 99MB (disk still be around 1MB). Not sure how to repo that one. and why now disk use very small amount of space

@mostynb
Copy link
Collaborator

mostynb commented Sep 22, 2021

Did you clear the cache dir before restarting bazel-remote with --storage_mode uncompressed ? If not, the previously stored compressed blobs will need to be decoded for each (uncompressed) request. The zstd library we use is known to use a lot of memory.

Which version of top are you using? When I try locally (on ubuntu) I don't get the same fields as in your logs.

I will land #474 (since that seems to fix the goroutine leak), and then try making a build that uses the low-memory options for the zstd library for you to try.

mostynb added a commit that referenced this issue Sep 22, 2021
io.PipeWriter Write calls block until all the data written in the
call is read from the corresponding io.PipeReader. If we don't read
all that data, then the writing goroutine will block forever.

The PipeWriter used in bytestream Write calls is intended to be
consumed by disk.Put(), but if that returns early then there will
be blocked writes. To un-block them, we should ensure that any
remaining data is drained before disk.Put returns.

This might fix #473
@mostynb mostynb reopened this Sep 22, 2021
mostynb added a commit to mostynb/bazel-remote that referenced this issue Sep 22, 2021
Let's see how this affects memory usage and performance.

Related to buchgr#473.
@mostynb
Copy link
Collaborator

mostynb commented Sep 22, 2021

Here's a version with the zstd libary's low memory options enabled to try out (not in all places, but most): #475

@BoyangTian-Robinhood
Copy link
Author

Cool, thanks. When will you publish the newer version of your fix? Will deploy them. For the version I use to test is just the branch for you fix, I think which version you fix depends on is the one I used.

@mostynb
Copy link
Collaborator

mostynb commented Sep 22, 2021

The goroutine fix has landed on the master branch, and been pushed to the "latest" tag on docker hub, if that helps. I don't mind making a versioned release tomorrow with just the goroutine fix if it would be useful, but I can wait a few days if you want to test the "lowmem" changes first.

(The "lowmem" PR is based on the tip of master, with the goroutine fix.)

@BoyangTian-Robinhood
Copy link
Author

We didn't run under docker image, we directly run it managed by systemd. I could build by myself. But a version would be better to track for future maintains :)

@BoyangTian-Robinhood
Copy link
Author

We think have a version tag is better to manage. In this case, could you please create a version tag for this fix 🙏

@mostynb mostynb changed the title memory leak cause by grpc goroutines bytestream write can leak goroutines if disk.Put doesn't drain the io.Reader Sep 23, 2021
@mostynb
Copy link
Collaborator

mostynb commented Sep 23, 2021

v2.1.4 and v1.3.3 have been released with this fix.

Let's move discussion of the "lowmem" experiment over to #475.

@mostynb mostynb closed this as completed Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants