Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #3221

Closed
erfan-khadem opened this issue Apr 2, 2024 · 20 comments
Closed

Memory leak #3221

erfan-khadem opened this issue Apr 2, 2024 · 20 comments

Comments

@erfan-khadem
Copy link

There is / has been a memory leak issue since at least version 1.8.9. I have tested the latest release, and it is leaking at a rate of about 1.7 GB for every 100 GB of traffic proxied, during a time frame of 24 hours and with 50 active users. I have not done experiments to correlate these numbers with each other, but it seems like the amount of leaked memory is proportional to the served traffic. There is also no ceiling to this, it just grows until systemd-oomd kills it. I have tried using vless only, vless+vmess and vless+trojan, and avoided the newly added transports but to no avail. I would be more than happy to experiment more and provide the team with more information. I am running xray behind haproxy and that behind cloudflare CDN.

OS: Ubuntu 22.04, all packages up-to-date.
Architecture: Armv8, Neoverse-N1 at Hetzner. Also generic x86 platforms.
Memory: 4GB
Kernel: Generic Ubuntu 5.15.0-xxx
Logging: Disabled

@erfan-khadem
Copy link
Author

heap.zip
Here is the result of heap profiling using pprof. memory usage was about 800MB when I took this profile.

@GFW-knocker
Copy link

@er888kh
what kind of transmission do you use?
i know that grpc has memory leak in 1.8.0+ above
but ws is rather fine in my experience

@50infivedays
Copy link

same problem , in my experience , version 1.8.4 is fine , and after that version , xray will leak

@RPRX
Copy link
Member

RPRX commented Apr 21, 2024

same problem , in my experience , version 1.8.4 is fine , and after that version , xray will leak

麻烦逐个测试 v1.8.4 至 v1.8.6 之间的 commits,看一下哪个 commit 出现该问题

@er888kh 你那边也是这样吗?

@yuhan6665
Copy link
Member

yuhan6665 commented Apr 29, 2024

heap.zip Here is the result of heap profiling using pprof. memory usage was about 800MB when I took this profile.

From this pprof, the biggest is at readv reader with 220MB. Although I'm not sure if this is indeed a leak. Can you profile a bigger usage? (Or reduce the buffer size to isolate the issue)

@GektorUA
Copy link

GektorUA commented Apr 30, 2024

Same issue on MT7981A (ARM64, cortex A53) with 512MB RAM, have try 1.8.4 and 1.8.10, both consume all RAM on device after first speedtest running and hangs, switch to sing-box 1.8.10 without change anything on server side (xray-core 1.8.10) and have no issues with memory consumption (~180 MB RAM after 10 and more tests).

P.S.
buffer-size is set to 4

@amirhosss
Copy link

Same issue on MT7981A (ARM64, cortex A53) with 512MB RAM, have try 1.8.4 and 1.8.10, both consume all RAM on device after first speedtest running and hangs

Same problem when running speedtest with chain-proxy config and using Xray-Core

@taoabc
Copy link

taoabc commented May 9, 2024

Same problem

@shakibamoshiri
Copy link

shakibamoshiri commented May 16, 2024

my VM faced an OS crash yesterday because of OOM.

here is the screenshot of the console after the crash
xray-memory-leak

never did profiling for xray , but could it be for memory leaks ?
protocol: vless-grpc
number of users: around 30/40
number of concurrent users : 10/15
OS : Debian 11 x64
physical memory: 1G
swap : off
xray : Xray 1.8.7 (Xray, Penetrates Everything.) 3f0bc13 (go1.21.5 linux/amd64)

total-vm: 2558796KB
anon-rss: 561376KB
file-rss: 0KB

which protocol has the least memory leak ?

did not have full console access and so was not able to recover the OS; I had to rebuild it and no further logs

@taoabc
Copy link

taoabc commented May 17, 2024

my VM faced an OS crash yesterday because of OOM.

here is the screenshot of the console after the crash xray-memory-leak

never did profiling for xray , but could it be for memory leaks ? protocol: vless-grpc number of users: around 30/40 number of concurrent users : 10/15 OS : Debian 11 x64 physical memory: 1G swap : off xray : Xray 1.8.7 (Xray, Penetrates Everything.) 3f0bc13 (go1.21.5 linux/amd64)

total-vm: 2558796KB anon-rss: 561376KB file-rss: 0KB

which protocol has the least memory leak ?

did not have full console access and so was not able to recover the OS; I had to rebuild it and no further logs

I updated to 1.8.11, seems it's been mitigated. While it's easy to repro when I use 1.8.10

@EldestBard
Copy link

same problem on the latest version 1.18.13.
image

@masbur
Copy link

masbur commented Jun 25, 2024

I'm using version 1.8.16 and problem solved. except shadowsocks2022, still leak
image

@M03ED
Copy link

M03ED commented Aug 13, 2024

there is similar report's in marzban
Gozargah/Marzban#1062
Gozargah/Marzban#992
Gozargah/Marzban#814

@majidsadr
Copy link

@M03ED just in case what I was facing was not OOM issue. I couldn't find any reason for the issue. but with another configuration (WS+VMESS) I have about 5k sockstat TCP metrics and there are no issues with connections.

@mmmray
Copy link
Collaborator

mmmray commented Aug 15, 2024

I think this issue has become about too many things at once (socket/filedescriptor leak versus memory leak) and each of the reports is not very specific. I suggest to attempt these things:

  • reduce the amount of inbounds/outbounds per xray process to pin it down to a specific transport/protocol. For example, if you have a node with 6 inbounds, try splitting up the node into two with 3 each, then whichever produces a memory leak, split it up again. try to arrive at a server json config that can reproduce the issue reliably and is free of unnecessary things.
  • in case of OOM, once you have a node that has a minimal configuration, use pprof (like the OP) to produce a heap profile. you can follow https://xtls.github.io/en/config/metrics.html#pprof to configure it, although I don't know if it can be done in panels easily.
  • rprx and yuhan gave specific questions already (please test all commits in the range, please profile a bigger usage), but i don't see a followup to them.
  • if anything about the core developer's response is unclear, please ask, don't just ignore it and post "same here"

@boris768
Copy link

boris768 commented Aug 20, 2024

i checked original v2ray repository and found same bug
v2fly/v2ray-core#3086

I used the patch from the bug and it seems to have helped with the big leak, at least the process doesn't eat memory as aggressively. tested on 1.8.23 release code

place to patch

func (p *pipe) ReadMultiBuffer() (buf.MultiBuffer, error) {

applied patch

func (p *pipe) ReadMultiBuffer() (buf.MultiBuffer, error) {
	for {
		data, err := p.readMultiBufferInternal()
		if data != nil || err != nil {
			p.writeSignal.Signal()
			return data, err
		}

		timer := time.NewTimer(15 * time.Minute)
		select {
		case <-p.readSignal.Wait():
		case <-p.done.Wait():
		case <-timer.C: // new add
			return nil, buf.ErrReadTimeout // new add
		case err = <-p.errChan:
			return nil, err
		}
		timer.Stop() // new add
	}
}

@mmmray
Copy link
Collaborator

mmmray commented Aug 25, 2024

@boris768 Then it seems the real issue might be that some pipe is not correctly closed somewhere, or generally resources are not cleaned up in some inbound/outbound or transport. Unfortunately this patch has the ability to work around a variety of different issues, it doesn't really explain what's going on IMO

@boris768
Copy link

@mmmray, i testing this patch, that can be used as hack way to fix memory leak. I using simple XTLS-Reality setup and without fix 8-10 clients gives xray memory usage up to 700-800 mb in one day (OOM kills service). With fix, memory peak usage reaches 150mb.
At least, it gives info about mem leak place, i hope, it will help to fix.

alryaz added a commit to alryaz/openwrt-ax3600-builds that referenced this issue Aug 31, 2024
alryaz added a commit to alryaz/openwrt-ax3600-builds that referenced this issue Sep 1, 2024
@M03ED
Copy link

M03ED commented Sep 9, 2024

Is the problem solved ?

@mmmray
Copy link
Collaborator

mmmray commented Sep 9, 2024

Not completed, but it's missing too much information (for example, what version was it introduced?), nobody has investigated it much, and it's not clear if this is 1 issue or N issues. As a developer I also wouldn't know what to do with it right now. I guess, let's reopen when somebody has managed to make another heap profile or some other discovery. I think the patch cannot help like this, to be honest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests