Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory-Efficient Code Path is Not Automatically Triggered #331

Open
aszepieniec opened this issue Sep 30, 2024 · 4 comments
Open

Memory-Efficient Code Path is Not Automatically Triggered #331

aszepieniec opened this issue Sep 30, 2024 · 4 comments
Labels
🪳 bug Something is not working 🤖 code Changes the implementation 🧑‍🤝‍🧑 help wanted Need some help 🟡 prio: medium Not super urgent

Comments

@aszepieniec
Copy link
Collaborator

No description provided.

@aszepieniec aszepieniec added 🪳 bug Something is not working 🤖 code Changes the implementation labels Sep 30, 2024
@aszepieniec aszepieniec added the 🟡 prio: medium Not super urgent label Dec 27, 2024
@jan-ferdinand
Copy link
Member

I can confirm this bug on Linux. I'm reasonably sure I'm using the try_reserve_exact API correctly.

A possible explanation for this behavior is overcommitment. That is, the kernel lets the allocation attempt go through, even though the available memory is insufficient. Once Triton VM actually starts using the allocated memory, the kernel realizes that memory is running out, and invokes the oom-killer. Since Triton VM is the process with the worst oom score (usually true considering our RAM demands), it gets terminated.

Should this explanation be correct1, I see little opportunity for improvement. It is not possible to disable overcommitment for a single process. Messing with the system configuration is a bad idea, and probably impossible. Lowering the oom score adjustment below 0 (which is the default) also sounds like a bad idea and requires administrator privileges.

I'm open to other explanations.

Footnotes

  1. I'm not sure what experiments to perform to try to disprove this hypothesis

@jan-ferdinand
Copy link
Member

On Windows, the memory-efficient code path is automatically triggered. As far as I know, the allocator on Windows does not perform overcommitting.

@aszepieniec
Copy link
Collaborator Author

Based on this source it is not too difficult to change the system-wide overcommit policy on linux. If we have a reliable way of triggering this issue, let's try that reliable way with overcommitment disabled to verify/falsify this hypothesis.

@jan-ferdinand
Copy link
Member

Yes, that's a good experiment to run. 🧑‍🔬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪳 bug Something is not working 🤖 code Changes the implementation 🧑‍🤝‍🧑 help wanted Need some help 🟡 prio: medium Not super urgent
Projects
None yet
Development

No branches or pull requests

2 participants