-
Notifications
You must be signed in to change notification settings - Fork 7.2k
feat(runtime): use prlimit
to limit resource usage of command to avoid OOM Runtime Kill
#6338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
7421aa1
log more mem info
xingyaoww 501824a
simplify remote stress test a little bit
xingyaoww 867f672
reliable way to reproduce error
xingyaoww c6902da
use a more reasonable tests
xingyaoww 5c44726
Merge branch 'main' into xw/bash-perf
xingyaoww 61b87ce
feat(runtime): add memory monitoring to prevent k8s OOM kills
openhands-agent 54ac167
update lock
xingyaoww fc18e5c
update memory monitor for action execution server
xingyaoww feda348
monitor the entire pg
xingyaoww ed35d53
fix recursive call
xingyaoww 6d6adba
Merge commit 'f24fbec165de33749500dc06c9b6e753b588dbf9' into xw/bash-…
xingyaoww 5f33ae1
update log
xingyaoww 4699e91
use prlimit to restrict memory usage
xingyaoww 7fda066
fix prlimit
xingyaoww 33737cd
also support running stress test locally
xingyaoww 9da4550
log memory stuff in case of high system pressure
xingyaoww 57afe20
tweak tests
xingyaoww 6bc5ca3
combine docker stress test with remote runtime
xingyaoww d2d57fe
remove save perf debug
xingyaoww e09ac90
makes it work for both remote and docke rtests
xingyaoww a1d200c
allow override max memory gb in action execution server; try to get…
xingyaoww 19f025b
ok got this working with docker
xingyaoww 6e678fd
use pss instead of rss for process mem
xingyaoww 74d048b
Merge branch 'main' into xw/bash-perf
enyst 507c0a9
update runtime startup command for remote runtime too
xingyaoww a6bbbfe
Merge commit '74d048b62341b33e32961b861e6312ed70086ac6' into xw/bash-…
xingyaoww 9b92118
update lock
xingyaoww f39181c
update stresstest script
xingyaoww 81634ea
Merge commit '5fa2634d6070b84e912bb85017cf686cd7abecdf' into xw/bash-…
xingyaoww 34e36d4
add stress test for file editing
xingyaoww a9dc6d4
add a memory test that can run in CI
xingyaoww 239ee06
revert memory monitor
xingyaoww b039201
Merge commit 'b12b426e3ded6934b289e2efe2dd7ad0c7d181c1' into xw/bash-…
xingyaoww a51c9c5
simplify dep
xingyaoww 995a3fd
revert more changes
xingyaoww 2a38c54
revert even more changes
xingyaoww 0a157a3
use lock from main
xingyaoww bf5dcbf
update comment
xingyaoww ac5ee21
add another test where we use higher limit and expect the stress test…
xingyaoww e64f477
Update openhands/runtime/action_execution_server.py
xingyaoww 76744ea
only enable prlimit when max_memory_mb is not None
xingyaoww cc5f738
shorten stress test to 30 sec
xingyaoww 679d27a
Merge branch 'main' into xw/bash-perf
xingyaoww File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
"""Stress tests for the DockerRuntime, which connects to the ActionExecutor running in the sandbox.""" | ||
|
||
from conftest import _close_test_runtime, _load_runtime | ||
|
||
from openhands.core.logger import openhands_logger as logger | ||
from openhands.events.action import CmdRunAction | ||
|
||
|
||
def test_stress_docker_runtime(temp_dir, runtime_cls, repeat=1): | ||
runtime, config = _load_runtime( | ||
temp_dir, | ||
runtime_cls, | ||
docker_runtime_kwargs={ | ||
'cpu_period': 100000, # 100ms | ||
'cpu_quota': 100000, # Can use 100ms out of each 100ms period (1 CPU) | ||
'mem_limit': '4G', # 4 GB of memory | ||
}, | ||
) | ||
|
||
action = CmdRunAction( | ||
command='sudo apt-get update && sudo apt-get install -y stress-ng' | ||
) | ||
logger.info(action, extra={'msg_type': 'ACTION'}) | ||
obs = runtime.run_action(action) | ||
logger.info(obs, extra={'msg_type': 'OBSERVATION'}) | ||
assert obs.exit_code == 0 | ||
|
||
for _ in range(repeat): | ||
# run stress-ng stress tests for 1 minute | ||
action = CmdRunAction(command='stress-ng --all 1 -t 30s') | ||
action.set_hard_timeout(120) | ||
logger.info(action, extra={'msg_type': 'ACTION'}) | ||
obs = runtime.run_action(action) | ||
logger.info(obs, extra={'msg_type': 'OBSERVATION'}) | ||
|
||
_close_test_runtime(runtime) | ||
|
||
|
||
def test_stress_docker_runtime_hit_memory_limits(temp_dir, runtime_cls): | ||
"""Test runtime behavior under resource constraints.""" | ||
runtime, config = _load_runtime( | ||
temp_dir, | ||
runtime_cls, | ||
docker_runtime_kwargs={ | ||
'cpu_period': 100000, # 100ms | ||
'cpu_quota': 100000, # Can use 100ms out of each 100ms period (1 CPU) | ||
'mem_limit': '4G', # 4 GB of memory | ||
'memswap_limit': '0', # No swap | ||
'mem_swappiness': 0, # Disable swapping | ||
'oom_kill_disable': False, # Enable OOM killer | ||
}, | ||
runtime_startup_env_vars={ | ||
'RUNTIME_MAX_MEMORY_GB': '3', | ||
}, | ||
) | ||
|
||
action = CmdRunAction( | ||
command='sudo apt-get update && sudo apt-get install -y stress-ng' | ||
) | ||
logger.info(action, extra={'msg_type': 'ACTION'}) | ||
obs = runtime.run_action(action) | ||
logger.info(obs, extra={'msg_type': 'OBSERVATION'}) | ||
assert obs.exit_code == 0 | ||
|
||
action = CmdRunAction( | ||
command='stress-ng --vm 1 --vm-bytes 6G --timeout 30s --metrics' | ||
) | ||
action.set_hard_timeout(120) | ||
logger.info(action, extra={'msg_type': 'ACTION'}) | ||
obs = runtime.run_action(action) | ||
logger.info(obs, extra={'msg_type': 'OBSERVATION'}) | ||
assert 'aborted early, out of system resources' in obs.content | ||
assert obs.exit_code == 3 # OOM killed! | ||
|
||
_close_test_runtime(runtime) | ||
|
||
|
||
def test_stress_docker_runtime_within_memory_limits(temp_dir, runtime_cls): | ||
"""Test runtime behavior under resource constraints.""" | ||
runtime, config = _load_runtime( | ||
temp_dir, | ||
runtime_cls, | ||
docker_runtime_kwargs={ | ||
'cpu_period': 100000, # 100ms | ||
'cpu_quota': 100000, # Can use 100ms out of each 100ms period (1 CPU) | ||
'mem_limit': '4G', # 4 GB of memory | ||
'memswap_limit': '0', # No swap | ||
'mem_swappiness': 0, # Disable swapping | ||
'oom_kill_disable': False, # Enable OOM killer | ||
}, | ||
runtime_startup_env_vars={ | ||
'RUNTIME_MAX_MEMORY_GB': '7', | ||
}, | ||
) | ||
|
||
action = CmdRunAction( | ||
command='sudo apt-get update && sudo apt-get install -y stress-ng' | ||
) | ||
logger.info(action, extra={'msg_type': 'ACTION'}) | ||
obs = runtime.run_action(action) | ||
logger.info(obs, extra={'msg_type': 'OBSERVATION'}) | ||
assert obs.exit_code == 0 | ||
|
||
action = CmdRunAction( | ||
command='stress-ng --vm 1 --vm-bytes 6G --timeout 30s --metrics' | ||
) | ||
action.set_hard_timeout(120) | ||
logger.info(action, extra={'msg_type': 'ACTION'}) | ||
obs = runtime.run_action(action) | ||
logger.info(obs, extra={'msg_type': 'OBSERVATION'}) | ||
assert obs.exit_code == 0 | ||
|
||
_close_test_runtime(runtime) |
This file was deleted.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.