Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alive check for syscall server #287

Closed
wants to merge 0 commits into from

Conversation

Officeyutong
Copy link
Contributor

@Officeyutong Officeyutong commented Apr 28, 2024

Closes #178

Adds a watchdog for agents. If syscall server was dead, agents will exit automacitally.

When starting up, both agent and server will start a separate thread. The thread at server side will keep updating a time stamp stored in shared memory every 50ms, indicating that server is still alive till this time. At the agent side, the thread will keep reading this time stamp from shared memory, and check if the time stamp hasn't been updated for over 150ms. If succeeded, it will regard server as dead, and start to detach.

Demo

Term1

root@mnfe-pve:~/bpftime/example/malloc# bpftime load ./malloc
[2024-05-01 18:53:44.152] [info] [syscall_context.hpp:86] manager constructed
libbpf: loading object 'malloc_bpf' from buffer
libbpf: elf: section(2) .symtab, size 192, link 1, flags 0, type=2
libbpf: elf: section(3) uprobe/libc.so.6:malloc, size 440, link 0, flags 6, type=1
libbpf: sec 'uprobe/libc.so.6:malloc': found program 'do_count' at insn offset 0 (0 bytes), code size 55 insns (440 bytes)
libbpf: elf: section(4) .rodata.str1.1, size 27, link 0, flags 32, type=1
libbpf: elf: section(5) .maps, size 32, link 0, flags 3, type=1
libbpf: elf: section(6) license, size 4, link 0, flags 3, type=1
libbpf: license of malloc_bpf is GPL
libbpf: elf: section(7) .reluprobe/libc.so.6:malloc, size 64, link 2, flags 40, type=9
libbpf: elf: section(8) .BTF, size 1434, link 0, flags 0, type=1
libbpf: elf: section(9) .BTF.ext, size 384, link 0, flags 0, type=1
libbpf: looking for externs among 8 symbols...
libbpf: collected 0 externs total
libbpf: map 'libc_malloc_calls_total': at sec_idx 5, offset 0.
libbpf: map 'libc_malloc_calls_total': found type = 1.
libbpf: map 'libc_malloc_calls_total': found key [8], sz = 4.
libbpf: map 'libc_malloc_calls_total': found value [12], sz = 8.
libbpf: map 'libc_malloc_calls_total': found max_entries = 1024.
libbpf: map '.rodata.str1.1' (global data): at sec_idx 4, offset 0, flags 80.
[2024-05-01 18:53:44.156] [info] [syscall_server_utils.cpp:24] Initialize syscall server
[2024-05-01 18:53:44][error][1637850] pkey_alloc failed
[2024-05-01 18:53:44][info][1637850] Global shm constructed. shm_open_type 0 for bpftime_maps_shm
[2024-05-01 18:53:44][info][1637850] Global shm initialized
[2024-05-01 18:53:44][info][1637850] Enabling helper groups ufunc, kernel, shm_map by default
[2024-05-01 18:53:44][info][1637850] bpftime-syscall-server started
[2024-05-01 18:53:44][info][1637851] Server side watchdog started
libbpf: map 1 is ".rodata.str1.1"
libbpf: sec '.reluprobe/libc.so.6:malloc': collecting relocation for section(3) 'uprobe/libc.so.6:malloc'
libbpf: sec '.reluprobe/libc.so.6:malloc': relo #0: insn #24 against 'libc_malloc_calls_total'
libbpf: prog 'do_count': found map 0 (libc_malloc_calls_total, sec 5, off 0) for insn #24
libbpf: sec '.reluprobe/libc.so.6:malloc': relo #1: insn #32 against 'libc_malloc_calls_total'
libbpf: prog 'do_count': found map 0 (libc_malloc_calls_total, sec 5, off 0) for insn #32
libbpf: sec '.reluprobe/libc.so.6:malloc': relo #2: insn #37 against 'libc_malloc_calls_total'
libbpf: prog 'do_count': found map 0 (libc_malloc_calls_total, sec 5, off 0) for insn #37
libbpf: sec '.reluprobe/libc.so.6:malloc': relo #3: insn #49 against 'libc_malloc_calls_total'
libbpf: prog 'do_count': found map 0 (libc_malloc_calls_total, sec 5, off 0) for insn #49
libbpf: map 'libc_malloc_calls_total': created successfully, fd=4
libbpf: map '.rodata.str1.1': created successfully, fd=5
libbpf: resolved 'libc.so.6' to '/lib/x86_64-linux-gnu/libc.so.6'
libbpf: elf: symbol address match for 'malloc' in '/lib/x86_64-linux-gnu/libc.so.6': 0x98860
[2024-05-01 18:53:44][info][1637850] Created uprobe/uretprobe perf event handler, module name /lib/x86_64-linux-gnu/libc.so.6, offset 98860
18:53:45 
18:53:46 

Term2

Croot@mnfe-pve:~/bpftime/example/malloc# bpftime start ./victim
[2024-05-01 18:54:37.575] [info] [agent.cpp:75] Entering bpftime agent
[2024-05-01 18:54:37.576] [error] [bpftime_shm_internal.cpp:669] pkey_alloc failed
[2024-05-01 18:54:37.576] [info] [bpftime_shm_internal.cpp:687] Global shm constructed. shm_open_type 1 for bpftime_maps_shm
[2024-05-01 18:54:37.576] [info] [bpftime_shm_internal.cpp:38] Global shm initialized
[2024-05-01 18:54:37.577] [info] [bpftime_shm_internal.cpp:833] Agent side watchdog started
[2024-05-01 18:54:37.578] [info] [bpf_attach_ctx.cpp:171] Register attach-impl defined helper bpf_get_func_arg, index 183
[2024-05-01 18:54:37.578] [info] [bpf_attach_ctx.cpp:171] Register attach-impl defined helper bpf_get_func_ret_id, index 184
[2024-05-01 18:54:37.578] [info] [bpf_attach_ctx.cpp:171] Register attach-impl defined helper bpf_get_retval, index 186
[2024-05-01 18:54:37.578] [info] [agent.cpp:162] Initializing agent..
[2024-05-01 18:54:37][info][1638300] Initializing llvm
[2024-05-01 18:54:37][warning][1638300] Not implemented yet: toggle_bounds_check
[2024-05-01 18:54:37][info][1638300] Executable path: /root/bpftime/example/malloc/victim
malloc called from pid 1638300
malloc called from pid 1638300
[2024-05-01 18:54:37][info][1638300] Attach successfully
malloc called from pid 1638300
continue malloc...
malloc called from pid 1638300

Then, stop syscall server (Ctrl+C) and have a look at agent

continue malloc...
malloc called from pid 1638300
[2024-05-01 18:54:39][error][1638300] Expected fd 4 to be a map fd (map_ptr_by_fd)
[2024-05-01 18:54:39][error][1638300] Expected fd 4 to be a map fd (map_ptr_by_fd)
[2024-05-01 18:54:39][error][1638300] Expected fd 4 to be a map fd (map_ptr_by_fd)
continue malloc...
[2024-05-01 18:54:39][warning][1638301] Syscall server seems to be dead, agent will exit now
malloc called from pid 1638300
[2024-05-01 18:54:39][error][1638301] Expected fd 4 to be a map fd (map_ptr_by_fd)
[2024-05-01 18:54:39][error][1638301] Expected fd 4 to be a map fd (map_ptr_by_fd)
[2024-05-01 18:54:39][error][1638301] Expected fd 4 to be a map fd (map_ptr_by_fd)
[2024-05-01 18:54:39][info][1638301] Agent side watchdog exited
continue malloc...
continue malloc...
continue malloc

@Officeyutong Officeyutong requested a review from yunwei37 May 1, 2024 05:23
@Officeyutong Officeyutong marked this pull request as ready for review May 1, 2024 05:23
@yunwei37
Copy link
Member

yunwei37 commented May 1, 2024

I think we might need some further discussion on this design.

For example, what happens if the agent or server is running a fork()?

@Officeyutong
Copy link
Contributor Author

I think we might need some further discussion on this design.

For example, what happens if the agent or server is running a fork()?

The new process will only contain the thread that executed fork()

@Officeyutong
Copy link
Contributor Author

I think we might need some further discussion on this design.

For example, what happens if the agent or server is running a fork()?

We may need to discuss about how we should handle forked processes

@yunwei37
Copy link
Member

Can we reopen this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] When bpftime load is interrupted or killed, the target process's injected code should be removed
2 participants