[RFC] refactor how tool is written #45

wangbj · 2019-05-21T21:50:35Z

systrace allows using a tool shared library (tool) with --tool switch.
A tool basically implements captured_syscall C API, so after systrace
successfully patched a syscall site, it can generate trampoline and can jump
to captured_syscall, so that we can intercerpt the original syscalls.

The tool is loaded by systrace using LD_PRELOAD, hence it is not usable
after LD_PRELOAD is finished. There're already about 20+ syscalls called
by ld-linux.so and they're not catchable. For now this is a hard limitation,
however, we can still catch them by SECCOMP. once the tool is
(LD_PRE)loaded, systrace tries to patch any syscall with predefined rules
(in src/bpf.c). please note we only apply patching when the syscall and
following instructions match our predefined pattern, hence, if there's no
pattern match, patching would not occur. This makes write interception code
cumbersome, because not all syscalls are catchable into captured_syscall
function call in tracee's memory space. The plan is when such case happens,
we could use ptrace SECCOMP stop to inject captured_syscall, forcing tracee
to do this very function call. It is relatively easy to inject real syscalls,
and we've done that in the past many times. however captured_syscall is a
regular C function (written in rust), and it could use mmx/sse registers, hence
it would be more difficult to inject it in the tracer, nonetheless, it should
be possible with proper xsave/xrestore instructions.

In the future, we might install a second seccomp rule in tool's init function,
so that we can patch the syscall either in tracee's memory space, or intercept
the syscall in SIGSYS signal handler, but this also have risks such as the
decoding of ucontext from the signal handler seems complicated, and redicting
control flow in the same task seems more difficult than ptrace.

The tool library is running in tracee's memory space, however, because we
intercept raw syscall, we must be very careful to avoid dead locks. i.e.: doing
allocations could be dangrous, drop (inserted by rust) could be dangerous
as well, because it may call pthread_xxx, which then may call futex syscall.
Even there's no dead lock, doing the extra syscalls can cause performance
degration. Thus the tool must be written in a very strong constrait. We also
have a choice to use std or no_std. using no_std allows the tool not to
have dependencies on any external library (including libc), because of that, we
can rewrite the seccomp filters, allowing all syscalls inside tool memory
range (by checking procfs). however, no_std variant is a lot more difficult
to write, less documented, and have less libraries and features.

After serveral discussion, our captured_syscall could be look like:

pub extern "C" fn captured_syscall(
    p: &mut ProcessState,
    t: &mut ThreadState,
	a: &Args);

ProcessState holds resources sharing among threads, such as unix file
descriptor, signal handlers, etc. while ThreadState holds resources local
to any threads. The hard part is our trampoline, like a reguar syscall,
doesn't know anything, except the syscall no and six arguments. We could
allocate ProcessState during ptrace exec event; and allocate ThreadState
both in exec event and fork/vfork/clone event. however, because the
heap belongs to the tracee only, it could be quite difficult to prepare
those data structures in the tracer, even with help of Serialize/Deserialize.
It could be possible to abuse inject function calls once again, or we could
rewrite all tracees' global allocator, forcing them use the same heap
preallocated by the tracer. This isn't any easier by any means, i.e.: the
tracer will need to expose some APIs to claim/reclaim memory to the tracees;
so that tracees could use the exposed API to implements their own Global
Allocator; It also seems very unsafe, because any tracee have access to the
global heap, shared among the tracer and all tracees.

The text was updated successfully, but these errors were encountered:

gatoWololo · 2019-05-21T22:31:14Z

please note we only apply patching when the syscall and following instructions match our predefined pattern, hence, if there's no pattern match, patching would not occur

To clarify, by pattern you mean instruction patterns that can be easily patched right?

This makes write interception code cumbersome, because not all syscalls are catchable into captured_syscall function call in tracee's memory space

Because they were not patched, instead they were caught by SECCOMP which traps on a ptrace tracer?

however captured_syscall is a regular C function (written in rust), and it could use mmx/sse registers

You're worried about these registers being clovered here. Since classically we only save/restore the more common CPU registers.

allocations could be dangrous, drop (inserted by rust) could be dangerous as well, because it may call pthread_xxx, which then may call futex syscall.

So we're worried about Rust standard library doing system calls as part of the work.

however, no_std variant is a lot more difficult to write, less documented, and have less libraries and features

We would basically have to roll out our own data structures and call system calls ourselves. Granted this would be no different had we done it in C right? Assuming we don't need anything too fancy, we could insert our own mini-libc or functionality that we need. Write it once and use it everywhere? While technically unsafe, we could wrap our functions and data structures in safe interfaces.

or we could rewrite all tracees' global allocator, forcing them use the same heap preallocated by the tracer

I prefer the approach of avoiding rust stdlib all together and hand managing data structures and memory.

wangbj · 2019-05-21T22:50:03Z

To clarify, by pattern you mean instruction patterns that can be easily patched right?
Yes, most syscalls have ssimilar patterns, such as:

 0f 05                   syscall 
48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax

Because they were not patched, instead they were caught by SECCOMP which traps on a ptrace tracer?
Right

You're worried about these registers being clovered here. Since classically we only save/restore the more common CPU registers.
Yes, for syscalls basically we only have to:

push parameters and return address onto tracee's stack
save caller saved registers
set syscall registers (rax+6 args)
syscall
restore caller saved registers
adjust sp and do a retq

Of course if we have ptrace stops or can use breakpoint instruction it would be even easier. For regular function calls, rather than save caller saved registers (rax/rdi/rsi/rdx/rcx/r8/r9/r10/rbx), we also have to save FP registers and xmm/ymm registers, there're instructions like xsave/xrstore so it should be possible.

So we're worried about Rust standard library doing system calls as part of the work.
Yes rust make that quite implicit (even more so than c++), so we need to be careful

We would basically have to roll out our own data structures and call system calls ourselves.
Granted this would be no different had we done it in C right? Assuming we don't need anything
too fancy, we could insert our own mini-libc or functionality that we need. Write it once and
use it everywhere? While technically unsafe, we could wrap our functions and data structures
in safe interfaces.

Right, with C we actually have more direct control on how the tool is linked, for rust it is harder. For instance, with C we can built libc.a from musl-libc, then link our tool with libc.a (static), then use objcopy -G<symbol_a> -G<symbol_b> ... to control symbol visibility. with rust I've found no_std is the only way to archive that so far. Rust does have musl target, but it doesn't work well with cdylib, at least with +crt-static (for cdylib).

I think use no_std is a better choice too, as mentioned, it has its own downside, none the less.

rrnewton · 2019-05-29T20:16:28Z

forcing them use the same heap preallocated by the trace

Are you referring here to the "shared global memory" option (rather than the message-passing/RPC approach to globalState)? We have a complicated decision tree of possible futures we're considering, so good to clarify which branch we're on ;-).

rrnewton · 2019-05-29T20:32:01Z

because of that, we can rewrite the seccomp filters, allowing all syscalls inside tool memory range (by checking procfs)

Why is this additional "whitelisting" approach specific to no_std only? Even if you have a tool/plugin that uses full featured libc + Rust stdlib, as long as everything is statically linked, couldn't you in principle whitelist all code inside that tool?

wangbj · 2019-05-30T03:56:55Z

The prerequisite is to make sure the tool shared library is a standalone library doesn't link against any other libraries, so that everything is self contained. If the guarantee satisfies, then we know it has all its syscall instruction self-contained as well, so that we can create a filter, allow all syscall to be whitelisted within the tool.

It would not work if the tool linked with external library, such as glibc, because when the tool calls read@glibc, it would escaped the whitelist, and we're not whitelisting glibc syscalls.

rrnewton mentioned this issue May 29, 2019

Link-method / format for Rust plugins that statically link libc #48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] refactor how tool is written #45

[RFC] refactor how tool is written #45

wangbj commented May 21, 2019

gatoWololo commented May 21, 2019 •

edited

Loading

wangbj commented May 21, 2019

rrnewton commented May 29, 2019

rrnewton commented May 29, 2019

wangbj commented May 30, 2019

[RFC] refactor how tool is written #45

[RFC] refactor how tool is written #45

Comments

wangbj commented May 21, 2019

gatoWololo commented May 21, 2019 • edited Loading

wangbj commented May 21, 2019

rrnewton commented May 29, 2019

rrnewton commented May 29, 2019

wangbj commented May 30, 2019

gatoWololo commented May 21, 2019 •

edited

Loading