Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
seccomp-bpf
: Render syscall rules after binary search tree traversa…
…l code. Structure before: ``` if sysno < current node value: goto left_node if sysno > current node value: goto right_node // Render rules for current sysno here... left_node: // Recursively render left node code here... right_node: // Recursively render right node code here... ``` This is fine, but if the "render rules for current sysno" part is larger enough for any syscall in the BST, this makes the jumps to `left_node` and `right_node` of the current nodes (and all its ancestor nodes) have to use unconditional jumps (i.e. extra instructions) during BST traversal. Since BST traversal must be fast, it is better to keep all the rules for each syscall separate from the BST traversal code. This ensures that the BST traversal code all fits in the maximum unconditional jump size (255 instructions), and then we do just one possibly-unconditional jump to the set of rules for that syscall. Compare that to the previous structure, where multiple jumps during BST traversal could have been unconditional jumps. Structure after: ``` if sysno < current node value: goto left_node if sysno > current node value: goto right_node goto sysno_rules left_node: // Recursively render left node traversal code here... right_node: // Recursively render right node traversal code here... sysno_rules: // Render rules for current sysno here... // Recursively render left node syscall rules code here... // Recursively render right node syscall rules code here... ``` With all the optimizations done in previous CLs, BSTs in practice are actually small enough that both the traversal and and the syscall rules together all fit under 255 instructions, so this only rarely comes into play. However, as we add more syscalls and syscall rules, the effect of this optimization should increase. This actually results in slightly larger bytecode (because most syscall filter rules do fit in just a few instructions, but now they have to be grouped at the end which takes a bigger jump to reach), but *execution* is still faster, because only one unconditional jump is ever done per program execution. Benchmarks expectedly don't show much change, except KVM which is quite happy for some reason: ``` │ before │ after │ │ sec/op │ sec/op vs base │ SentrySystrap/Postgres-48 51.43n ± 20% 50.50n ± 14% ~ (p=0.394 n=93+92) SentryKVM/Postgres-48 48.15n ± 10% 40.19n ± 14% -16.53% (p=0.012 n=96+99) NVProxyIoctl/nvproxy-48 65.75n ± 2% 65.43n ± 2% ~ (p=0.703 n=99+98) geomean 57.84n 57.69n -0.49% ``` (Most other benchmarks including`futex` and such show no change, which makes sense because they're part of the "hot" syscalls which aren't in a BST at all.) PiperOrigin-RevId: 595816325
- Loading branch information