Add network activity tracking to nono learn#136
Add network activity tracking to nono learn#136EItanya wants to merge 1 commit intoalways-further:mainfrom
Conversation
Extend the learn command to capture outbound connections and listening ports alongside filesystem accesses. Network activity is discovered by tracing connect, bind, and sendto syscalls via strace. Hostname resolution uses a three-tier strategy: 1. Timing correlation — attaches the hostname from the preceding DNS query directly to each connect() call, handling DNS round-robin 2. Forward DNS — resolves captured hostnames to build IP→hostname map 3. Reverse DNS — fallback for IPs with no other mapping Supports both direct DNS (sendto to port 53) and systemd-resolved (Varlink JSON protocol over Unix socket) for hostname capture. Adds --no-rdns flag to skip all DNS resolution. Output includes network sections in both summary and JSON formats, with a hint suggesting --net-block when network activity is detected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Summary of ChangesHello @EItanya, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Putting this up as a draft because I still want to validate the changes, and review the code more, but so far looking great on my end in terms of functionality. |
There was a problem hiding this comment.
Code Review
The pull request successfully extends the learn command to track network activity, providing a more complete picture of a process's requirements for sandbox profile generation. The multi-tier hostname resolution strategy is well-conceived. However, there are a few areas where the implementation could be more robust, particularly regarding multi-threaded trace accuracy and the fragility of string-based JSON parsing.
| // When a DNS query precedes a connect(), we attach the hostname directly | ||
| // to the NetworkAccess. This handles DNS round-robin (where forward DNS | ||
| // after the fact may resolve to different IPs than the traced program got). | ||
| let mut last_queried_hostname: Option<String> = None; |
There was a problem hiding this comment.
The last_queried_hostname state is shared across all traced processes and threads. Since strace -f interleaves output from multiple PIDs, a DNS query from one thread can incorrectly be associated with a connect call from another. Hostnames should be tracked per PID using a HashMap<u32, String> to ensure accurate attribution in multi-threaded or multi-process applications.
| let unescaped = unescape_strace_string(&buf_str); | ||
|
|
||
| // Extract hostname from "name":"HOSTNAME" in the unescaped JSON | ||
| let name_str = extract_between(&unescaped, "\"name\":\"", "\"")?; |
There was a problem hiding this comment.
Using extract_between to parse JSON from the systemd-resolved Varlink protocol is fragile. It may fail if the JSON structure has different whitespace or if values contain escaped characters. Since serde_json is already a dependency, it should be used to parse the unescaped buffer properly.
| let name_str = extract_between(&unescaped, "\"name\":\"", "\"")?; | |
| let name_str = serde_json::from_str::<serde_json::Value>(&unescaped).ok()?.pointer("/parameters/name")?.as_str()?; |
| "openat,open,access,stat,lstat,readlink,execve,creat,mkdir,rename,unlink,connect,bind,sendto" | ||
| .to_string(), |
There was a problem hiding this comment.
|
I like where this is going, and can see it's got valuable utility - this will fit nicely with the supervisor based proxy , for testing if filtering works! |
Extend the learn command to capture outbound connections and listening ports alongside filesystem accesses. Network activity is discovered by tracing connect, bind, and sendto syscalls via strace.
Hostname resolution uses a three-tier strategy:
Supports both direct DNS (sendto to port 53) and systemd-resolved (Varlink JSON protocol over Unix socket) for hostname capture.
Adds --no-rdns flag to skip all DNS resolution. Output includes network sections in both summary and JSON formats, with a hint suggesting --net-block when network activity is detected.