Skip to content

Comments

Add network activity tracking to nono learn#136

Draft
EItanya wants to merge 1 commit intoalways-further:mainfrom
EItanya:learn-network
Draft

Add network activity tracking to nono learn#136
EItanya wants to merge 1 commit intoalways-further:mainfrom
EItanya:learn-network

Conversation

@EItanya
Copy link

@EItanya EItanya commented Feb 20, 2026

Extend the learn command to capture outbound connections and listening ports alongside filesystem accesses. Network activity is discovered by tracing connect, bind, and sendto syscalls via strace.

Hostname resolution uses a three-tier strategy:

  1. Timing correlation — attaches the hostname from the preceding DNS query directly to each connect() call, handling DNS round-robin
  2. Forward DNS — resolves captured hostnames to build IP→hostname map
  3. Reverse DNS — fallback for IPs with no other mapping

Supports both direct DNS (sendto to port 53) and systemd-resolved (Varlink JSON protocol over Unix socket) for hostname capture.

Adds --no-rdns flag to skip all DNS resolution. Output includes network sections in both summary and JSON formats, with a hint suggesting --net-block when network activity is detected.

Extend the learn command to capture outbound connections and listening
ports alongside filesystem accesses. Network activity is discovered by
tracing connect, bind, and sendto syscalls via strace.

Hostname resolution uses a three-tier strategy:
1. Timing correlation — attaches the hostname from the preceding DNS
   query directly to each connect() call, handling DNS round-robin
2. Forward DNS — resolves captured hostnames to build IP→hostname map
3. Reverse DNS — fallback for IPs with no other mapping

Supports both direct DNS (sendto to port 53) and systemd-resolved
(Varlink JSON protocol over Unix socket) for hostname capture.

Adds --no-rdns flag to skip all DNS resolution. Output includes
network sections in both summary and JSON formats, with a hint
suggesting --net-block when network activity is detected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
@gemini-code-assist
Copy link

Summary of Changes

Hello @EItanya, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the nono learn command by integrating robust network activity monitoring. It allows users to observe and understand the network interactions of a traced program, providing insights into outbound connections and listening services. The sophisticated hostname resolution ensures that network endpoints are presented in a human-readable format, making the learned profiles more comprehensive and actionable for security and policy enforcement.

Highlights

  • Network Activity Tracking: The learn command now captures outbound network connections and listening ports by tracing connect, bind, and sendto syscalls using strace.
  • Hostname Resolution Strategies: Implemented a three-tier strategy for hostname resolution: timing correlation with preceding DNS queries, forward DNS lookups for captured hostnames, and reverse DNS as a fallback for unresolved IPs.
  • DNS Protocol Support: Supports hostname capture from both direct DNS queries (sendto to port 53) and systemd-resolved (Varlink JSON protocol over Unix sockets).
  • New CLI Flag: Added a --no-rdns flag to skip all DNS resolution, providing more control over the learning process.
  • Enhanced Output: Network activity is now included in both the summary and JSON output formats, with a hint suggesting the use of --net-block when network activity is detected.
Changelog
  • Cargo.lock
    • Added dns-lookup crate as a new dependency.
    • Added socket2 crate as a new dependency.
    • Updated windows-sys and windows-targets dependencies to newer versions.
  • crates/nono-cli/Cargo.toml
    • Added dns-lookup dependency specifically for Linux targets.
  • crates/nono-cli/src/cli.rs
    • Introduced a new --no-rdns CLI argument to disable reverse DNS lookups.
  • crates/nono-cli/src/learn.rs
    • Updated LearnResult to include outbound_connections and listening_ports fields.
    • Added has_network_activity method to LearnResult to check for network events.
    • Modified to_json and to_summary methods to incorporate network activity data.
    • Introduced NetworkAccessKind, NetworkAccess, NetworkEndpoint, NetworkConnectionSummary, and TracedAccess structs/enums for network data representation.
    • Refactored run_strace to capture file accesses, network accesses, and DNS queries, and to use timing-based DNS correlation.
    • Extended parse_strace_line to handle connect, bind, and sendto syscalls, including parsing DNS queries from sendto buffers.
    • Added helper functions (extract_between, parse_network_syscall, parse_dns_sendto, parse_resolved_sendto, extract_sendto_buffer, unescape_strace_bytes, parse_dns_query_hostname) for detailed strace output parsing.
    • Implemented process_network_accesses to aggregate and resolve hostnames for network connections using a multi-tiered approach.
    • Added resolve_forward_dns and resolve_reverse_dns functions for hostname resolution.
    • Included comprehensive unit tests for network parsing, DNS parsing, and output formatting.
  • crates/nono-cli/src/main.rs
    • Updated the learn command's initial message to reflect network activity tracing.
    • Added a post-execution message to suggest --net-block if network activity is detected.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@EItanya
Copy link
Author

EItanya commented Feb 20, 2026

Putting this up as a draft because I still want to validate the changes, and review the code more, but so far looking great on my end in terms of functionality.

./target/debug/nono learn -- curl example.com
WARNING: nono learn runs the command WITHOUT any sandbox restrictions.
The command will have full access to your system to discover required paths.

Continue? [y/N] y

nono learn - Tracing file accesses and network activity...

<!doctype html><html lang="en"><head><title>Example Domain</title><meta name="viewport" content="width=device-width, initial-scale=1"><style>body{background:#eee;width:60vw;margin:15vh auto;font-family:system-ui,sans-serif}h1{font-size:1.5em}div{opacity:0.8}a:link,a:visited{color:#348}</style></head><body><div><h1>Example Domain</h1><p>This domain is for use in documentation examples without needing permission. Avoid use in operations.</p><p><a href="https://iana.org/domains/example">Learn more</a></p></div></body></html>
Read access needed:
  /home/<redacted>
  /home/<redacted>/.config

Outbound connections:
  example.com (104.18.26.120):80 (2x)
  example.com (104.18.27.120):80
  example.com (2606:4700::6812:1a78):80 (2x)
  example.com (2606:4700::6812:1b78):80 (2x)

To use these paths, add them to your profile or use --read/--write/--allow flags.
Network activity detected. Use --net-block to restrict network access.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully extends the learn command to track network activity, providing a more complete picture of a process's requirements for sandbox profile generation. The multi-tier hostname resolution strategy is well-conceived. However, there are a few areas where the implementation could be more robust, particularly regarding multi-threaded trace accuracy and the fragility of string-based JSON parsing.

// When a DNS query precedes a connect(), we attach the hostname directly
// to the NetworkAccess. This handles DNS round-robin (where forward DNS
// after the fact may resolve to different IPs than the traced program got).
let mut last_queried_hostname: Option<String> = None;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The last_queried_hostname state is shared across all traced processes and threads. Since strace -f interleaves output from multiple PIDs, a DNS query from one thread can incorrectly be associated with a connect call from another. Hostnames should be tracked per PID using a HashMap<u32, String> to ensure accurate attribution in multi-threaded or multi-process applications.

let unescaped = unescape_strace_string(&buf_str);

// Extract hostname from "name":"HOSTNAME" in the unescaped JSON
let name_str = extract_between(&unescaped, "\"name\":\"", "\"")?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using extract_between to parse JSON from the systemd-resolved Varlink protocol is fragile. It may fail if the JSON structure has different whitespace or if values contain escaped characters. Since serde_json is already a dependency, it should be used to parse the unescaped buffer properly.

Suggested change
let name_str = extract_between(&unescaped, "\"name\":\"", "\"")?;
let name_str = serde_json::from_str::<serde_json::Value>(&unescaped).ok()?.pointer("/parameters/name")?.as_str()?;

Comment on lines +340 to +341
"openat,open,access,stat,lstat,readlink,execve,creat,mkdir,rename,unlink,connect,bind,sendto"
.to_string(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation only traces sendto for DNS queries. Modern runtimes and libraries (e.g., Go or async frameworks) often use sendmsg or sendmmsg for network I/O. Adding these to the traced syscall list and updating the parsing logic would improve the coverage of network activity detection.

@lukehinds
Copy link
Collaborator

I like where this is going, and can see it's got valuable utility - this will fit nicely with the supervisor based proxy , for testing if filtering works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants