Skip to content

chf3198/crostini-mem-watchdog

🛡️ Mem Watchdog

VS Code OOM protection for ChromeOS Crostini

VS Marketplace Installs License: PolyForm NC Platform Tests Tests

earlyoom hard-crashes on Crostini (exit 104, every 3 seconds, zero protection). This replaces it with a VS Code-aware watchdog that kills Chrome before the kernel OOM-kills VS Code.


⚡ Quick Install

Via VS Code Marketplace (recommended — auto-installs and manages the daemon):

ext install CurtisFranks.mem-watchdog-status

Or search "Mem Watchdog" in the VS Code Extensions panel. The extension installs the systemd daemon automatically on first activation.

Shell-only (no VS Code required):

git clone https://github.com/chf3198/crostini-mem-watchdog.git
cd crostini-mem-watchdog && bash install.sh

📌 Project Management & Contribution Traceability

This repository uses a public issue-first workflow so visitors can audit all work history end-to-end.

Every pull request must include Closes #N, milestone assignment, and label coverage so contribution lineage is always visible. Release/publish gate: client UAT PASS is mandatory before release. Client involvement is limited to design consultation and UAT.


The Problem

ChromeOS Crostini runs VS Code inside a Debian LXC container. Three independent OOM pathways can kill it:

# Pathway What kills VS Code
1 Container RAM exhausted Linux kernel OOM killer — SIGKILL on the highest-RSS process (VS Code)
2 ChromeOS balloon driver shrinks the VM ChromeOS host memory pressure
3 V8 heap exhausted at a hard cap V8 OOM handler — allocation failure in the extension host

ChromeOS zram swap only covers Pathway #2. The container kernel's OOM killer has no knowledge of host-level swap. free -h showing Swap: 0B inside the container is not cosmetic — it is the kernel's actual memory budget for OOM scoring.

Why earlyoom fails here

The Crostini kernel reports a uint64 overflow sentinel for SwapFree when no swap is configured:

/proc/meminfo → SwapFree: 18446744073709551360 kB   ← 2^64 − 256

earlyoom passes this to C's strtol() → signed overflow → exit code 104 → crash-loop every 3 seconds → zero protection. This is the silent default state on every Crostini machine where earlyoom appears "running."

This watchdog reads only MemAvailable and MemTotal — both correct on this kernel — and never touches SwapFree.


What It Does

Trigger Action
MemAvailable ≤ 15% SIGKILL Chrome / Playwright
MemAvailable ≤ 25% SIGTERM Chrome / Playwright
PSI full avg10 > 25% SIGTERM Chrome (sustained memory stall)
VS Code RSS > 3.4 GB RSS warn path: kill disposable targets or lowest-value VS Code helper if RSS is accelerating
VS Code RSS > 3.8 GB RSS emergency circuit-breaker: critical helper kill, then controlled VS Code main restart if needed
RSS velocity spike Kill lowest-value VS Code helper — never a language server or Extension Host at normal severity
Every loop Set oom_score_adj=0 on VS Code PIDs (counters Electron's 200–300 default)
Every loop Set oom_score_adj=1000 on Chrome PIDs (kernel kills it first)

Language-server protection: Built-in language servers (HTML, JSON, CSS, Markdown, ESLint) and the Extension Host process are excluded from the helper-kill candidate pool at normal severity. These 80–120 MB processes are subject to VS Code's 5-crash-in-3-minutes permanent death threshold — killing them saves <2% of memory for 2 seconds (they respawn immediately) while permanently disabling language intelligence until a full VS Code restart. Only at true emergency severity (RSS ≥ 3.8 GB) are they considered as last-resort targets.

Automation awareness: When an automation session is active (detected via TIER_DISPOSABLE_PATTERN_AUX, default node.*(playwright|puppeteer|cypress|selenium-webdriver)), the daemon defers non-critical disposable kills and skips the DISPOSABLE_COUNT_MAX cap. At true emergency severity (≤ 15% free), disposable processes are always killed regardless (safety net).

  • Checks every 2 seconds (4s was confirmed too slow — missed a 4 GB spike in < 4s on 2026-03-05)
  • Startup mode: 0.5 s polling for 90 s after new VS Code PIDs detected — catches extension-host spikes during startup
  • Reads only MemAvailable, MemTotal, and /proc/pressure/memory (PSI) — all safe on Crostini
  • Logs via logger -t mem-watchdog → journald (no /tmp writes)
  • Desktop notifications via notify-send, throttled to 1 per 5 minutes per severity level
  • Chat Continuity Guard in the VS Code extension can archive oversized Copilot chat sessions, generate a continuity pack, open a resumable prompt, and reload VS Code before the extension host re-parses a multi-hundred-MB chat JSON. It also preemptively auto-archives oversized sessions in inactive workspaces so workspace switches do not trigger session-load OOM spikes.

Architecture

crostini-mem-watchdog/
├── mem-watchdog.sh              ← core daemon (bash, coreutils only)
├── mem-watchdog.service         ← systemd user service unit
├── install.sh                   ← shell-only installer (no VS Code required)
├── tests/
│   ├── test-watchdog.sh         ← 20-test validation suite
│   ├── test-pressure.sh         ← live memory pressure tests
│   ├── stress-harness.sh        ← Playwright stress test harness
│   └── stress-playwright.js     ← Playwright automation driver
├── scripts/
│   ├── mem-watchdog-mode.sh   ← repo-agnostic SLEEP/NORMAL mode helper
│   ├── watchdog-tray.sh         ← optional: yad system tray icon
│   ├── docs-integrity-check.sh  ← documentation drift check
│   ├── extension-footprint-advisor.sh ← extension RAM advisor (prototype)
│   └── psi-calibration.sh       ← PSI threshold calibration
└── vscode-extension/            ← VS Code extension (primary install path)
    ├── extension.js             ← activate(): install → skill → config → commands → status bar → update check → chat
    ├── installer.js             ← SHA-256 hash-based auto-install/upgrade + downgrade protection + MIN_SAFE guard
    ├── configWriter.js          ← VS Code Settings → ~/.config/mem-watchdog/config.sh
    ├── commands.js              ← dashboard, preflight, killDisposable, restartService, optimizeMemory, createLowMemProfile, chatRescue
    ├── chatContinuity.js        ← oversized chat archival + continuity pack + resumable prompt generation
    ├── updateChecker.js         ← GitHub Releases API self-update check (24h throttled)
    ├── skillInstaller.js        ← installs/updates ~/.copilot/skills/mem-watchdog-ops/
    ├── chatParticipant.js       ← @memwatchdog chat participant: /status, /logs, /tune, /act, /optimize, /lowmem
    ├── utils.js                 ← readMeminfo(), readPsi(), sh(), checkServiceStatus() — shared helpers
    ├── lifecycle.js             ← vscode:uninstall: stop + disable service
    └── scripts/prepare.js       ← vscode:prepublish: bundles daemon files into resources/

The daemon must remain a separate systemd process. VS Code's extension host can freeze under OOM pressure — the daemon's independence is the protection. The VS Code extension manages the daemon; it does not replace it.

Config sourcing: mem-watchdog.sh sources ~/.config/mem-watchdog/config.sh (if present) after its own defaults. VS Code Settings writes this file. The daemon script itself is never modified at runtime, which makes SHA-256 hash-based upgrade detection exact.


Configuration

Preferred: VS Code Settings → Mem Watchdog — changes apply on next daemon restart.

Manual fallback: top-of-file variables in mem-watchdog.sh:

Variable Default Description
SIGTERM_THRESHOLD 25 % free RAM → SIGTERM Chrome
SIGKILL_THRESHOLD 15 % free RAM → SIGKILL Chrome
PSI_THRESHOLD 25 PSI full avg10 % → SIGTERM Chrome
INTERVAL 2 Seconds between normal checks
STARTUP_INTERVAL 0.5 Seconds between checks in startup mode
STARTUP_DURATION 90 Seconds to stay in startup mode after new VS Code PIDs
VSCODE_RSS_WARN_KB 3400000 ~3.4 GB — VS Code RSS warning level
VSCODE_RSS_EMERG_KB 3800000 ~3.8 GB — VS Code RSS emergency level
NOTIFY_INTERVAL 300 Seconds between desktop notifications per severity

Chat Continuity Guard

When Copilot chat history becomes dangerously large, use the extension command Mem Watchdog: Rescue Oversized Chat Session. It:

  1. Moves giant session JSON out of VS Code's live chatSessions/ store.
  2. Writes a continuity bundle to ~/.config/mem-watchdog/chat-archives/<timestamp>-<workspaceId>/.
  3. Generates:
  • session-original.json — the full archived session
  • continuity-pack.md — compact forensic summary
  • resume.prompt.md — paste-ready fresh-chat prompt
  1. Optionally reloads VS Code immediately so the extension host does not re-parse the oversized chat on restart.

Background guard behavior:

  • Active workspace oversized session: prompt/auto behavior based on chatGuard.autoRescue.
  • Inactive workspace oversized session: preemptive auto-archive (no reload) to prevent cross-workspace startup OOM.

Managed Protection Window (Repo-Agnostic)

Use the installed helper script to safely run any automation command in a watchdog-managed SLEEP window (no repo-specific integration required):

~/.local/bin/mem-watchdog-mode.sh run -- node scripts/mcp/local_capture_test.js

Other commands:

~/.local/bin/mem-watchdog-mode.sh status
~/.local/bin/mem-watchdog-mode.sh sleep
~/.local/bin/mem-watchdog-mode.sh normal

Tuning for Your RAM

Total RAM VSCODE_RSS_WARN_KB VSCODE_RSS_EMERG_KB
4 GB 1500000 2000000
6 GB (default) 3400000 3800000
8 GB 3500000 5000000
16 GB 6000000 10000000

Validation

All 4 gates must pass before any change is published:

bash tests/test-watchdog.sh              # 20 bash tests (~3 s) — service, OOM scores, PSI, RSS circuit-breaker policy, SwapFree safety, startup chat-footprint warning scan, SIGTERM
cd vscode-extension && npm test    # 184 JS unit tests (~1 s) — continuity rescue flow (including cross-workspace preemptive archival), extension state machine, activation singleton, low-memory profile guidance, utils
bash -n mem-watchdog.sh            # bash syntax check
shellcheck --shell=bash -e SC1091,SC2317 mem-watchdog.sh scripts/watchdog-tray.sh install.sh
bash tests/test-pressure.sh    # live: allocates memory, verifies watchdog fires (requires < 40% RAM free)

Future Features (Planned)

  • Extension Footprint Advisor (diagnostic mode):
    • Sample code --status + extension-host child processes over a short window.
    • Rank extension helpers by RSS contribution and restart churn.
    • Emit a recommendation report: keep, workspace-disable, or remove.
    • Provide safe, staged guidance (disable in workspace first; uninstall only after validation).
    • Integrate with watchdog STATUS counters to correlate extension mix vs emergency/warn frequency.

Prototype script available now:

bash scripts/extension-footprint-advisor.sh /path/to/workspace


Stress Test Sessions

This section records observed crash events from live development sessions where mem-watchdog was running as protection.


Session 2026-03-10 — Crostini VM Termination via VS Code Crash-Restart Loop

Duration: 17:58 – 21:22 (3h 24m)
Workload: FPW website development (HTML/CSS editing in VS Code) + MCP Playwright browser (visual QA)
Outcome: Full Crostini container terminated by ChromeOS host at 21:22:07

Crash fingerprint (journalctl):

Mar 10 21:22:07  awk: cannot open /proc/meminfo (Transport endpoint is not connected)
Mar 10 21:22:07  systemd[144]: Activating special unit exit.target...
Mar 10 21:22:07  [12+ app-com.google.Chrome-*.scope stopped]
Mar 10 21:22:07  cros-garcon.service: Main process exited, code=dumped, status=6/ABRT

What happened:

  1. MCP Playwright server continuously spawned Chrome processes. Watchdog set oom_score_adj=1000 on each and sent SIGTERM on each VS Code startup event.
  2. VS Code extension-host crashed and restarted 30+ times over 3.5 hours. Each restart triggered a new startup-mode SIGTERM to Chrome — but Chrome (MCP server) kept respawning.
  3. 12+ Chrome scopes accumulated throughout the session, each adding to baseline RSS.
  4. The sustained restart loop exhausted the ChromeOS VM. At 21:22:07, the virtio socket was severed (Transport endpoint is not connected), and ChromeOS terminated the Crostini container.

Watchdog gaps exposed:

Gap Issue Severity
No restart-loop detection — each VS Code crash triggered fresh SIGTERM instead of escalating #25 High
No Chrome process count cap — 12+ Chrome scopes accumulated across session #26 High
No VM health canary — /proc/meminfo unreadable swallowed silently #27 Medium

What the watchdog did right:

  • ✅ Protected VS Code (no kernel OOM kill of VS Code itself in this session — contrast with 2026-03-05)
  • ✅ Correctly set oom_score_adj=1000 on Chrome processes
  • ✅ SIGTERM'd Chrome on each VS Code startup to free memory

Key distinction from 2026-03-05 crash: This crash was a VM-level termination, not a kernel OOM kill of VS Code. The watchdog successfully prevented OOM kills but was overwhelmed by the sustained restart-loop pressure that the ChromeOS host eventually terminated.


Crash That Started This

On 2026-03-05 at 13:02:25, VS Code process 778 was OOM-killed:

[ 2610.439831] code invoked oom-killer
[ 2610.522581] Out of memory: Killed process 778 (code)
               total-vm:1463369736kB, anon-rss:4087536kB, oom_score_adj:0

The extension host spiked from normal RSS to ~4 GB in under 4 seconds during startup. The watchdog's interval was 4 s — it fired at 13:02:32, 7 seconds after the crash. No Chrome was running, so there was nothing to kill anyway.

Three fixes:

  1. Interval: 4 s → 2 s normal, 0.5 s during startup mode
  2. RSS threshold: lowered to 3.2 GB emergency (earlier intervention)
  3. Last resort: if no Chrome to kill, SIGTERM the highest-RSS code process to save the VS Code window

Requirements

  • ChromeOS Crostini (Debian 11+) or any Linux with systemd user services
  • bash ≥ 5.0
  • notify-send optional (desktop alerts — sudo apt install libnotify-bin)
  • yad optional (system tray icon — sudo apt install yad)
  • VS Code 1.74+ optional (for the extension)

License

PolyForm Noncommercial 1.0.0 — free for personal, educational, and non-commercial use.

Commercial use (including internal company tooling) requires a paid license. See COMMERCIAL-LICENSE.md or contact curtisfranks@gmail.com.

About

VS Code extension + systemd daemon that prevents OOM-kills on ChromeOS Crostini — replaces broken earlyoom

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors