Rivendell Forensic Platform - Key Functionality

This document describes the critical architectural features and operational characteristics of the Rivendell forensic analysis platform.

Contents:

Integration Summary
Progress Tracking
Deferred Memory Processing
File Descriptor Management
OrbStack File Handle Accumulation

Integration Summary

✅ File Descriptor Retry Logic Integration

The retry-on-file-descriptor-limit logic has been integrated into the critical file processing paths of Rivendell.

Files Modified

1. File Collection (`rivendell/collect/files/compare.py`)

Changes:

Added imports: from analysis.utils import retry_on_fd_limit, safe_open
Wrapped compare_include_exclude() with @retry_on_fd_limit(max_retries=10, initial_wait=1.0, max_wait=30.0)
Replaced open() with safe_open() for include/exclude filter files (2 instances)

Impact:

All file collection operations now automatically retry on FD exhaustion
10 retries with 1-30 second waits (exponential backoff)
Critical for tests like win_collect_files_* that process thousands of files

2. IOC Extraction (`rivendell/analysis/iocs.py`)

Changes:

Added imports: from analysis.utils import retry_on_fd_limit, safe_open
Wrapped load_ioc_watchlist() with @retry_on_fd_limit(max_retries=5)
Wrapped compare_iocs() with @retry_on_fd_limit(max_retries=10, initial_wait=1.0, max_wait=30.0)
Replaced open() with safe_open() for:
- Watchlist file loading
- IOC file reading during extraction

Impact:

IOC extraction from collected files now resilient to FD limits
Critical for tests like win_extract_iocs that scan many files

How It Works

Before Integration:

def collect_files(...):
    with open(file_path, 'r') as f:  # ❌ Fails immediately at limit
        data = f.read()

Result: OSError: [Errno 24] Too many open files → Test FAILS

After Integration:

@retry_on_fd_limit(max_retries=10, initial_wait=1.0, max_wait=30.0)
def collect_files(...):
    with safe_open(file_path, 'r') as f:  # ✅ Retries with backoff
        data = f.read()

Result:

Hit FD limit → Trigger garbage collection
Wait 1.0s → Retry
If still failing → Wait 2.0s → Retry
Continue up to 10 times
If all retries exhausted → Raise FileDescriptorLimitError with helpful message

Retry Configuration

Function	Max Retries	Initial Wait	Max Wait	Use Case
`compare_include_exclude()`	10	1.0s	30.0s	File collection - many files
`compare_iocs()`	10	1.0s	30.0s	IOC extraction - scan many files
`load_ioc_watchlist()`	5	0.5s	10.0s	Load single watchlist file

Benefits

Graceful Degradation: Tests don't fail immediately - they retry intelligently
Automatic Cleanup: Garbage collection frees unreferenced file handles
Exponential Backoff: Gives system time to recover
Logging: Clear warnings when retries happen (visible in logs)
Ultimate Failure: Still fails after all retries with helpful error message

Example Log Output

Successful Retry:

WARNING:analysis.utils.file_limits:File descriptor limit reached in compare_include_exclude (attempt 1/10): [Errno 24] Too many open files: '/path/to/file'. Waiting 1.0s and retrying...
DEBUG:analysis.utils.file_limits:Garbage collection freed 127 objects, attempting to close file handles
✓ Test PASSED

Exhausted Retries (rare):

ERROR:analysis.utils.file_limits:File descriptor limit reached after 10 retries in compare_include_exclude: [Errno 24] Too many open files
FileDescriptorLimitError: Too many open files after 10 retries. Consider increasing ulimit or reducing concurrent file operations.
✗ Test FAILED

Three-Layer Defense

Prevention (Docker limits: 65,536)
- Prevents most FD exhaustion issues
Recovery (Retry logic with GC)
- Automatically recovers from FD exhaustion
- Triggers garbage collection to free handles
- Waits for system cleanup
Isolation (30s delays between tests)
- Allows full cleanup between batch tests

Architecture

┌─────────────────────────────────────────┐
│  Test: win_collect_files_mail          │
│  Opening 10,000+ mail files             │
└────────────┬────────────────────────────┘
             │
             ↓
┌─────────────────────────────────────────┐
│  compare_include_exclude()              │
│  @retry_on_fd_limit(max_retries=10)     │
└────────────┬────────────────────────────┘
             │
             ↓ Hit FD limit (24 files open)
┌─────────────────────────────────────────┐
│  Retry Logic                            │
│  1. Catch OSError(errno.EMFILE)         │
│  2. Trigger gc.collect()                │
│  3. Wait with exponential backoff       │
│  4. Retry operation                     │
└────────────┬────────────────────────────┘
             │
             ↓ Retry successful
┌─────────────────────────────────────────┐
│  File collection continues              │
│  ✓ Test PASSES                          │
└─────────────────────────────────────────┘

Summary

✅ File collection now resilient to FD limits
✅ IOC extraction now resilient to FD limits
✅ Automatic retry with garbage collection
✅ Exponential backoff prevents thrashing
✅ Clear logging shows retry behavior
✅ Graceful failure after exhausting retries

Progress Tracking

This section describes how progress percentage is calculated during forensic analysis jobs.

Overview

Progress is tracked per-image and scaled to overall job progress:

Each image gets a proportional share of 0-95% progress
Final 5% (95-100%) is reserved for job completion
Progress never goes backwards

Phase Transitions

Trigger Message	Internal Progress
Identification phase	5%
Scanning for forensic images	7%
Found forensic image	8%
Commencing Collection Phase	10%
Identifying operating system	12%
Identified platform	15%
Completed Collection Phase	35%
Scanning for artefacts	40%
Commencing Processing Phase	45%
Completed Processing Phase	70%
Commencing Analysis Phase	75%
Completed Analysis Phase	95%
Splunk/Elastic/Navigator phases	97%
Rivendell completed	95% (overall)

Collection Sub-Steps (Windows)

Trigger Message	Progress Range
Collecting artefacts for...	12% → 25%
Copying full user profile	25% → 33%
Completed copying profile	26% → 34%
Collecting registry/WMI/WBEM	30% → 34%
Collected event logs	34%
Recovered ($I30 records)	35%

Collection Sub-Steps (Linux)

Trigger Message	Progress Range
/etc/passwd, shadow, group, hosts	18% → 24%
Configuration files	20% → 28%
Crontab	22% → 28%
Bash history/files	24% → 30%
Service files (systemd)	26% → 32%
Systemd journal	28% → 34%
Keyrings	30% → 34%

Collection Sub-Steps (macOS)

Trigger Message	Progress Range
/etc/passwd, shadow, group, hosts	18% → 24%
Crontab	22% → 28%
Plist files	24% → 33%
LaunchAgents/LaunchDaemons	26% → 32%
StartupItems	28% → 33%
Trash files	30% → 34%

Shared (Linux/macOS)

Trigger Message	Progress Range
Log files (/var/log, /Library/Logs)	22% → 30%
Temp files (/tmp)	32% → 34%
Memory files (sleepimage, swapfile)	33% → 35%

Processing & Analysis

Phase	Progress Range
Processing/parsing	45% → 65%
Analyzing	75% → 93%

Generic Fallback

Trigger Message	Progress Range
General "collecting/mounting"	15% → 32%

Multi-Image Scaling

Progress is dynamically scaled based on the total number of images being analyzed.

How it works

Initial estimate: Progress starts with an estimate from source_paths count
Dynamic adjustment: When elrond reports "Processing image X of Y", the system uses Y as the actual image count
Progress range: 95% is divided equally among all images (5% reserved for completion)

Example: 3 images

Image	Progress Range
Image 1 of 3	0% - 31.6%
Image 2 of 3	31.6% - 63.3%
Image 3 of 3	63.3% - 95%

Example: 2 images

Image	Progress Range
Image 1 of 2	0% - 47.5%
Image 2 of 2	47.5% - 95%

Example: 1 image

Image	Progress Range
Image 1 of 1	0% - 95%

Each image's internal progress (0-100% scale) is mapped to its allocated range. For example, if Image 2 of 3 is at 50% internal progress, the overall job progress would be: 31.6% + (31.6% × 0.5) = 47.4%

Implementation

Progress tracking is implemented in src/web/backend/tasks_docker.py in the start_analysis task.

Key variables:

total_images: Total number of images being analyzed (dynamically updated from elrond output)
image_progress_range: Progress range allocated to each image (95 / total_images)
image_base_progress: Starting progress for current image
image_internal_progress: Progress within current image (0-100 scale)
actual_image_count_detected: Flag indicating if elrond has reported the actual image count
progress_count: Number of log lines processed

Progress is saved to the job storage every 10 log lines.

DEBUG Messages

Debug messages (MITRE enrichment, verbose command output) are filtered out by default. They only appear in the job log if the "Debug" option is enabled in the Web UI.

Deferred Memory Processing

Overview

Rivendell/Elrond implements a deferred memory processing architecture that separates disk image collection from memory image processing. This ensures proper phase separation and prevents processing conflicts when both disk and memory images are provided together.

Key Concept

Memory images are "parked" during the Collection Phase and processed after disk image collection completes.

This is critical because:

Disk and memory images require different processing pipelines
Memory images should not interfere with disk artifact collection
Volatility processing for memory images can be resource-intensive
Phase separation provides clearer progress tracking and error handling

Image Format

Images are identified using a three-part format:

filename::mount_point::type

Examples:

win7-64-disk.E01::/mnt/elrond_mount00::disk
Linux: win7-64-nfury-memory-raw.001::/mnt/external/rivendell_imgs/::memory
macOS: win7-64-nfury-memory-raw.001::/Volumes/ExternalDrive/rivendell_imgs/::memory
Windows: win7-64-nfury-memory-raw.001::D:/rivendell_imgs/::memory

Field Breakdown:

[0] filename: The base image filename
[1] mount_point: Where the image is mounted (for disk) or source directory (for memory)
[2] type: Either disk or memory

Processing Phases

Phase 1: Identification (in elrond.py)

When images are first identified:

Disk images: Added to imgs dict for immediate processing
Memory images:
- Profile identified via identify_memory_image()
- Profile saved to .memory_profiles.json via save_memory_profile()
- NOT added to imgs dict - they are "parked"

Location: src/analysis/rivendell/core/identify.py:186

# The actual memory processing (artefact extraction) is deferred to the
# processing phase via process_deferred_memory().

Phase 2: Collection (collect.py)

Only disk images are processed during collection.

Memory images should be filtered out using image type detection:

# CORRECT way to check image type:
img_type = img.split("::")[2] if len(img.split("::")) > 2 else ""
if img_type != "memory":
    # Process disk artifacts

Location: src/analysis/rivendell/collect/collect.py:161-405

Phase 3: Processing (select.py)

Only disk artifacts are processed during the main processing phase.

Memory images must be excluded from artifact processing:

# CORRECT way to filter out memory images:
img_type = img.split("::")[2] if len(img.split("::")) > 2 else ""
if img_basename not in str(processed_imgs) and img_type != "memory":
    # Process disk artifacts

Location: src/analysis/rivendell/process/select.py:113-117

Phase 3b: Deferred Memory Processing (elrond.py)

After all disk images are processed, memory images are loaded from the saved profiles:

# Load saved memory profiles
has_deferred_memory = bool(load_memory_profiles(output_directory))
if volatility or has_deferred_memory:
    process_deferred_memory(
        verbosity,
        output_directory,
        flags,
        mount_point,
    )

Location: src/analysis/elrond.py:730-737

Implementation: src/analysis/rivendell/core/identify.py:275-359

The process_deferred_memory() function:

Loads profiles from .memory_profiles.json
For each saved memory image:
- Reconstructs the image info
- Calls process_memory() with Volatility 3
- Extracts memory artifacts to output directory

Phase 4: Analysis

All images (disk and memory) are analyzed for IOCs, keywords, YARA rules, etc.

Memory Profile Storage

Profiles are saved to: {output_directory}/.memory_profiles.json

Example profile (macOS):

{
  "win7-64-nfury-memory-raw.001": {
    "profile": "Windows",
    "platform": "Windows memory",
    "volchoice": "3",
    "path": "/Volumes/ExternalDrive/rivendell_imgs/win7-64-nfury-memory-raw.001",
    "mount_point": "/Volumes/ExternalDrive/rivendell_imgs/",
    "vss": false,
    "vssmem": "",
    "memtimeline": ""
  }
}

Critical Bug: Image Type Detection (FIXED)

The Bug

Three locations were checking the WRONG index when determining image type:

# WRONG - checks mount_point (index 1):
if img.split("::")[1].endswith("memory"):
    # ...

# CORRECT - checks type (index 2):
img_type = img.split("::")[2] if len(img.split("::")) > 2 else ""
if img_type == "memory":
    # ...

Affected Files (FIXED)

select.py:113-117 - Processing phase artifact filtering
collect.py:183-184 - Collection phase memory detection
core.py:232-234 - Timeline phase memory filtering

RecursionError Fix: `hiberfil.sys` Processing

The Problem

Tests were failing with RecursionError: maximum recursion depth exceeded at 38% of Processing Phase when processing hiberfil.sys.

Root Cause

hiberfil.sys (hibernation file) found on disk images was being processed synchronously during the disk Processing Phase using the full memory processing pipeline. Volatility 3's plugin system has deep internal call stacks that exceeded Python's default recursion limit (1000).

The Fix

Increased Python recursion limit during the Processing Phase to handle Volatility's deep call stacks.

Implementation: src/analysis/rivendell/process/select.py

def select_pre_process_artefacts(...):
    # Increase recursion limit for Volatility processing
    original_recursion_limit = sys.getrecursionlimit()
    sys.setrecursionlimit(5000)

    try:
        # ... all processing logic ...
    finally:
        # Restore original recursion limit
        sys.setrecursionlimit(original_recursion_limit)

Why This is Correct

Standalone memory images (.raw, .mem): Should be deferred to Phase 3b
hiberfil.sys (disk artifact): Should be collected and processed during disk phases

The recursion issue is not architectural - it's simply that Volatility's deep processing exceeds Python's default recursion limit.

Workflow Diagram

┌─────────────────────────────────────────────────┐
│ Phase 1: Identification                         │
├─────────────────────────────────────────────────┤
│ ┌─────────────┐          ┌──────────────┐      │
│ │ Disk Image  │──────────>│ Add to imgs  │      │
│ └─────────────┘          └──────────────┘      │
│                                                  │
│ ┌──────────────┐         ┌────────────────────┐│
│ │ Memory Image │────────>│ Save to            ││
│ └──────────────┘         │ .memory_profiles   ││
│                          │ (PARKED)           ││
│                          └────────────────────┘│
└─────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────┐
│ Phase 2: Collection                             │
├─────────────────────────────────────────────────┤
│ Process ONLY disk images from imgs dict         │
│ - Filter: img_type != "memory"                  │
│ - Collect system artifacts                      │
│ - Extract files                                 │
└─────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────┐
│ Phase 3: Processing                             │
├─────────────────────────────────────────────────┤
│ Process ONLY disk artifacts                     │
│ - Filter: img_type != "memory"                  │
│ - Parse registry, logs, browsers, etc.          │
└─────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────┐
│ Phase 3b: Deferred Memory Processing            │
├─────────────────────────────────────────────────┤
│ Load .memory_profiles.json                      │
│ For each memory image:                          │
│   - Reconstruct image info                      │
│   - Run Volatility 3 plugins                    │
│   - Extract memory artifacts                    │
└─────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────┐
│ Phase 4: Analysis                               │
├─────────────────────────────────────────────────┤
│ Analyze ALL artifacts (disk + memory)           │
│ - IOC extraction                                │
│ - Keyword searching                             │
│ - YARA scanning                                 │
│ - Timeline generation                           │
└─────────────────────────────────────────────────┘

File Locations

File	Purpose	Key Functions/Lines
`src/analysis/elrond.py`	Main orchestration	Lines 730-737: Deferred memory processing
`src/analysis/rivendell/core/identify.py`	Memory identification & deferred processing	Lines 17-42: save/load profiles Lines 186: Deferral comment Lines 275-359: process_deferred_memory()
`src/analysis/rivendell/core/core.py`	Phase orchestration	Lines 54-121: Phase control logic
`src/analysis/rivendell/collect/collect.py`	Artifact collection	Lines 161-405: Collection loop Lines 183-184: Memory filtering
`src/analysis/rivendell/process/select.py`	Artifact processing	Lines 113-117: Memory filtering
`src/analysis/rivendell/memory/memory.py`	Memory processing	Lines 131-322: process_memory()

File Descriptor Management

Problem

When processing large forensic images, Rivendell can hit the operating system's file descriptor limit, causing this error:

OSError: [Errno 24] Too many open files: '/path/to/file'

This happens when:

Processing many small files simultaneously
Collecting files from large disk images
Running multiple tests back-to-back without cleanup

Solution: Automatic Retry Logic

We've implemented automatic retry logic that gracefully handles file descriptor exhaustion:

Catches the error instead of failing immediately
Triggers garbage collection to close unreferenced file handles
Waits with exponential backoff to allow system cleanup
Retries the operation up to 5 times
Only fails if all retries are exhausted

Usage

Option 1: Decorator (Recommended for Functions)

Wrap any function that opens many files:

from analysis.utils import retry_on_fd_limit

@retry_on_fd_limit(max_retries=5, initial_wait=0.5)
def process_collected_files(file_paths):
    """Process a large collection of files."""
    results = []
    for path in file_paths:
        with open(path, 'rb') as f:
            data = f.read()
            results.append(analyze(data))
    return results

# Function will automatically retry if it hits file descriptor limits
results = process_collected_files(many_files)

Option 2: Safe Open (Drop-in Replacement)

Replace open() with safe_open():

from analysis.utils import safe_open

# Before:
with open('/path/to/file', 'r') as f:
    data = f.read()

# After:
with safe_open('/path/to/file', 'r') as f:
    data = f.read()

Option 3: Monitor File Descriptor Usage

Check usage and get warnings:

from analysis.utils import check_fd_usage, get_fd_limit

# Check current usage
stats = check_fd_usage(warn_threshold=0.8)
print(f"FD usage: {stats['used']}/{stats['soft_limit']} ({stats['percent']}%)")

# Get limits
soft, hard = get_fd_limit()
print(f"Limits: {soft} (soft), {hard} (hard)")

Configuration

Decorator Options

@retry_on_fd_limit(
    max_retries=5,          # Number of retry attempts
    initial_wait=0.5,       # Initial wait time (seconds)
    backoff_multiplier=2.0, # Exponential backoff multiplier
    max_wait=10.0,          # Maximum wait between retries
    trigger_gc=True         # Trigger garbage collection on retry
)
def my_function():
    pass

Benefits

Before (Hard Failure)

Processing files...
OSError: [Errno 24] Too many open files
❌ Test FAILED

After (Automatic Recovery)

Processing files...
⚠️  File descriptor limit reached (attempt 1/5): Too many open files.
    Waiting 0.5s and retrying...
✓ Garbage collection freed 127 objects
✓ Retry successful
✓ Test PASSED

Docker Configuration

We've already increased Docker container limits:

services:
  celery-worker:
    ulimits:
      nofile:
        soft: 65536
        hard: 65536

Combined with the retry logic, this provides:

High limits (65,536) to prevent most errors
Automatic retry if limits are still hit
Graceful degradation instead of hard failures

Testing

Test the retry logic:

# Run the utility test
python3 src/analysis/utils/file_limits.py

Expected output:

File Descriptor Limit Management Utility
==================================================
Current FD limits: 65536 (soft), 65536 (hard)
Currently open FDs: 42
Function executed successfully
Test result: success

OrbStack File Handle Accumulation

Problem

OrbStack on macOS accumulates ~8,000 file handles per file collection test from Docker volume mounts (/Volumes/Media5TB/rivendell_imgs). These handles are not released even after:

Tests complete
Containers stop
Containers restart

Impact

macOS system limit: ~122,880 total open files
After 1-2 tests: Approaching system limit
Subsequent tests fail: [Errno 23] Too many open files in system

Root Cause

This is a known limitation with Docker on macOS (both Docker Desktop and OrbStack):

File handles from volume mounts persist at the host level
macOS doesn't aggressively release handles from container filesystem operations
Large forensic file collection operations (processing thousands of files) accumulate handles rapidly

Evidence

# Before any tests
$ lsof | grep "/Volumes/Media5TB/rivendell_imgs" | wc -l
0

# After win_collect_files_lnk (13m, collected 97 LNK files + 1.7GB artifacts)
$ lsof | grep "/Volumes/Media5TB/rivendell_imgs" | wc -l
4243

# After win_collect_files_docs (also successful)
$ lsof | grep "/Volumes/Media5TB/rivendell_imgs" | wc -l
8008

# After win_collect_files_mail (12m, successful)
$ lsof | grep "/Volumes/Media5TB/rivendell_imgs" | wc -l
7998

Which Tests Are Affected

High accumulation (require cleanup):

win_collect_files_lnk - LNK shortcut files
win_collect_files_docs - Document files
win_collect_files_bin - Binary files
win_collect_files_mail - Email files (PST/OST)
win_collect_files_scripts - Script files
win_collect_files_hidden - Hidden files
win_collect_files_virtual - Virtual machine files
win_collect_files_web - Web browser artifacts

Lower accumulation (usually OK):

Processing-only tests (no large file collection)
Memory analysis tests
Timeline generation
IOC extraction

Solution Implemented

1. Automatic Cleanup in Batch Scripts

ALL batch scripts now automatically clean up file handles in two ways:

A. Pre-Batch Cleanup (ALL scripts)

Every batch script now checks for and cleans up file handles from previous runs BEFORE starting any tests:

# Pre-batch cleanup: Clear any existing file handles from previous runs
HANDLE_COUNT=$(lsof 2>/dev/null | grep "/Volumes/Media5TB/rivendell_imgs" | wc -l | tr -d " \n" || echo "0")
if [ "$HANDLE_COUNT" -gt 100 ]; then
    echo "⚠️  Found $HANDLE_COUNT file handles - cleaning up before starting batch..."
    # Stop containers, restart OrbStack, restart Rivendell
    ...
    echo "✓ Pre-batch cleanup complete"
fi

Updated scripts: All 20 batch scripts (1a through 5d)

B. Between-Test Cleanup (File collection scripts only)

File collection batch scripts also clean up between tests when handle count exceeds 1,000:

Scripts with between-test cleanup:

tests/scripts/batch/1b_collect--lnk-docs.sh
tests/scripts/batch/1c_collect--bin-hidden-scripts.sh
tests/scripts/batch/1d_collect--mail-virtual-web.sh

Cleanup logic:

# Check file handle count and cleanup if needed
HANDLE_COUNT=$(lsof 2>/dev/null | grep "/Volumes/Media5TB/rivendell_imgs" | wc -l | tr -d " \n" || echo "0")
if [ "$HANDLE_COUNT" -gt 1000 ]; then
    echo "⚠️  High file handle count ($HANDLE_COUNT) - cleaning up OrbStack..."
    # Stop containers, restart OrbStack, restart Rivendell
    ...
    echo "✓ Cleanup complete - continuing tests..."
fi

Threshold lowered: From 5,000 to 1,000 handles for more proactive cleanup

2. Manual Cleanup Script

Created helper script for manual cleanup between test batches:

Script: scripts/cleanup-orbstack-handles.sh

Usage:

cd /Users/ben/Library/CloudStorage/OneDrive-Personal/Projects/GitHub/rivendell
./scripts/cleanup-orbstack-handles.sh

# Or force cleanup even below threshold
./scripts/cleanup-orbstack-handles.sh --force

3. Fixed Batch Script Exit Code Bug

Problem: Batch scripts were reporting "PASSED" for failed tests because they checked the exit code of tee (which always succeeds) instead of the python script.

Fix: Changed from:

if python3 scripts/run_test.py --run "$test" -y --wait 2>&1 | tee -a "$LOG_FILE"; then
    echo "✓ $test PASSED"

To:

python3 scripts/run_test.py --run "$test" -y --wait 2>&1 | tee -a "$LOG_FILE"

# Check the exit code of the python script (PIPESTATUS[0]), not tee
if [ "${PIPESTATUS[0]}" -eq 0 ]; then
    echo "✓ $test PASSED"

Applied to all 20 batch scripts.

Monitoring File Handles

Check current file handle usage:

# Total rivendell-related handles
lsof | grep "rivendell" | wc -l

# Handles on test volume
lsof | grep "/Volumes/Media5TB/rivendell_imgs" | wc -l

# Detailed view
lsof -p $(pgrep OrbStack | head -1) | grep "/Volumes/Media5TB/rivendell_imgs" | head -30

When to Run Cleanup

Automatic Cleanup (Built-in)

All batch scripts now automatically handle cleanup:

Pre-batch cleanup: Every batch script checks and cleans up file handles BEFORE starting tests (threshold: >100 handles)
Between-test cleanup: File collection scripts (1b, 1c, 1d) also clean up between tests (threshold: >1,000 handles)

This means you usually don't need to manually run cleanup before batch tests anymore! The scripts will handle it automatically.

Manual Cleanup (Optional)

You may still want to manually run cleanup in these scenarios:

# From project root
./scripts/cleanup-orbstack-handles.sh

When to use:

Before running individual tests (non-batch) if you've accumulated handles
To force cleanup even if below threshold (use --force flag)
When debugging file handle issues
Between very large test batches if you want to ensure a clean slate

Alternative Solutions Considered

❌ Increase macOS system limits

Requires SIP disabling (security risk)
Temporary - doesn't address root cause
May hit other system constraints

❌ Use Docker volumes instead of host mounts

Would avoid this issue entirely
BUT: Forensic images (10-30GB each) must stay on external drive
No practical way to move images into Docker volumes

❌ Restart containers only (not OrbStack)

Doesn't work - handles are held by OrbStack itself
Tested: restarting containers leaves handles open

✅ Restart OrbStack between tests

Most reliable solution
Guarantees handle release
~20 second overhead per cleanup
Acceptable tradeoff for batch testing

Integration: Retry Logic + OrbStack Cleanup

The retry logic implemented in utils/file_limits.py is still valuable:

Handles transient FD exhaustion during test execution
Provides exponential backoff and garbage collection
Adds resilience to individual tests
However: Does NOT solve the OrbStack handle accumulation issue

Both solutions work together:

Retry logic: Handles in-test FD issues (container-level)
OrbStack restart: Cleans up between-test accumulation (host-level)

Summary

Three-Layer File Descriptor Strategy

Prevention (Docker limits)
- Increased Docker container limits to 65,536
- Prevents most FD exhaustion errors
Recovery (Retry logic)
- Automatic retry with exponential backoff
- Garbage collection on retry
- Graceful handling of transient FD issues
Cleanup (OrbStack management)
- Automatic pre-batch cleanup (threshold: 100 handles)
- Automatic between-test cleanup for file collection (threshold: 1,000 handles)
- Manual cleanup script for edge cases

Best Practices

Always close files explicitly - Don't rely on garbage collection
Use context managers - with open() ensures cleanup
Process in batches - Don't open 10,000 files at once
Monitor usage - Log FD usage at operation boundaries
Use safe_open - For code that must be bulletproof
Trust automatic cleanup - Batch scripts handle cleanup automatically

Key Takeaways

Memory processing is deferred - Never processed during disk Collection/Processing phases
Image type detection is critical - Always check index [2] for image type
File descriptors have three solutions - Prevention, recovery, and cleanup
Batch tests are automatic - No manual cleanup required before running
OrbStack needs special handling - Host-level handles require OrbStack restart
Progress tracking is multi-image aware - Dynamic scaling based on actual image count
Integration is complete - Retry logic applied to file collection and IOC extraction

Last Updated: 2026-01-16 Contributors: Investigation and implementation by Claude Code (Sonnet 4.5)

FilesExpand file tree

FUNCTIONALITY.md

Latest commit

History

FUNCTIONALITY.md

File metadata and controls

Rivendell Forensic Platform - Key Functionality

Integration Summary

✅ File Descriptor Retry Logic Integration

Files Modified

1. File Collection (rivendell/collect/files/compare.py)

2. IOC Extraction (rivendell/analysis/iocs.py)

How It Works

Before Integration:

After Integration:

Retry Configuration

Benefits

Example Log Output

Successful Retry:

Exhausted Retries (rare):

Three-Layer Defense

Architecture

Summary

Progress Tracking

Overview

Phase Transitions

Collection Sub-Steps (Windows)

Collection Sub-Steps (Linux)

Collection Sub-Steps (macOS)

Shared (Linux/macOS)

Processing & Analysis

Generic Fallback

Multi-Image Scaling

How it works

Example: 3 images

Example: 2 images

Example: 1 image

Implementation

DEBUG Messages

Deferred Memory Processing

Overview

Key Concept

Image Format

Processing Phases

Phase 1: Identification (in elrond.py)

Phase 2: Collection (collect.py)

Phase 3: Processing (select.py)

Phase 3b: Deferred Memory Processing (elrond.py)

Phase 4: Analysis

Memory Profile Storage

Critical Bug: Image Type Detection (FIXED)

The Bug

Affected Files (FIXED)

RecursionError Fix: hiberfil.sys Processing

The Problem

Root Cause

The Fix

Why This is Correct

Workflow Diagram

File Locations

File Descriptor Management

Problem

Solution: Automatic Retry Logic

Usage

Option 1: Decorator (Recommended for Functions)

Option 2: Safe Open (Drop-in Replacement)

Option 3: Monitor File Descriptor Usage

Configuration

Decorator Options

Benefits

Before (Hard Failure)

After (Automatic Recovery)

Docker Configuration

Testing

OrbStack File Handle Accumulation

Problem

Impact

Root Cause

Evidence

Which Tests Are Affected

1. File Collection (`rivendell/collect/files/compare.py`)

2. IOC Extraction (`rivendell/analysis/iocs.py`)

RecursionError Fix: `hiberfil.sys` Processing