Skip to content

Latest commit

 

History

History
692 lines (535 loc) · 25.3 KB

File metadata and controls

692 lines (535 loc) · 25.3 KB

LLM Script

LLM Script is Probe's programmable orchestration engine for running complex, multi-step code analysis tasks. Instead of relying on unpredictable multi-turn AI conversations, LLM Script lets you (or the AI) write short, deterministic programs that orchestrate search, extraction, and LLM calls in a sandboxed environment.

Think of it as stored procedures for code intelligence — predictable, reproducible, and capable of processing entire codebases in a single execution.

Why LLM Script?

Traditional AI agent workflows have a fundamental problem: each step is a separate LLM call that can drift, hallucinate, or lose context. When you ask "find all API endpoints and classify them by auth method," a typical agent might:

  1. Search once, get partial results
  2. Lose track of what it already found
  3. Produce inconsistent classifications across calls
  4. Take dozens of expensive LLM round-trips

LLM Script solves this by letting the AI write a complete program upfront that:

  • Searches systematically across the entire codebase
  • Processes results in parallel with controlled concurrency
  • Uses LLM calls only where needed (classification, summarization)
  • Accumulates structured data in a persistent store
  • Computes statistics with pure JavaScript — no LLM needed
  • Returns formatted, predictable results

How It Works

LLM Script programs look like simple JavaScript but run in a secure sandbox with special capabilities:

// Find all API endpoints and count by HTTP method
const results = search("API endpoint route handler")
const chunks = chunk(results)

const classified = map(chunks, c => LLM(
  "Extract endpoints as JSON: [{method, path}]. ONLY JSON.", c
))

var endpoints = []
for (const batch of classified) {
  const parsed = parseJSON(batch)
  if (parsed) { for (const ep of parsed) { endpoints.push(ep) } }
}

const byMethod = groupBy(endpoints, "method")
var table = "| Method | Count |\n|--------|-------|\n"
for (const method of Object.keys(byMethod)) {
  table = table + "| " + method + " | " + byMethod[method].length + " |\n"
}

return table

The execution pipeline:

  1. Validate — AST-level whitelist ensures only safe constructs are used (no eval, require, import, class, new, etc.)
  2. Transform — Automatically injects await before async tool calls and adds loop guards to prevent infinite loops
  3. Execute — Runs in a SandboxJS environment with a configurable timeout (default 2 minutes)
  4. Self-heal — If execution fails, the AI automatically gets the error and fixes the script (up to 2 retries)

Two Ways to Use LLM Script

1. Through Prompting (AI-Generated Scripts)

The most common way — you describe what you want in natural language, and the AI writes the script for you:

You: "Find all API endpoints in this codebase, classify each by HTTP method,
      and produce a markdown table with counts per method."

The AI generates and executes a script like:

// Discover repo structure first
const files = listFiles("**/*.{js,ts,py,go,rs}")
const sample = search("API endpoint route handler")

// Let LLM determine the best search strategy
const strategy = LLM(
  "Based on this codebase structure, what search queries would find ALL API endpoints? Return as JSON array of strings.",
  files.join("\n") + "\n\nSample results:\n" + sample
)

const queries = JSON.parse(String(strategy))
var allResults = ""
for (const q of queries) {
  allResults = allResults + "\n" + search(q)
}

// Process in chunks with LLM classification
const chunks = chunk(allResults)
const classified = map(chunks, (c) => LLM(
  "Extract API endpoints as JSON: [{method, path, handler, file}]. ONLY JSON.", c
))

var endpoints = []
for (const batch of classified) {
  const parsed = parseJSON(batch)
  if (parsed) { for (const ep of parsed) { endpoints.push(ep) } }
}

// Pure JS statistics — no LLM needed
endpoints = unique(endpoints)
const byMethod = groupBy(endpoints, "method")
var table = "| Method | Count | Example |\n|--------|-------|---------|\n"
for (const method of Object.keys(byMethod)) {
  const examples = byMethod[method]
  table = table + "| " + method + " | " + examples.length + " | " + examples[0].path + " |\n"
}

return table + "\nTotal: " + endpoints.length + " endpoints"

2. User-Provided Scripts

You can also write scripts directly — useful for repeatable analysis tasks, CI pipelines, or when you want precise control over the execution:

// Audit: find all TODO/FIXME comments with their context
const todos = search("TODO OR FIXME")
const chunks = chunk(todos)
const items = map(chunks, (c) => LLM(
  "Extract TODO/FIXME items as JSON: [{text, file, priority, category}]. " +
  "Priority: high/medium/low. Category: bug/feature/refactor/debt. ONLY JSON.", c
))

var all = []
for (const batch of items) {
  const parsed = parseJSON(batch)
  if (parsed) { for (const item of parsed) { all.push(item) } }
}

const byPriority = groupBy(all, "priority")
var report = "# TODO Audit Report\n\n"
for (const priority of ["high", "medium", "low"]) {
  const group = byPriority[priority] || []
  report = report + "## " + priority.toUpperCase() + " (" + group.length + ")\n\n"
  for (const item of group) {
    report = report + "- **" + item.file + "**: " + item.text + " [" + item.category + "]\n"
  }
  report = report + "\n"
}

return report

Available Functions

Search & Extraction (async, auto-awaited)

Function Description Returns
search(query) Semantic code search with Elasticsearch-like syntax. Returns up to 20K tokens by default. string — code snippets with file paths
search(query, path, {maxTokens}) Search with custom token limit. Use {maxTokens: null} for unlimited results. string — code snippets
searchAll(query) Exhaustive search — auto-paginates to retrieve ALL matching results. Use for bulk analysis when you need complete coverage. string — all matching code snippets
query(pattern) AST-based structural code search (tree-sitter) string — matching code elements
extract(targets) Extract code by file path + line number string — extracted code content
listFiles(pattern) List files matching a glob pattern array — array of file path strings
bash(command) Execute a shell command string — command output

search vs searchAll:

  • search(query) — Returns first 20K tokens. Fast, good for targeted queries.
  • search(query, ".", {maxTokens: null}) — Returns all results in one call (may be large).
  • searchAll(query) — Auto-paginates, concatenating all pages. Best for comprehensive analysis.

Session-Based Pagination:

Each execute_plan invocation gets its own isolated session ID. This means:

  • Multiple search() calls with the same query return successive pages (automatic pagination)
  • Different execute_plan calls don't interfere with each other's pagination state

Manual Pagination Loop:

// Each search() call with same query returns the next page
let allResults = ""
let page = search("authentication")

while (page && !page.includes("All results retrieved")) {
  allResults = allResults + "\n" + page
  page = search("authentication")  // Same query = next page automatically
}

return allResults

When to use manual loops vs searchAll():

Use Case Approach
Get everything, simple case searchAll(query)
Stop early when found Manual loop with break
Process each page with LLM Manual loop with LLM() per page
Different logic per page Manual loop
Memory-sensitive (large repos) Manual loop, process & discard

AI (async, auto-awaited)

Function Description Returns
LLM(instruction, data, options?) Make a focused LLM call to process/classify/summarize data. When options.schema is provided, automatically parses the JSON response. string or object (if schema provided)
map(array, fn) Process items in parallel with concurrency control (default 3) array — results

LLM with Schema Example:

// Without schema - returns string, need manual parsing
const raw = LLM("Extract functions as JSON", data)
const functions = parseJSON(raw)

// With schema - returns parsed object directly
const result = LLM("Extract functions", data, {
  schema: '{"functions": [{"name": "string", "file": "string", "lines": "number"}]}'
})
// result.functions[0].name works directly - no parseJSON needed!

Data Utilities (sync)

Function Description Returns
chunk(data, tokens?) Split a large string into token-sized chunks (default 20,000 tokens) array of strings
chunkByKey(data, keyFn, tokens?) Split data ensuring same-key items stay together. Keys extracted from File: headers. array of strings
extractPaths(searchResults) Extract unique file paths from search results (parses File: headers) array of strings
batch(array, size) Split array into sub-arrays of given size array of arrays
groupBy(array, key) Group array items by a key or function object
unique(array) Deduplicate array items array
flatten(array) Flatten one level of nesting array
range(start, end) Generate array of integers [start, end) array
parseJSON(text) Parse JSON from LLM output (strips markdown fences). Returns null on parse failure. any|null
log(message) Log a message for debugging void

chunkByKey Example:

// Search returns snippets with "File: path/to/file" headers
const results = search("error handling", { path: "./src" })

// Group by module - ensures all code from same module stays in same chunk
const chunks = chunkByKey(results, file => {
  const match = file.match(/src\/([^/]+)/)  // Extract first directory under src/
  return match ? match[1] : 'other'
})

// Each chunk contains ALL code for one or more modules (never split mid-module)
// Useful when analyzing code that spans multiple files within a module

extractPaths Example:

// Get full file content instead of snippets
const results = search("authentication")
const chunks = chunkByKey(results, keyFn)

const fullContent = map(chunks, chunk => {
  const paths = extractPaths(chunk)  // ['src/auth.js', 'src/login.js']
  return extract(paths.join(' '))    // Get full file content
})

Direct Output (sync)

Function Description Returns
output(content) Write content directly to the user's response, bypassing LLM rewriting. Use for large tables, JSON, or CSV that should be delivered verbatim. void

When you use output(), the content is appended directly to the final response after the AI's summary — the AI never sees or rewrites it. This preserves data fidelity for large structured outputs like tables with 50+ rows.

Session Store (sync, persists across executions)

The session store allows data to persist across multiple script executions within the same conversation. This enables multi-phase workflows where one script collects data and a later script processes it.

Function Description Returns
storeSet(key, value) Store a value void
storeGet(key) Retrieve a value (returns undefined if missing) any
storeAppend(key, item) Append to an array (auto-creates if key doesn't exist) void
storeKeys() List all stored keys array of strings
storeGetAll() Return entire store as a plain object object

Patterns

Pattern 1: Discovery-First (Recommended)

Start by exploring the repository structure, then let the LLM determine the optimal search strategy:

// Phase 1: Discover
const files = listFiles("**/*.{js,ts,py}")
const sample = search("authentication")

// Phase 2: Let LLM plan the strategy
const strategy = LLM(
  "Based on this repo structure, what are the best search queries to find ALL authentication code? Return as JSON array.",
  files.join("\n") + "\n\nSample:\n" + sample
)

// Phase 3: Execute with discovered strategy
const queries = JSON.parse(String(strategy))
var allCode = ""
for (const q of queries) {
  allCode = allCode + "\n" + search(q)
}

// Phase 4: Analyze
const analysis = LLM("Provide a comprehensive analysis of the authentication system.", allCode)
return analysis

Pattern 2: Data Pipeline with Session Store

Process large datasets in phases — extract, accumulate, compute, format:

// Phase 1: Collect and classify
const results = search("API endpoints")
const chunks = chunk(results)
const extracted = map(chunks, (c) => LLM(
  "Extract endpoints as JSON array: [{method, path, handler}]. ONLY JSON.", c
))
for (const batch of extracted) {
  const parsed = parseJSON(batch)
  if (parsed) { for (const item of parsed) { storeAppend("endpoints", item) } }
}

// Phase 2: Pure JS statistics (no LLM needed!)
const all = storeGet("endpoints")
const byMethod = groupBy(all, "method")
var table = "| Method | Count |\n|--------|-------|\n"
for (const method of Object.keys(byMethod)) {
  table = table + "| " + method + " | " + byMethod[method].length + " |\n"
}

// Phase 3: Small LLM summary
const summary = LLM("Write a brief summary of this API surface.", table)
return table + "\n" + summary

Pattern 3: Batch Processing with Parallel Execution

Process many items efficiently using map() for controlled concurrency:

const files = listFiles("src/**/*.ts")
const batches = batch(files, 5)

var allIssues = []
for (const b of batches) {
  const results = map(b, (file) => {
    const code = extract(file)
    return LLM("Find potential bugs. Return JSON: [{file, line, issue, severity}]. ONLY JSON.", code)
  })
  for (const r of results) {
    const parsed = parseJSON(r)
    if (parsed) { for (const issue of parsed) { allIssues.push(issue) } }
  }
}

const bySeverity = groupBy(allIssues, "severity")
var report = "# Bug Report\n\n"
for (const sev of ["high", "medium", "low"]) {
  const items = bySeverity[sev] || []
  report = report + "## " + sev.toUpperCase() + " (" + items.length + ")\n"
  for (const item of items) {
    report = report + "- " + item.file + ":" + item.line + " — " + item.issue + "\n"
  }
  report = report + "\n"
}

return report

Pattern 4: Multi-Search Synthesis

Combine results from multiple targeted searches:

const topics = ["authentication", "authorization", "session management", "CSRF", "XSS"]
var allFindings = ""

for (const topic of topics) {
  try {
    const results = search(topic + " security")
    allFindings = allFindings + "\n## " + topic + "\n" + results
  } catch (e) {
    log("Search failed for: " + topic)
  }
}

const chunks = chunk(allFindings)
const analyses = map(chunks, (c) => LLM(
  "Analyze this code for security issues. Be specific about file and line.", c
))

var report = "# Security Audit\n\n"
for (const a of analyses) {
  report = report + a + "\n\n"
}

return report

Pattern 5: Iterative Deepening

Start broad, then drill into the most interesting results:

// Broad search
const overview = search("database connection pool")
const summary = LLM(
  "Which files are most important for understanding connection pooling? Return as JSON array of file paths.",
  overview
)

// Deep dive
const importantFiles = JSON.parse(String(summary))
var details = ""
for (const file of importantFiles) {
  try {
    details = details + "\n\n" + extract(file)
  } catch (e) { log("Could not extract: " + file) }
}

// Final analysis
const analysis = LLM(
  "Provide a detailed analysis of the connection pooling implementation. Include architecture decisions and potential improvements.",
  details
)
return analysis

Pattern 6: Grouped Analysis with chunkByKey

When analyzing code organized by module, package, or namespace, use chunkByKey() to ensure all files from the same group stay together. This gives the LLM complete context for each logical unit:

// Search for error handling across the codebase
const results = search("error handling try catch", { path: "./src" })

// Group by top-level module - each chunk has complete module context
const chunks = chunkByKey(results, file => {
  const match = file.match(/src\/([^/]+)/)
  return match ? match[1] : 'other'
})

// Analyze each module's error handling patterns
const analyses = map(chunks, c => LLM(
  "Analyze error handling patterns in this module. Identify: module name, patterns used, inconsistencies, and recommendations.",
  c,
  { schema: '{"modules": [{"name": "string", "patterns": "string", "issues": "string", "recommendation": "string"}]}' }
))

// Collect all module analyses
var all = []
for (const batch of analyses) {
  if (batch.modules) { for (const m of batch.modules) { all.push(m) } }
}

// Output as markdown table
output("| Module | Patterns | Issues | Recommendation |\n")
output("|--------|----------|--------|----------------|\n")
for (const m of all) {
  output("| " + m.name + " | " + m.patterns + " | " + m.issues + " | " + m.recommendation + " |\n")
}

return "Analyzed error handling in " + all.length + " modules"

Other use cases for chunkByKey():

  • Group test files by the module they test
  • Analyze dependencies per package
  • Review code changes by author or PR
  • Process documentation by section
  • Extract insights from markdown notes grouped by project or client
  • Analyze log files grouped by service or date

Pattern 7: Direct Output for Large Data

When your script produces large structured data (tables, JSON, CSV), use output() to deliver it directly to the user without the AI rewriting or summarizing it:

const results = search("function export public")
const chunks = chunk(results)

const classified = map(chunks, (c) => LLM(
  "Extract exported functions as JSON: [{name, file, description}]. ONLY JSON.", c
))

var functions = []
for (const batch of classified) {
  const parsed = parseJSON(batch)
  if (parsed) { for (const item of parsed) { functions.push(item) } }
}

// Build a markdown table
var table = "| Function | File | Description |\n|----------|------|-------------|\n"
for (const f of functions) {
  table = table + "| " + (f.name || "Unknown") + " | " + (f.file || "Unknown") + " | " + (f.description || "-") + " |\n"
}

// output() sends the full table directly to the user — no summarization
output(table)

// return value is what the AI sees — keep it short
return "Generated table with " + functions.length + " exported functions"

The AI will respond with something like "Here's the API surface analysis..." and the full table will be appended verbatim below its response.

Pattern 8: Exhaustive Search with Pagination

When you need ALL matching results from a large codebase, you have two options:

Option A: Use searchAll() for simple cases

// Automatically paginates and concatenates all results
const allAuth = searchAll("authentication OR authorization")
const chunks = chunk(allAuth)
// Process all results...

Option B: Manual pagination loop for custom logic

// Each execute_plan gets an isolated session, so pagination works correctly
const query = "JWT OR OAuth OR HMAC OR authentication"
var allResults = ""
var pageNum = 0

// Each search() call with the same query returns the next page
var page = search(query)
while (page && !page.includes("All results retrieved")) {
  pageNum = pageNum + 1
  log("Processing page " + pageNum)

  // Process this page immediately (memory efficient)
  const insights = LLM(
    "Extract authentication patterns as JSON",
    page,
    { schema: '{"patterns": [{"type": "string", "file": "string"}]}' }
  )

  // Accumulate just the extracted data, not raw results
  for (const p of insights.patterns || []) {
    storeAppend("auth_patterns", p)
  }

  // Get next page (same query = automatic pagination)
  page = search(query)
}

// Generate final report from accumulated data
const patterns = storeGet("auth_patterns") || []
const byType = groupBy(patterns, "type")

output("# Authentication Patterns Found\n\n")
for (const type of Object.keys(byType)) {
  output("## " + type + " (" + byType[type].length + " occurrences)\n")
  for (const p of byType[type]) {
    output("- " + p.file + "\n")
  }
  output("\n")
}

return "Found " + patterns.length + " authentication patterns across " + pageNum + " pages"

Why manual pagination?

  • Process each page with LLM before fetching next (lower memory usage)
  • Stop early if you find what you need
  • Different processing logic per page
  • Track progress with log()

Session isolation guarantee: Each execute_plan invocation gets a unique session ID, so:

  • Multiple scripts running in parallel don't interfere
  • Previous execute_plan calls don't affect pagination state
  • You always start from page 1 within each script

Writing Rules

LLM Script uses a safe subset of JavaScript. Keep these rules in mind:

Do:

  • Use var for variables (or const/let)
  • Use for...of loops for iteration
  • Use plain objects and arrays
  • Check for errors with if (result.indexOf("ERROR:") === 0) — tool functions never throw, they return "ERROR: ..." strings
  • Use string concatenation with + (not template literals with ${})
  • Use parseJSON() instead of JSON.parse() when parsing LLM output (handles markdown fences)
  • Use output() for large structured data that should reach the user verbatim

Don't:

  • Use async/await (auto-injected by the transformer)
  • Use class, new, this
  • Use eval, require, import
  • Use process, globalThis, __proto__
  • Define helper functions that call tools (the transformer can't inject await inside user-defined functions)
  • Use regex literals (/pattern/) — use indexOf(), includes(), startsWith() instead
  • Use .matchAll() or .join() (SandboxJS limitations — use for...of loops instead)

Safety Model

LLM Script runs in a multi-layer security sandbox:

  1. AST Validation — Before execution, the script's Abstract Syntax Tree is checked against a whitelist. Only safe constructs are allowed. No eval, require, import, class, new, this, __proto__, constructor, or prototype access.

  2. SandboxJS Isolation — Scripts execute in SandboxJS, a JavaScript sandbox that prevents access to Node.js globals, the filesystem, and the network. Only the explicitly provided tool functions are available.

  3. Loop Guards — Automatic loop iteration limits (default 5,000) prevent infinite loops. The transformer injects a __checkLoop() call into every loop body.

  4. Execution Timeout — A configurable timeout (default 2 minutes) kills scripts that take too long.

  5. Self-Healing — If a script fails, the error is sent to the LLM which generates a fixed version. Up to 2 retries are attempted before returning an error.

Enabling LLM Script

When enabled, Probe Agent gets access to the execute_plan tool. The AI generates LLM Script code and calls execute_plan to run it. The script has automatic access to all built-in tools (search, query, extract, LLM, etc.) plus any MCP tools you've connected.

ProbeAgent SDK

import { ProbeAgent } from '@probelabs/probe';

const agent = new ProbeAgent({
  path: '/path/to/your/codebase',
  provider: 'anthropic',
  enableExecutePlan: true  // Enable LLM Script
});

// The agent will now use LLM Script for complex analysis tasks
const report = await agent.answer(
  'Find all API endpoints and classify them by HTTP method'
);

CLI

probe agent "Find all API endpoints" \
  --path /path/to/project \
  --provider google \
  --enable-execute-plan

When Does LLM Script Trigger?

The AI automatically chooses LLM Script (over simple search) for questions that require:

  • Comprehensive coverage: "Find all error handling patterns"
  • Complete inventories: "Give me a complete inventory of API routes"
  • Multi-topic analysis: "Compare authentication, authorization, and session handling"
  • Batch processing: "Classify every TODO comment by priority"
  • Quantitative answers: "How many functions in each module?"

For simple, focused questions like "How does the login function work?", the AI uses direct search instead.

Real-World Examples

Codebase Health Report

You: "Generate a comprehensive health report for this codebase —
      code complexity, test coverage gaps, dependency analysis,
      and security concerns."

API Documentation Generator

You: "Find every API endpoint, extract its parameters, authentication
      requirements, and response types, then generate OpenAPI-style
      documentation as markdown."

Migration Planning

You: "We're migrating from Express to Fastify. Find all Express-specific
      patterns (middleware, route handlers, error handlers) and produce
      a migration checklist with effort estimates."

Dependency Impact Analysis

You: "We need to upgrade the 'auth' library. Find every file that imports
      from it, classify each usage pattern, and identify which ones will
      break with the new API."

Related Resources