LLM Script is Probe's programmable orchestration engine for running complex, multi-step code analysis tasks. Instead of relying on unpredictable multi-turn AI conversations, LLM Script lets you (or the AI) write short, deterministic programs that orchestrate search, extraction, and LLM calls in a sandboxed environment.
Think of it as stored procedures for code intelligence — predictable, reproducible, and capable of processing entire codebases in a single execution.
Traditional AI agent workflows have a fundamental problem: each step is a separate LLM call that can drift, hallucinate, or lose context. When you ask "find all API endpoints and classify them by auth method," a typical agent might:
- Search once, get partial results
- Lose track of what it already found
- Produce inconsistent classifications across calls
- Take dozens of expensive LLM round-trips
LLM Script solves this by letting the AI write a complete program upfront that:
- Searches systematically across the entire codebase
- Processes results in parallel with controlled concurrency
- Uses LLM calls only where needed (classification, summarization)
- Accumulates structured data in a persistent store
- Computes statistics with pure JavaScript — no LLM needed
- Returns formatted, predictable results
LLM Script programs look like simple JavaScript but run in a secure sandbox with special capabilities:
// Find all API endpoints and count by HTTP method
const results = search("API endpoint route handler")
const chunks = chunk(results)
const classified = map(chunks, c => LLM(
"Extract endpoints as JSON: [{method, path}]. ONLY JSON.", c
))
var endpoints = []
for (const batch of classified) {
const parsed = parseJSON(batch)
if (parsed) { for (const ep of parsed) { endpoints.push(ep) } }
}
const byMethod = groupBy(endpoints, "method")
var table = "| Method | Count |\n|--------|-------|\n"
for (const method of Object.keys(byMethod)) {
table = table + "| " + method + " | " + byMethod[method].length + " |\n"
}
return tableThe execution pipeline:
- Validate — AST-level whitelist ensures only safe constructs are used (no
eval,require,import,class,new, etc.) - Transform — Automatically injects
awaitbefore async tool calls and adds loop guards to prevent infinite loops - Execute — Runs in a SandboxJS environment with a configurable timeout (default 2 minutes)
- Self-heal — If execution fails, the AI automatically gets the error and fixes the script (up to 2 retries)
The most common way — you describe what you want in natural language, and the AI writes the script for you:
You: "Find all API endpoints in this codebase, classify each by HTTP method,
and produce a markdown table with counts per method."
The AI generates and executes a script like:
// Discover repo structure first
const files = listFiles("**/*.{js,ts,py,go,rs}")
const sample = search("API endpoint route handler")
// Let LLM determine the best search strategy
const strategy = LLM(
"Based on this codebase structure, what search queries would find ALL API endpoints? Return as JSON array of strings.",
files.join("\n") + "\n\nSample results:\n" + sample
)
const queries = JSON.parse(String(strategy))
var allResults = ""
for (const q of queries) {
allResults = allResults + "\n" + search(q)
}
// Process in chunks with LLM classification
const chunks = chunk(allResults)
const classified = map(chunks, (c) => LLM(
"Extract API endpoints as JSON: [{method, path, handler, file}]. ONLY JSON.", c
))
var endpoints = []
for (const batch of classified) {
const parsed = parseJSON(batch)
if (parsed) { for (const ep of parsed) { endpoints.push(ep) } }
}
// Pure JS statistics — no LLM needed
endpoints = unique(endpoints)
const byMethod = groupBy(endpoints, "method")
var table = "| Method | Count | Example |\n|--------|-------|---------|\n"
for (const method of Object.keys(byMethod)) {
const examples = byMethod[method]
table = table + "| " + method + " | " + examples.length + " | " + examples[0].path + " |\n"
}
return table + "\nTotal: " + endpoints.length + " endpoints"You can also write scripts directly — useful for repeatable analysis tasks, CI pipelines, or when you want precise control over the execution:
// Audit: find all TODO/FIXME comments with their context
const todos = search("TODO OR FIXME")
const chunks = chunk(todos)
const items = map(chunks, (c) => LLM(
"Extract TODO/FIXME items as JSON: [{text, file, priority, category}]. " +
"Priority: high/medium/low. Category: bug/feature/refactor/debt. ONLY JSON.", c
))
var all = []
for (const batch of items) {
const parsed = parseJSON(batch)
if (parsed) { for (const item of parsed) { all.push(item) } }
}
const byPriority = groupBy(all, "priority")
var report = "# TODO Audit Report\n\n"
for (const priority of ["high", "medium", "low"]) {
const group = byPriority[priority] || []
report = report + "## " + priority.toUpperCase() + " (" + group.length + ")\n\n"
for (const item of group) {
report = report + "- **" + item.file + "**: " + item.text + " [" + item.category + "]\n"
}
report = report + "\n"
}
return report| Function | Description | Returns |
|---|---|---|
search(query) |
Semantic code search with Elasticsearch-like syntax. Returns up to 20K tokens by default. | string — code snippets with file paths |
search(query, path, {maxTokens}) |
Search with custom token limit. Use {maxTokens: null} for unlimited results. |
string — code snippets |
searchAll(query) |
Exhaustive search — auto-paginates to retrieve ALL matching results. Use for bulk analysis when you need complete coverage. | string — all matching code snippets |
query(pattern) |
AST-based structural code search (tree-sitter) | string — matching code elements |
extract(targets) |
Extract code by file path + line number | string — extracted code content |
listFiles(pattern) |
List files matching a glob pattern | array — array of file path strings |
bash(command) |
Execute a shell command | string — command output |
search vs searchAll:
search(query)— Returns first 20K tokens. Fast, good for targeted queries.search(query, ".", {maxTokens: null})— Returns all results in one call (may be large).searchAll(query)— Auto-paginates, concatenating all pages. Best for comprehensive analysis.
Session-Based Pagination:
Each execute_plan invocation gets its own isolated session ID. This means:
- Multiple
search()calls with the same query return successive pages (automatic pagination) - Different
execute_plancalls don't interfere with each other's pagination state
Manual Pagination Loop:
// Each search() call with same query returns the next page
let allResults = ""
let page = search("authentication")
while (page && !page.includes("All results retrieved")) {
allResults = allResults + "\n" + page
page = search("authentication") // Same query = next page automatically
}
return allResultsWhen to use manual loops vs searchAll():
| Use Case | Approach |
|---|---|
| Get everything, simple case | searchAll(query) |
| Stop early when found | Manual loop with break |
| Process each page with LLM | Manual loop with LLM() per page |
| Different logic per page | Manual loop |
| Memory-sensitive (large repos) | Manual loop, process & discard |
| Function | Description | Returns |
|---|---|---|
LLM(instruction, data, options?) |
Make a focused LLM call to process/classify/summarize data. When options.schema is provided, automatically parses the JSON response. |
string or object (if schema provided) |
map(array, fn) |
Process items in parallel with concurrency control (default 3) | array — results |
LLM with Schema Example:
// Without schema - returns string, need manual parsing
const raw = LLM("Extract functions as JSON", data)
const functions = parseJSON(raw)
// With schema - returns parsed object directly
const result = LLM("Extract functions", data, {
schema: '{"functions": [{"name": "string", "file": "string", "lines": "number"}]}'
})
// result.functions[0].name works directly - no parseJSON needed!| Function | Description | Returns |
|---|---|---|
chunk(data, tokens?) |
Split a large string into token-sized chunks (default 20,000 tokens) | array of strings |
chunkByKey(data, keyFn, tokens?) |
Split data ensuring same-key items stay together. Keys extracted from File: headers. |
array of strings |
extractPaths(searchResults) |
Extract unique file paths from search results (parses File: headers) |
array of strings |
batch(array, size) |
Split array into sub-arrays of given size | array of arrays |
groupBy(array, key) |
Group array items by a key or function | object |
unique(array) |
Deduplicate array items | array |
flatten(array) |
Flatten one level of nesting | array |
range(start, end) |
Generate array of integers [start, end) | array |
parseJSON(text) |
Parse JSON from LLM output (strips markdown fences). Returns null on parse failure. |
any|null |
log(message) |
Log a message for debugging | void |
chunkByKey Example:
// Search returns snippets with "File: path/to/file" headers
const results = search("error handling", { path: "./src" })
// Group by module - ensures all code from same module stays in same chunk
const chunks = chunkByKey(results, file => {
const match = file.match(/src\/([^/]+)/) // Extract first directory under src/
return match ? match[1] : 'other'
})
// Each chunk contains ALL code for one or more modules (never split mid-module)
// Useful when analyzing code that spans multiple files within a moduleextractPaths Example:
// Get full file content instead of snippets
const results = search("authentication")
const chunks = chunkByKey(results, keyFn)
const fullContent = map(chunks, chunk => {
const paths = extractPaths(chunk) // ['src/auth.js', 'src/login.js']
return extract(paths.join(' ')) // Get full file content
})| Function | Description | Returns |
|---|---|---|
output(content) |
Write content directly to the user's response, bypassing LLM rewriting. Use for large tables, JSON, or CSV that should be delivered verbatim. | void |
When you use output(), the content is appended directly to the final response after the AI's summary — the AI never sees or rewrites it. This preserves data fidelity for large structured outputs like tables with 50+ rows.
The session store allows data to persist across multiple script executions within the same conversation. This enables multi-phase workflows where one script collects data and a later script processes it.
| Function | Description | Returns |
|---|---|---|
storeSet(key, value) |
Store a value | void |
storeGet(key) |
Retrieve a value (returns undefined if missing) |
any |
storeAppend(key, item) |
Append to an array (auto-creates if key doesn't exist) | void |
storeKeys() |
List all stored keys | array of strings |
storeGetAll() |
Return entire store as a plain object | object |
Start by exploring the repository structure, then let the LLM determine the optimal search strategy:
// Phase 1: Discover
const files = listFiles("**/*.{js,ts,py}")
const sample = search("authentication")
// Phase 2: Let LLM plan the strategy
const strategy = LLM(
"Based on this repo structure, what are the best search queries to find ALL authentication code? Return as JSON array.",
files.join("\n") + "\n\nSample:\n" + sample
)
// Phase 3: Execute with discovered strategy
const queries = JSON.parse(String(strategy))
var allCode = ""
for (const q of queries) {
allCode = allCode + "\n" + search(q)
}
// Phase 4: Analyze
const analysis = LLM("Provide a comprehensive analysis of the authentication system.", allCode)
return analysisProcess large datasets in phases — extract, accumulate, compute, format:
// Phase 1: Collect and classify
const results = search("API endpoints")
const chunks = chunk(results)
const extracted = map(chunks, (c) => LLM(
"Extract endpoints as JSON array: [{method, path, handler}]. ONLY JSON.", c
))
for (const batch of extracted) {
const parsed = parseJSON(batch)
if (parsed) { for (const item of parsed) { storeAppend("endpoints", item) } }
}
// Phase 2: Pure JS statistics (no LLM needed!)
const all = storeGet("endpoints")
const byMethod = groupBy(all, "method")
var table = "| Method | Count |\n|--------|-------|\n"
for (const method of Object.keys(byMethod)) {
table = table + "| " + method + " | " + byMethod[method].length + " |\n"
}
// Phase 3: Small LLM summary
const summary = LLM("Write a brief summary of this API surface.", table)
return table + "\n" + summaryProcess many items efficiently using map() for controlled concurrency:
const files = listFiles("src/**/*.ts")
const batches = batch(files, 5)
var allIssues = []
for (const b of batches) {
const results = map(b, (file) => {
const code = extract(file)
return LLM("Find potential bugs. Return JSON: [{file, line, issue, severity}]. ONLY JSON.", code)
})
for (const r of results) {
const parsed = parseJSON(r)
if (parsed) { for (const issue of parsed) { allIssues.push(issue) } }
}
}
const bySeverity = groupBy(allIssues, "severity")
var report = "# Bug Report\n\n"
for (const sev of ["high", "medium", "low"]) {
const items = bySeverity[sev] || []
report = report + "## " + sev.toUpperCase() + " (" + items.length + ")\n"
for (const item of items) {
report = report + "- " + item.file + ":" + item.line + " — " + item.issue + "\n"
}
report = report + "\n"
}
return reportCombine results from multiple targeted searches:
const topics = ["authentication", "authorization", "session management", "CSRF", "XSS"]
var allFindings = ""
for (const topic of topics) {
try {
const results = search(topic + " security")
allFindings = allFindings + "\n## " + topic + "\n" + results
} catch (e) {
log("Search failed for: " + topic)
}
}
const chunks = chunk(allFindings)
const analyses = map(chunks, (c) => LLM(
"Analyze this code for security issues. Be specific about file and line.", c
))
var report = "# Security Audit\n\n"
for (const a of analyses) {
report = report + a + "\n\n"
}
return reportStart broad, then drill into the most interesting results:
// Broad search
const overview = search("database connection pool")
const summary = LLM(
"Which files are most important for understanding connection pooling? Return as JSON array of file paths.",
overview
)
// Deep dive
const importantFiles = JSON.parse(String(summary))
var details = ""
for (const file of importantFiles) {
try {
details = details + "\n\n" + extract(file)
} catch (e) { log("Could not extract: " + file) }
}
// Final analysis
const analysis = LLM(
"Provide a detailed analysis of the connection pooling implementation. Include architecture decisions and potential improvements.",
details
)
return analysisWhen analyzing code organized by module, package, or namespace, use chunkByKey() to ensure all files from the same group stay together. This gives the LLM complete context for each logical unit:
// Search for error handling across the codebase
const results = search("error handling try catch", { path: "./src" })
// Group by top-level module - each chunk has complete module context
const chunks = chunkByKey(results, file => {
const match = file.match(/src\/([^/]+)/)
return match ? match[1] : 'other'
})
// Analyze each module's error handling patterns
const analyses = map(chunks, c => LLM(
"Analyze error handling patterns in this module. Identify: module name, patterns used, inconsistencies, and recommendations.",
c,
{ schema: '{"modules": [{"name": "string", "patterns": "string", "issues": "string", "recommendation": "string"}]}' }
))
// Collect all module analyses
var all = []
for (const batch of analyses) {
if (batch.modules) { for (const m of batch.modules) { all.push(m) } }
}
// Output as markdown table
output("| Module | Patterns | Issues | Recommendation |\n")
output("|--------|----------|--------|----------------|\n")
for (const m of all) {
output("| " + m.name + " | " + m.patterns + " | " + m.issues + " | " + m.recommendation + " |\n")
}
return "Analyzed error handling in " + all.length + " modules"Other use cases for chunkByKey():
- Group test files by the module they test
- Analyze dependencies per package
- Review code changes by author or PR
- Process documentation by section
- Extract insights from markdown notes grouped by project or client
- Analyze log files grouped by service or date
When your script produces large structured data (tables, JSON, CSV), use output() to deliver it directly to the user without the AI rewriting or summarizing it:
const results = search("function export public")
const chunks = chunk(results)
const classified = map(chunks, (c) => LLM(
"Extract exported functions as JSON: [{name, file, description}]. ONLY JSON.", c
))
var functions = []
for (const batch of classified) {
const parsed = parseJSON(batch)
if (parsed) { for (const item of parsed) { functions.push(item) } }
}
// Build a markdown table
var table = "| Function | File | Description |\n|----------|------|-------------|\n"
for (const f of functions) {
table = table + "| " + (f.name || "Unknown") + " | " + (f.file || "Unknown") + " | " + (f.description || "-") + " |\n"
}
// output() sends the full table directly to the user — no summarization
output(table)
// return value is what the AI sees — keep it short
return "Generated table with " + functions.length + " exported functions"The AI will respond with something like "Here's the API surface analysis..." and the full table will be appended verbatim below its response.
When you need ALL matching results from a large codebase, you have two options:
Option A: Use searchAll() for simple cases
// Automatically paginates and concatenates all results
const allAuth = searchAll("authentication OR authorization")
const chunks = chunk(allAuth)
// Process all results...Option B: Manual pagination loop for custom logic
// Each execute_plan gets an isolated session, so pagination works correctly
const query = "JWT OR OAuth OR HMAC OR authentication"
var allResults = ""
var pageNum = 0
// Each search() call with the same query returns the next page
var page = search(query)
while (page && !page.includes("All results retrieved")) {
pageNum = pageNum + 1
log("Processing page " + pageNum)
// Process this page immediately (memory efficient)
const insights = LLM(
"Extract authentication patterns as JSON",
page,
{ schema: '{"patterns": [{"type": "string", "file": "string"}]}' }
)
// Accumulate just the extracted data, not raw results
for (const p of insights.patterns || []) {
storeAppend("auth_patterns", p)
}
// Get next page (same query = automatic pagination)
page = search(query)
}
// Generate final report from accumulated data
const patterns = storeGet("auth_patterns") || []
const byType = groupBy(patterns, "type")
output("# Authentication Patterns Found\n\n")
for (const type of Object.keys(byType)) {
output("## " + type + " (" + byType[type].length + " occurrences)\n")
for (const p of byType[type]) {
output("- " + p.file + "\n")
}
output("\n")
}
return "Found " + patterns.length + " authentication patterns across " + pageNum + " pages"Why manual pagination?
- Process each page with LLM before fetching next (lower memory usage)
- Stop early if you find what you need
- Different processing logic per page
- Track progress with
log()
Session isolation guarantee: Each execute_plan invocation gets a unique session ID, so:
- Multiple scripts running in parallel don't interfere
- Previous
execute_plancalls don't affect pagination state - You always start from page 1 within each script
LLM Script uses a safe subset of JavaScript. Keep these rules in mind:
Do:
- Use
varfor variables (orconst/let) - Use
for...ofloops for iteration - Use plain objects and arrays
- Check for errors with
if (result.indexOf("ERROR:") === 0)— tool functions never throw, they return"ERROR: ..."strings - Use string concatenation with
+(not template literals with${}) - Use
parseJSON()instead ofJSON.parse()when parsing LLM output (handles markdown fences) - Use
output()for large structured data that should reach the user verbatim
Don't:
- Use
async/await(auto-injected by the transformer) - Use
class,new,this - Use
eval,require,import - Use
process,globalThis,__proto__ - Define helper functions that call tools (the transformer can't inject
awaitinside user-defined functions) - Use regex literals (
/pattern/) — useindexOf(),includes(),startsWith()instead - Use
.matchAll()or.join()(SandboxJS limitations — usefor...ofloops instead)
LLM Script runs in a multi-layer security sandbox:
-
AST Validation — Before execution, the script's Abstract Syntax Tree is checked against a whitelist. Only safe constructs are allowed. No
eval,require,import,class,new,this,__proto__,constructor, orprototypeaccess. -
SandboxJS Isolation — Scripts execute in SandboxJS, a JavaScript sandbox that prevents access to Node.js globals, the filesystem, and the network. Only the explicitly provided tool functions are available.
-
Loop Guards — Automatic loop iteration limits (default 5,000) prevent infinite loops. The transformer injects a
__checkLoop()call into every loop body. -
Execution Timeout — A configurable timeout (default 2 minutes) kills scripts that take too long.
-
Self-Healing — If a script fails, the error is sent to the LLM which generates a fixed version. Up to 2 retries are attempted before returning an error.
When enabled, Probe Agent gets access to the execute_plan tool. The AI generates LLM Script code and calls execute_plan to run it. The script has automatic access to all built-in tools (search, query, extract, LLM, etc.) plus any MCP tools you've connected.
import { ProbeAgent } from '@probelabs/probe';
const agent = new ProbeAgent({
path: '/path/to/your/codebase',
provider: 'anthropic',
enableExecutePlan: true // Enable LLM Script
});
// The agent will now use LLM Script for complex analysis tasks
const report = await agent.answer(
'Find all API endpoints and classify them by HTTP method'
);probe agent "Find all API endpoints" \
--path /path/to/project \
--provider google \
--enable-execute-planThe AI automatically chooses LLM Script (over simple search) for questions that require:
- Comprehensive coverage: "Find all error handling patterns"
- Complete inventories: "Give me a complete inventory of API routes"
- Multi-topic analysis: "Compare authentication, authorization, and session handling"
- Batch processing: "Classify every TODO comment by priority"
- Quantitative answers: "How many functions in each module?"
For simple, focused questions like "How does the login function work?", the AI uses direct search instead.
You: "Generate a comprehensive health report for this codebase —
code complexity, test coverage gaps, dependency analysis,
and security concerns."
You: "Find every API endpoint, extract its parameters, authentication
requirements, and response types, then generate OpenAPI-style
documentation as markdown."
You: "We're migrating from Express to Fastify. Find all Express-specific
patterns (middleware, route handlers, error handlers) and produce
a migration checklist with effort estimates."
You: "We need to upgrade the 'auth' library. Find every file that imports
from it, classify each usage pattern, and identify which ones will
break with the new API."
- Agent Overview — What is Probe Agent and when to use it
- Node.js SDK — Programmatic access to Probe
- API Reference — ProbeAgent class documentation
- Tools Reference — All available agent tools