Skip to content

Comments

Strip XML tags, runtime model metadata, run detail UI#90

Open
penso wants to merge 8 commits intomainfrom
xml-stripping
Open

Strip XML tags, runtime model metadata, run detail UI#90
penso wants to merge 8 commits intomainfrom
xml-stripping

Conversation

@penso
Copy link
Collaborator

@penso penso commented Feb 11, 2026

Summary

  • XML tag stripping: New response_sanitizer module strips internal XML tags (thinking, reasoning, scratchpad, etc.), pipe tokens (<|eot_id|>, <|im_end|>, etc.), and reasoning pattern blocks from LLM responses. Also recovers structured tool calls from XML blocks as a fallback. Integrated in both streaming and non-streaming paths in the agent runner and gateway chat handler.
  • Runtime model metadata: ModelMetadata struct and model_metadata() trait method on LlmProvider with a default implementation. OpenAI provider overrides it to fetch context window from /models/{model} API with OnceCell caching and static fallback on error. Auto-compaction now uses runtime metadata for accurate context window sizing.
  • Run detail UI: Backend read_by_run_id store method, sessions.run_detail RPC endpoint, and a Preact RunDetail component with Overview/Actions/Messages tabs. Expandable button appears on assistant messages that have a run_id. Tool results now propagate run_id for proper grouping.

Validation

Completed

  • just format-check passes
  • cargo clippy -p moltis-agents -p moltis-sessions -p moltis-gateway -- -D warnings clean
  • cargo test — all tests pass (including 20 new sanitizer tests, 5 new metadata tests, 2 new store tests)
  • biome check --write applied to JS files

Remaining

  • ./scripts/local-validate.sh <PR> — to be run after PR creation
  • E2E tests: cd crates/gateway/ui && npx playwright test
  • Manual QA: send a message and verify clean response, check logs for metadata fetch, expand run detail in chat

Manual QA

Pending — will be performed after local validation completes.

Add response_sanitizer module that strips internal reasoning tags
(thinking, reflection, scratchpad, etc.), special control tokens
(eot_id, im_end, etc.), and recovers structured tool calls from
XML blocks in LLM output. Integrated at both the agent runner
level and the gateway streaming path.
Add ModelMetadata struct and model_metadata() trait method to
LlmProvider with a default implementation that returns the static
context_window() value. Override in OpenAiProvider to fetch context
length from the /models API endpoint with OnceCell caching. Use
runtime metadata for auto-compaction threshold in the gateway.
Add sessions.run_detail RPC method that returns messages for a
specific run_id, plus a RunDetail Preact component with Overview,
Actions, and Messages tabs. The component is mounted on assistant
messages that have a run_id. Tool results now carry run_id for
linking. Includes backend tests and E2E specs.
Apply rustfmt, biome, and clippy fixes across the three feature commits.
Add changelog entries for XML tag stripping, runtime model metadata, and
run detail UI features.
@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Feb 11, 2026

Merging this PR will improve performance by ×2.6

⚡ 1 improved benchmark
✅ 33 untouched benchmarks
⏩ 1 skipped benchmark1

Performance Changes

Benchmark BASE HEAD Efficiency
session_store_list[10] 39 µs 15.1 µs ×2.6

Comparing xml-stripping (d4d15bc) with main (45cbf7c)

Open in CodSpeed

Footnotes

  1. 1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports.

@codecov
Copy link

codecov bot commented Feb 11, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant