feat: vision/image support via stream-json + CLI isolation by rjabalosiii · Pull Request #14 · atalovesyou/claude-max-api-proxy

rjabalosiii · 2026-02-09T15:25:39Z

Summary

Adds multimodal (text + image) support for OpenAI-compatible clients that send base64 screenshots, such as Browser Use. Also fixes several compatibility issues discovered during real-world testing.

What changed

Vision/image support: Auto-detects image_url content parts in OpenAI requests and switches to --input-format stream-json mode, piping NDJSON with base64 image blocks via stdin. Text-only requests still use the fast CLI argument path (zero overhead when no images).
Multimodal content arrays: Handles OpenAI's content field as both string and Array<{type, text?, image_url?}> — required for any client sending screenshots or structured content.
Code fence stripping: Claude wraps JSON responses in markdown code fences (\``json...). Added stripCodeFences()` to extract clean JSON for clients expecting raw JSON (e.g., Browser Use's structured output).
CLI isolation: Prevents Claude Code's agentic system prompt, tools, skills, and settings files from leaking into proxy responses. Flags: --tools "", --disable-slash-commands, --setting-sources "", custom --system-prompt, cwd: /tmp.
Body limit bump: 10mb → 50mb to accommodate base64-encoded screenshots.

How it works

Client sends image_url (base64 data URI)
  → Proxy detects hasImages=true
  → Converts to Claude CLI stream-json format:
    {"type":"user","message":{"role":"user","content":[
      {"type":"text","text":"..."},
      {"type":"image","source":{"type":"base64","media_type":"image/png","data":"..."}}
    ]}}
  → Pipes via stdin with --input-format stream-json
  → Claude CLI processes multimodal input
  → Response streamed back as SSE chunks

Backward compatibility

No breaking changes — text-only requests follow the same code path as before
Stream-json mode only activates when images are detected

Test plan

E2E tested with Browser Use v0.11.9 (vision-enabled browser automation)
Stress tested with 3 task types: data extraction, multi-page navigation, structured output
Verified text-only requests still work via CLI argument path
Verified base64 PNG screenshots pass through to Claude and get vision responses

🤖 Generated with Claude Code

Adds multimodal (text + image) support for OpenAI-compatible clients like Browser Use that send base64 screenshots via the chat completions endpoint. Changes: - Support OpenAI content arrays with text and image_url parts - Auto-detect images and switch to --input-format stream-json mode (text-only requests still use the fast CLI argument path) - Convert data URI images to Claude CLI base64 format via stdin piping - Strip code fences from model responses (Claude wraps JSON in fences) - Isolate CLI subprocess: --tools "", --disable-slash-commands, --setting-sources "", --system-prompt override, cwd /tmp - Bump body limit to 50mb for base64 image payloads Tested with Browser Use v0.11.9 running vision-enabled browser automation tasks (screenshots piped as base64 PNG). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…am-json + CLI isolation # Conflicts: # src/adapter/cli-to-openai.ts # src/adapter/openai-to-cli.ts # src/server/routes.ts # src/subprocess/manager.ts # src/types/openai.ts

wende mentioned this pull request Feb 18, 2026

feat: upstream PR triage — 6 fixes + e2e tests wende/claude-max-api-proxy#2

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: vision/image support via stream-json + CLI isolation#14

feat: vision/image support via stream-json + CLI isolation#14
rjabalosiii wants to merge 1 commit intoatalovesyou:mainfrom
rjabalosiii:feat/vision-multimodal-support

rjabalosiii commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rjabalosiii commented Feb 9, 2026

Summary

What changed

How it works

Backward compatibility

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant