Support multiple image response schemas in async_llm.py for better provider compatibility

Hi maintainers,

While reproducing the repository with various `image_model` providers (including proxy vendors), I found that `base/engine/async_llm.py` currently assumes a specific response schema. In practice, different LLM/VLM providers return images in varying formats, which currently leads to parsing errors or requires provider-specific hacks in downstream code.

Current implementations often fail when encountering:

- Direct URLs: {"url": "https://..."}
- Base64 Strings: {"b64_json": "..."}
- Data URIs: {"image": "data:image/png;base64,..."}
- Nested/Custom Structures: Some SDKs wrap outputs in `output.images` or return raw binary blobs like {"output": {"images": [...]}}.


I propose adding a normalization layer in `base/engine/async_llm.py`, which would detect the incoming format automatically and unify these diverse responses into a consistent internal representation (e.g., always converting to a standard PIL object or a consistent Base64 format).

This enhancement would significantly improve the flexibility of the engine and make it easier to integrate with different inference backends.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support multiple image response schemas in async_llm.py for better provider compatibility #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support multiple image response schemas in async_llm.py for better provider compatibility #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions