Skip to content

Support multiple image response schemas in async_llm.py for better provider compatibility #7

@XuShengXianggg

Description

@XuShengXianggg

Hi maintainers,

While reproducing the repository with various image_model providers (including proxy vendors), I found that base/engine/async_llm.py currently assumes a specific response schema. In practice, different LLM/VLM providers return images in varying formats, which currently leads to parsing errors or requires provider-specific hacks in downstream code.

Current implementations often fail when encountering:

  • Direct URLs: {"url": "https://..."}
  • Base64 Strings: {"b64_json": "..."}
  • Data URIs: {"image": "data:image/png;base64,..."}
  • Nested/Custom Structures: Some SDKs wrap outputs in output.images or return raw binary blobs like {"output": {"images": [...]}}.

I propose adding a normalization layer in base/engine/async_llm.py, which would detect the incoming format automatically and unify these diverse responses into a consistent internal representation (e.g., always converting to a standard PIL object or a consistent Base64 format).

This enhancement would significantly improve the flexibility of the engine and make it easier to integrate with different inference backends.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions