Skip to content

Issues with High Latency (8-12s) & Artifacts on L40 (TensorRT) - Direct Connection #49

@bastionloui-lgtm

Description

@bastionloui-lgtm

Hi Team PeronaLive,

Following up on my previous tests, I have completely upgraded my setup to match your recommended architecture.

My Current Setup:

Hardware: NVIDIA L40 (Ada Lovelace architecture, similar to RTX 4090).

Software: Standard installation from your repo + TensorRT acceleration enabled.

Network: Direct HTTP connection via GPU IP + Port (I am NOT using any tunneling service like ngrok to rule out network bottlenecks).

Despite this optimized setup, I am facing two major issues:

  1. Latency is still extremely high (8-12 seconds) Since I am using an L40 with TensorRT and a direct connection, the inference should be near-instant. However, the delay remains consistently around 8 to 12 seconds. Could you explain exactly how you achieve the low latency shown in your demos? Is there a specific buffer setting or streaming configuration in the code that needs to be adjusted for cloud environments?

  2. Visual Artifacts (Glasses appear on movement) When I nod or turn my head, the model suddenly generates weird glasses on the character's face, even though the reference image does not have any. This happens consistently during head movements.

Could you please provide insights on why the latency is so high despite the correct hardware/TensorRT setup, and what causes these specific artifacts?

Thanks for your help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions