Issues with High Latency (8-12s) & Artifacts on L40 (TensorRT) - Direct Connection

Hi Team PeronaLive,

Following up on my previous tests, I have completely upgraded my setup to match your recommended architecture.

My Current Setup:

Hardware: NVIDIA L40 (Ada Lovelace architecture, similar to RTX 4090).

Software: Standard installation from your repo + TensorRT acceleration enabled.

Network: Direct HTTP connection via GPU IP + Port (I am NOT using any tunneling service like ngrok to rule out network bottlenecks).

Despite this optimized setup, I am facing two major issues:

1. Latency is still extremely high (8-12 seconds) Since I am using an L40 with TensorRT and a direct connection, the inference should be near-instant. However, the delay remains consistently around 8 to 12 seconds. Could you explain exactly how you achieve the low latency shown in your demos? Is there a specific buffer setting or streaming configuration in the code that needs to be adjusted for cloud environments?

2. Visual Artifacts (Glasses appear on movement) When I nod or turn my head, the model suddenly generates weird glasses on the character's face, even though the reference image does not have any. This happens consistently during head movements.

Could you please provide insights on why the latency is so high despite the correct hardware/TensorRT setup, and what causes these specific artifacts?

Thanks for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with High Latency (8-12s) & Artifacts on L40 (TensorRT) - Direct Connection #49

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issues with High Latency (8-12s) & Artifacts on L40 (TensorRT) - Direct Connection #49

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions