-
Notifications
You must be signed in to change notification settings - Fork 311
Description
Hi Team PeronaLive,
Following up on my previous tests, I have completely upgraded my setup to match your recommended architecture.
My Current Setup:
Hardware: NVIDIA L40 (Ada Lovelace architecture, similar to RTX 4090).
Software: Standard installation from your repo + TensorRT acceleration enabled.
Network: Direct HTTP connection via GPU IP + Port (I am NOT using any tunneling service like ngrok to rule out network bottlenecks).
Despite this optimized setup, I am facing two major issues:
-
Latency is still extremely high (8-12 seconds) Since I am using an L40 with TensorRT and a direct connection, the inference should be near-instant. However, the delay remains consistently around 8 to 12 seconds. Could you explain exactly how you achieve the low latency shown in your demos? Is there a specific buffer setting or streaming configuration in the code that needs to be adjusted for cloud environments?
-
Visual Artifacts (Glasses appear on movement) When I nod or turn my head, the model suddenly generates weird glasses on the character's face, even though the reference image does not have any. This happens consistently during head movements.
Could you please provide insights on why the latency is so high despite the correct hardware/TensorRT setup, and what causes these specific artifacts?
Thanks for your help.