-
Notifications
You must be signed in to change notification settings - Fork 310
Description
Hi, thanks for PersonaLive! I’ve now got the server running reliably on my Windows machine (RTX 5080, PyTorch nightly + 2.10.0.+cu130 / CUDA 13.1). I switched back to the non-boosted PyTorch path. That keeps the server stable, but I’m still hitting a few blocking problems in the real-time experience:
Latency never below 3.5–5 seconds
Even after trimming queues (prefer_latest=True, rolling window, queue size capped at 32) the live latency sits around 3.5–5 s. I’m trying to get it under 1 s so the system feels live. The GPU is barely stressed (usage < 40 %) so I suspect either the model pipeline is still waiting on full windows or parts of the queue stack aren’t being flushed aggressively enough.
Generated output “jitters” or turns crispy when I move my head
The synthesized face flickers heavily if I make rapid head turns or quick facial motions. It looks like the renderer can’t keep up with head pose changes and oscillates between keyframes. (I’m running 15 FPS from the webcam, model runs ~5 FPS.)
Glasses are always added, even if target avatar has none
I wear glasses; whenever I drive a portrait that doesn’t have glasses, the output re-draws thick glasses on top of the avatar even though the reference image is clearly without frames. I can’t find a switch to disable that behaviour.
Mouth corners always droop
Regardless of how much I smile, grin, or laugh, the generated character keeps its mouth corners glued far downward. It’s especially noticeable when driving a cheerful portrait—it looks sad even when I’m smiling.
Environment
Windows 11
GPU: RTX 5080 (Blackwell)
CPU: AMD Ryzen 9 9950X 16-Core Processor (4.30 GHz)
RAM: 64,0 GB
PyTorch nightly (2.10.0.+cu130 / CUDA 13.1)
PersonaLive config: personalive_online.yaml (batch_size 1, fp16)
Running python inference_online.py --acceleration none
What I’ve tried
Limited the webcam input queue to 32 frames and flush on overflow.
Enabled prefer_latest=True and implemented a rolling window so we only wait for one new frame before pushing to the model.
Tested both codec quality and FPS downscaling (camera at 10 FPS).
Disabled TensorRT/XFormers temporarily, because that path still crashes on FP16 copy mismatch (see earlier issue).
None of these reduce latency below ~3.5 s or stop the flicker/glasses/mouth distortions.
Questions / Requested help
Is there an officially recommended way to keep latency under 1 s on RTX 50-series hardware with the PyTorch pipeline (no TensorRT yet)?
Can we disable the “auto glasses” augmentation or make it optional when the reference portrait doesn’t have glasses?
Are there parameters (scheduler, motion encoder, temporal window) we should tune to reduce flicker when the driver moves quickly?
Can the expression model be nudged so the mouth follows my actual smile rather than drooping?
Once we have these pieces sorted, I’ll happily retest the TensorRT path again; for now I just need a stable live experience with low latency and accurate expressions.
Thanks for any guidance!