Recent advances in neural rendering have improved both training and rendering times by orders of magnitude. While these methods demonstrate state-of-the-art quality and speed, they are designed for photogrammetry of static scenes and do not generalize well to freely moving humans in the environment. In this work, we introduce Human Gaussian Splats (HUGS) that represents an animatable human together with the scene using 3D Gaussian Splatting (3DGS). Our method takes only a monocular video with a small number of (50-100) frames, and it automatically learns to disentangle the static scene and a fully animatable human avatar within 30 minutes. We utilize the SMPL body model to initialize the human Gaussians. To capture details that are not modeled by SMPL (e.g. cloth, hairs), we allow the 3D Gaussians to deviate from the human body model. Utilizing 3D Gaussians for animated humans brings new challenges, including the artifacts created when articulating the Gaussians. We propose to jointly optimize the linear blend skinning weights to coordinate the movements of individual Gaussians during animation. Our approach enables novel-pose synthesis of human and novel view synthesis of both the human and the scene. We achieve state-of-the-art rendering quality with a rendering speed of 60 FPS while being ~100x faster to train over previous work.
最近在神经渲染方面的进步大幅提高了训练和渲染时间。虽然这些方法展示了最先进的质量和速度,但它们主要设计用于静态场景的摄影测量,对于环境中自由移动的人类并不适用。在这项工作中,我们引入了人类高斯溅点(Human Gaussian Splats,简称 HUGS),它使用三维高斯溅点(3D Gaussian Splatting,简称 3DGS)来表示可动画的人类和场景。我们的方法仅需要一个单目视频,帧数在50到100之间,它能在30分钟内自动学习区分静态场景和完全可动画的人类化身。我们利用SMPL身体模型初始化人类高斯。为了捕捉SMPL无法模拟的细节(例如衣物、头发),我们允许三维高斯偏离人体模型。使用三维高斯来实现人类动画带来了新的挑战,包括在高斯关节活动时产生的失真。我们提出共同优化线性混合蒙皮权重,以协调动画过程中各个高斯的运动。我们的方法使得人类新姿势的合成和人类及场景的新视角合成成为可能。我们实现了最先进的渲染质量,并且渲染速度达到60 FPS,训练速度比以往的工作快约100倍。