Abstract: The creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven approach to create high-fidelity, controllable avatars from multi-view video recordings. We build our method around 3D Gaussian splatting for its highly efficient rendering and to inherit the topological flexibility of point clouds. In contrast to previous work, we condition our avatars' dynamics on the rich expression space of neural parametric head models (NPHM), instead of mesh-based 3DMMs. To this end, we distill the backward deformation field of our underlying NPHM into forward deformations which are compatible with rasterization-based rendering. All remaining fine-scale, expression-dependent details are learned from the multi-view videos. To increase the representational capacity of our avatars, we augment the canonical Gaussian point cloud using per-primitive latent features which govern its dynamic behavior. To regularize this increased dynamic expressivity, we propose Laplacian terms on the latent features and predicted dynamics. We evaluate our method on the public NeRSemble dataset, demonstrating that NPGA significantly outperforms the previous state-of-the-art avatars on the self-reenactment task by ~2.6PSNR. Furthermore, we demonstrate accurate animation capabilities from real-world monocular videos.
Lay summary (by Llama 3 70B): Imagine you're playing a video game or chatting with someone online, and you want to see a virtual version of yourself that looks super realistic and can make all the same facial expressions as you. Creating digital heads that look and act like real people is a big challenge, but it's an important step in making virtual reality feel more real.
We've come up with a new way to create these digital heads, called Neural Parametric Gaussian Avatars (NPGA). It's like taking a bunch of videos of someone's face from different angles and then using that information to create a virtual version of their head that can move and express emotions just like they do.
Our method is special because it uses a technique called "Gaussian splatting" to make the virtual head look really realistic and move smoothly. It also uses a special kind of math called "neural parametric head models" to make the head's movements and expressions look super natural.
To make the virtual head even more realistic, we added some extra details that are learned from the videos. We also added some special rules to make sure the head's movements look realistic and not too crazy.
We tested our method using a special dataset called NeRSemble, and it worked way better than other methods that have been tried before. We can even take a video of someone's face and use it to animate the virtual head, making it look like the person is really talking and moving!
Overall, our new method is a big step forward in creating realistic digital heads that can be used in all sorts of cool applications, like video games, virtual reality, and even video conferencing.