HSR: Holistic 3D Human-Scene Reconstruction from Monocular Videos

Lixin Xue*, Chen Guo, Chengwei Zheng, Fangjinhua Wang, Tianjian Jiang, Hsuan-I Ho, Manuel Kaufmann, Jie Song, Otmar Hilliges ;

Abstract


"An overarching goal for computer-aided perception systems is the holistic understanding of the human-centric 3D world, including faithful reconstructions of humans, scenes, and their global spatial relationships. While recent progress in monocular 3D reconstruction has been made for footage of either humans or scenes alone, the joint reconstruction of both humans and scenes, along with their global spatial information, remains an unsolved challenge. To address this, we introduce a novel and unified framework that simultaneously achieves temporally and spatially coherent 3D reconstruction of static scenes with dynamic humans from monocular RGB videos. Specifically, we parameterize temporally consistent canonical human models and static scene representations using two neural fields in a shared 3D space. Additionally, we develop a global optimization framework that considers physical constraints imposed by potential human-scene interpenetration and occlusion. Compared to separate reconstructions, our framework enables detailed and holistic geometry reconstructions of both humans and scenes. Furthermore, we introduce a synthetic dataset for quantitative evaluations. Extensive experiments and ablation studies on both real-world and synthetic videos demonstrate the efficacy of our framework in monocular human-scene reconstruction. Code and data are publicly available on our project page."

Related Material


[pdf] [supplementary material] [DOI]