Neural Volumetric World Models for Autonomous Driving

Zanming Huang*, Jimuyang Zhang*, Eshed Ohn-Bar* ;

Abstract


"Effectively navigating a dynamic 3D world requires a comprehensive understanding of the 3D geometry and motion of surrounding objects and layouts. However, existing methods for perception and planning in autonomous driving primarily rely on a 2D spatial representation, based on a bird’s eye perspective of the scene, which is insufficient for modeling motion characteristics and decision-making in real-world 3D settings with occlusion, partial observability, subtle motions, and varying terrains. Motivated by this key insight, we present a novel framework for learning end-to-end autonomous driving based on volumetric representations. Our proposed neural volumetric world modeling approach, NeMo, can be trained in a self-supervised manner for image reconstruction and occupancy prediction tasks, benefiting scalable training and deployment paradigms such as imitation learning. Specifically, we demonstrate how the higher-fidelity modeling of 3D volumetric representations benefits vision-based motion planning. We further propose a motion flow module to model complex dynamic scenes, enabling additional robust spatiotemporal consistency supervision. Moreover, a temporal attention module is introduced to effectively integrate predicted future volumetric features for the planning task. Our proposed sensorimotor agent achieves state-of-the-art driving performance on nuScenes and CARLA, outperforming prior baseline methods by over 18%."

Related Material


[pdf] [DOI]