nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding

Benjin Zhu*, zhe wang, Hongsheng Li* ;

Abstract


"Existing benchmarks for 3D semantic occupancy prediction in autonomous driving are limited by low resolution (up to [512×512×40] with 0.2m voxel size) and inaccurate annotations, hindering the unification of 3D scene understanding through the occupancy representation. Moreover, previous methods can only generate occupancy predictions at 0.4m resolution or lower, requiring post-upsampling to reach their full resolution (0.2m). The root of these limitations lies in the sparsity, noise, and even errors present in the raw data. In this paper, we overcome these challenges by introducing nuCraft, a high-resolution and accurate semantic occupancy dataset derived from nuScenes. nuCraft offers an 8× increase in resolution ([1024 × 1024 × 80] with voxel size of 0.1m) and more precise semantic annotations compared to previous benchmarks. To address the high memory cost of high-resolution occupancy prediction, we propose VQ-Occ, a novel method that encodes occupancy data into a compact latent feature space using a VQ-VAE. This approach simplifies semantic occupancy prediction into feature simulation in the VQ latent space, making it easier and more memory-efficient. Our method enables direct generation of semantic occupancy fields at high resolution without post-upsampling, facilitating a more unified approach to 3D scene understanding. We validate the superior quality of nuCraft and the effectiveness of VQ-Occ through extensive experiments, demonstrating significant advancements over existing benchmarks and methods."

Related Material


[pdf] [supplementary material] [DOI]