ECVA | European Computer Vision Association

Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels

Rui Huang, Songyou Peng, Ayca Takmaz, Federico Tombari, Marc Pollefeys, Shiji Song, Gao Huang*, Francis Engelmann ;

Abstract

"Current 3D scene segmentation methods are heavily dependent on manually annotated 3D training datasets. Such manual annotations are labor-intensive, and often lack fine-grained details. Furthermore, models trained on this data typically struggle to recognize object classes beyond the annotated training classes, , they do not generalize well to unseen domains and require additional domain-specific annotations. In contrast, recent 2D foundation models have demonstrated strong generalization and impressive zero-shot abilities, inspiring us to incorporate these characteristics from 2D models into 3D models. Therefore, we explore the use of image segmentation foundation models to automatically generate high-quality training labels for 3D segmentation models. The resulting model, , generalizes significantly better than the models trained on costly manual 3D labels and enables easily adding new training data to further boost the segmentation performance."

Related Material

[pdf] [supplementary material] [DOI]