Segment and Recognize Anything at Any Granularity

Feng Li*, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianwei Yang, Lei Zhang*, Jianfeng Gao* ;

Abstract


"In this work, we introduce , an augmented image segmentation foundation for segmenting and recognizing anything at desired granularities. Compared to the foundational segmentation model SAM [?], our model has two unique advantages: (i) granularity-controllability in that the model can produce segmentation masks at any desired granularities, from objects to parts to both; (ii) semantic-awareness in that the model simultaneously predicts semantic labels for masks at different granularities. To enable multi-granularity capabilities, we propose a multi-choice learning scheme, where each click point generates a set of masks at multiple levels of granularity, corresponding to a set of ground-truth masks. To achieve semantic awareness, we consolidate multiple datasets of different levels of granularity and train our model using decoupled object- and part-based tasks to facilitate knowledge sharing and transfer among different tasks. To the best of our knowledge, this work is the first attempt to jointly train a model on SA-1B, instance-level, and part-level segmentation datasets. Experimental results and visualizations demonstrate that our model successfully achieves the desired goals. Furthermore, we show that multi-task training using the segmentation task defined on SA-1B and other segmentation tasks (e.g., panoptic and part segmentation) leads to performance gains on all segmentation tasks. In particular, we achieve a new state-of-the-art in COCO panoptic segmentation 60.2 PQ by adding SAM data."

Related Material


[pdf] [DOI]