ECVA | European Computer Vision Association

Crowd-SAM:SAM as a smart annotator for object detection in crowded scenes

Zhi Cai, Yingjie Gao, Yaoyan Zheng, Nan Zhou, Di Huang* ;

Abstract

"Object detection is an important task that finds its application in a wide range of scenarios. Generally, it requires extensive labels for training, which is quite time-consuming, especially in crowded scenes. Recently, Segment Anything Model (SAM) has emerged as a powerful zero-shot segmenter, offering a novel approach to instance segmentation. However, the accuracy and efficiency of SAM and its variants are often compromised when handling objects in crowded scenes where occlusions often appear. In this paper, we propose Crowd-SAM, a SAM-based framework designed to enhance the performance of SAM in crowded scenes with the cost of few learnable parameters and minimal labeled images. We introduce an efficient prompt sampler (EPS) and a part-whole discrimination network (PWD-Net), facilitating mask selection and contributing to an improvement in accuracy in crowded scenes. Despite its simplicity, Crowd-SAM rivals state-of-the-art fully-supervised object detection methods on several benchmarks including CrowdHuman and CityPersons. Our code is available at https://github.com/FelixCaae/CrowdSAM."

Related Material

[pdf] [supplementary material] [DOI]