Weakly Supervised Object Localization through Inter-class Feature Similarity and Intra-Class Appearance Consistency

Jun Wei, Sheng Wang, S. Kevin Zhou, Shuguang Cui, Zhen Li ;

Abstract


"Weakly supervised object localization (WSOL) aims at detecting objects through only image-level labels. Class activation maps (CAMs) are the commonly used features for WSOL. However, existing CAM-based methods tend to excessively pursue discriminative features for object recognition and hence ignore the feature similarities among different categories, thereby leading to CAMs incomplete for object localization. In addition, CAMs are sensitive to background noise due to over-dependence on the holistic classification. In this paper, we propose a simple but effective WSOL model (named ISIC) through Inter-class feature Similarity and Intra-class appearance Consistency. In practice, our ISIC model first proposes the inter-class feature similarity (ICFS) loss against the original cross entropy loss. Such an ICFS loss sufficiently leverages the shared features together with the discriminative features between different categories, which significantly reduces the model over-fitting risk to background noise and brings more complete object masks. Besides, instead of CAMs, a non-negative matrix factorization mask module is applied to extract object masks from multiple intra-class images. Thanks to intra-class appearance consistency, the achieved pseudo masks are more complete and robust. As a result, extensive experiments confirm that our ISIC model achieves state-of-the-art on both CUB-200 and ImageNet-1K benchmarks i.e., 97.3% and 70.0% GT-Known localization accuracy, respectively."

Related Material


[pdf] [DOI]