Adversarial Erasing Framework via Triplet with Gated Pyramid Pooling Layer for Weakly Supervised Semantic Segmentation

Sung-Hoon Yoon, Hyeokjun Kweon, Jegyeong Cho, Shinjeong Kim, Kuk-Jin Yoon ;

Abstract


"Weakly supervised semantic segmentation (WSSS) has employed Class Activation Maps (CAMs) to localize the objects. However, the CAMs typically do not fit along the object boundaries and highlight only the most-discriminative regions. To resolve the problems, we propose a Gated Pyramid Pooling (GPP) layer which is a substitute for a Global Average Pooling (GAP) layer, and an Adversarial Erasing Framework via Triplet (AEFT). In the GPP layer, a feature pyramid is obtained by pooling the CAMs at multiple spatial resolutions, and then be aggregated into an attention for class prediction by gated convolution. With the process, CAMs are trained not only to capture the global context but also to preserve fine-details from the image. Meanwhile, the AEFT targets an over-expansion, a chronic problem of Adversarial Erasing (AE). Although AE methods expand CAMs by erasing the discriminative regions, they usually suffer from the over-expansion due to an absence of guidelines on when to stop erasing. We experimentally verify that the over-expansion is due to rigid classification, and metric learning can be a flexible remedy for it. AEFT is devised to learn the concept of erasing with the triplet loss between the input image, erased image, and negatively sampled image. With the GPP and AEFT, we achieve new state-of-the-art both on the PASCAL VOC 2012 val/test and MS-COCO 2014 val set by 70.9%/71.7% and 44.8% in mIoU, respectively."

Related Material


[pdf] [supplementary material] [DOI]