Multi-scale Cross Distillation for Object Detection in Aerial Images
Kun Wang, Zi Wang, Zhang Li*, Xichao Teng, Yang Li
;
Abstract
"Object detection in aerial images is a longstanding yet challenging task. Despite the significant advancements in recent years, most works still show unsatisfactory performance due to the scale variation of objects. A standard strategy to address this problem is multi-scale training, aiming to learn scale-invariant feature representations. Albeit achieving inspiring improvements, such a multi-scale strategy is impractical for real application as inference time increases considerably. Besides, the original images are resized to different scales and subsequently trained separately, lacking information interaction across different scales. This paper presents a novel method called multi-scale cross distillation (MSCD) to address the issues mentioned above. MSCD combines the merits of multi-scale training and knowledge distillation, enabling single-scale inference to achieve comparable or superior performance than multi-scale inference. Specifically, we first construct a parallel multi-branch architecture, in which each branch shares the same parameters yet takes images with different scales as input. Furthermore, we design an adaptive cross-scale distillation module that adaptively integrates the knowledge of different branches into one. Thus, the detectors trained with MSCD only require single-scale inference. Extensive experiments demonstrate the effectiveness of MSCD. Without bells and whistles, MSCD can facilitate prevalent two-stage detectors to outperform corresponding single-scale models by ∼5 and ∼7 mAP improvement on DOTA and DIOR-R datasets, respectively."
Related Material
[pdf]
[supplementary material]
[DOI]