ECVA | European Computer Vision Association

Global-to-Pixel Regression for Human Mesh Recovery

Yabo Xiao, Mingshu HE*, Dongdong Yu ;

Abstract

"Existing human mesh recovery (HMR) methods commonly leverage the global or dense-annotations-based local features to produce a single prediction from the input image. However, the compressed global and local features disrupt the spatial geometry of the human body and make it hard to capture the local dynamics, resulting in visual-mesh misalignment. Moreover, dense annotations are labor-intensive and expensive. Toward the above issues, we propose a global-to-local prediction framework to preserve spatial information and obtain precise visual-mesh alignments for top-down HMR. Specifically, we present an adaptive 2D Keypoint-Guided Local Encoding Module to enable per-pixel features to capture fine-grained body part information with structure and local context maintained. The acquisition of local features relies exclusively on sparse 2D keypoint guidance without dense annotations or heuristics keypoint-based ROI (Region of Interested) pooling. The enhanced pixel features are used to predict residuals for rectifying the initial estimation produced by global features. Secondly, we introduce a Dynamic Matching Strategy that determines positive/negative pixels by only calculating the classification and 2D keypoint costs to further improve visual-mesh alignments. The comprehensive experiments demonstrate the effectiveness of network design. Our framework outperforms previous local regression methods by a large margin and achieves state-of-the-art performance on Human3.6M and 3DPW datasets."

Related Material

[pdf] [DOI]