Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation

Yuanchen Ju, Kaizhe Hu, Guowei Zhang, Gu Zhang, Mingrun Jiang, Huazhe Xu* ;

Abstract


"Enabling robotic manipulation that generalizes to out-of-distribution scenes is a crucial step toward the open-world embodied intelligence. For human beings, this ability is rooted in the understanding of semantic correspondence among different objects, which helps to naturally transfer the interaction experience of familiar objects to novel ones. Although robots lack such a reservoir of interaction experience, the vast availability of human videos on the Internet may serve as a resource, from which we extract an affordance memory of contact points. Inspired by the natural way humans think, we propose : when confronted with unfamiliar objects that require generalization, the robot can acquire affordance by retrieving objects that share visual and semantic similarities from the memory, then mapping the contact points of the retrieved objects to the new object. While such correspondence may present formidable challenges at first glance, recent research finds it naturally arises from pre-trained diffusion models, enabling affordance mapping even across disparate categories. Through the framework, robots can generalize to manipulate out-of-category objects in a zero-shot manner without any manual annotation, additional training, part segmentation, pre-coded knowledge, or viewpoint restrictions. Quantitatively, significantly enhances the accuracy of visual affordance inference by a large margin of 28.7% compared to state-of-the-art (SOTA) end-to-end affordance models. We also conduct real-world experiments of cross-category object-grasping and achieve a success rate of 85.7%, proving ’s capacity for real-world tasks."

Related Material


[pdf] [supplementary material] [DOI]