ECVA | European Computer Vision Association

AddMe: Zero-shot Group-photo Synthesis by Inserting People into Scenes

Dongxu Yue, Maomao Li, Yunfei Liu, Ailing Zeng, Tianyu Yang, Qin Guo, Yu Li* ;

Abstract

"While large text-to-image diffusion models have made significant progress in high-quality image generation, challenges persist when users insert their portraits into existing photos, especially group photos. Concretely, existing customization methods struggle to insert facial identities at desired locations in existing images, and it is difficult for existing local image editing methods to deal with facial details. To address these limitations, we propose AddMe, a powerful diffusion-based portrait generator that can insert a given portrait into a desired location in an existing scene image in a zero-shot manner. Specifically, we propose a novel identity adapter to learn a facial representation decoupled from existing characters in the scene. Meanwhile, to ensure that the generated portrait can interact properly with others in the existing scene, we design an enhanced portrait attention module to capture contextual information during the generation process. Our method is compatible with both text and various spatial conditions, enabling precise control over the generated portraits. Extensive experiments demonstrate significant improvements in both performance and efficiency."

Related Material

[pdf] [supplementary material] [DOI]