ComFusion: Enhancing Personalized Generation by Instance-Scene Compositing and Fusion

Yan Hong*, Yuxuan Duan, Bo Zhang, Haoxing Chen, Jun Lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang* ;

Abstract


"Recent progress in personalizing text-to-image (T2I) diffusion models has demonstrated their capability to generate images based on personalized visual concepts using only a few user-provided examples. However, these models often struggle with maintaining high visual fidelity, particularly when modifying scenes according to textual descriptions. To address this challenge, we introduce ComFusion, an innovative approach that leverages pretrained models to create compositions of user-supplied subject images and predefined text scenes. ComFusion incorporates a class-scene prior preservation regularization, which combines subject class and scene-specific knowledge from pretrained models to enhance generation fidelity. Additionally, ComFusion uses coarse-generated images to ensure alignment with both the instance images and scene texts, thereby achieving a delicate balance between capturing the subject’s essence and maintaining scene fidelity. Extensive evaluations of ComFusion against various baselines in T2I personalization have demonstrated its qualitative and quantitative superiority."

Related Material


[pdf] [supplementary material] [DOI]