Idea2Img: Iterative Self-Refinement with GPT-4V for Automatic Image Design and Generation

Zhengyuan Yang*, Jianfeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang ;

Abstract


"We introduce “Idea to Image,”1 an agent system that enables multimodal iterative self-refinement with for automatic image design and generation. Humans can quickly identify the characteristics of different text-to-image (T2I) models via iterative explorations. This enables them to efficiently convert their high-level generation ideas into effective T2I prompts that can produce good images. We investigate if systems based on large multimodal models (LMMs) can develop analogous multimodal self-refinement abilities that enable exploring unknown models or environments via self-refining tries. cyclically generates revised T2I prompts to synthesize draft images, and provides directional feedback for prompt revision, both conditioned on its memory of the probed T2I model’s characteristics. The iterative self-refinement brings various advantages over vanilla T2I models. Notably, can process input ideas with interleaved image-text sequences, follow ideas with design instructions, and generate images of better semantic and visual qualities. The user preference study validates the efficacy of on automatic image design and generation via multimodal iterative self-refinement. 1 Short for “.” System logo design [height=15pt]figure/logo1.png assisted by ."

Related Material


[pdf] [supplementary material] [DOI]