ECVA | European Computer Vision Association

ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model

Fu-Yun Wang*, Zhaoyang Huang*, Qiang Ma, Guanglu Song, Xudong LU, Weikang Bian, Yijin Li, Yu Liu, Hongsheng Li* ;

Abstract

"Although video generation has made great progress in capacity and controllability and is gaining increasing attention, currently available video generation models still make minimal progress in the video length they can generate. Due to the lack of well-annotated long video data, high training/inference cost, and flaws in the model designs, current video generation models can only generate videos of 2 ∼ 4 seconds, greatly limiting their applications and the creativity of users. We present , a zero-shot method for creative long animation generation with short video diffusion models and even with short video consistency models (a new family of generative models known for the fast generation with high quality). In addition to the extension for long animation generation (dozens of seconds), as a zero-shot method, can be easily combined with existing community adapters (developed only for image or short video models) for more innovative generation results, including control-guided animation generation/editing, motion customization/alternation, and multi-prompt conditioned animation generation, . And, importantly, all of these can be done with commonly affordable GPU (12 GB for 32-second animations) and inference time (90 seconds for denoising 32-second animations with consistency models). Experiments validate the effectiveness of , bringing great potential for creative long animation generation. More details are available at https://gen-l-2.github.io/."

Related Material

[pdf] [supplementary material] [DOI]