AnyHome: Open-Vocabulary Large-Scale Indoor Scene Generation with First-Person View Exploration

Rao Fu*, Zehao Wen, Zichen Liu , Srinath Sridhar ;

Abstract


"Inspired by cognitive theories, we introduce , a framework that translates any text into well-structured and textured indoor scenes at a house-scale. By prompting Large Language Models (LLMs) with designed templates, our approach converts provided textual narratives into amodal structured representations. These representations guarantee consistent and realistic spatial layouts by directing the synthesis of a geometry mesh within defined constraints. A Score Distillation Sampling process is then employed to refine the geometry, followed by an egocentric inpainting process that adds lifelike textures to it. stands out with its editability, customizability, diversity, and realism. The structured representations for scenes allow for extensive editing at varying levels of granularity. Capable of interpreting texts ranging from simple labels to detailed narratives, generates detailed geometries and textures that outperform existing methods in both quantitative and qualitative measures."

Related Material


[pdf] [supplementary material] [DOI]