DreamHome-Pano: Design-Aware and Conflict-Free Panoramic Interior Generation

DreamHome-Pano: Design-Aware and Conflict-Free Panoramic Interior Generation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In modern interior design, the generation of personalized spaces frequently necessitates a delicate balance between rigid architectural structural constraints and specific stylistic preferences. However, existing multi-condition generative frameworks often struggle to harmonize these inputs, leading to “condition conflicts” where stylistic attributes inadvertently compromise the geometric precision of the layout. To address this challenge, we present DreamHome-Pano, a controllable panoramic generation framework designed for high-fidelity interior synthesis. Our approach introduces a Prompt-LLM that serves as a semantic bridge, effectively translating layout constraints and style references into professional descriptive prompts to achieve precise cross-modal alignment. To safeguard architectural integrity during the generative process, we develop a Conflict-Free Control architecture that incorporates structural-aware geometric priors and a multi-condition decoupling strategy, effectively suppressing stylistic interference from eroding the spatial layout. Furthermore, we establish a comprehensive panoramic interior benchmark alongside a multi-stage training pipeline, encompassing progressive Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). Experimental results demonstrate that DreamHome-Pano achieves a superior balance between aesthetic quality and structural consistency, offering a robust and professional-grade solution for panoramic interior visualization.


💡 Research Summary

This paper introduces “DreamHome-Pano,” a novel AI framework designed for generating high-fidelity, 360-degree panoramic interior images that simultaneously satisfy user-provided structural layout constraints and stylistic preferences. The core challenge addressed is the “condition conflict” prevalent in existing multi-condition generative models, where enforcing rigid geometric constraints from a layout (often provided as a rendered “Place Image”) clashes with the flexible aesthetic guidance from a style reference image, leading to compromised structural fidelity or bland visual output.

DreamHome-Pano’s solution is built upon two key innovations: a Prompt-LLM and a Conflict-Free Control Architecture. The Prompt-LLM acts as a semantic bridge and design interpreter. It takes the heterogeneous inputs—a layout image and a style reference image—and synthesizes them into a coherent, professional-grade textual design description. Crucially, it infers missing attributes (like material or color for furniture defined in the layout but not visible in the style reference) to ensure global stylistic consistency while strictly adhering to the spatial blueprint.

The Conflict-Free Control Architecture safeguards architectural integrity during generation. It employs a Dual-Geometric Prior: using an empty-room normal map to enforce rigid architectural boundaries (walls, ceilings, floors) while freeing up model capacity for decorative hallucination, and using coarse instance segmentation to provide flexible guidance for furniture placement. This resolves the issue of “over-rigid conditioning.” Furthermore, to prevent “multi-condition interference,” where latent spatial cues in the style reference image conflict with the fixed layout, the framework standardizes style references onto neutral spatial templates. This process isolates pure aesthetic attributes (color, texture, ambiance) from any inherent structural noise in the reference image.

The framework is compatible with modern MM-DiT (Multi-Modal Diffusion Transformer) backbones like Qwen-Image-Edit and FLUX.2. To train such a model effectively, the authors constructed a three-stage hierarchical data pipeline. Stage 1 uses 2.55 million low-resolution (1024x512) raw panoramic images for foundational geometric alignment. Stage 2 applies multi-dimensional quality filtering (resolution, brightness, Q-Align score) and diversity-preserving clustering to curate a set of 100k high-resolution (2048x1024) images. Stage 3 involves expert interior designers manually selecting 12k ultra-high-aesthetic-quality images for refined training. Each image is paired with a detailed, hierarchical attribute-grounded caption. A Vision-Language Model extracts elements (furniture, hard furnishings) and maps them to a taxonomy of 1,379 specific attributes (material, color, shape) before synthesizing a fluent, comprehensive description, providing fine-grained semantic supervision.

For evaluation, the authors established a comprehensive panoramic interior benchmark comprising 50 representative layouts and 10 diverse design styles. Performance is measured across four automated metrics: Spatial Consistency (via pixel-wise mIoU for layout alignment), Aesthetic Quality (HPSv3), Realism (OmniAID), and Similarity (CLIP score). This is complemented by a human expert evaluation where interior design professionals rate outputs on spatial plausibility, aesthetic appeal, and overall coherence. Experimental results demonstrate that DreamHome-Pano achieves a superior balance between aesthetic quality and structural consistency compared to strong baselines like Seedream 4.5, Gemini 3 Pro Image, and the base Qwen-Image-Edit and FLUX.2 models it builds upon.

In summary, DreamHome-Pano presents a systematic framework that effectively decouples geometric constraints from stylistic rendering in panoramic interior generation. By introducing a design-aware Prompt-LLM and a conflict-free control strategy, coupled with a rigorous, hierarchical data and training pipeline, it overcomes a fundamental limitation in conditional generation. The work provides a robust, professional-grade solution for immersive interior visualization, marking a significant step towards AI tools that can reliably interpret and execute complex human design intent within strict spatial constraints.


Comments & Academic Discussion

Loading comments...

Leave a Comment