This study investigates how generative AI systems interpret the architectural intelligence embedded in vernacular form. Using the Iranian pigeon tower as a case study, the research tests three diffusion models, Midjourney v6, DALL-E 3, and DreamStudio based on Stable Diffusion XL (SDXL), across three prompt stages: referential, adaptive, and speculative. A five-criteria evaluation framework assesses how each system reconstructs typology, materiality, environment, realism, and cultural specificity. Results show that AI reliably reproduces geometric patterns but misreads material and climatic reasoning. Reference imagery improves realism yet limits creativity, while freedom from reference generates inventive but culturally ambiguous outcomes. The findings define a boundary between visual resemblance and architectural reasoning, positioning computational vernacular reasoning as a framework for analyzing how AI perceives, distorts, and reimagines traditional design intelligence.
Vernacular architecture carries environmental intelligence embedded in its material, spatial, and climatic adaptations. Iranian pigeon towers exemplify this knowledge system: cylindrical mudbrick structures designed to harvest fertilizer through controlled nesting, ventilation, and shading. Their form arises not from aesthetic choice but from iterative dialogue between craft, ecology, and material performance (Beazley, 1966;Bourgeois, 1983;Rapoport, 1969).
Generative AI offers new ways to visualize and reinterpret such traditions, yet its understanding of architecture remains largely visual. Diffusion-based models can reproduce recognizable geometry but rarely grasp the environmental logic or cultural context that produced it (Chen et al., 2025;Croce et al., 2023). When applied to heritage typologies, these systems generate compelling images that risk flattening localized intelligence into global aesthetic patterns (Tiribelli et al., 2024).
This study investigates how AI interprets vernacular material reasoning through the case of the Iranian pigeon tower. It examines how textual and visual inputs influence the fidelity and inventiveness of AI outputs, asking whether computational systems can engage not only with what architecture looks like, but how it works. Through a structured experimental workflow, the research evaluates three generative models-Midjourney v6, DALL•E 3, and DreamStudio (Stable Diffusion XL)-across referential, adaptive, and speculative prompts to test the boundaries between imitation and interpretation (Dahy, 2019).
Theoretical Background
Vernacular architecture emerges from continuous negotiation between material, climate, and use. Rather than following abstract design rules, it evolves through observation, adaptation, and repair, each generation refining what already works. In arid regions, where temperature, light, and dust impose strict limits, material knowledge becomes ecological strategy. Earthen construction stores and releases heat, filters air, and weathers into breathable surfaces that stabilize interior comfort (Bourgeois, 1983). The Iranian pigeon tower (kaboutarkhaneh) exemplifies this environmental reasoning. Built from sun-dried mudbrick, its cylindrical body, rhythmic perforations, and ventilated turrets provide thousands of nesting cells while regulating airflow and daylight for guano production in adjacent fields (Beazley, 1966;Momeni & Shiri, 2022). Geometry and performance coincide: the wall’s porosity is both structure and ventilation, and the tower’s mass anchors thermal stability. What might appear decorative is in fact climatic instrumentation.
Across cultures, such structures reveal architecture as a feedback system where spatial form emerges from the interaction of matter, climate, and human adaptation. In this view, design operates as a responsive process rather than a fixed intention-an idea echoed in performance-oriented and material-driven approaches that treat building as a negotiation between natural behavior and human agency (Dahy, 2019;Hensel et al., 2012). Vernacular builders achieved this through intuition and craft, while contemporary designers pursue it through computational experimentation. Both approaches depend on cycles of adjustment between constraint and opportunity, bridging traditional ecological intelligence and digital fabrication.
Generative artificial intelligence enters this continuum as a visual interpreter. Diffusion models such as Midjourney, DALL•E 3, and DreamStudio (based on Stable Diffusion XL) synthesize images by translating textual and visual data into probabilistic associations. Their output can be visually convincing yet conceptually shallow: they reconstruct what a form looks like without understanding why it exists (Leach, 2022). In heritage and architectural research, these models already assist in reconstructing damaged artifacts or visualizing historical typologies, producing high-fidelity images with minimal human input (Croce et al., 2023). Yet their success in resemblance often conceals interpretive limits.
When applied to architectural heritage, AI consistently reproduces recognizable geometry but overlooks the environmental reasoning that originally shaped it. Diffusion models can extrapolate missing surfaces or details with remarkable precision, yet they rarely register material behavior or climatic adaptation (Chen et al., 2025). This imbalance between visual fidelity and contextual understanding is reinforced by training datasets that merge imagery from diverse regions and eras, generating an aesthetic average that smooths cultural distinctions into a global visual norm (Croce et al., 2023). Without transparency in data provenance and ethical oversight, such systems risk perpetuating homogenizing and colonial patterns of representation, where local architectural intelligence is reduced to stylized exoticism. (Tiribelli et al., 2024) This homogenizing tendency is evident when prompting an AI with “Iranian
This content is AI-processed based on open access ArXiv data.