HouseTS: A Large-Scale, Multimodal Spatiotemporal U.S. Housing Dataset and Benchmark
Accurate long-horizon house-price forecasting requires benchmarks that capture temporal dynamics together with time-varying local context. However, existing public resources remain fragmented: many datasets have limited spatial coverage, temporal depth, or multimodal alignment; the robustness of modern deep forecasters and time-series foundation models on housing data is not well characterized; and aerial imagery is rarely leveraged in a time-aware and interpretable manner at scale. To bridge these gaps, we present HouseTS (House Time Series), a multimodal spatiotemporal dataset for ZIP-code-level housing-market analysis, covering monthly signals from March 2012 to December 2023 across over 6,000 ZIP codes in 30 major U.S. metropolitan areas. HouseTS aligns monthly housing-market indicators, monthly POI dynamics, and annual census-based socioeconomic variables under a unified schema, and includes time-stamped annual aerial imagery. Building on HouseTS, we define standardized long-horizon forecasting tasks for univariate and multivariate prediction and benchmark 16 model families spanning statistical methods, classical machine learning, deep neural networks, and time-series foundation models in both zero-shot and fine-tuned modes. We also provide image-derived textual change annotations from multi-year aerial image sequences via a vision–language pipeline with LLM-as-judge and human verification to support scalable interpretability analyses. HouseTS is available on Kaggle, with code and documentation on GitHub.
💡 Research Summary
This paper introduces HouseTS, a comprehensive multimodal spatiotemporal dataset and benchmark designed to address critical gaps in long-term housing price forecasting research. The authors identify key limitations in existing public resources: fragmented data with limited spatial coverage, temporal depth, or alignment across modalities; a lack of rigorous evaluation for modern deep learning and time-series foundation models on housing data; and the underutilization of aerial imagery in a temporally-aware and interpretable manner.
To bridge these gaps, HouseTS is constructed as a ZIP-code-level panel dataset covering over 6,000 ZIP codes within 30 major U.S. metropolitan areas from March 2012 to December 2023. Its core innovation is the temporal alignment of diverse monthly and annual signals under a unified schema. The dataset integrates: 1) Housing market indicators from Zillow (e.g., Zillow Home Value Index as the primary target) and Redfin (e.g., median sale price, inventory, days on market), 2) Dynamic Points of Interest (POI) data extracted from historical OpenStreetMap, providing monthly counts of amenities like restaurants and schools, 3) Annual socioeconomic variables from the American Community Survey (ACS), carefully integrated to prevent temporal leakage, and 4) Time-stamped annual aerial imagery from the National Agriculture Imagery Program (NAIP) for each ZIP code from 2012 to 2022.
Building upon this dataset, the paper establishes standardized long-horizon forecasting tasks for both univariate (predicting only the home value index) and multivariate (predicting multiple time series) settings. A extensive benchmark evaluation is conducted, comparing 16 model families across four categories: statistical models (AR, ARDL), classical machine learning (Random Forest, XGBoost), deep neural networks (including LSTMs, Transformers like PatchTST, and spatiotemporal GNNs), and pretrained time-series foundation models (Chronos, TimesFM) in both zero-shot and fine-tuned modes. A key finding is that while advanced models show promise, robust preprocessing (like log transformation) is crucial for neural network stability, and well-regularized linear baselines remain highly competitive across different forecast horizons, challenging the assumption that complexity always leads to superior performance in this domain.
Beyond forecasting accuracy, the paper contributes novel resources for interpretability. It provides image-derived textual change annotations generated through a scalable vision-language pipeline. This process uses a Vision-Language Model (VLM) to describe changes between annual aerial images, an LLM-as-a-judge step to refine and filter descriptions, and finally human verification. These natural language descriptions of neighborhood evolution (e.g., “new construction of residential buildings between 2017 and 2019”) offer a human-readable, time-aware context that can help explain housing market dynamics.
In conclusion, HouseTS provides a much-needed, large-scale, and aligned data infrastructure for housing market analysis. It not only facilitates rigorous benchmarking of a wide array of forecasting models but also enables new research directions in multimodal spatiotemporal learning and interpretable analysis. The dataset, benchmark code, and interpretability resources are publicly released to foster further research in both the housing domain and the broader spatiotemporal machine learning community.
Comments & Academic Discussion
Loading comments...
Leave a Comment