MORPH: PDE Foundation Models with Arbitrary Data Modality
We introduce MORPH, a modality-agnostic, autoregressive foundation model for partial differential equations (PDEs). MORPH is built on a convolutional vision transformer backbone that seamlessly handles heterogeneous spatiotemporal datasets of varying data modality (1D–3D) at different resolutions, and multiple fields with mixed scalar and vector components. The architecture combines (i) component-wise convolution, which jointly processes scalar and vector channels to capture local interactions, (ii) inter-field cross-attention, which models and selectively propagates information between different physical fields, (iii) axial attentions, which factorize full spatiotemporal self-attention along individual spatial and temporal axes to reduce computational burden while retaining expressivity. We pretrain multiple model variants on a diverse collection of heterogeneous PDE datasets and evaluate transfer to a range of downstream prediction tasks. Using both full-model fine-tuning and parameter-efficient low-rank adapters, MORPH outperforms models trained from scratch. Across extensive evaluations, MORPH matches or surpasses strong baselines and recent state-of-the-art models. Collectively, these capabilities present a flexible and powerful backbone for learning from the heterogeneous and multimodal nature of scientific observations, charting a path toward scalable and data-efficient scientific machine learning. The source code, datasets, and models are publicly available at https://github.com/lanl/MORPH.
💡 Research Summary
MORPH (Modality‑agnostic PDE Foundation Model) is introduced as a universal, autoregressive foundation model for partial differential equations (PDEs) that can ingest heterogeneous spatiotemporal data ranging from 1‑D to 3‑D, at arbitrary resolutions, and with mixed scalar‑vector field components. The architecture builds on a convolutional vision‑transformer backbone and integrates three complementary mechanisms: (i) component‑wise convolutions that jointly process scalar and vector channels to capture local interactions; (ii) inter‑field multi‑head cross‑attention that fuses multiple physical fields into a single representation while selectively propagating the most relevant inter‑field information; and (iii) 4‑D axial attention that factorizes full spatiotemporal self‑attention along the time, depth, height, and width axes, reducing the computational cost from O((T·D·H·W)²) to O(T² + D² + H² + W²).
To handle the diversity of existing PDE benchmarks, the authors propose a Unified Physics Tensor Format (UPTF‑7) with shape (B, T, F, C, D, H, W). This format maps any dataset—whether a 1‑D diffusion trajectory, a 2‑D incompressible flow field, or a 3‑D magnetohydrodynamic simulation—into a common 7‑dimensional tensor without costly padding, enabling on‑the‑fly conversion during data loading.
Four model sizes are released: MORPH‑TI (7 M parameters), MORPH‑S (30 M), MORPH‑M (126 M), and MORPH‑L (480 M). All models are pretrained on six large, heterogeneous PDE datasets covering diffusion‑reaction, incompressible and compressible Navier–Stokes, turbulent flows, magnetohydrodynamics, and self‑gravitating fluids. Pretraining uses an autoregressive next‑step prediction loss augmented with physics‑aware regularization.
Transferability is evaluated in two ways. First, a zero‑shot experiment pretrains MORPH on the 2‑D incompressible Navier–Stokes dataset and directly tests it on six unseen targets spanning 1‑D, 2‑D, and 3‑D modalities, including turbulent and multiphysics cases. Performance is quantified by the Gap‑Closure Ratio (GCR), which measures the fraction of the gap between a naïve random‑baseline and a fully trained‑from‑scratch model that the pretrained model closes without any fine‑tuning. Positive GCR values are observed across all targets, with the strongest transfer to near‑domain tasks (e.g., 1‑D and 2‑D compressible flows) and meaningful transfer to more distant domains such as 3‑D MHD and self‑gravitating flows.
Second, downstream fine‑tuning is performed on seven additional heterogeneous datasets. Two adaptation strategies are compared: (a) full‑model fine‑tuning, and (b) parameter‑efficient low‑rank adapters (LoRA). LoRA enables the largest MORPH‑L model to achieve nearly the same accuracy as full fine‑tuning while updating only a tiny fraction of the parameters, reducing training cost by an order of magnitude.
Across all benchmarks, MORPH consistently outperforms strong baselines, including DeepONet, Fourier Neural Operators, and prior vision‑transformer‑based PDE models. The axial‑attention design allows processing of up to 4,096 patches (≈ 32 k‑dimensional token sequences) without exceeding GPU memory, and the component‑wise convolution adds locality bias that improves sample efficiency.
The paper’s contributions are threefold: (1) a truly modality‑agnostic architecture that eliminates the need for task‑specific data reshaping; (2) a scalable spatiotemporal attention scheme that makes high‑resolution 3‑D PDE learning tractable; and (3) demonstration that a single pretrained model can serve as a universal PDE surrogate, with effective zero‑shot and low‑rank‑adapter transfer across disparate physics and data modalities.
Limitations are acknowledged: the autoregressive objective may accumulate error over long horizons, and explicit physics constraints (e.g., conservation laws) are not directly encoded, which could affect long‑term physical fidelity. Future work may integrate diffusion‑based generative modeling, physics‑informed regularizers, or multi‑scale attention mechanisms to address these issues.
Overall, MORPH represents a significant step toward universal, data‑efficient scientific machine learning, offering a flexible backbone for learning from the heterogeneous and multimodal nature of real‑world scientific observations.
Comments & Academic Discussion
Loading comments...
Leave a Comment