In-Context Multi-Operator Learning with DeepOSets

In-Context Multi-Operator Learning with DeepOSets
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

An important application of neural networks to scientific computing has been the learning of non-linear operators. In this framework, a neural network is trained to fit a non-linear map between two infinite dimensional spaces, for example, the solution operator of ordinary and partial differential equations. Recently, inspired by the discovery of in-context learning for large language models, an even more ambitious paradigm has been explored, called multi-operator learning. In this approach, a neural network is trained to learn many different operators at the same time. In order to evaluate one of the learned operators, the network is passed example inputs and outputs to disambiguate the desired operator. In this work, we provide a precise mathematical formulation of the multi-operator learning problem. In addition, we modify a simple efficient architecture, called DeepOSets, for multi-operator learning and prove its universality for multi-operator learning. Finally, we provide a comprehensive set of experiments that demonstrate the ability of DeepOSets to learn multiple operators corresponding to different initial-value and boundary-value differential equations and use in-context examples to predict accurately the solutions corresponding to queries and differential equations not seen during training. The main advantage of DeepOSets is its architectural simplicity, which allows the derivation of theoretical guarantees and training times that are in the order of minutes, in contrast to similar transformer-based alternatives that are empirically justified and require hours of training.


💡 Research Summary

This paper addresses the emerging challenge of learning multiple solution operators for differential equations within a single neural network, a task often referred to as multi‑operator learning. While single‑operator learning (e.g., DeepONet, Fourier Neural Operator) has been well‑studied and enjoys universal approximation guarantees, extending these ideas to a setting where a model must infer which operator to apply from a set of candidates has remained largely informal. The authors first formalize the problem mathematically. They define a compact set of continuous operators (K) acting between two Banach spaces (X=C(K_1)) and (Y=C(K_2)). To disambiguate a particular operator (G\in K), the model receives an in‑context prompt consisting of a variable number (m) of input‑output pairs ((u_i,G(u_i))). The prompt points must form a ((\delta,C))-discretization of the input function space (V), guaranteeing a sufficiently uniform coverage. With this definition, universal approximation for in‑context multi‑operator learning means that for any (\epsilon>0) there exists a neural architecture that, given such a prompt, can predict (G(u)(y)) within (\epsilon) for any query function (u) and evaluation point (y).

To realize a practical architecture satisfying this definition, the authors propose DeepOSets, a hybrid of DeepSets (a permutation‑invariant set encoder) and DeepONet (a branch‑trunk operator learner). Each prompt pair is first embedded by a shared multilayer perceptron (\Phi); the resulting vectors are aggregated by a permutation‑invariant pooling operation (mean, max, or min) to produce a fixed‑size set representation (h). This representation is concatenated with the discretized query function and fed into the DeepONet branch, while the query coordinates are processed by the DeepONet trunk. Because the set encoder processes the prompt in linear time with respect to the number of examples, the overall computational complexity is (O(m)), a substantial improvement over transformer‑based approaches that require quadratic self‑attention.

The theoretical contribution is encapsulated in Theorem 5.1, which proves that DeepOSets is a universal approximator for the in‑context multi‑operator learning problem as defined. The proof leverages two known results: (i) DeepSets can approximate any continuous set‑to‑vector mapping arbitrarily well, and (ii) DeepONet can approximate any continuous operator on compact domains. By composing these approximations, the authors show that a single DeepOSets instance can simultaneously approximate every operator in the compact class (K) to any desired accuracy, provided the prompt satisfies the ((\delta,C))-discretization condition.

Empirically, the paper evaluates DeepOSets on two families of PDEs: (a) one‑dimensional Poisson equations with varying source terms and boundary conditions, and (b) two‑dimensional reaction‑diffusion equations in both forward and inverse configurations. For each family, a diverse set of operators (different coefficients, geometries, and boundary data) is generated, and the model is trained on a modest number of in‑context examples. Test scenarios include operators and parameter configurations never seen during training. Results show mean absolute errors below (10^{-3}) across all tasks, and the model remains robust when the number of in‑context examples at inference time exceeds the number used during training. Training on a consumer‑grade GPU completes in 5–8 minutes, whereas comparable transformer‑based baselines (ICON, PROSE) require several hours on similar hardware.

The authors discuss limitations: the need for a well‑distributed prompt (the ((\delta,C))-discretization assumption) may be hard to guarantee with real experimental data; the current implementation relies on fixed discretization grids for both input and output spaces, so extensions to unstructured meshes or higher‑dimensional domains are left for future work; and the simple pooling mechanism might be insufficient for highly nonlinear operators such as turbulent flow solvers. Nonetheless, DeepOSets demonstrates that a conceptually simple architecture can achieve both theoretical universality and practical efficiency for in‑context multi‑operator learning, offering a compelling alternative to heavyweight transformer models for scientific computing tasks.


Comments & Academic Discussion

Loading comments...

Leave a Comment