Multi-Agent Inverted Transformer for Flight Trajectory Prediction

Multi-Agent Inverted Transformer for Flight Trajectory Prediction
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Flight trajectory prediction for multiple aircraft is essential and provides critical insights into how aircraft navigate within current air traffic flows. However, predicting multi-agent flight trajectories is inherently challenging. One of the major difficulties is modeling both the individual aircraft behaviors over time and the complex interactions between flights. Generating explainable prediction outcomes is also a challenge. Therefore, we propose a Multi-Agent Inverted Transformer, MAIFormer, as a novel neural architecture that predicts multi-agent flight trajectories. The proposed framework features two key attention modules: (i) masked multivariate attention, which captures spatio-temporal patterns of individual aircraft, and (ii) agent attention, which models the social patterns among multiple agents in complex air traffic scenes. We evaluated MAIFormer using a real-world automatic dependent surveillance-broadcast flight trajectory dataset from the terminal airspace of Incheon International Airport in South Korea. The experimental results show that MAIFormer achieves the best performance across multiple metrics and outperforms other methods. In addition, MAIFormer produces prediction outcomes that are interpretable from a human perspective, which improves both the transparency of the model and its practical utility in air traffic control.


💡 Research Summary

The paper introduces MAIFormer, a novel neural architecture designed for multi‑agent flight trajectory prediction in dense terminal airspace. Traditional physics‑based models struggle when aircraft deviate from standard procedures due to air‑traffic‑controller (ATC) interventions, while recent data‑driven approaches either ignore inter‑aircraft interactions or model them with a single, monolithic attention layer that mixes temporal and social dynamics, leading to high complexity and poor interpretability. MAIFormer addresses these issues by explicitly separating intra‑aircraft (spatio‑temporal) and inter‑aircraft (social) modeling into two dedicated attention stages, reflecting the hierarchical reasoning process used by human ATCs.

Input tokenization and inverted embedding
Given N aircraft, each with a past trajectory of length T and F variates (latitude, longitude, altitude, etc.), the raw tensor X∈ℝ^{N×T×F} is first reshaped to (N·F)×T, treating each variate of each aircraft as an independent time series. An inverted linear embedding projects each series into a D‑dimensional token without adding positional encodings, because there is no intrinsic ordering among agents. The result is a set of (N·F) tokens, each representing a specific variate of a specific aircraft over the whole past horizon.

Masked Multivariate Attention (MMA)
The first transformer layer applies a mask matrix M that blocks attention between tokens belonging to different aircraft. Consequently, attention scores are computed only among the F tokens of the same aircraft, allowing the model to capture fine‑grained intra‑aircraft dynamics (e.g., coordinated changes in latitude, longitude, and altitude) while preventing leakage of information across agents. This stage yields a transformed token set C^{ST}∈ℝ^{(N·F)×D}.

Agent Attention (AA)
C^{ST} is reshaped by concatenating the F tokens of each aircraft, forming N agent‑level tokens of size (F·D). A standard self‑attention mechanism is then applied across these N tokens, enabling each aircraft to attend directly to every other aircraft’s aggregated state. This high‑level attention captures social patterns such as conflict avoidance, sequencing for runway usage, and coordinated speed adjustments. The output C^{SC} is reshaped back to the original token layout for further processing.

Layer composition and decoder
Each MAIFormer layer consists of (i) MMA, (ii) AA, and (iii) a feed‑forward network (FFN) with GELU activation and residual connections, all preceded by layer normalization. Stacking L identical layers yields deep hierarchical representations. The final token matrix C^{L} is fed into a non‑autoregressive MLP decoder that projects the D‑dimensional latent vectors to the prediction horizon S, producing the future trajectory tensor Ŷ∈ℝ^{N×S×F} in a single forward pass. This design eliminates error accumulation typical of autoregressive decoders.

Experimental evaluation
The authors evaluate MAIFormer on a real‑world ADS‑B dataset collected from FlightRadar24 covering arrivals within a 70‑nautical‑mile radius of Incheon International Airport (ICN) over five months (Jan–May 2023). The dataset includes timestamps, aircraft type, and 3‑D positions. Only arrival flights are used because they exhibit richer inter‑aircraft interactions. Baselines include LSTM, CNN‑LSTM, vanilla Transformer, Temporal Fusion Transformer, and AgentFormer. Metrics such as MAE, RMSE, ADE, and FDE show that MAIFormer consistently outperforms all baselines, achieving up to a 12 % reduction in final‑point error.

Interpretability analysis
To assess transparency, the authors compute attention entropy across agents. MAIFormer yields significantly lower entropy than AgentFormer, indicating that attention is more concentrated on a few relevant aircraft rather than being diffusely spread. Visualizations of the AA matrix align with human ATC intuition—for example, an aircraft on final approach strongly attends to a nearby aircraft climbing to a holding altitude. This property is crucial for safety‑critical domains where operators must understand model decisions.

Contributions and limitations
The paper’s contributions are: (1) a hierarchical transformer that decouples intra‑ and inter‑agent modeling, (2) a masked multivariate attention mechanism that preserves fine‑grained aircraft dynamics, (3) an agent‑level attention that provides interpretable social interaction modeling, and (4) empirical validation on a large, realistic terminal‑airspace dataset showing state‑of‑the‑art performance. Limitations include the focus on arrival traffic only, omission of weather or runway‑state variables, and the need for further work on online inference and model compression for real‑time ATC deployment.

In summary, MAIFormer demonstrates that a carefully structured attention hierarchy—first “inside” each aircraft, then “between” aircraft—can deliver both higher predictive accuracy and human‑readable insights, making it a promising candidate for next‑generation trajectory‑based air traffic management systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment