METANOIA: A Lifelong Intrusion Detection and Investigation System for Mitigating Concept Drift
As Advanced Persistent Threat (APT) complexity increases, provenance data is increasingly used for detection. Anomaly-based systems are gaining attention due to their attack-knowledge-agnostic nature and ability to counter zero-day vulnerabilities. However, traditional detection paradigms, which train on offline, limited-size data, often overlook concept drift - unpredictable changes in streaming data distribution over time. This leads to high false positive rates. We propose incremental learning as a new paradigm to mitigate this issue. However, we identify FOUR CHALLENGES while integrating incremental learning as a new paradigm. First, the long-running incremental system must combat catastrophic forgetting (C1) and avoid learning malicious behaviors (C2). Then, the system needs to achieve precise alerts (C3) and reconstruct attack scenarios (C4). We present METANOIA, the first lifelong detection system that mitigates the high false positives due to concept drift. It connects pseudo edges to combat catastrophic forgetting, transfers suspicious states to avoid learning malicious behaviors, filters nodes at the path-level to achieve precise alerts, and constructs mini-graphs to reconstruct attack scenarios. Using state-of-the-art benchmarks, we demonstrate that METANOIA improves precision performance at the window-level, graph-level, and node-level by 30%, 54%, and 29%, respectively, compared to previous approaches.
💡 Research Summary
The paper addresses a critical shortcoming of current provenance‑based intrusion detection systems (PIDS): their reliance on offline, static training data makes them vulnerable to concept drift, i.e., the gradual shift in the distribution of benign host behavior over time. Concept drift arises naturally as users change work environments, install new software, or modify system configurations, leading to a surge in false positives for traditional anomaly‑based detectors that keep a fixed decision boundary.
To mitigate this, the authors propose a new detection paradigm based on incremental learning, which continuously adapts the model to streaming provenance graphs while preserving knowledge of past normal behavior. They identify four intertwined challenges that any lifelong PIDS must solve: (C1) catastrophic forgetting, (C2) learning malicious patterns (the “discrimination paradox”), (C3) delivering precise alerts rather than noisy anomalies, and (C4) reconstructing attack scenarios for forensic analysis.
METANOIA, the system introduced in the paper, tackles these challenges with four novel mechanisms:
- Pseudo‑edge connections – virtual edges are added between historic nodes to keep their relational memory alive during incremental updates, directly combating catastrophic forgetting (C1).
- Suspicious‑state transfer – only nodes flagged as suspicious are replayed during rehearsal, preventing malicious samples from contaminating the model and thus avoiding the discrimination paradox (C2).
- Path‑level filtering – instead of flagging isolated anomalous nodes, METANOIA evaluates the entire execution path and filters out benign anomalies, dramatically improving alert precision (C3).
- Mini‑graph construction – for each alert, a compact sub‑graph that captures the essential dependency chain is automatically generated, enabling rapid, automated attack‑scenario reconstruction (C4).
The architecture consists of four critical node types (Anomalous Nodes, Suspicious Nodes, Rehearsal Nodes, and Normal Nodes) and a pipeline that ingests streaming audit logs, updates graph neural‑network embeddings incrementally, applies the above mechanisms, and outputs both alerts and investigative mini‑graphs.
Evaluation is performed on the DARPA APT dataset and several public benchmarks (Unicorn, KAIR‑OS, ProGrapher, ThreatTrace). Compared with these state‑of‑the‑art systems, METANOIA achieves a 30 % increase in window‑level precision, a 54 % increase at the graph‑level, and a 29 % boost at the node‑level. Notably, in scenarios with pronounced concept drift, false‑positive rates drop sharply, and the time required for manual forensic investigation is reduced from hours to minutes thanks to the automatically generated mini‑graphs.
The authors acknowledge limitations: pseudo‑edge generation currently relies on heuristic rules, and the system’s memory and computational overhead have not yet been fully optimized for large‑scale production deployments. Future work will explore dynamic edge‑weight learning, lightweight graph compression, and multi‑tenant scalability.
In summary, METANOIA demonstrates that incremental learning, when carefully engineered to address catastrophic forgetting, malicious‑sample contamination, alert precision, and forensic reconstruction, can produce a lifelong intrusion detection system that remains robust under concept drift, markedly reduces false alarms, and provides actionable attack narratives for security operators.
Comments & Academic Discussion
Loading comments...
Leave a Comment