StatePlane: A Cognitive State Plane for Long-Horizon AI Systems Under Bounded Context
Large language models (LLMs) and small language models (SLMs) operate under strict context window and key-value (KV) cache constraints, fundamentally limiting their ability to reason coherently over long interaction horizons. Existing approaches – extended context windows, retrieval-augmented generation, summarization, or static documentation – treat memory as static storage and fail to preserve decision-relevant state under long-running, multi-session tasks. We introduce StatePlane, a model-agnostic cognitive state plane that governs the formation, evolution, retrieval, and decay of episodic, semantic, and procedural state for AI systems operating under bounded context. Grounded in cognitive psychology and systems design, StatePlane formalizes episodic segmentation, selective encoding via information-theoretic constraints, goal-conditioned retrieval with intent routing, reconstructive state synthesis, and adaptive forgetting. We present a formal state model, KV-aware algorithms, security and governance mechanisms including write-path anti-poisoning, enterprise integration pathways, and an evaluation framework with six domain-specific benchmarks. StatePlane demonstrates that long-horizon intelligence can be achieved without expanding context windows or retraining models.
💡 Research Summary
StatePlane tackles the fundamental limitation of current large and small language models: a bounded context window and KV‑cache that truncate or compress earlier interaction history, causing loss of commitments, contradictions, and degraded long‑horizon reasoning. The authors argue that the problem is not one of storage capacity but of state continuity. Inspired by cognitive psychology’s division of memory into episodic, semantic, and procedural subsystems, they propose an external, model‑agnostic “cognitive state plane” that externalizes these three state types from the model itself.
The architecture consists of five core components. An Event Boundary Detector monitors the interaction history and triggers a new episode when the KL‑divergence between the predictive distributions before and after a token exceeds a threshold θ, mirroring human surprise‑driven segmentation. The Episodic Encoder compresses each detected event into a structured tuple (goal, action, outcome, rationale, timestamp, salience). Salience is computed from utility, surprise, and novelty functions, and the system optimizes an information‑bottleneck objective min I(S;H) − β·I(S;Y) to retain only the most future‑relevant information under a hard token budget.
State is stored in a typed State Store that maintains episodic, semantic, and procedural objects together with provenance, confidence, and scope metadata. Hierarchical namespaces (tenant/org, user, case/project, session) enable fine‑grained access control and the right‑to‑be‑forgotten. A Goal‑Conditioned Retriever first classifies the user’s intent, routes the query to the appropriate store, and then applies goal‑conditioned ranking plus policy filtering (π) to produce a minimal evidence set Rₜ. The State Reconstructor then assembles a bounded context Cₜ (|Cₜ| ≤ Lₘₐₓ) from Rₜ and the current user input, ensuring that only evidence—not instructions—is fed to the language model. This “evidence‑oriented” reconstruction protects against prompt‑injection attacks.
For forgetting, each episode’s strength decays exponentially unless reinforced by reuse, with a reinforcement bonus α per retrieval. Episodes whose strength falls below ε are pruned, providing regularization and reducing interference. The authors also formalize write‑gate logic, provenance requirements, and optional human‑in‑the‑loop approvals to prevent state poisoning.
Security and governance are addressed across confidentiality, integrity, availability, and isolation. Write‑path protections include salience thresholds, typed schemas that reject arbitrary code, and automatic PII redaction. Read‑path protections enforce that retrieved state cannot override system policy. Comprehensive audit logging, per‑tenant encryption, and role‑based access control meet enterprise compliance needs.
Evaluation comprises six domain‑specific benchmarks: Long‑Horizon Policy (LH‑PCT), Exception Ledger (ELR), Rationale Stability (RCDS), Tool‑Heavy Case (TH‑CBC), Privacy Enforcement (PRSE), and Memory Poisoning (MP‑RI). Baselines (stateless LLM, sliding window, summarize‑and‑append, RAG top‑k, hybrid) are compared under equal token budgets and retrieval limits across horizons up to eight times the context size. Metrics include Commitment Compliance Rate, Contradiction Rate, Exception Handling Accuracy, Provenance Completeness, Policy Violation Rate, Sensitive Recall Leakage, Tokens per Correct Decision, and latency curves. StatePlane consistently outperforms baselines, achieving 30‑45 % higher compliance and 2‑3× lower contradiction rates while staying within the same token budget.
Deployment is realized via a two‑call contract (PrepareContext, CommitOutcome) that keeps the language model stateless. Two reference implementations are provided: an on‑premises stack (NGINX, FastAPI, PostgreSQL, Redis, MinIO) and an Azure cloud‑native stack (Container Apps, Azure Database for PostgreSQL, Azure Cache for Redis, Key Vault, Entra ID). This design aligns with MLOps best practices, allowing post‑deployment behavioral evolution without retraining immutable model artifacts.
Limitations include reliance on accurate salience estimation, latency introduced by write‑gate checks, the need for cooperation between the agent and StatePlane, and benchmark dependence on domain definitions. Ethical considerations are addressed through transparent audit trails, user‑scoped deletion rights, and default PII minimization, satisfying GDPR‑type regulations.
In conclusion, StatePlane demonstrates that long‑horizon intelligence can be achieved under strict context constraints by externalizing cognitive state rather than expanding the context window. By governing episodic, semantic, and procedural state through selective encoding, goal‑conditioned retrieval, bounded reconstruction, and robust security, the system delivers consistent reasoning, policy compliance, and enterprise‑grade governance while remaining model‑agnostic. This work offers a practical roadmap for building scalable, secure, and evolvable AI agents that operate over unbounded interaction horizons.
Comments & Academic Discussion
Loading comments...
Leave a Comment