Polaris: A Gödel Agent Framework for Small Language Models through Experience-Abstracted Policy Repair
Gödel agent realize recursive self-improvement: an agent inspects its own policy and traces and then modifies that policy in a tested loop. We introduce Polaris, a Gödel agent for compact models that performs policy repair via experience abstraction, turning failures into policy updates through a structured cycle of analysis, strategy formation, abstraction, and minimal code pat ch repair with conservative checks. Unlike response level self correction or parameter tuning, Polaris makes policy level changes with small, auditable patches that persist in the policy and are reused on unseen instances within each benchmark. As part of the loop, the agent engages in meta reasoning: it explains its errors, proposes concrete revisions to its own policy, and then updates the policy. To enable cumulative policy refinement, we introduce experience abstraction, which distills failures into compact, reusable strategies that transfer to unseen instances. On MGSM, DROP, GPQA, and LitBench (covering arithmetic reasoning, compositional inference, graduate-level problem solving, and creative writing evaluation), a 7-billion-parameter model equipped with Polaris achieves consistent gains over the base policy and competitive baselines.
💡 Research Summary
Polaris introduces a practical recursive self‑improvement framework tailored for small language models (≈7 B parameters). Building on the Gödel Agent concept, which enables agents to inspect and mutate their own policy code, Polaris addresses the prohibitive memory and compute demands that arise when scaling this approach to large contexts. The key innovation is “experience abstraction”: after each evaluation round the agent collects a small set of failed validation examples (N = 3–5), runs a structured failure‑analysis prompt to produce a diagnosis‑revision‑prevention tuple for each case, and then abstracts these reflections into a compact set of reusable repair strategies (e.g., decomposition, normalization, control‑flow adjustments). Each strategy is instantiated as a minimal code patch that modifies only the necessary lines of the policy. A lightweight validator checks syntax and executability before the patch is merged; failed merges are retried up to three times and permanently recorded for future analysis.
Polaris keeps the policy as executable Python code, allowing runtime mutation without full retraining. By limiting context growth, discarding unnecessary tool‑call histories, and using small validation batches, the system runs on two V100 GPUs for a 10‑hour autonomous evolution window. Experiments on four diverse benchmarks—MGSM (math word problems), DROP (reading comprehension), GPQA (graduate‑level factual QA), and LitBench (creative writing)—show consistent improvements over the base Qwen‑2.5‑7B‑INSTRUCT model, typically 2–5 percentage points in accuracy or macro‑F1. Qualitative analysis of the generated patches demonstrates that the same abstracted strategies are reapplied across multiple tasks, preventing repeat failures and providing clear audit trails.
The paper’s contributions are threefold: (1) a concrete method for recursive policy repair that works with resource‑constrained models, (2) an empirical analysis of the bottlenecks in the original Gödel Agent and a solution that controls context size while preserving traceability, and (3) extensive evaluation across arithmetic, compositional inference, advanced factual reasoning, and open‑ended creative tasks, confirming that policy‑level changes are both effective and interpretable. Limitations include reliance on static validation sets (no open‑ended continual learning) and manually crafted prompts for strategy synthesis. Future directions suggested are building a growing library of abstracted strategies, extending the framework to multimodal tools, and exploring longer‑term self‑evolution dynamics.
Comments & Academic Discussion
Loading comments...
Leave a Comment