LongCat-Flash-Thinking-2601 Technical Report

LongCat-Flash-Thinking-2601 Technical Report
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability. LongCat-Flash-Thinking-2601 achieves state-of-the-art performance among open-source models on a wide range of agentic benchmarks, including agentic search, agentic tool use, and tool-integrated reasoning. Beyond benchmark performance, the model demonstrates strong generalization to complex tool interactions and robust behavior under noisy real-world environments. Its advanced capability stems from a unified training framework that combines domain-parallel expert training with subsequent fusion, together with an end-to-end co-design of data construction, environments, algorithms, and infrastructure spanning from pre-training to post-training. In particular, the model’s strong generalization capability in complex tool-use are driven by our in-depth exploration of environment scaling and principled task construction. To optimize long-tailed, skewed generation and multi-turn agentic interactions, and to enable stable training across over 10,000 environments spanning more than 20 domains, we systematically extend our asynchronous reinforcement learning framework, DORA, for stable and efficient large-scale multi-environment training. Furthermore, recognizing that real-world tasks are inherently noisy, we conduct a systematic analysis and decomposition of real-world noise patterns, and design targeted training procedures to explicitly incorporate such imperfections into the training process, resulting in improved robustness for real-world applications. To further enhance performance on complex reasoning tasks, we introduce a Heavy Thinking mode that enables effective test-time scaling by jointly expanding reasoning depth and width through intensive parallel thinking.


💡 Research Summary

LongCat‑Flash‑Thinking‑2601 is a 560‑billion‑parameter open‑source Mixture‑of‑Experts (MoE) model designed specifically for agentic reasoning tasks such as autonomous search, tool use, and tool‑integrated reasoning. Building on the pre‑training recipe of LongCat‑Flash‑Chat, the authors preserve strong general‑purpose language and reasoning abilities while adding two novel stages: a mid‑training phase that injects synthetic agentic trajectories and a post‑training reinforcement‑learning (RL) phase that scales across thousands of heterogeneous environments.

In the mid‑training stage, the authors address the scarcity of real‑world long‑horizon agentic data by combining text‑driven synthesis and environment‑grounded synthesis. Text‑driven synthesis mines procedural knowledge from large text corpora, extracts implicit tool calls, and restructures them into multi‑turn user‑assistant interactions. Two augmentation techniques—tool decomposition (hiding parameters in the environment) and reasoning decomposition (generating alternative action candidates)—increase trajectory complexity and encourage decision‑making. Environment‑grounded synthesis constructs lightweight, verifiable Python environments for each tool set, models dependencies as a directed graph, samples valid tool chains, and validates every generated trajectory by executing the code. A planning‑oriented augmentation further transforms trajectories into decision‑making processes, providing supervision for coarse‑to‑fine planning.

To enable large‑scale agentic RL, the authors develop an automated environment scaling pipeline that produces over 10,000 executable environments spanning more than 20 domains. They extend their asynchronous rollout system, Dynamic ORchestration for Asynchronous Rollout (DORA), to support up to 32,000 concurrent environments, ensuring high‑throughput, long‑horizon rollouts. Recognizing that real‑world deployments are noisy, the paper presents a systematic analysis of environmental noise (parameter drift, API latency, error responses, etc.) and injects multi‑type, multi‑level noise into training environments via a curriculum‑based RL schedule. This improves robustness when the model encounters imperfect feedback.

A distinctive “Heavy Thinking” test‑time mode is introduced to scale reasoning depth and width simultaneously. The model generates multiple parallel reasoning paths, aggregates intermediate results, and iteratively refines its answer, effectively expanding both the breadth of exploration and the depth of deliberation. An additional RL stage reinforces the ability to combine and refine intermediate outcomes.

Empirical results show state‑of‑the‑art performance among open‑source models on several agentic benchmarks: 73.1 % on BrowseComp, 77.7 % on RWSearch, 88.2 % on τ²‑Bench, and 29.3 % on VITA‑Bench (a noisy real‑world benchmark). These numbers surpass prior open‑source baselines and approach proprietary systems. All model checkpoints, code, and environment specifications are released publicly, facilitating further research and real‑world applications.

In summary, LongCat‑Flash‑Thinking‑2601 advances the field by (1) integrating domain‑parallel expert training with a fusion stage to keep MoE efficiency, (2) constructing a massive, diverse set of executable environments and scaling asynchronous RL infrastructure, (3) explicitly modeling and training on realistic environmental noise, and (4) providing a test‑time Heavy Thinking mode for scalable reasoning. The combination of data synthesis, environment engineering, robust RL, and test‑time scaling yields a model that not only excels on benchmark tasks but also demonstrates strong generalization to complex, noisy, real‑world agentic scenarios.


Comments & Academic Discussion

Loading comments...

Leave a Comment