EffGen: Enabling Small Language Models as Capable Autonomous Agents

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Most existing language model agentic systems today are built and optimized for large language models (e.g., GPT, Claude, Gemini) via API calls. While powerful, this approach faces several limitations including high token costs and privacy concerns for sensitive applications. We introduce effGen, an open-source agentic framework optimized for small language models (SLMs) that enables effective, efficient, and secure local deployment (pip install effgen). effGen makes four major contributions: (1) Enhanced tool-calling with prompt optimization that compresses contexts by 70-80% while preserving task semantics, (2) Intelligent task decomposition that breaks complex queries into parallel or sequential subtasks based on dependencies, (3) Complexity-based routing using five factors to make smart pre-execution decisions, and (4) Unified memory system combining short-term, long-term, and vector-based storage. Additionally, effGen unifies multiple agent protocols (MCP, A2A, ACP) for cross-protocol communication. Results on 13 benchmarks show effGen outperforms LangChain, AutoGen, and Smolagents with higher success rates, faster execution, and lower memory. Our results reveal that prompt optimization and complexity routing have complementary scaling behavior: optimization benefits SLMs more (11.2% gain at 1.5B vs 2.4% at 32B), while routing benefits large models more (3.6% at 1.5B vs 7.9% at 32B), providing consistent gains across all scales when combined. effGen (https://effgen.org/) is released under the MIT License, ensuring broad accessibility for research and commercial use. Our framework code is publicly available at https://github.com/ctrl-gaurav/effGen.

💡 Research Summary

The paper introduces effGen, an open‑source agentic framework specifically engineered for small language models (SLMs) to address the high token costs, latency, and privacy concerns associated with large language model (LLM)‑centric systems such as GPT, Claude, or Gemini. EffGen can be installed locally via a single pip command and provides four core innovations. First, a prompt‑optimization pipeline compresses input prompts by 70‑80 % while preserving semantic content. The optimizer applies a series of rule‑based transformations—pattern replacement, sentence splitting, redundancy removal, bullet‑point formatting, and context truncation—adjusted dynamically according to the target model’s size category (Tiny, Small, Medium, Large). Second, a pre‑execution complexity analyzer C(q) computes a score from five weighted factors: task length, number of requirements, domain breadth, tool requirements, and reasoning depth. Scores range from 0 to 10; a threshold τ≈7 determines whether a query is handled by a single agent or routed to a multi‑agent workflow. Third, when routing to multiple agents, an intelligent task‑decomposition module builds a dependency graph and decides between parallel and sequential execution. Independent subtasks are dispatched in parallel, while dependent tasks are executed sequentially with context propagation, dramatically reducing overall latency. Fourth, a three‑tier memory architecture combines short‑term history, long‑term episodic storage, and vector‑based semantic retrieval, allowing SLMs with limited context windows to retrieve relevant information on demand. EffGen also unifies three major communication protocols—Model Context Protocol (MCP), Agent‑to‑Agent (A2A), and Agent Communication Protocol (ACP)—into a single interface, enabling seamless interoperability with heterogeneous agent ecosystems. Empirical evaluation on 13 benchmarks covering tool‑calling, reasoning, and memory‑intensive tasks was performed using Qwen2.5 models ranging from 1.5 B to 32 B parameters. Compared with LangChain, AutoGen, and Smolagents, effGen achieved a 13.1 % average improvement in success rate for 1.5 B models and a 6.0 % gain for 32 B models, up to 18× faster execution, and substantially lower memory consumption. Notably, prompt optimization contributed an 11.2 % boost for the smallest models, while complexity‑based routing offered a 7.9 % boost for the largest models, demonstrating complementary scaling behavior. The framework’s code and data are released under the MIT license, facilitating both academic research and commercial deployment. In sum, effGen demonstrates that with careful architectural design—prompt compression, pre‑execution routing, intelligent decomposition, and unified memory—small language models can match or exceed the performance of larger counterparts on agentic tasks while retaining their inherent efficiency and privacy advantages.

EffGen: Enabling Small Language Models as Capable Autonomous Agents

💡 Research Summary

Comments & Academic Discussion

Leave a Comment