AsynDBT: Asynchronous Distributed Bilevel Tuning for efficient In-Context Learning with Large Language Models
With the rapid development of large language models (LLMs), an increasing number of applications leverage cloud-based LLM APIs to reduce usage costs. However, since cloud-based models’ parameters and gradients are agnostic, users have to manually or use heuristic algorithms to adjust prompts for intervening LLM outputs, which requiring costly optimization procedures. In-context learning (ICL) has recently emerged as a promising paradigm that enables LLMs to adapt to new tasks using examples provided within the input, eliminating the need for parameter updates. Nevertheless, the advancement of ICL is often hindered by the lack of high-quality data, which is often sensitive and different to share. Federated learning (FL) offers a potential solution by enabling collaborative training of distributed LLMs while preserving data privacy. Despite this issues, previous FL approaches that incorporate ICL have struggled with severe straggler problems and challenges associated with heterogeneous non-identically data. To address these problems, we propose an asynchronous distributed bilevel tuning (AsynDBT) algorithm that optimizes both in-context learning samples and prompt fragments based on the feedback from the LLM, thereby enhancing downstream task performance. Benefiting from its distributed architecture, AsynDBT provides privacy protection and adaptability to heterogeneous computing environments. Furthermore, we present a theoretical analysis establishing the convergence guarantees of the proposed algorithm. Extensive experiments conducted on multiple benchmark datasets demonstrate the effectiveness and efficiency of AsynDBT.
💡 Research Summary
The paper addresses a pressing challenge in the era of cloud‑based large language model (LLM) APIs: how to improve downstream task performance without access to model parameters or gradients. While in‑context learning (ICL) offers a parameter‑free way to adapt LLMs by providing demonstrations and prompts, its effectiveness hinges on the quality of those demonstrations, which are often private, costly to obtain, or heterogeneous across users. Existing attempts to combine ICL with federated learning (FL) suffer from severe straggler problems, device heterogeneity, and vulnerability to malicious participants who can inject low‑quality or adversarial examples.
To overcome these limitations, the authors propose AsynDBT (Asynchronous Distributed Bilevel Tuning), a novel framework that treats ICL as a bilevel optimization problem. The upper level optimizes the selection probabilities of in‑context demonstration samples (denoted by q), while the lower level optimizes the token‑wise categorical distribution of a prompt fragment (denoted by p). Both levels rely solely on black‑box feedback from the LLM (e.g., cross‑entropy loss on the model’s predictions), eliminating the need for gradient access.
The system is built on a classic FL architecture with a parameter server and R workers, of which Nₙ are benign and B are potentially malicious. Each worker holds private data and maintains its own local copies of p and q. Workers asynchronously push updates of their lower‑level variables p(v) to the server, which aggregates them into consensus variables z using a robust averaging scheme that includes L1 regularization and clipping to mitigate poisoned updates. The upper‑level variables q(v) are kept local because they are inherently data‑dependent and do not require global consensus.
A key technical contribution is the transformation of the bilevel problem into a single‑level one via polyhedral approximation of the lower‑level feasible set. The lower‑level solution is approximated by K‑step gradient descent, which is feasible even when the LLM is a black box. The authors provide a rigorous convergence analysis for the asynchronous updates, showing that under bounded staleness and reasonable step‑size choices, the algorithm converges to a stationary point. The convergence rate depends on the number of workers, communication delay, and the number of inner‑loop steps K.
Empirical evaluation spans five benchmark NLP tasks (sentiment analysis, news classification, question answering, etc.) and a real‑world 5G AIOps scenario. AsynDBT is compared against state‑of‑the‑art prompt‑tuning methods (OPR, APO, RLPrompt) and demonstration‑selection baselines (nearest‑neighbor, Cover‑LS, AutoCoT). Results show that AsynDBT consistently outperforms baselines by 6–10 % absolute accuracy and reduces training time by over 30 % in heterogeneous environments. Moreover, when up to 20 % of workers are malicious, performance degradation remains marginal, confirming the robustness of the proposed aggregation scheme.
In summary, the paper makes four major contributions: (1) formalizing ICL as a bilevel black‑box optimization problem that jointly tunes prompts and demonstrations; (2) designing an asynchronous distributed algorithm that alleviates stragglers and accommodates device heterogeneity; (3) introducing a regularization‑based robust aggregation to defend against data‑poisoning attacks; and (4) providing theoretical convergence guarantees together with extensive empirical validation. AsynDBT opens a practical pathway for cost‑effective, privacy‑preserving, and resilient in‑context learning in real‑world LLM‑as‑a‑service deployments, and its design principles are readily extensible to multimodal models and larger federated ecosystems.
Comments & Academic Discussion
Loading comments...
Leave a Comment