FedLoDrop: Federated LoRA with Dropout for Generalized LLM Fine-tuning
Fine-tuning (FT) large language models (LLMs) is crucial for adapting general-purpose models to specific tasks, enhancing accuracy and relevance with minimal resources. To further enhance generalization ability while reducing training costs, this paper proposes Federated LoRA with Dropout (FedLoDrop), a new framework that applies dropout to the rows and columns of the trainable matrix in Federated LoRA. A generalization error bound and convergence analysis under sparsity regularization are obtained, which elucidate the fundamental trade-off between underfitting and overfitting. The error bound reveals that a higher dropout rate increases model sparsity, thereby lowering the upper bound of pointwise hypothesis stability (PHS). While this reduces the gap between empirical and generalization errors, it also incurs a higher empirical error, which, together with the gap, determines the overall generalization error. On the other hand, though dropout reduces communication costs, deploying FedLoDrop at the network edge still faces challenges due to limited network resources. To address this issue, an optimization problem is formulated to minimize the upper bound of the generalization error, by jointly optimizing the dropout rate and resource allocation subject to the latency and per-device energy consumption constraints. To solve this problem, a branch-and-bound (B&B)-based method is proposed to obtain its globally optimal solution. Moreover, to reduce the high computational complexity of the B&B-based method, a penalized successive convex approximation (P-SCA)-based algorithm is proposed to efficiently obtain its high-quality suboptimal solution. Finally, numerical results demonstrate the effectiveness of the proposed approach in mitigating overfitting and improving the generalization capability.
💡 Research Summary
**
The paper introduces FedLoDrop, a novel framework that integrates Low‑Rank Adaptation (LoRA) with dropout in a federated learning (FL) setting to improve the generalization of fine‑tuned large language models (LLMs) while reducing communication overhead. Traditional LoRA fine‑tunes only the low‑rank matrices A and B added to a frozen pre‑trained model, dramatically cutting the number of trainable parameters. However, when LoRA is deployed across many edge devices with heterogeneous data, over‑fitting can still occur despite the reduced parameter count.
FedLoDrop tackles this by applying dropout not to the activations of the network but directly to the rows and columns of the trainable LoRA matrices on each client. For client k in round t, a dropout rate γ_{k,t}∈
Comments & Academic Discussion
Loading comments...
Leave a Comment