FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment

FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Parameter-efficient fine-tuning techniques such as low-rank adaptation (LoRA) enable large language models (LLMs) to adapt to downstream tasks efficiently. Federated learning (FL) further facilitates this process by enabling collaborative fine-tuning across distributed clients without sharing private data. However, the use of two separate low-rank matrices in LoRA for federated fine-tuning introduces two types of challenges. First, aggregation error can arise from separately aggregating the two low-rank matrices. Second, even if the server aggregates the product of two low-rank matrices, it needs to decompose the aggregated matrix back into low-rank matrices. Since the decomposition is not unique, it can lead to decomposition drift. To tackle the aforementioned challenges, we propose federated low-rank Gram-matrix aggregation (FLoRG), a federated fine-tuning framework which employs a single low-rank matrix for fine-tuning and aggregates its Gram matrix (i.e., the matrix of inner products of its column vectors). FLoRG can eliminate the aggregation error and reduce the communication overhead. It also minimizes the decomposition drift by introducing a Procrustes alignment approach which aligns the decomposed matrix between consecutive fine-tuning rounds for consistent updates. We theoretically analyze the convergence of FLoRG and prove that adopting the Procrustes alignment results in a tighter convergence bound. Experimental results across multiple LLM fine-tuning benchmarks demonstrate that FLoRG outperforms five state-of-the-art baseline schemes by providing higher downstream task accuracy and can reduce the communication overhead by up to 2041$\times$.


💡 Research Summary

The paper “FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment” introduces a novel framework designed to address critical challenges in federated fine-tuning of Large Language Models (LLMs) using parameter-efficient methods like Low-Rank Adaptation (LoRA).

Core Problem: Integrating LoRA with Federated Learning (FL) is promising for privacy-preserving, collaborative adaptation of LLMs. However, standard approaches face two major issues: 1) Aggregation Error: When the central server separately averages the two low-rank matrices (B and A) from clients, the product of the averaged matrices is not equal to the average of the local products (B_n * A_n). This mathematical mismatch introduces a systematic bias that accumulates over training rounds, degrading performance. 2) Decomposition Drift: An alternative is to aggregate the product matrices (B_n*A_n) directly and then decompose the result back into low-rank factors. However, matrix decomposition (e.g., via SVD) is not unique, especially for rank-deficient matrices or those with repeated eigenvalues. Different valid decompositions lead to different parameter directions in subsequent rounds, causing “decomposition drift” that destabilizes the training trajectory.

Proposed Solution - FLoRG: The authors propose FLoRG, a framework built on three key innovations to solve these problems.

  1. Single Low-Rank Matrix & Gram Matrix Aggregation: FLoRG re-parameterizes the fine-tuning parameters using a single low-rank matrix A. Clients perform local updates on A and send it to the server. Instead of aggregating A directly in a problematic way, the server computes each client’s Gram matrix Q_n = A_n^T A_n and averages these Q_n matrices. Since averaging Gram matrices is a linear operation, it yields the exact average of the local updates, completely eliminating the aggregation error. This also reduces uplink communication overhead as clients send only one matrix instead of two.
  2. Shared Semi-Orthogonal Basis (L, R): To make the single-matrix design compatible with weight matrices W of arbitrary dimensions (d_out x d_in), FLoRG uses fixed, shared matrices L (d_out x k) and R (k x d_in), where k = min(d_in, d_out). The actual model update is applied as ΔW = L Q R = L (A^T A) R. L and R are semi-orthogonal and remain frozen, providing a consistent subspace for the low-rank operations.
  3. Procrustes Alignment to Minimize Decomposition Drift: After aggregating Q, the server decomposes it (e.g., via eigendecomposition) to obtain a candidate matrix Â. To mitigate the drift caused by non-unique decomposition, FLoRG applies a Procrustes alignment. It finds an orthogonal matrix S that minimizes the Frobenius norm ||S - A_t||, where A_t is the parameter from the previous round. This optimally rotates/reflects  to align it as closely as possible with the previous subspace before using it as A_{t+1} for the next round. This stabilizes the update direction across rounds while preserving the information in the aggregated Gram matrix.

Theoretical and Empirical Validation: The authors provide a theoretical convergence analysis for FLoRG under non-convex loss settings, proving that the Procrustes alignment leads to a tighter convergence bound by controlling the decomposition drift. Extensive experiments are conducted on GLUE benchmarks (MRPC, QQP, MNLI, QNLI, WNLI, RTE). FLoRG is compared against five state-of-the-art federated fine-tuning baselines: FedIT, FeDeRA, FFA-LoRA, FedSA-LoRA, and FedEx-LoRA. The results demonstrate that FLoRG consistently achieves higher downstream task accuracy across different datasets and settings. Furthermore, it drastically reduces communication overhead—by up to 2041 times compared to some baseline schemes—due to its single-matrix transmission and Gram-matrix aggregation strategy.

In conclusion, FLoRG successfully tackles the fundamental limitations of federated LoRA by eliminating aggregation error and minimizing decomposition drift. It offers a more accurate, stable, and communication-efficient framework for fine-tuning LLMs in distributed, privacy-sensitive environments.


Comments & Academic Discussion

Loading comments...

Leave a Comment