RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness

RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Fine-tuning pre-trained models with custom data leads to numerous expert models on specific tasks. Merging models into one universal model to empower multi-task ability refraining from data leakage has gained popularity. With the expansion in data and model size, parameter-efficient tuning becomes the common practice for obtaining task-specific models efficiently. However, few methods are dedicated to efficient merging, and existing methods designed for full fine-tuning merging fail under efficient merging. To address the issue, we analyze from low-rank decomposition and reveal that direction robustness during merging is crucial for merging efficient modules. We furthermore uncover that compensating for the gap between stark singular values contributes to direction robustness. Therefore, we propose RobustMerge, a training-free parameter-efficient merging method with complementary parameter adaptation to maintain direction robustness. Specifically, we (1) prune parameters and scale coefficients from inter-parameter relation for singular values to maintain direction stability away from task interference, and (2) perform cross-task normalization to enhance unseen task generalization. We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certify the outstanding performance and generalizability of our method. Additional studies and extensive analyses further showcase the effectiveness. Code is available at https://github.com/AuroraZengfh/RobustMerge.


💡 Research Summary

RobustMerge addresses the emerging need to combine multiple task‑specific expert models, fine‑tuned via parameter‑efficient tuning (PEFT) such as LoRA, into a single universal multimodal large language model (MLLM) without accessing any training data or extra storage. Existing model‑merging techniques were designed for full‑parameter fine‑tuning (FFT) and assume that the merged parameters share a relatively narrow distribution, making them vulnerable to sign‑conflict when multiple tasks are combined. In contrast, PEFT modules are low‑rank adapters that remain frozen in the backbone and exhibit a much wider distribution of values; the dominant source of interference is not sign conflict but direction instability of singular vectors, especially those associated with small singular values.

The authors first decompose each LoRA adapter (B·A) using singular value decomposition (SVD) to expose singular values (σ) and their corresponding left/right singular vectors. They observe “stark” singular values within a single task: a few large σ dominate, while many tail σ are tiny. When merging adapters from different tasks, large‑σ directions tend to stay stable, but the directions of small‑σ components are easily perturbed, leading to performance degradation on both seen and unseen tasks. This phenomenon motivates the notion of direction robustness – preserving the orientation of each singular vector during merging.

RobustMerge introduces two complementary, training‑free operations:

  1. Pruning and Scaling – Identify parameters whose singular values are excessively large relative to the rest of the matrix. Those parameters are either pruned (set to zero) or down‑scaled, while the remaining tail singular values are up‑scaled by a factor that is inversely proportional to their magnitude (e.g., scaling factor ∝ (σ_max/σ_i)^α). This reduces the gap between head and tail singular values, thereby strengthening the contribution of small‑σ directions without harming the dominant knowledge encoded in large‑σ components.

  2. Cross‑Task Normalization – After pruning/scaling, each task’s adapter is normalized to a common mean and variance across all adapters. This affine normalization aligns the overall scale of different adapters, mitigating the risk that a new, unseen task will be dominated by a subset of adapters with disproportionate magnitude. Crucially, this step requires no validation data, test‑time adaptation, or external storage, distinguishing RobustMerge from methods such as LoraHub or AdaMerging.

The method is evaluated on a newly constructed benchmark comprising eight “seen” multimodal tasks (image captioning, video QA, math reasoning, etc.) and four “unseen” tasks from distinct domains. Compared with state‑of‑the‑art merging baselines—Task Arithmetic, Ties‑Merging, DARE, PCB‑Merging, and LoraHub—RobustMerge consistently improves performance: an average gain of 3.4 percentage points on seen tasks and 4.5 percentage points on unseen tasks. Detailed analysis shows that after scaling, the singular‑value spectrum becomes flatter, confirming that tail directions receive amplified influence. Distribution plots reveal that FFT adapters have a tight, low‑variance value range (making sign conflict prominent), whereas PEFT adapters spread over a wide range (making direction instability the primary bottleneck).

Ablation studies demonstrate that pruning alone or scaling alone yields modest improvements, but their combination delivers the largest boost, validating the authors’ hypothesis about the importance of mitigating singular‑value gaps. Sensitivity experiments on the scaling exponent α, pruning threshold, and normalization strength β indicate that RobustMerge is robust to hyper‑parameter choices. Additional experiments on pure vision tasks (VQA, image classification) and across model sizes (7 B to 13 B parameters) confirm the method’s generality.

In summary, RobustMerge provides a practical, data‑free solution for merging PEFT‑based expert models in large multimodal systems. By focusing on singular‑value direction robustness and compensating for head‑tail value disparities, it eliminates task interference while preserving the specialized knowledge of each adapter. This enables cost‑effective multi‑task deployment, respects data privacy, and opens avenues for future work on other PEFT paradigms (prompt tuning, adapters) and large‑scale federated merging scenarios.


Comments & Academic Discussion

Loading comments...

Leave a Comment