Demystifying Mergeability: Interpretable Properties to Predict Model Merging Success
Model merging combines knowledge from separately fine-tuned models, yet success factors remain poorly understood. While recent work treats mergeability as an intrinsic property, we show with an architecture-agnostic framework that it fundamentally depends on both the merging method and the partner tasks. Using linear optimization over a set of interpretable pairwise metrics (e.g., gradient L2 distance), we uncover properties correlating with post-merge performance across four merging methods. We find substantial variation in success drivers (46.7% metric overlap; 55.3% sign agreement), revealing method-specific “fingerprints”. Crucially, however, subspace overlap and gradient alignment metrics consistently emerge as foundational, method-agnostic prerequisites for compatibility. These findings provide a diagnostic foundation for understanding mergeability and motivate future fine-tuning strategies that explicitly encourage these properties.
💡 Research Summary
The paper challenges the prevailing view that “mergeability” is an intrinsic, model‑specific property independent of the merging partner or algorithm. Instead, the authors argue that mergeability is fundamentally a relationship that depends jointly on the merging method and the pair of tasks involved. To investigate this hypothesis, they propose an architecture‑agnostic diagnostic framework that evaluates a large set of interpretable pairwise similarity metrics without actually performing a merge.
Four representative merging strategies are examined: Task Arithmetic (TA), simple Weight Averaging (WA), Task Singular Vector merging (TSV), and Isotropic merging (ISO). For each pair of fine‑tuned models the authors compute 28 metrics spanning five categories: (1) task‑vector geometry (cosine similarity, L2 distance, angle, magnitude ratio), (2) effective rank measures (global and layer‑wise effective rank, stable rank, spectral gap, singular‑value ratio), (3) subspace overlap (singular‑value overlap, left/right subspace overlap, interaction‑matrix overlap), (4) activation‑based similarity (L2 distance, cosine, magnitude ratio, dot product), and (5) gradient‑based similarity (encoder‑gradient cosine/L2/dot, input‑gradient cosine/L2/dot). All metrics are normalized to the interval
Comments & Academic Discussion
Loading comments...
Leave a Comment