FedDES: Graph-Based Dynamic Ensemble Selection for Personalized Federated Learning

Statistical heterogeneity in Federated Learning (FL) often leads to negative transfer, where a single global model fails to serve diverse client distributions. Personalized federated learning (pFL) aims to address this by tailoring models to individu…

Authors: Brianna Mueller, W. Nick Street

FedDES: Graph-Based Dynamic Ensemble Selection for Personalized Federated Learning
FedDES: Graph-Based Dynamic Ensemble Selection for Personalized Federated Learning Brianna Mueller University of Iowa Iowa City, Iowa brianna- mueller@uiowa.edu W . Nick Street University of Iowa Iowa City, Iowa nick- street@uiowa.edu Abstract Statistical heterogeneity in Federated Learning (FL) often leads to negative transfer , where a single global model fails to serve diverse client distributions. Personalized federated learning (pFL) aims to address this by tailoring mo dels to individual clients. However , under most existing pFL appr oaches, clients integrate peer client contributions uniformly , which ignor es the r eality that not all peers are likely to b e equally benecial. Additionally , the potential for personalization at the instance level remains largely unexplored, even though the reliability of dierent peer mo dels often varies across individual samples within the same client. W e introduce FedDES (Federated D ynamic Ensemble Selection), a decentralized pFL framework that achieves instance-level per- sonalization through dynamic ensemble selection. Central to our approach is a Graph Neural Network (GNN) meta-learner trained on a heterogeneous graph modeling interactions between data samples and candidate classiers. For each test query , the GNN dynamically selects and weights peer client mo dels, forming an ensemble of the most competent classiers while eectively suppr essing con- tributions from those that are irrelevant or potentially harmful for performance. Experiments on CIF AR-10 and real-world ICU health- care data demonstrate that FedDES outperforms state-of-the-art pFL baselines in non-IID settings, oering robust pr otection against negative transfer . 1 Introduction Federated Learning (FL) enables collaborative model training across distributed data sources without requiring data centralization. By keeping sensitive data localized, FL addresses privacy and regu- latory constraints in applications such as healthcare diagnostics, nancial fraud detection, and autonomous systems. Traditional federated learning approaches, exemplied by Federated A veraging (FedA vg) and its variants, train a single global model by iteratively aggregating local gradient updates from participating clients. How- ever , the assumption that a single shared model can adequately serve all clients often fails when clients possess signicantly dier- ent data distributions in practice. This statistical heterogeneity cre- ates a critical challenge: forcing diverse clients to converge toward a shared representation can result in negative transfer , degrading performance for individual participants. In response to the limitations of global model appr oaches, per- sonalized federated learning has emerge d as a pr omising research direction that aims to provide each client with a model tailored to its local data distribution while still beneting from collaborative learning. Existing pFL approaches use various strategies to balance global and local information. Methods like pFe dMe [ 27 ] and Ditto [16] use bi-level optimization to separate shared and personalized parameters. Meta-learning approaches such as Per-FedA vg [ 9 ] learn initialization points that can be quickly adapted to local data. Other works explore mixtures of global and local models [ 7 ], clustered federated learning [ 23 ],or personalized aggregation strategies that weight pe er models based on client similarity or local utility [ 12 , 37 ]. While these metho ds show improvements over purely global approaches, they share several fundamental limitations. First, many rely on centralized coordination through synchronous communica- tion rounds, creating bottlenecks and single points of failure. Recent decentralized approaches have addressed this by enabling direct peer-to-pe er collaboration without a central server [ 4 , 20 , 26 , 31 ]. Howev er , most metho ds, whether centralized or decentralized, still require homogeneous model architectures acr oss all clients. This constraint limits exibility for a federated network of clients with varying computational capabilities. W ork on model-heterogeneous FL has begun to relax this constraint through methods such as knowledge distillation [ 24 , 33 ] or sharing prototypes rather than gradients or mo del parameters. Despite these advances, the ma- jority of personalized federated learning approaches balance only "global" and "local" knowledge, treating all peer contributions uni- formly without the ability to identify when spe cic peer models help versus harm. While some personalized methods learn client- specic weights that can selectively emphasize individual peers, these approaches still require homogeneous mo del architectures to perform weighted averaging in parameter space. Moving be- yond selective peer contribution at the client level, we identify an opportunity for sample-level personalization. Ev en methods that successfully learn which pe er models to weight for each client apply those weights uniformly across all test samples. Within a single client’s data distribution, the optimal set of contributing peer clients may vary signicantly across individual test samples. A hospital’s patients matching its predominant demographic prole might be best predicted by local models, while patients from un- derrepresented populations could benet from models trained at institutions where those populations are better represented. In this work, we introduce FedDES (Federated D ynamic Ensem- ble Selection), a novel pFL approach that performs dynamic ensem- ble selection to achieve personalization at the instance level. FedDES inherits the decentralized framework of the FedP AE algorithm [ 21 ], where clients independently train heterogeneous base classiers and construct local model b enches thr ough asynchronous peer-to- peer model sharing. The key advancement is the replacement of FedP AE’s static ensemble selection (one ensemble per client) with a Graph Neural Network ( GNN) that learns to dynamically select models from the bench for each individual test sample. The foundation of this approach is a heter ogeneous graph that models relationships between data samples and classiers, with Brianna Mueller and W . Nick Street edges encoding their interactions base d on classier prediction patterns and sample similarity . The GNN processes this graph to produce sample emb eddings, which are then mapped to output weights that determine both classier selection and the relative strength of their contribution to the ensemble prediction. Through this process, the network learns to identify which classiers are competent in spe cic regions of the sample space. By assigning zero weights to peer classiers not estimated to contribute to a correct ensemble de cision, the GNN provides instance-level protection against negative transfer , ensuring external knowledge is integrated only when benecial. Our main contributions are summarized as follows: • A decentralized, model-heterogeneous framework : W e propose FedDES, a fully decentralized pFL framework that relies on peer-to-peer communication, replacing syn- chronous communication rounds facilitated by a central server . By aggregating client contributions in the output space (via ensembles) rather than the parameter space, Fed- DES supports complete model heterogeneity . • GNN-based dynamic ensemble selection for instance- level personalization : W e introduce a novel dynamic en- semble selection approach leveraging GNNs. By modeling the interactions between local data samples and candidate classiers, the GNN learns to estimate classier competence based on sample-specic characteristics. This enables Fed- DES to mov e beyond standard client-level weighting in pFL to personalization at the instance level. • Precise protection against negative transfer : By tailor- ing solutions to individual samples rather than clients, Fe d- DES eectively suppresses contributions from non-benecial peer clients even in scenarios where a peer client’s model is generally useful for the client’s local data distribution, but fails on specic cases. • Empirical validation : W e demonstrate the eectiveness of FedDES on benchmarks including image classication (CIF AR-10) and real-world distributed healthcare data (eICU). Our experiments show that FedDES outperforms state-of- the-art p ersonalized federated learning baselines in non-IID settings. In Section 2, w e revie w personalized federated learning, dynamic ensemble selection, and graph representation learning. W e describe the proposed method in Section 3. Exp eriments and results are presented in Sections 4 and 5, followed by discussion in Section 6 and conclusions in Section 7. 2 Related W orks 2.1 Personalized Fe derated Learning under Heterogeneity Standard federated learning methods such as FedA vg assume a single global model can ser v e all clients, but statistical heterogene- ity across client data distributions often causes this assumption to fail [ 17 ]. Personalize d federated learning (pFL) addresses this by tailoring models to individual clients while retaining the benets of collaboration. A range of personalization strategies hav e been proposed, including meta-learning approaches that learn shared initializations for rapid local adaptation [ 9 ], regularization meth- ods that penalize deviation from a global reference [ 16 , 27 ], and aggregation-based approaches that learn client-specic weighting of pe er models [ 12 , 37 ]. However , most pFL methods assume ho- mogeneous model architectures across clients, which is restrictive when clients have varying computational resources or proprietary model designs. Model-heterogeneous federate d learning (MHFL) relaxes this constraint through sev eral strategies. Model-splitting methods such as LG-FedA vg [ 19 ] and FedGH [ 35 ] partition the architecture into shared and personalized components, enabling heterogeneity in the personalized portion while maintaining a com- mon component for collaboration. Knowledge distillation metho ds such as FML [ 24 ] and FedKD [ 33 ] achieve full architectural exibil- ity by exchanging knowledge thr ough model outputs rather than parameters. Each client maintains a personalized model alongside a local copy of a shared auxiliary model, with bidirectional distillation providing the learning signal. Prototype-based methods such as FedProto [ 28 ] and FedTGP [ 36 ] communicate class-level feature rep- resentations rather than parameters or predictions, which FedTGP improves throug h adaptive-margin contrastive learning to address poor prototype separability under strong heterogeneity . While these methods vary in the degree of architectural exibility they sup- port, most still require some form of dimensionality alignment across clients. Model-splitting methods require matching output dimensions at the shared component boundary , prototype-based methods assume embe ddings ar e comparable across architectures, and distillation methods introduce a shared auxiliary model whose architecture must be xed across all clients. These constraints can degrade performance when heterogeneous backb ones pr oduce rep- resentations with dierent scales or dimensionalities. Be yond these structural constraints, existing MHFL methods personalize at the client level, learning a single model or set of aggregation weights applied uniformly across all test samples. FedDES addresses both limitations: by aggregating classier contributions in the output space, it requires only that classiers produce predictions over a common label set, and by selecting classiers per sample through a GNN meta-learner , it achieves instance-level personalization. 2.2 Dynamic Ensemble Selection Dynamic ensemble selection (DES) aims to improve classication performance by selecting, for each test sample , a subset of classiers from a larger pool to contribute to the prediction. The core intu- ition underlying DES is that classiers vary in competence across dierent regions of the sample space [ 14 ], such that selecting those estimated to be most competent in the local region of the query can outperform static ensemble selection or full-pool aggregation [ 5 ]. The standard DES pipeline procee ds in three phases: (1) Dening a region of competence (RoC) around the query , (2) estimating classi- er competence within that region, and (3) selecting the subset of classiers for the nal prediction. Region of competence construction. The RoC determines the local context for competence estimation. Most DES methods dene this region using 𝑘 -nearest neighbors in the feature space, drawn from a de dicated dynamic selection dataset (DSEL) [ 2 , 5 ]. Standard k- NN, however , is sensitive to noise, class overlap, and class imbalance, prompting renements such as adaptive neighborhoods, alternative FedDES: Graph-Based Dynamic Ensemble Selection for Personalized Federated Learning distance metrics, and class-balanced neighbor rules [ 8 ]. Decision- space approaches take a dierent persp ectiv e, representing each sample by its vector of classier outputs (hard labels or posterior scores) and determining neighbors based on similarity in this space rather than the original feature space [10, 13]. Competence estimation criteria. Individual-based compe- tence measures evaluate each classier independently , most com- monly through accuracy within the RoC [ 32 ]. In contrast, meta- learning frameworks estimate competence from meta-featur es en- coding classier behavior . MET A-DES, for instance , trains a meta- classier on features such as local accuracy , condence, and output entropy to predict whether each base classier should be included in the ensemble for a given query [ 6 ]. Group-based measures account for classier interactions by seeking complementar y error patterns [ 25 ], though some work suggests that the benets of ensemble diversity are more relevant for static selection and that actively promoting instance-level div ersity can be counterproductive [34]. Selection and aggregation. Selection strategies range from strict oracle-inspired rules that require (near-)perfect local accuracy to softer threshold-based criteria that retain any classier outper- forming random chance [ 14 ]. Rather than making binar y inclusion decisions, dynamic weighting approaches use competence scor es directly to modulate each classier’s inuence during aggregation. 2.3 Graph Representation Learning Graph neural networks (GNNs) learn node representations by it- eratively aggregating information from each no de ’s local neigh- borhood. At each layer 𝑙 , a no de 𝑣 updates its emb edding ℎ ( 𝑙 ) 𝑣 by combining its current representation with messages fr om its neigh- bors N ( 𝑣 ) : ℎ ( 𝑙 ) 𝑣 = Upda te ( 𝑙 )  ℎ ( 𝑙 − 1 ) 𝑣 , Aggrega te ( 𝑙 )  { ℎ ( 𝑙 − 1 ) 𝑢 : 𝑢 ∈ N ( 𝑣 ) }   By stacking multiple layers, each node’s emb edding captures in- creasingly broad structural context. Graph Attention Networks (GA T s) [ 30 ] replace xed aggregation with a learned attention mechanism that assigns dierent importance weights to dier ent neighbors. However , Brody et al. [ 3 ] showed that standard GA T computes a global ranking of neighb or importance that is indepen- dent of the query node, a limitation termed static attention. Their proposed GA T v2 remedies this by reordering the nonlinearity in the attention computation, yielding dynamic attention where the ranking of neighbors is conditioned on the specic query node. This property makes GA Tv2 a natural t for FedDES, where the relevance of neighboring samples and classiers varies depending on the specic quer y instance . 3 Methodology FedDES op erates in a fully decentralized, p eer-to-peer setting in which each client performs three main operations locally: (1) train- ing a set of heterogeneous base classiers on its local data and exchanging the models with peers to obtain a shared pool of 𝑀 clas- siers; (2) evaluating all models in the shared pool on its lo cal data to construct a decision-space representation of samples and build a heterogeneous graph where nodes represent data samples and candidate classiers; (3) training a GNN meta-learner that produces sample-specic ensemble weights. Figure 1 pro vides an overview of the complete pipeline. 3.1 Stage 1: T raining Base Classiers and Decentralized Model Exchange. Each client partitions its private dataset D 𝑘 into three subsets: (i) a training set used to t the local base classiers and to supervise the GNN meta-learner , (ii) a validation set for early stopping during base model training and tuning GNN hyperparameters, and (iii) a test set reserved for nal performance evaluation. Clients train one or more models on their local data, producing model sets F 𝑘 with 𝑀 𝑘 = | F 𝑘 | classiers. These models may vary in architecture and are not limited to gradient-based methods. After training, clients exchange their classiers via P2P communication, so that each locally retains a copy of the full pool of 𝑀 = Í 𝑘 𝑀 𝑘 base classiers. 3.2 Stage 2: Decision-Space Representation and Graph Construction. After exchanging models, each client constructs a decision-space representation of its training data. Since the base classiers are trained on non-IID data using heter ogeneous architectures, their condence scores may not be directly comparable. Each client there- fore calibrates the received classiers using temperature scaling, aligning condence estimates with the local data distribution and preventing poorly calibrated classiers fr om distorting the decision- space representation. For each sample 𝑥 𝑖 , every classier 𝑓 𝑚 produces a probability vector 𝑝 𝑚 ( 𝑥 𝑖 ) ∈ R 𝐶 where 𝐶 is the number of classes. W e dene the projection function 𝜙 : X → R 𝑀 · 𝐶 which maps a sample into the decision space, the concatenated outputs of all 𝑀 classiers: 𝜙 ( 𝑥 𝑖 ) =  𝑝 1 ( 𝑥 𝑖 ) , 𝑝 2 ( 𝑥 𝑖 ) , . . . , 𝑝 𝑀 ( 𝑥 𝑖 )  ∈ R 𝑀 · 𝐶 . Let x 𝑖 : = 𝜙 ( 𝑥 𝑖 ) denote the decision-space embedding of sample 𝑥 𝑖 . Stacking these embeddings yields the decision-space matrix P 𝑘 ∈ R 𝑁 𝑘 × ( 𝑀 · 𝐶 ) . T o super vise the meta-learner , each client constructs a binary meta-label matrix Z 𝑘 ∈ { 0 , 1 } 𝑁 𝑘 × 𝑀 , where 𝑍 𝑖 , 𝑚 = 1 indicates that classier 𝑓 𝑚 correctly predicts the label of 𝑥 𝑖 . These meta-labels serve as targets for learning sample-specic ensemble weights. Each client then constructs a heter ogeneous graph over data samples and classiers, linke d by two edge types: sample-sample edges encoding decision-space similarity , and classier-sample edges encoding lo cal competence. Sample-sample e dges. These edges enco de similarity in the decision space. Instead of using standard global 𝑘 -NN, which is known to be sensitive to class imbalance, w e perform class-aware selection. For a target sample 𝑥 𝑗 , we identify the 𝑘 nearest neighbors from each class based on 𝐿 1 distance in the decision space, forming a class-balanced neighborhoo d: N ( 𝑥 𝑗 ) = 𝐶 − 1 Ø 𝑐 = 0 { 𝑥 𝑛 𝑐 , 1 , 𝑥 𝑛 𝑐 , 2 , . . . , 𝑥 𝑛 𝑐 , 𝑘 } , where 𝑛 𝑐 , 1 , . . . , 𝑛 𝑐 , 𝑘 index the selected neighbors from class 𝑐 . While class-stratied sampling ensures representation of minority classes for downstream GNN training, it introduces a tradeo: to meet per- class quotas, we may include neighbors that are r elatively distant from the target sample and therefor e provide less reliable signals. Brianna Mueller and W . Nick Street Figure 1: O v erview of FedDES. Stage 1: Clients independently train heterogeneous base classiers on their lo cal data and exchange models via pe er-to-peer communication, forming a shared classier pool. Stage 2: Each client evaluates the full classier pool on its local data to obtain decision-space representations P 𝑘 and meta-labels Z 𝑘 , and then constructs a heterogeneous graph in which sample nodes (circles) are linked by decision-space similarity and classier nodes (triangles) are connected to samples base d on local competence. Stage 3: A heterogeneous GA T v2 meta-learner is trained to map sample embeddings, rened through message passing over the graph, to per-classier competence scores. Inference: A new quer y 𝑥 𝑞 is projected into the de cision space, inserted into the graph, and processe d by the trained GNN. The resulting competence scores 𝜎 ( s 𝑖 ) determine which classiers are sele cted (scores above 0.5) and their relative voting strength in the nal weighted ensemble prediction ˆ 𝑦 𝑞 . More broadly , dynamic ensemble sele ction is inher ently sensitive to noise and class overlap near decision boundaries, which can distort the local context that guides ensemble selection. W e address this by assessing the stability of each class’s local structure ar ound 𝑥 𝑗 using Cumulative Mean Distance W eighting [ 1 ] and adjusting the inuence of neighbors accordingly . Classes whose neighb ors form compact, coherent clusters receive higher aggregate weight, while scattered or distant neighborho ods are downw eighted. Formally , neighbors are ordered by increasing distance and the class stability is quantied by the average drift of the cumulative neighborhood mean from the target: ¯ 𝑑 𝑐 = 1 𝑘 𝑘  𝑟 = 1   𝝁 𝑐 , 𝑟 − x 𝑗   1 , (1) where 𝝁 𝑐 , 𝑟 is the cumulative mean of the 𝑟 closest class- 𝑐 neigh- bors. Thus, smaller values of ¯ 𝑑 𝑐 indicate 𝑥 𝑗 lies within a reliable region of class 𝑐 , while large values suggest the nearest class- 𝑐 neighbors may be unrepresentative of the broader class distribu- tion. Finally , neighborho od inuence is distributed through a hier- archical weighting scheme. Class-level inuence is rst allocated as 𝜋 𝑐 ∝ 1 / ( ¯ 𝑑 𝑐 + 𝜀 ) , and then distributed among individual neighbors within each class through softmax over negative distances, yielding nal edge weights 𝑤 𝑖 𝑗 for each neighb or 𝑥 𝑖 ∈ N ( 𝑥 𝑗 ) , which sum to one across the entire neighborhood. Thus, edge weights reect both class stability and sample-level proximity . Classier-sample edges. These edges enco de classier compe- tence within the decision-space region local to the target sample. Given the weighted neighborhood N ( 𝑥 𝑗 ) , we evaluate each classi- er 𝑓 𝑚 by computing a gain score 𝐺 ( 𝑓 𝑚 , 𝑥 𝑗 ) , which quanties its performance relative to the pool average: 𝐺 ( 𝑓 𝑚 , 𝑥 𝑗 ) =  𝑥 𝑖 ∈ N ( 𝑥 𝑗 ) 𝑤 𝑖 𝑗 ( I ( 𝑓 𝑚 ( 𝑥 𝑖 ) = 𝑦 𝑖 ) − ¯ 𝑐 𝑖 ) (2) where 𝑤 𝑖 𝑗 are the sample-sample edge weights, I ( ·) is the indicator of correctness, and ¯ 𝑐 𝑖 = 1 𝑀 Í 𝑚 ′ I ( 𝑓 𝑚 ′ ( 𝑥 𝑖 ) = 𝑦 𝑖 ) represents the mean accuracy of the pool for neighb or 𝑥 𝑖 . By prioritizing marginal con- tribution over absolute accuracy , this metric identies local experts that succeed on dicult samples where the majority fail. T o r esolve ties where classiers achiev e identical gain, we use the w eighted log-loss over N ( 𝑥 𝑗 ) as a secondary criterion, favoring higher con- dence on correct predictions. W e sele ct the top- 𝑘 classiers ( 𝑘 = 5 ) and assign edge weights by normalizing their gain scores. Sample node features. Depending on the data modality and input dimensionality , sample node features may be the raw input features or a xed embedding of them. This pr ovides a complemen- tary signal to the decision-space representation, which captures only how classiers respond to a sample and discards input-le vel structure. Classier node features. While classier–sample edges cap- ture local, sample-sp ecic performance, the classier no de attributes characterize each model’s global b ehavior . Classier features in- clude per-class recall, the standard error of per-class recall, per-class condence, overall accuracy , and balanced accuracy . FedDES: Graph-Based Dynamic Ensemble Selection for Personalized Federated Learning 3.3 Stage 3: T raining meta-learner The objective of the meta-learner is to learn a sample repr esenta- tion that captures both local data context and classier behavior . Operating on the graph constructed in Stage 2, a heterogeneous GNN aggregates information from neighboring nodes to rene sample embeddings. Through iterative message passing over sam- ple–sample and classier–sample edges, each sample node incor- porates information about nearby samples and locally competent classiers. A nal linear projection maps each embe dding to an 𝑀 - dimensional vector of logits that quantify the pr edicted competence of each classier for the given sample. Specically , for sample 𝑥 𝑖 , the meta-learner takes the heteroge- neous graph as input and outputs a vector of logits s 𝑖 = ( 𝑠 𝑖 , 1 , . . . , 𝑠 𝑖 ,𝑀 ) ∈ R 𝑀 , where 𝑠 𝑖 , 𝑚 represents a scor e quantifying the predicted com- petence of classier 𝑓 𝑚 for sample 𝑥 𝑖 . Training is supervised using the meta-label matrix Z , where each entry 𝑍 𝑖 , 𝑚 ∈ { 0 , 1 } indicates whether classier 𝑓 𝑚 correctly predicts the label of sample 𝑥 𝑖 . Spe cif- ically , training minimizes L meta = 1 𝑁 train  𝑖 ∈ D train 1 𝑀 𝑀  𝑚 = 1 ℓ BCE _ logits ( 𝑠 𝑖 , 𝑚 , 𝑍 𝑖 , 𝑚 ) ! , (3) where ℓ BCE _ logits denotes the binar y cross-entropy loss with log- its. This training objective can be viewed as a multi-label learning problem at the sample level, wher e each sample may have multi- ple positive targets corresponding to the classiers that corr ectly predict its label. Notably , the loss is dened exclusively o ver sam- ple nodes. Classier nodes are not prediction targets and thus do not contribute directly to the loss, instead providing contextual information through classier–sample edges. At inference time, we apply the sigmoid function to map the raw logits to normalized competence scores in [ 0 , 1 ] , consistent with the binary cross-entropy training objective. Classiers with competence scores exceeding 0.5 are selected for the ensemble: 𝑞 𝑖 , 𝑚 = 𝜎 ( 𝑠 𝑖 , 𝑚 ) , 𝑤 𝑖 , 𝑚 = ( 𝑞 𝑖 , 𝑚 if 𝑞 𝑖 , 𝑚 > 0 . 5 0 otherwise (4) The competence scores of selected classiers are then normalized to sum to one across the sele cted subset. In cases where all classiers have competence scor es at or below 0.5 (i.e., no classier is deemed competent), the system falls back to uniform weighting across all classiers to ensure a valid prediction. The learne d weights serve a dual role: classiers with compe- tence score exceeding 0.5 are selecte d for the ensemble, while their normalized scores determine their voting strength. The nal pre- diction for 𝑥 𝑖 is obtained by aggregating the hard predictions of the selected classiers according to their normalized weights ˜ 𝑤 𝑖 , 𝑚 : ˆ 𝑝 ( 𝑦 | 𝑥 𝑖 ) = 𝑀  𝑚 = 1 ˜ 𝑤 𝑖 , 𝑚 1 { ˆ 𝑦 𝑚,𝑖 = 𝑦 } , ˆ 𝑦 𝑖 = arg max 𝑦 ˆ 𝑝 ( 𝑦 | 𝑥 𝑖 ) . (5) This competence-weighted voting mechanism enables FedDES to dynamically determine whether , and how much, each peer client contributes knowledge for each sample. 4 Experiments Datasets. Experiments are performed on CIF AR-10 [ 15 ], a widely- used image dataset, and real-world distributed healthcare datasets from the eICU Collaborative Research Database [22]. CIF AR-10. W e simulate a federation of 20 clients using the CIF AR- 10 dataset, consisting of 60,000 images across 10 classes. T o generate heterogeneous client data distributions, we use the Extended Dirich- let (ExDir) sampling strategy [ 18 ], which extends Dirichlet-based data partitioning [ 11 ] by rst randomly assigning a subset of class labels to each client before allocating samples to each client via a Dirichlet distribution. This strategy is denoted ExDir ( 𝐶, 𝛼 ) , where 𝐶 is the numb er of classes assigned to each client and 𝛼 is the Dirichlet concentration parameter . W e vary heterogeneity along two axes: the number of classes per client 𝐶 ∈ { 3 , 5 , 7 } and the Dirichlet concentration 𝛼 ∈ { 1 , 10 } , producing six experimental settings. Smaller values of 𝐶 restrict each client to fewer classes, creating label-distribution skew , while smaller values of 𝛼 produce more uneven sample allocations within the assigned classes, cre- ating quantity skew . T ogether , these two axes span a range from mild heterogeneity ( 𝐶 = 7 , 𝛼 = 10 ) to severe heterogeneity ( 𝐶 = 3 , 𝛼 = 1 ). eICU. The eICU Collab orativ e Research Database is a multi-center critical care dataset containing de-identie d health records from over 200 hospitals. Following the preprocessing pipeline and cohort denitions from T ang et al. [ 29 ], we use their extracted cohorts for two prediction tasks: cir culatory shock and in-hospital mortality . For both tasks, we use static features (age, demographics, past me di- cal history) and time-series features from the rst hours of ICU data (vital signs, laboratory values, medications, uid intake/output) to predict whether the patient will de velop the outcome. The obser- vation window is 4 hours for shock and 24 hours for in-hospital mortality . Shock is dened as the ne ed for vasopressor therapy during the remainder of the hospital stay , while in-hospital mor- tality corresponds to hospital discharge status as expired. Clients are naturally dened by hospital. W e select the 50 hospitals with the highest positive class prevalence among those with at least 150 ICU encounters. Baselines. W e include two reference baselines: (i) Local, where each client trains its o wn model( s) independently (a single classier for CIF AR-10, a uniform ensemble of locally trained classiers for eICU), and (ii) Global Ensemble, where all classiers in the shared pool are weighted equally . W e also compare FedDES against six state-of-the-art pFL methods that support mo del heterogeneity , spanning three categories: kno wledge distillation–based (FedKD, FML), model-splitting (LG-FedA vg, FedGH), and representation- based (FedProto , FedTGP). All methods use identical data partitions and architecture assignments, and w e adopt hyperparameters re- ported in the original papers. Model Heterogeneity . W e create model heterogeneity by den- ing a po ol of 𝐾 model architectures per dataset. For CIF AR-10, we use 𝐾 = 4 convolutional architectures: a custom 3-layer CNN, MobileNetV2, ResNet-18, and ResNet-34. For eICU , we use 𝐾 = 3 se- quence modeling architectur es: a T emporal Conv olutional Netw ork (TCN), a custom 1D CNN, and an LSTM-based recurrent mo del. Baseline methods assign each client a single architecture from the pool in a repeating sequence over the 𝐾 options. Since FedDES Brianna Mueller and W . Nick Street aggregates in the output space, it is not constrained to a single architecture per client. W e leverage this e xibility on eICU, wher e each client trains all 𝐾 = 3 architectures on its local data to increase pool diversity by capturing dierent inductiv e biases o ver the same data, producing three base classiers per hospital (150 classiers across the federation). For CIF AR-10, FedDES follows the same single-architecture assignment as baselines to ensure a controlled comparison, with the shared po ol containing 20 classiers ( one per client). T o minimize communication overhead during parameter trans- mission, distillation-based methods require compact auxiliary mod- els. W e therefore select the architecture with the few est parameters from each dataset’s model group to serve as the auxiliar y model for FedKD and FML. For model-splitting methods FedGH and LG- FedA vg, we introduce heterogeneity only in the feature extractor (backbone) while ke eping the classier head homogeneous across clients, as these methods rely on a shared component for collab- oration. Since heterogeneous backbones may output embeddings with dierent dimensionalities, representation-based methods that compare or aggregate embeddings (FedProto and FedTGP) are not directly applicable . W e therefore follow FedTGP and insert an adap- tive pooling layer after each backbone to standardize the emb edding dimension across clients. General implementation details. Each client’s lo cal data is divided into training and test sets following an 80/20 split, with 25% of the training data reserved for validation. W e report stan- dard accuracy for CIF AR-10 and balanced accuracy (the unweighted mean of p er-class recall) for eICU, where substantial class imbalance causes standard accuracy to be dominated by majority-class perfor- mance and can obscure meaningful dierences between methods. All methods use a client participation ratio of 1, learning rate 0.01, and train for up to 300 communication rounds. For each client, we report test performance from the round achie ving the best valida- tion accuracy (CIF AR-10) or validation balanced accuracy (eICU). Each round consists of 1 epoch of lo cal training with batch size 32. For eICU tasks, all methods use class-weighted cross-entropy loss to account for label imbalance, with w eights inversely proportional to class frequency . Method-specic hyperparameters are set according to their origi- nal publications. Knowledge distillation baselines use the following: FML employs 𝛼 = 0 . 5 and 𝛽 = 0 . 5 , while FedKD congures its aux- iliary model with learning rate 0.01 (matching client models) and temperature parameters 𝑇 start = 𝑇 end = 0 . 95 . Representation-based methods are congured as: FedProto with 𝜆 = 0 . 1 , and FedTGP with 𝜆 = 0 . 1 , margin threshold 𝜏 = 100 , and 𝑆 = 100 server epochs. FedDES implementation details. T o utilize all local data for both base classier training and graph construction without the optimistic bias from evaluating classiers on their own training data, we employ 5-fold cross-validation to generate out-of-fold pre- dictions for graph construction. Base classiers are then retrained on the complete training set using Adam (LR = 5e-4) for up to 300 epochs with early stopping based on validation balance d accuracy (eICU) or validation accuracy (CIF AR-10). Since classiers dier in both architecture and the data distributions they wer e trained on, their condence scores may not be directly comparable. Each client therefore calibrates all received models using temperature scaling to align condence estimates across the pool, preventing miscalibrated condence scores from distorting the decision-space representation. For building the heterogeneous graphs, we use 𝑘 = 5 neighbors per class for sample-sample edges and top- 𝑘 = 3 classiers for classier-sample e dges. The GNN meta-learner uses a two-layer heterogeneous GA T v2 architecture with hidden dimension 128 and four attention heads. The network is trained for up to 300 epochs using Adam (LR = 1e-3), with dropout 0.2, batch size 32, and early stopping based on validation loss with patience 20. 5 Results Overall Performance. T able 1 presents results on CIF AR-10 across six heterogeneity settings spanning two axes: the number of classes per client ( 𝐶 ∈ { 3 , 5 , 7 } ) and the Dirichlet concentration ( 𝛼 ∈ { 1 , 10 } ). FedDES achieves the highest mean accuracy in every set- ting, ranging from 85.7% ( 𝛼 = 1 , 𝐶 = 3 ) to 60.2% ( 𝛼 = 10 , 𝐶 = 7 ). The per- formance gap over the next-best method ranges from 1.9 points at 𝛼 = 1 , 𝐶 = 3 (vs. Local at 83.8%) to 4.7 points at 𝛼 = 1 , 𝐶 = 7 (vs. Local at 63.7%). FedDES also achieves the highest win rates across all settings (80–95%), meaning it improv es over local training for the T able 1: Performance comparison on CIF AR-10 across heterogeneity levels. Win rate indicates p er centage of clients where a method outperforms the local baseline. Mean accuracy (%) ± standard deviation computed across 20 clients. 𝛼 = 1 𝛼 = 10 𝐶 = 3 𝐶 = 5 𝐶 = 7 𝐶 = 3 𝐶 = 5 𝐶 = 7 Method Acc. Win% Acc. Win% Acc. Win% Acc. Win% Acc. Win% Acc. Win% FedDES 85.7 ± 6.0 90 74.2 ± 8.9 95 68.4 ± 7.3 95 81.9 ± 6.8 80 68.3 ± 10.5 90 60.2 ± 5.1 90 Local 83.8 ± 7.0 – 70.9 ± 10.5 – 63.7 ± 9.7 – 79.3 ± 7.2 – 64.0 ± 12.8 – 54.3 ± 7.7 – Global 44.5 ± 16.4 0 46.7 ± 8.5 0 51.1 ± 5.8 10 49.1 ± 18.5 0 49.5 ± 7.4 20 57.0 ± 3.8 60 FML 80.4 ± 7.0 30 67.8 ± 11.8 25 60.0 ± 10.0 25 75.5 ± 8.2 20 61.1 ± 12.4 40 48.4 ± 7.6 15 LG-FedA vg 81.1 ± 6.5 20 68.7 ± 10.7 20 60.9 ± 9.9 10 76.4 ± 7.0 20 61.8 ± 11.4 25 49.5 ± 6.1 20 FedTGP 76.3 ± 13.0 5 65.9 ± 12.2 5 58.0 ± 12.2 10 74.8 ± 11.6 20 59.8 ± 12.4 30 50.2 ± 8.2 25 FedKD 81.0 ± 7.5 25 69.2 ± 11.2 30 61.8 ± 9.8 35 76.0 ± 7.9 30 62.7 ± 11.9 45 51.0 ± 8.6 35 FedGH 81.3 ± 7.1 10 68.3 ± 11.3 15 60.1 ± 9.5 20 76.2 ± 7.1 15 61.3 ± 11.6 30 49.1 ± 6.9 10 FedProto 69.8 ± 19.9 5 57.4 ± 18.2 5 54.3 ± 12.9 10 68.1 ± 14.1 10 57.4 ± 15.1 20 45.9 ± 7.8 10 FedDES: Graph-Based Dynamic Ensemble Selection for Personalized Federated Learning T able 2: Performance comparison on eICU prediction tasks. Win rate indicates percentage of hospitals where a method outperforms the local baseline. Mean balanced accuracy (%) ± standard deviation computed across 50 hospitals. Shock Mortality Method Mean ± Std Win (%) Mean ± Std Win (%) FedDES 65.7 ± 8.6 70 71.9 ± 7.2 86 Local 61.1 ± 7.6 - 64.4 ± 8.3 - Global 60.1 ± 7.1 40 62.8 ± 7.2 36 LG-FedA vg 62.9 ± 9.2 64 68.1 ± 8.2 74 FedTGP 57.3 ± 9.8 30 61.1 ± 8.3 44 FedKD 63.2 ± 9.2 64 70.0 ± 8.5 82 FedProto 51.9 ± 5.4 12 54.5 ± 7.3 18 FML 63.4 ± 9.3 62 66.2 ± 9.1 60 FedGH 64.4 ± 9.0 76 66.8 ± 8.4 66 vast majority of clients. In contrast, competing fe derated meth- ods achieve win rates of only 5–45%, indicating that while they may help some clients, they hurt others. The global ensemble per- forms poorly under high heterogeneity (44.5% at 𝛼 = 1 , 𝐶 = 3 ; 49.1% at 𝛼 = 10 , 𝐶 = 3 ), conrming that uniform aggregation without person- alization causes severe negative transfer when client distributions diverge substantially . The one exception is 𝛼 = 10 , 𝐶 = 7 , where the global ensemble achieves 57.0% with a 60% win rate, outperforming all federated baselines except FedDES. The two heterogeneity axes produce distinct patterns. Across both 𝛼 values, the local baseline alone outp erforms most federated methods, and at 𝛼 = 10 the local baseline outp erforms e very person- alized federated metho d e xcept FedDES across all three 𝐶 settings. This observation is consistent with the nding from FedP AE [ 21 ] that local baselines in the pFL literature may b e underestimated when default training congurations are adopted without tuning for the local setting. T able 2 reports balanced accuracy on the eICU prediction tasks. On mortality prediction, FedDES achieves 71.9% balanced accuracy with an 86% win rate, improving ov er the local baseline (64.4%) by 7.5 points and outperforming all competing methods. The next-best method is FedKD (70.0%, 82% win rate), followed by LG-Fe dA vg (68.1%, 74%). On shock prediction, FedDES achieves 65.7% balanced accuracy with a 70% win rate, improving over the local baseline (61.1%) by 4.6 points. FedDES outperforms all baselines, with FedGH (64.4%, 76% win rate) and FML (63.4%, 62%) as the next-closest meth- ods. The global ensemble achieves 60.1% on sho ck and 62.8% on mortality , underperforming FedDES by 5.6 and 9.1 points respec- tively , conrming that uniform aggregation acr oss hospitals fails to account for distributional dierences between institutions. Representation-based methods (FedProto, FedTGP) show the weakest and most variable performance, with FedPr oto achieving only 51.9% balanced accuracy on eICU sho ck and high variance across CIF AR-10 settings. This instability likely reects the chal- lenge of aligning heterogeneous backbone emb eddings through adaptive pooling, which can produce poorly separated representa- tions when architectures dier substantially . Meta-Learner Selection Behavior . T o visualize the selection behavior of the GNN meta-learner , we group test samples by their true class label and compute, for each classier , the mean sele ction score across all test samples of that class. W e then plot this average selection score against the proportion of the target class in the classier’s home client’s training data (Figure 2). Each p oint in the gure represents one (classier , target class) pair , aggregated across all clients, with local classiers (those traine d at the same client as the test samples) highlighted in red. Across all heterogeneity settings, the meta-learner assigns higher selection scor es to classiers whose home clients had more training data for the target class. The strength of this relationship varies with both heterogeneity axes. Fixing the number of classes per client, 𝛼 = 10 yields stronger correlations than 𝛼 = 1 : at 𝐶 = 3 , 𝜌 = 0 . 78 for 𝛼 = 10 versus 𝜌 = 0 . 69 for 𝛼 = 1 ; at 𝐶 = 5 , 𝜌 = 0 . 72 versus 0 . 59 ; at 𝐶 = 7 , 𝜌 = 0 . 64 versus 0 . 55 . Fixing 𝛼 , the correlation decreases as 𝐶 increases (more classes per client). Under high label skew ( 𝐶 = 3 ), class frequencies are highly skewed and the meta-learner e xploits these distributional dierences more aggressively . In settings with lower label skew ( 𝐶 = 7 ), each client trains on a broader subset of classes, compressing the range of class fr equencies across clients and narrowing the resulting dier ences in selection scores. Classiers with near-zero home support for a given class gener- ally receive near-zero average selection scores across all settings, as the meta-learner learns to exclude classiers that lack relevant training experience. Conversely , classiers trained at clients with high class prevalence ar e preferentially selected, ev en when they originate from remote clients. Local classiers (red points in Fig- ure 2) tend to app ear ab ov e the trendline, and this local preference is most pronounced under high heterogeneity . At 𝐶 = 3 , lo cal classiers are clearly elevated above non-local classiers with comparable home support in both 𝛼 settings, while at 𝐶 = 7 the distinction is less pronounced. Under high heterogeneity , each client’s training distribution is more distinctive , so locally trained classiers are especially well-matched to local test samples. Nevertheless, non- local classiers with high home support still receive substantial selection scores across all settings, conrming that the meta-learner actively leverages peer classiers based on their rele vant training experience rather than defaulting to local models. This increasing selectivity is also reected in the ensemble size statistics (T able 3). The mean number of classiers selected per pre- diction decreases from 9.73 ( 𝛼 = 10 , 𝐶 = 7 ) to 4.36 ( 𝛼 = 1 , 𝐶 = 3 ), conrm- ing that the meta-learner identies fewer experts as heterogeneity increases. The eective ensemble size (ESS), which accounts for weight concentration among selected classiers, remains in a nar- rower range (3.71–4.71), indicating that ev en when more classiers are selected under mild heterogeneity , the meta-learner concen- trates weight on a small set. The ESS-to-size ratio decreases from 0.85 ( 𝛼 = 1 , 𝐶 = 3 ) to 0.48 ( 𝛼 = 10 , 𝐶 = 7 ): under high heterogeneity , se- lected classiers contribute more e qually , while under mild hetero- geneity the ensemble is larger but weight is concentrate d on few er members. 6 Discussion The consistent gap between FedDES and competing metho ds on CIF AR-10 (T able 1) illustrates a fundamental limitation of client- level p ersonalization. Methods such as FedKD, FML, and LG-FedA vg learn a single personalized model per client that applies the same Brianna Mueller and W . Nick Street Figure 2: Meta-learner selection scores vs. home client class frequency on CIF AR-10 across six heterogeneity settings. Each point represents one (classier , target class) pair averaged across all clients. Re d p oints indicate local classiers; gray points indicate non-local classiers. Dashed line shows linear trend. 𝜌 denotes Spearman correlation. T op row: 𝛼 = 1 ; bottom row: 𝛼 = 10 . Columns vary 𝐶 ∈ { 3 , 5 , 7 } . learned parameters to all test samples. When a client’s data spans multiple classes with varying representation, no single model can optimally handle all samples, a shortcoming that FedDES addresses by assembling a dierent ensemble for each test sample . The con- sistently high win rates across settings suggest that this per-sample adaptability provides broad benets across the federation rather than improvements concentrated on a few clients. Notably , the local baseline alone outperformed most federated methods on CIF AR-10. Fe dDES’s ensemble-base d design aords exibility not only in model architecture but also in training proce- dure. Because aggregation occurs in the output space , each client is T able 3: Fe dDES ensemble characteristics across heterogene- ity levels on CIF AR-10. Mean ensemble size is the average number of classiers selecte d (competence score > 0 . 5 ). Eec- tive ensemble size (ESS) measures the number of classiers meaningfully contributing to each prediction after account- ing for weight concentration. ESS/Size ratio indicates how evenly weight is distributed among selected classiers. 𝛼 = 1 𝛼 = 10 𝐶 = 3 𝐶 = 5 𝐶 = 7 𝐶 = 3 𝐶 = 5 𝐶 = 7 Mean Ensemble Size 4.36 5.44 7.07 4.96 7.58 9.73 Mean ESS 3.71 3.73 4.12 3.96 4.38 4.71 ESS / Ensemble Size 0.85 0.68 0.58 0.80 0.58 0.48 free to select the optimizer and training conguration best suited to its local setting. In our experiments, this meant training base classi- ers with Adam and early stopping, which pr oduced surprisingly strong local models. This raises the possibility that local baselines in the pFL literature may be underestimated when default training congurations are adopted without tuning for the local setting. The eICU results (T able 2) demonstrate that FedDES’s instance- level selection is eective beyond synthetic benchmarks. The im- provements over local training on both mortality and shock pre- diction are notable given that heterogeneity across hospitals arises naturally from dierences in patient p opulations, clinical proto- cols, and documentation practices rather than being synthetically induced. Beyond raw performance, FedDES’s selection mechanism of- fers insight into how collaboration emerges across clients. The strong correlation between selection scores and home client class frequency (Figure 2) supports our hypothesis that heterogeneity produces classiers with distinct areas of expertise, and a learned selection mechanism can route test samples to classiers with rele- vant training experience rather than defaulting to local models or treating all peers uniformly . The decreasing correlation strength as heterogeneity decreases highlights when federated collab oration is most valuable. Under high heterogeneity , clients produce highly specialize d classiers with concentrated expertise in a small numb er of classes, meaning that for any given target class, fe wer classiers in the global pool FedDES: Graph-Based Dynamic Ensemble Selection for Personalized Federated Learning possess relevant kno wledge and selective ensemble construction becomes critical. Conversely , under low heterogeneity , more classi- ers develop general knowledge, reducing the gap between strong and weak classiers for a given target class, which weakens the correlation. This suggests that FedDES is particularly well-suited to settings where data heterogeneity is a primary challenge, which is also where conventional federated methods struggle most. The mild selection preference for local classiers is also notewor- thy . This preference is most pronounced under high heterogeneity ( 𝐶 = 3 in both 𝛼 settings) where distribution-specic knowledge is most valuable. A lo cal classier that has se en even a modest number of examples from the same distribution oers an advantage that its class frequency alone does not capture. 7 Conclusion W e introduced FedDES, a p ersonalized federated learning frame- work that achie ves instance-lev el personalization through dynamic ensemble selection. By constructing heterogeneous graphs that model interactions between data samples and candidate classiers, and training a GNN meta-learner to pr edict per-sample classier competence, FedDES moves beyond the client-level p ersonaliza- tion oered by existing methods. The framework operates in a fully decentralized setting and supports complete model hetero- geneity by aggregating in the output space. On CIF AR-10, FedDES consistently outperforms all baselines across heterogeneity levels with 80–95% win rates, demonstrating that instance-level selection provides broad benets across the federation. On r eal-world eICU healthcare data, FedDES achiev es competitive performance on both mortality and shock prediction, improving over local training on both tasks. Our results suggest that routing samples to classiers with relevant training experience, rather than applying uniform client-level weights, is a particularly eective strategy when client data distributions are highly heterogeneous. References [1] Hassan I Abdalla, Ali A Amer , and Mohammad Nassef. 2025. New fuzzy K- nearest neighbor algorithms for classication performance impr ovement. Future Generation Computer Systems (2025), 108139. [2] Alceu S Britto Jr , Robert Sabourin, and Luiz ES Oliveira. 2014. Dynamic selection of classiers—a comprehensive r eview . Pattern recognition 47, 11 (2014), 3665– 3680. [3] Shaked Brody, Uri Alon, and Eran Yahav . 2021. How attentiv e are graph attention networks? arXiv preprint arXiv:2105.14491 (2021). [4] Wu Chen, Peilin Liu, Yida Bai, Jiamou Liu, Nianyu Li, Enhong Mu, and Mingyue Zhang. 2025. De centralized Federated Learning with Dynamic Neighbor Selec- tion and Condence-Based Aggregation. A vailable at SSRN 5560620 (2025). [5] Rafael MO Cruz, Robert Sabourin, and George DC Cavalcanti. 2018. Dynamic classier selection: Recent advances and perspectives. Information Fusion 41 (2018), 195–216. [6] Rafael MO Cruz, Robert Sab ourin, George DC Cavalcanti, and T sang Ing Ren. 2015. MET A-DES: A dynamic ensemble selection framework using meta-learning. Pattern recognition 48, 5 (2015), 1925–1935. [7] Y uyang Deng, Mohammad Mahdi Kamani, and Mehrdad Mahdavi. 2020. Adaptive personalized federated learning. arXiv preprint arXiv:2003.13461 (2020). [8] Luca Didaci and Giorgio Giacinto. 2004. Dynamic classier selection by adaptive k-nearest-neighbourhood rule. In International workshop on multiple classier systems . Springer , 174–183. [9] Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar . 2020. Personalized federated learning with theoretical guarantees: A mo del-agnostic meta-learning approach. Advances in Neural Information Processing Systems 33 (2020), 3557– 3568. [10] Giorgio Giacinto and Fabio Roli. 2001. Dynamic classier sele ction based on multiple classier behaviour . Pattern Recognition 34, 9 (2001), 1879–1881. [11] Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. 2019. Measuring the eects of non-identical data distribution for federated visual classication. arXiv Preprint arXiv:1909.06335 (2019). [12] Y utao Huang, Lingyang Chu, Zirui Zhou, Lanjun W ang, Jiangchuan Liu, Jian Pei, and Y ong Zhang. 2021. Personalized cross-silo federated learning on non-IID data. In Proceedings of the AAAI Conference on Articial Intelligence , V ol. 35. 7865–7873. [13] Y ea S Huang and Ching Y Suen. 1993. The behavior-knowledge space method for combination of multiple classiers. In IEEE computer society conference on computer vision and pattern recognition . Institute of Electrical Engineers Inc (IEEE), 347–347. [14] Albert HR Ko, Robert Sabourin, and Alceu Souza Britto Jr . 2008. From dynamic classier selection to dynamic ensemble selection. Pattern recognition 41, 5 (2008), 1718–1731. [15] Alex Krizhe vsky , Georey Hinton, et al . 2009. Learning multiple layers of features from tiny images . Technical Rep ort. Univ ersity of T oronto. [16] Tian Li, Shengyuan Hu, Ahmad Beirami, and Virginia Smith. 2021. Ditto: Fair and robust federated learning through personalization. In International Conference on Machine Learning . 6357–6368. [17] Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ame et Talwalkar , and Virginia Smith. 2020. Federate d optimization in heterogeneous networks. Proceedings of Machine Learning and Systems 2 (2020), 429–450. [18] Yipeng Li and Xinchen Lyu. 2023. Convergence analysis of sequential federated learning on heterogeneous data. Advances in Neural Information Processing Systems 36 (2023), 56700–56755. [19] Paul Pu Liang, Terrance Liu, Liu Ziyin, Nicholas B Allen, Randy P Auerbach, David Brent, Ruslan Salakhutdinov , and Louis-Philippe Morency . 2020. Think locally , act globally: Federate d learning with local and global representations. arXiv preprint arXiv:2001.01523 (2020). [20] I Lin, Osman Y agan, Carlee Joe-W ong, et al . 2024. Fedspd: A soft-clustering approach for personalized decentralized federate d learning. arXiv preprint arXiv:2410.18862 (2024). [21] Brianna Mueller , W Nick Street, Stephen Baek, Qihang Lin, Jingyi Yang, and Y ankun Huang. 2024. FedPAE: Peer- Adaptive Ensemble Learning for Asynchr o- nous and Model-Heterogeneous Federated Learning. In 2024 IEEE International Conference on Big Data (BigData) . IEEE, 7961–7970. [22] T om J Pollard, Alistair EW Johnson, Jesse D Raa, Leo A Celi, Roger G Mark, and Omar Badawi. 2018. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Scientic data 5, 1 (2018), 1–13. [23] Felix Sattler , Klaus-Robert Müller, and W ojciech Samek. 2020. Clustered feder- ated learning: Model-agnostic distributed multitask optimization under privacy constraints. IEEE T ransactions on Neural Networks and Learning Systems 32, 8 (2020), 3710–3722. [24] T ao Shen, Jie Zhang, Xinkang Jia, Fengda Zhang, Gang Huang, Pan Zhou, Kun Kuang, Fei Wu, and Chao W u. 2020. Federated mutual learning. arXiv preprint arXiv:2006.16765 (2020). [25] Rodrigo GF Soares, Alixandre Santana, Anne MP Canuto, and Marcílio Car- los Pereira de Souto. 2006. Using accuracy and diversity to select classiers to build ensembles. In The 2006 IEEE international joint conference on neural network proceedings . IEEE, 1310–1316. [26] Pedro Miguel Sánchez Sánchez, Enrique T omás Martínez Beltrán, Chao Feng, Gérôme Bovet, Gregorio Martínez Pérez, and Alberto Huertas Celdrán. 2025. S- V OTE: Similarity-based V oting for Client Selection in Decentralized Fe derated Learning. In International Joint Conference on Neural Networks . International Neural Network Society , 1–9. [27] Canh T Dinh, Nguyen Tran, and Josh Nguyen. 2020. Personalized federated learning with Moreau envelopes. Advances in Neural Information Processing Systems 33 (2020), 21394–21405. [28] Y ue Tan, Guodong Long, Lu Liu, Tianyi Zhou, Qinghua Lu, Jing Jiang, and Chengqi Zhang. 2022. Fedproto: Federated prototype learning across hetero- geneous clients. In Proceedings of the AAAI conference on articial intelligence , V ol. 36. 8432–8440. [29] Shengpu T ang, Parmida Davarmanesh, Y anmeng Song, Danai Koutra, Michael W Sjoding, and Jenna Wiens. 2020. Demo cratizing EHR analyses with FIDDLE: a exible data-driven preprocessing pipeline for structured clinical data. Journal of the A merican Medical Informatics Association 27, 12 (2020), 1921–1934. [30] Petar V eličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Y oshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017). [31] Lin W ang, Yang Chen, Y ongxin Guo, and Xiaoying T ang. 2024. Smart sampling: Helping from friendly neighbors for decentralized federated learning. arXiv preprint arXiv:2407.04460 (2024). [32] Kevin W oods, W . Philip Kegelmeyer , and Kevin Bo wyer . 1997. Combination of multiple classiers using local accuracy estimates. IEEE transactions on pattern analysis and machine intelligence 19, 4 (1997), 405–410. [33] Chuhan Wu, Fangzhao W u, Lingjuan Lyu, Y ongfeng Huang, and Xing Xie. 2022. Communication-ecient federated learning via knowledge distillation. Nature Brianna Mueller and W . Nick Street communications 13, 1 (2022), 2032. [34] Şenay Y aşar Sağlam and W Nick Street. 2018. Distant diversity in dynamic class prediction. Annals of Operations Research 263, 1 (2018), 5–19. [35] Liping Yi, Gang W ang, Xiaoguang Liu, Zhuan Shi, and Han Yu. 2023. Fe dGH: Heterogeneous federated learning with generalized global header . In Proceedings of the 31st ACM International Conference on Multimedia . 8686–8696. [36] Jianqing Zhang, Y ang Liu, Y ang Hua, and Jian Cao. 2024. Fedtgp: T rainable global prototypes with adaptive-margin-enhanced contrastive learning for data and model heterogeneity in federated learning. In Proceedings of the AAAI conference on articial intelligence , V ol. 38. 16768–16776. [37] Michael Zhang, Karan Sapra, Sanja Fidler, Serena Y eung, and Jose M Alvarez. 2020. Personalize d federate d learning with rst order model optimization. arXiv preprint arXiv:2012.08565 (2020).

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment