PRISM-FCP: Byzantine-Resilient Federated Conformal Prediction via Partial Sharing

IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. XX, 2026 1 PRISM-FCP: Byzantine-Resilient Federated Conformal Prediction via P artial Sharing Ehsan Lari, Reza Arablouei, and Stefan W erner , F ellow , IEEE Abstract —W e propose PRISM-FCP (Partial shaRing and r obust calIbration with Statistical Margins f or Federated Conf ormal Prediction), a Byzantine-resilient federated conformal prediction framework that utilizes partial model sharing to impro ve robust- ness against Byzantine attacks during both model training and conformal calibration. Existing approaches addr ess adversarial behavior only in the calibration stage, leaving the learned model susceptible to poisoned updates. In contrast, PRISM-FCP mitigates attacks end-to-end. During training, clients partially share updates by transmitting only M of D parameters per round. This attenuates the expected energy of an adversary’ s perturbation in the aggregated update by a factor of M /D , yielding lower mean-square error (MSE) and tighter prediction intervals. During calibration, clients conv ert nonconformity scores into characteri- zation vectors, compute distance-based maliciousness scores, and downweight or ﬁlter suspected Byzantine contributions befor e estimating the conformal quantile. Extensive experiments on both synthetic data and the UCI Superconductivity dataset demonstrate that PRISM-FCP maintains nominal coverage guarantees under Byzantine attacks while a voiding the interval inﬂation observed in standard FCP with reduced communication, pro viding a robust and communication-efﬁcient appr oach to federated uncertainty quantiﬁcation. Index T erms —federated learning, conformal prediction, uncer- tainty quantiﬁcation, Byzantine attacks, partial sharing I . I N T R O D U C T I O N F EDERA TED learning (FL) [1]–[5] is a distributed learn- ing paradigm in which a population of clients, such as smartphones, hospitals, or ﬁnancial institutions, collaboratively train a shared model without exchanging raw data. Over the past fe w years, FL has progressed from early formulations rooted in distributed optimization [6]–[8] to increasingly large- scale and production-oriented implementations [9]–[11]. This paradigm is particularly attracti ve when data is inherently decentralized and centralized training is unfeasible or impractical due to priv acy , regulatory , or logistical constraints [12]–[14]. Ho wev er, FL also introduces ne w attack surfaces: the serv er must This work was partially supported by the Research Council of Norway and the Research Council of Finland (Grant 354523). A conference v ersion of this work is submitted to the 2026 European Signal Processing Conference (EUSIPCO), Bruges, Belgium, Aug. 2026. (Corr esponding author: Ehsan Lari.) Ehsan Lari is with the Department of Electronic Systems, Norwegian Univ ersity of Science and T echnology , 7491 Trondheim, Norway (e-mail: ehsanl@alumni.ntnu.no). Reza Arablouei is with the Commonwealth Scientiﬁc and Industrial Research Organisation, Pullen vale, QLD 4069, Australia (e-mail: reza.arablouei@csiro.au). Stefan W erner is with the Department of Electronic Systems, Norwegian Univ ersity of Science and T echnology , 7491 Trondheim, Norway . He is also with the Department of Information and Communications Engineering, Aalto Univ ersity , 00076, Finland (e-mail: stefan.werner@ntnu.no). Digital Object Identiﬁer XXXXXX aggregate client-provided updates that are often un veriﬁable, making the learning process vulnerable to corrupted or malicious participants [15], [16]. As FL moves into high-stakes domains, reliable deployment requires not only accurate point predictions, but also calibrated uncertainty with rigorous guarantees. For instance, in a network of hospitals collaborativ ely training a diagnostic model, a single compromised institution could poison its updates to induce systematically overconﬁdent predictions, leading to unsafe clinical decisions that should have been ﬂagged as uncertain. Conformal prediction (CP) [17]–[21] is a principled frame- work for uncertainty quantiﬁcation that constructs prediction sets (or intervals) with user -speciﬁed co verage, assuming only that samples are exchangeable. In regression, the distribution-free, ﬁnite-sample guarantees established in [22] provide a canonical foundation for prediction interval construction. In addition, extensions to rob ust conformal inference can maintain validity under distribution shift, including cov ariate shift [23], a property that is especially relev ant in federated en vironments where client distributions may differ . T o bring these beneﬁts to decentralized learning, federated conformal prediction (FCP) has emerged [24]– [28], enabling clients to compute prediction intervals without sharing calibration data and addressing practical constraints such as communication ef ﬁciency and heterogeneity across clients (e.g., label shift). In network environments, Byzantine clients pose a critical challenge due to arbitrary and potentially malicious beha vior that can derail collaborativ e training. The work in [29] formalizes this threat model and demonstrates that ev en a single Byzantine participant can drive gradient-based learning toward arbitrarily poor solutions. In practice, adversaries may inject misleading gradients or manipulate model updates [30], [31], degrading performance and compromising the integrity of the global model. More recently , self-driv en entropy aggregation has been pro- posed for Byzantine-robust FL under client heterogeneity [32], reﬂecting the realistic regime in which benign clients themselves hold non-IID data. For FCP , the threat is ampliﬁed as Byzantine behavior can target not only training but also calibration. T raining-phase attacks reduce predictive accuracy , whereas calibration-phase attacks can silently violate cov erage guarantees by distorting the estimated conformal quantile, producing overly narrow intervals and systematic undercoverage, a particularly dangerous failure mode in safety-critical applications such as clinical decision support or ﬁnancial risk assessment. T o mitigate Byzantine behavior during training, a large body of work has proposed robust aggregation rules [29], [33], IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. XX, 2026 2 [34]. Coordinate-wise median and trimmed-mean estimators, for example, can suppress outlying (potentially adversarial) updates [33]. Recent adv ances further include Byzantine client identiﬁcation with false discovery rate control [35] and pri v acy- preserving Byzantine-robust mechanisms [36]. Howe ver , these approaches are typically most effecti ve when many clients participate in each round (often under assumptions akin to full participation and can suf fer noticeable de gradation under strong client heterogeneity and partial participation, conditions that are common in practical FL deployments. In the context of FCP , the Rob-FCP algorithm [37] addresses calibration-stage attacks by encoding local nonconformity-score distributions into characterization vectors and excluding clients whose vectors deviate signiﬁcantly from the population, thereby providing certiﬁable coverage guarantees. Ne vertheless, Rob- FCP assumes that training is secure and focuses exclusi vely on calibration. Con versely , robust training aggregators protect the training phase b ut remain obli vious to calibration-phase manipulation. As a result, existing defenses are stage-speciﬁc and do not provide an end-to-end solution for Byzantine- resilient FCP . This gap motiv ates methods that can jointly defend both training and calibration while remaining communication- efﬁcient. In this paper , we propose PRISM-FCP , which integrates partial-sharing online federated learning (PSO-Fed) [16], [38] with FCP to achieve Byzantine resilience across both training and calibration phases while reducing communication. In each round, ev ery selected client exchanges only a randomly chosen subset of M out of D model parameters with the server . Our key moti vation is that partial sharing (originally introduced for communication ef ﬁciency) provides a principled Byzantine- robustness effect at no additional computational cost to clients. Speciﬁcally , the random parameter mask acts as a stochastic ﬁlter: when a Byzantine client injects an adversarial perturbation, the perturbation energy entering the aggregated update is attenuated by a factor of M /D in expectation. This reduced Byzantine inﬂuence during training improves model accuracy , which in turn tightens the residual distributions at benign clients. T ighter residuals yield more concentrated nonconformity-score distributions. This increases separability in the characterization- vector space, improving distance-based maliciousness scoring and mitigating coverage loss and interval distortions under attack. In summary , our main contrib utions are: • T raining rob ustness via partial sharing: W e demonstrate that partial model sharing attenuates Byzantine pertur- bations by a factor proportional to M /D , reducing the steady-state mean-square error (MSE) and yielding tighter prediction intervals (Section IV). • Calibration r obustness via impro ved separability: W e show that this training-phase attenuation concentrates benign clients’ nonconformity-score distributions, improving the separability of characterization vectors and enhancing distance-based detection of Byzantine outliers during calibration (Section IV -D). • End-to-end empirical validation: Through experiments on synthetic benchmarks and the UCI Superconductivity dataset, we demonstrate that PRISM-FCP preserves nomi- nal coverage under Byzantine attacks that cause standard FCP to fail, while reducing communication relativ e to full-model-sharing baselines (Section V). W e organize the remainder of the paper as follows . In Section II, we introduce the system model, revie w the PSO- Fed algorithm, and summarize CP in federated settings. In Section IV, we theoretically analyze Byzantine perturbation attenuation under partial sharing and derive the resulting interval- width scaling. In Section IV -D , we examine how training-phase attenuation improves calibration-phase Byzantine detection. In Section V, we verify our theoretical ﬁndings through numerical experiments. Finally , in Section VI, we present some concluding remarks. Mathematical Notations: W e denote scalars by italic letters, column vectors by bold lo wercase letters, and matrices by bold uppercase letters. The superscripts ( · ) ⊺ and ( · ) − 1 denote the transpose and inv erse operations, respectively , and ∥·∥ denotes the Euclidean norm. W e use 1 {·} as the indicator function of its event ar gument and I D for the D × D identity matrix. Lastly , tr( · ) denotes the trace of a matrix, and calligraphic letters such as S and B denote sets. I I . P R E L I M I N A R I E S In this section, we introduce the considered system model, revie w the partial sharing mechanism used in PSO-Fed [15], [16], [38], [39], and outline CP in a federated setting. A. System Model W e consider a federated network consisting of K clients communicating with a central server . At iteration n , client k observes a data (feature-label) pair x k,n ∈ R D and y k,n ∈ R , generated according to y k,n = w ⋆ ⊺ x k,n + ν k,n , (1) where w ⋆ ∈ R D is the unknown parameter vector to be estimated collaborati vely and ν k,n denotes observ ation noise. The global learning objecti ve is to minimize the MSE aggregated across clients: J ( w ) = 1 K K X k =1 J k ( w ) , (2) where the local risk at client k is J k ( w ) = E  | y k,n − w ⊺ x k,n | 2  (3) and the expectation is taken with respect to the (possibly client-dependent) data distribution at client k . In general, these client distrib utions may dif fer substantially (i.e., the data can be non-IID across clients). The FL goal is to obtain the minimizer arg min w J ( w ) via decentralized collaboration, without exchanging raw data. IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. XX, 2026 3 server (aggregate) benign client benign client Byzantine client (a) Full sharing ( M = D ): Byzantine perturbation is passed in full. server (aggregate) benign client benign client Byzantine client (b) Partial sharing ( M < D ): Byzantine perturbation is attenuated by M /D . Fig. 1. Illustration of ho w partial sharing attenuates Byzantine perturbations. Remark 1 (Scope of the linear model): The linear re gression model in (1) is adopted for analytical tractability , enabling a closed-form characterization of Byzantine perturbation attenua- tion under partial sharing. While a complete theory for more general models is beyond the scope of this work, the underlying mechanism (attenuating adversarial energy by a factor of M /D ) extends conceptually be yond linear regression, and developing such analyses is an interesting direction for future work. B. P artial Model Sharing In our prior work [15], [16], we have studied partial model sharing through PSO-Fed in detail (see Fig. 1). Here, we summarize the mechanism most rele vant to this paper . Unlike con ventional FL, where the full parameter vector is e xchanged between the server and clients at ev ery iteration, PSO-Fed communicates only a subset of parameters. Speciﬁcally , at iteration n , client k receiv es the masked global model estimate and uploads only a masked version of its local model estimate. This masking is represented by a diagonal selection matrix S k,n ∈ R D × D with exactly M ones on the diagonal (and D − M zeros), which speciﬁes the model parameters communicated between client k and the server at iteration n . Therefore, the PSO-Fed recursions for minimizing (2) are gi ven by ϵ k,n = y k,n − [ S k,n − 1 w n − 1 + ( I D − S k,n − 1 ) w k,n − 1 ] ⊺ x k,n w k,n = S k,n − 1 w n − 1 + ( I D − S k,n − 1 ) w k,n − 1 + µ x k,n ϵ k,n w n = 1 |S n | X k ∈S n [ S k,n w k,n + ( I D − S k,n ) w n − 1 ] , where w k,n is the local model estimate at client k and iteration n , w n is the global model estimate at iteration n , µ is the stepsize parameter , and S n is the set of participating clients at iteration n , and |S n | is the cardinality of S n . C. Byzantine Attac k during T raining Phase Let S B denote the set of Byzantine clients, and let β k ∈ { 0 , 1 } indicate whether client k is Byzantine ( β k = 1 if client k ∈ S B and β k = 0 otherwise). The number of Byzantine clients is |S B | , which is assumed to be known by the server 1 . W e consider an uplink attack model in which, at each iteration, ev ery Byzantine client perturbs its transmitted (partial) model update with probability p a . Speciﬁcally , we model the corruption as w k,n + τ k,n δ k,n , where τ k,n is a Bernoulli random variable with Pr( τ k,n = 1) = p a , and δ k,n ∼ N ( 0 , σ 2 B I D ) is zero-mean white Gaussian noise [40]. Accordingly , the server receives S k,n w k,n + β k τ k,n S k,n δ k,n instead of the benign partial update S k,n w k,n . This model captures stochastic Byzantine behavior that injects additive noise into the aggre gation process. Remark 2 (Stochastic vs. adversarial attack model): The Gaussian perturbation model is standard in the Byzantine- resilient distributed learning [29], [40] and enables closed- form analysis. More powerful adversaries may be adaptiv e and concentrate perturbations on the shared coordinates if they observe the selection masks S k,n . In our setting, the masks are selected randomly and independently (Assumption A3 in Section IV -A ), implying that partial sharing attenuates the injected perturbation energy by a factor proportional to M /D in expectation. Extending the guarantees to fully adaptiv e adversaries is an interesting direction for future work. D. F ederated Conformal Pr ediction CP for regression constructs prediction intervals with a user- speciﬁed conﬁdence level (e.g., 90% ) under the assumption of data exchangeability . Gi ven a trained predictor , CP calculates nonconformity scores (typically absolute residuals) on a calibra- tion set and uses an empirical quantile of these scores to form prediction intervals with distribution-free marginal coverage. Importantly , these guarantees do not rely on the correctness of the underlying model and hold for arbitrary model classes. In the standard CP pipeline, we train a model on a training set and compute conformal calibration scores on a held-out cali- bration dataset { ( x j , y j ) } N j =1 using the learned model parameter vector ˆ w . The non-conformity score for each calibration sample is r j = | y j − ˆ w ⊺ x j | , j = 1 , . . . , N . (4) Let q 1 − α be the ⌈ ( N + 1)(1 − α ) ⌉ -th order statistic of { r j } N j =1 . Then, the two-sided (1 − α ) prediction interval for a ne w input x is C ( x ) = [ ˆ w ⊺ x − q 1 − α , ˆ w ⊺ x + q 1 − α ] . (5) The conﬁdence parameter α ∈ (0 , 1) controls interval width: smaller α yields wider interv als. Under exchangeability , CP guarantees marginal cov erage P ( y ∈ C ( x )) ≥ 1 − α. (6) 1 This assumption can be relaxed to an upper bound or an estimate. IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. XX, 2026 4 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 Normalized Nonconformit y Score 0.00 0.05 0.10 0.15 0.20 0.25 Probability Benign Clien ts Byzantine Clients (a) Efﬁcienc y attack 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Normalized Nonconformit y Score 0.00 0.03 0.05 0.08 0.10 0.12 0.15 0.18 0.20 (b) Coverage attack 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 Normalized Nonconformit y Score 0.00 0.03 0.05 0.08 0.10 0.12 0.15 0.18 (c) Random attack Fig. 2. Histograms illustrating the effect of different Byzantine attacks during the calibration phase: (a) efﬁcienc y attack (adversaries report all-zero normalized scores), (b) coverage attack (adversaries report all-one normalized scores), and (c) random attack (adversaries add Gaussian noise to their scores). In FL, after training of a global model ˆ w , each client k holds a local calibration set { ( x k,j , y k,j ) } N k j =1 (with partial exc hangeability [24], [41] assumption) and computes local nonconformity scores r k,j = | y k,j − ˆ w ⊺ x k,j | , j = 1 , . . . , N k . (7) Priv acy constraints prevent clients from sharing raw calibration data (or e ven individual scores) with the server . The FCP algorithm [24] addresses this by employing secure aggre gation or sketching mechanisms (e.g., T -Digest [42]) to enable the server to approximate the global conformal quantile from the pooled score population as ˆ q 1 − α ≈ Quan tile 1 − α K [ k =1 { r k,j } N k j =1 ! . (8) The resulting interval is constructed by replacing q 1 − α with ˆ q 1 − α . In practice, when the sketching/aggregation error is small, this approximation can preserv e co verage closely while protecting client-lev el calibration priv acy . E. Byzantine Attac k during Calibration Phase During the calibration phase, Byzantine clients in S B may submit arbitrary or adversarially crafted nonconformity scores (or score summaries) instead of their true v alues, with the aim of biasing the global quantile estimate used to construct prediction interv als. By manipulating the estimated quantile q 1 − α , adversaries can induce either ov erly narro w intervals (causing systematic undercoverage) or ov erly wide intervals (yielding vacuous uncertainty), thereby compromising the prac- tical reliability of CP . In this work, we consider three representativ e calibration- phase Byzantine attacks, whose effects are illustrated in Fig. 2: • Efﬁciency attack : Byzantine clients report near-zero scores (e.g., all zeros after normalization), deﬂating q 1 − α and leading to undercoverage. • Cover age attac k : Byzantine clients report maximal scores (e.g., all ones after normalization), inﬂating q 1 − α and producing excessi vely wide prediction interv als. • Random attack : Byzantine clients perturb their scores by ad- diti ve noise, injecting randomness into the quantile estimation (e.g., ˜ r k,j = r k,j + η k,j with η k,j ∼ N (0 , σ 2 C ) , optionally clipped to the score range). I I I . B Y Z A N T I N E - R E S I L I E N T F C P V I A P A RT I A L S H A R I N G In this section, we present PRISM-FCP , an efﬁcient FCP framew ork that is resilient to Byzantine attacks at both the training and calibration phases. The key insight is that the partial model sharing mechanism [16], which is originally designed for communication ef ﬁ cienc y , also attenuates Byzantine perturbations during training. This, in turn, impro ves do wnstream calibration robustness by making benign score distributions more concentrated and hence easier to separate from outliers. A. Mitigating Byzantine Attack during T raining Partial model sharing reduces communication overhead by allowing each participating client to exchange only a subset of model parameters with the server [15], [16]. Beyond communication sa vings, partial sharing also improv es rob ustness to stochastic model-poisoning attacks. In particular, prior results establish mean and mean-square conv ergence of PSO-Fed under noisy Byzantine updates [16]. In this work, we exploit this built-in property to mitigate Byzantine inﬂuence during training, thereby reducing the prediction error that propagates into conformal calibration. B. Mitigating Byzantine Attacks during Calibration T o mitigate Byzantine attacks in the calibration phase, each benign client computes nonconformity scores { r k,j } N k j =1 on its IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. XX, 2026 5 local calibration set and summarizes them via a histogram-based characterization vector v k ∈ ∆ H deﬁned as v k,h = 1 N k N k X j =1 1 { a h − 1 ≤ ˜ r k,j < a h } , h = 1 , . . . , H, (9) where ˜ r k,n denotes a normalized (or clipped) score mapped to [0 , 1] , and 0 = a 0 < a 1 < · · · < a H = 1 are H bin boundaries. Each entry v k,h is the empirical probability that the local scores fall into bin h . The set ∆ H ≜ ( v k ∈ R H      v k,h ≥ 0 , H X h =1 v k,h = 1 ) (10) denotes the H -dimensional probability simplex. Clients transmit v k to the server , while Byzantine clients may submit arbitrary or adversarial vectors (cf. Fig. 2 with H = 30 ). In practice, the bin boundaries { a h } can be ﬁxed a priori based on expected residual ranges or estimated from a small benign pilot round. This histogram summary preserves coarse distributional structure without transmitting raw scores, reducing commu- nication and offering an additional priv acy layer . Moreover , benign clients tend to produce similar characterization v ectors, whereas Byzantine submissions typically deviate, enabling robust detection. W e compute pairwise ℓ 2 distances between characterization vectors as d k,k ′ = ∥ v k − v k ′ ∥ 2 , ∀ k , k ′ ∈ { 1 , · · · , K } . (11) Let K b := K − |S B | denote the number of benign clients. Follo wing Rob-FCP [37], we assign each client k a malicious- ness score, m k , as the sum of distances to its K b − 1 farthest neighbors: m k = X k ′ ∈ FN( k,K b − 1) d k,k ′ , (12) where FN( k , K b − 1) is the set of the K b − 1 clients with the largest distances from client k . Intuitiv ely , Byzantine clients lie farther from the benign population and hence attain lar ger m k . W e declare the |S B | clients with the largest maliciousness scores as Byzantine and form the benign set B from the remaining clients. Remark 3 (Handling unknown |S B | ): When the number of Byzantine clients is unknown, we replace the top- |S B | ﬁltering by a robust outlier rule such as the median absolute deviation (MAD) criterion [43]. Speciﬁcally , we compute each client’ s distance to a robust population center (e.g., the coordinate-wise median characterization v ector) and ﬂag clients whose distance exceeds a MAD-based threshold. Our experiments in Section V demonstrate that both known- |S B | ﬁltering and MAD-based ﬁltering can preserve nominal coverage under attack. Finally , the server estimates the conformal quantile ˆ q 1 − α using only the scores (or sketches) contributed by clients in B , and constructs prediction intervals accordingly . This calibration-stage ﬁltering follows Rob-FCP’ s robust detection procedure [37], while PRISM-FCP additionally reduces training- stage Byzantine inﬂuence and communication via partial sharing, yielding reliable prediction interv als under Byzantine attacks. I V . T H E O R E T I C A L A N A L Y S I S In this section, we provide theoretical justiﬁcation for the Byzantine resilience of PRISM-FCP . Our main results are: (i) partial sharing attenuates the energy of injected Byzantine perturbations by a factor proportional to M /D per iteration, (ii) the corresponding steady-state MSE under Byzantine attacks is reduced, and (iii) this reduction translates into tighter residual distributions and hence narrower FCP prediction intervals at the same nominal cov erage. Collectively , these results formalize the intuition that partial sharing acts as a stochastic ﬁlter that dilutes adversarial inﬂuence. A. Assumptions and Deﬁnitions T o make the analysis tractable, we adopt the following standard assumptions. A1: The input vectors x k,n at each client k and iteration n are drawn from a wide-sense stationary multiv ariate random process with cov ariance R k = E  x k,n x ⊺ k,n  . A2: The observ ation noise ν k,n and the Byzantine perturbations δ k,n are IID across clients and iterations. Moreov er, they are mutually independent and independent of all other stochastic variables, including x k,n and S k,n . A3: The selection matrices S k,n are independent across clients and iterations. B. Mean and Mean-Square Conver gence and Steady-State MSE W e summarize key results from [16] for PSO-fed, which we will use to analyze PRISM-FCP under Byzantine attacks. Lemma 1 (Mean conver gence ): PSO-Fed con verges in the mean sense under Byzantine attacks with a suitable stepsize [16, Sec. III-A]. Lemma 2 (Mean-squar e con ver gence): PSO-Fed conv erges in the mean-square sense under Byzantine attacks with a suitable stepsize [16, Sec. III-B]. Lemma 3 (Steady-state MSE decomposition): Under mean- square con vergence, the steady-state MSE of PSO-Fed admits the decomposition [16, Sec. III-C] E := E ϕ + E ω + E Θ . (13) Remark 4 (Interpr etation of the MSE decomposition): The steady-state MSE (13) decomposes into three additi ve terms: 1) E ϕ : error due to observation noise as it propagates through the learning dynamics (affected by partial sharing, client scheduling, and stepsize), 2) E ω : error induced exclusively by Byzantine perturbations (the key quantity for attack resilience), 3) E Θ : irreducible error due to observation noise that is independent of the learning dynamics. IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. XX, 2026 6 Lemma 4 (P artial sharing attenuates Byzantine contribution): Partial sharing and client scheduling reduce the Byzantine- induced term E ω , thereby enhancing resilience to model- poisoning attacks [16, Sec. IV]. In particular , when S k,n selects M parameters uniformly at random, the expected energy (squared ℓ 2 -norm) of the injected perturbation in the aggregated update is attenuated by a factor M /D . C. Consequences of P artial Sharing for FCP Quantiles W e connect the training-phase partial sharing to the prediction- interval width. In particular , we show that attenuation of the Byzantine-induced MSE term (Lemma 4) leads to smaller param- eter error , which tightens residual (nonconformity) distrib utions and thus reduces the conformal quantile to form prediction intervals. A central objective of CP is to produce intervals that are both valid (achieving target marginal cov erage 1 − α ) and efﬁcient (as narrow as possible). V alidity is ensured by conformal quantile calibration under exchangeability , whereas ef ﬁciency depends on the distribution of nonconformity scores, which is directly inﬂuenced by the trained model’ s prediction error . Lemma 5 (Lipschitz continuity of residuals): Let r k,n ( w ) = | y k,n − w ⊺ x k,n | denote the residual at client k and iteration n . For any w and w ⋆ , we have | r k,n ( w ) − r k,n ( w ⋆ ) | ≤ ∥ x k,n ∥ 2 ∥ w − w ⋆ ∥ 2 . (14) Pr oof: Let a = y k,n − w ⊺ x k,n and b = y k,n − w ⋆ ⊺ x k,n . By the rev erse triangle inequality ,   | a | − | b |   ≤ | a − b | . Moreov er , a − b = ( w ⋆ − w ) ⊺ x k,n , hence | a − b | = | ( w − w ⋆ ) ⊺ x k,n | ≤ ∥ w − w ⋆ ∥ 2 ∥ x k,n ∥ 2 , where the last inequality follows from Cauchy-Schwarz. Theor em 1 (Quantile stability under local density bounds): Let q ⋆ = F ⋆ − 1 (1 − α ) and q k,n = F − 1 k,n (1 − α ) denote the (1 − α ) quantiles of the benign residual distribution F ⋆ (under w ⋆ ) and a perturbed residual distribution F k,n (under w k,n ), respectiv ely . Assume there e xists a neighborhood N of q ⋆ such that: (i) F ⋆ admits a density f ⋆ with inf t ∈N f ⋆ ( t ) ≥ f min > 0 , and (ii) F ⋆ is Lipschitz on N with constant L , i.e., | F ⋆ ( t ) − F ⋆ ( s ) | ≤ L | t − s | for all s, t ∈ N . Then, with X := r k,n ( w k,n ) and Y := r k,n ( w ⋆ ) , we have | q k,n − q ⋆ | ≤ 2 √ 2 L f min p E | X − Y | . (15) Pr oof: Since inf t ∈N f ⋆ ( t ) ≥ f min , the in verse CDF F ⋆ − 1 is locally Lipschitz on N with constant 1 /f min , hence | q k,n − q ⋆ | ≤ 1 f min sup t ∈N | F k,n ( t ) − F ⋆ ( t ) | . Fix ε > 0 . For any t , we have   1 { X ≤ t } − 1 { Y ≤ t }   ≤ 1 {| X − Y | ≥ ε } + 1 { t − ε < Y ≤ t + ε } . (16) T aking e xpectations giv es | F k,n ( t ) − F ⋆ ( t ) | ≤ P ( | X − Y | ≥ ε ) +  F ⋆ ( t + ε ) − F ⋆ ( t − ε )  ≤ E | X − Y | ε + 2 Lε, (17) where we use Markov’ s inequality and the Lipschitz property of F ⋆ on N . Optimizing over ε yields sup t ∈N | F k,n ( t ) − F ⋆ ( t ) | ≤ 2 p 2 L E | X − Y | and the claim follows. Remark 5 (Interpr eting the constant): The prefactor admits a natural decomposition. First, 1 /f min arises from the local Lipschitz continuity of the in verse CDF around q ⋆ , which requires the benign density to be bounded aw ay from zero in a neighborhood of q ⋆ . Second, the √ L dependence comes from a smoothing ar gument that conv erts an expected residual perturbation, E | X − Y | , into a uniform bound on the local CDF gap sup t ∈N | F k,n ( t ) − F ⋆ ( t ) | , with L capturing the local steepness of q ⋆ . In particular , when scores are normalized so that L ≤ 1 , the bound simpliﬁes to | q k,n − q ⋆ | ≤ 2 √ 2 f min p E | X − Y | . Cor ollary 1 (Impact of partial sharing on FCP quantiles): Let L x := p E ∥ x k,n ∥ 2 2 and e w k,n := w k,n − w ⋆ . Under the assumptions of Theorem 1, we ha ve | q k,n − q ⋆ | ≤ 2 √ 2 L f min L 1 / 2 x  E ∥ e w k,n ∥ 2 2  1 / 4 . (18) Pr oof: By Lemma 5, we have | r k,n ( w k,n ) − r k,n ( w ⋆ ) | ≤ ∥ x k,n ∥ 2 ∥ e w k,n ∥ 2 . (19) T aking e xpectations and applying Cauchy-Schwarz yields E | r k,n ( w k,n ) − r k,n ( w ⋆ ) | ≤ q E ∥ x k,n ∥ 2 2 q E ∥ e w k,n ∥ 2 2 = L x q E ∥ e w k,n ∥ 2 2 . (20) Substituting into Theorem 1, which bounds | q k,n − q ⋆ | by a constant times p E | X − Y | , gi ves the stated fourth-root dependence on E ∥ e w k,n ∥ 2 2 . Finally , Lemma 4 implies that partial sharing attenuates the injected Byzantine perturbation energy by a factor proportional to M /D , thereby reducing the steady-state E ∥ e w k,n ∥ 2 2 and consequently tightening the quantile deviation bound. Cor ollary 2 (Quantile deviation via steady-state MSE de- composition): Assume the conditions of Theorem 1 and let L x := p E ∥ x k,n ∥ 2 2 . If the steady-state parameter error sat- isﬁes lim n →∞ E ∥ w k,n − w ⋆ ∥ 2 2 = E with the decomposition E = E ϕ + E ω + E Θ in (13), then lim sup n →∞ | q k,n − q ⋆ | ≤ 2 √ 2 L f min L 1 / 2 x  E ϕ + E ω + E Θ  1 / 4 . (21) Moreov er , by Lemma 4, partial sharing reduces the Byzantine- induced term E ω (and hence tightens the above bound). Remark 6 (Coverage vs. efﬁciency): Conformal validity is enforced by the calibration quantile construction (under ex- changeability), whereas efﬁcienc y is governed by the interv al width. Byzantine perturbations degrade the trained model and increase the steady-state MSE through the Byzantine-induced IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. XX, 2026 7 term E ω in (13) . This increases the typical magnitude of residuals and hence the (1 − α ) quantile used for calibration, leading to wider prediction intervals. Partial sharing mitigates this ef fect: by reducing E ω (Lemma 4), it decreases the resulting quantile shift and the associated interval inﬂation. This prediction is consistent with the empirical trends in section V. Pr oposition 1 (W idth perturbation bound): Let C ( x ) = [ w ⊺ x − q , w ⊺ x + q ] with width ω = 2 q . Let ω ⋆ := 2 q ⋆ (under w ⋆ ) and ω k,n := 2 q k,n (under w k,n ). Assume the conditions of Theorem 1 hold on a neighborhood N of q ⋆ with inf t ∈N f ⋆ ( t ) ≥ f min > 0 and local Lipschitz constant L for F ⋆ . Let R k := E [ x k,n x ⊺ k,n ] and e w k,n := w k,n − w ⋆ . Then, | ω k,n − ω ⋆ | ≤ 4 √ 2 L f min  tr( R k )  1 / 4 | {z } =: K ω  E ∥ e w k,n ∥ 2 2  1 / 4 . (22) Theor em 2 (P er-iter ation Byzantine attenuation and im- plications for width inﬂation): Consider the training-phase Byzantine model in Section II-C , where client k uploads w k,n +1 + β k τ k,n δ k,n , with τ k,n ∼ Bernoulli( p a ) and δ k,n having IID entries of variance σ 2 B . Under partial sharing, the transmitted perturbation is mask ed by S k,n (a diagonal matrix with exactly M ones). Then the injected perturbation energy satisﬁes E h   β k τ k,n S k,n δ k,n   2 2 i = β k p a M σ 2 B . (23) In particular , conditional on an attack occurring (i.e., τ k,n = 1 ) at a Byzantine client ( β k = 1 ), we have E  ∥ S k,n δ k,n ∥ 2 2   τ k,n = 1  = M σ 2 B , (24) whereas under full sharing ( M = D ) the corresponding ener gy is D σ 2 B . Hence, partial sharing reduces the instantaneous injected energy by the exact factor M /D relati ve to full sharing. Let ω k,n = 2 q k,n and ω ⋆ = 2 q ⋆ denote the conformal interval widths under w k,n and w ⋆ , respectively , and deﬁne the steady- state width inﬂation Φ( M ) := lim sup n →∞ | ω k,n − ω ⋆ | . If Proposition 1 bounds | ω k,n − ω ⋆ | by a constant times  E ∥ w k,n − w ⋆ ∥ 2 2  1 / 4 and Lemma 3 provides the steady-state decomposition E = E ϕ + E ω + E Θ , then Φ( M ) ≤ e K  E ϕ ( M ) + E ω ( M ) + E Θ  1 / 4 , (25) for an explicit constant e K inherited from Proposition 1. While (23) gi ves an exact M /D reduction in the instantaneous injected energy , the resulting dependence of E ω ( M ) (and also E ϕ ( M ) ) on M is determined by how the attenuated perturbations propagate through the system dynamics via the selection matrices S k,n . Pr oof: Since S k,n is diagonal with exactly M ones and δ k,n has IID entries with variance E [ δ 2 k,n,d ] = σ 2 B , we have E  ∥ S k,n δ k,n ∥ 2 2  = D X d =1 [ S k,n ] 2 dd E [ δ 2 k,n,d ] = M σ 2 B . Multiplying by β k and using E [ τ 2 k,n ] = E [ τ k,n ] = p a yields (23) . The M /D reduction in (24) follows by comparing M σ 2 B to the full-sharing case D σ 2 B . For the steady-state width bound in (25) , Proposition 1 links the width deviation to the fourth root of the steady-state parameter error (MSE). By Lemma 3, this errors decomposes as E = E ϕ ( M ) + E ω ( M ) + E Θ . The Byzantine component E ω ( M ) reﬂects the attenuated injected energy in (23) , while its steady- state impact is determined by how this perturbation propagates through the system dynamics via the in volved matrices whose spectral properties also depend on M . D. How T r aining-Phase P artial Sharing Impr oves Calibration In Section IV -C , we showed that partial sharing yields tighter conformal intervals by reducing the steady-state MSE and hence the residual quantiles. Here, we sho w an additional beneﬁt: it also improves Byzantine detection during calibration. The ke y insight is that smaller training error concentrates benign clients’ nonconformity-score histograms, increasing their separation from Byzantine outliers and enabling more reliable ﬁltering. W e adopt the histogram-based characterization approach of Rob-FCP [37], which summarizes each client’ s local score distribution by a ﬁnite-dimensional vector and applies distance- based outlier detection. W e show that partial sharing strengthens this procedure, i.e., by reducing training error , it makes benign histograms more tightly clustered, thereby increasing the detec- tion margin and further tightening the post-ﬁltering quantiles. Remark 7 (Homog eneity assumption and non-IID e xtension): For analytical tractability , we assume benign clients’ calibration residuals are IID drawn from distributions F k that are small perturbations of a common reference F ⋆ . In practice, federated clients often hold heterogeneous (non-IID) data, hence popula- tion histograms may differ across benign clients ev en under the optimal model. Our e xperiments in Section V include non-IID data and still yield effecti ve ﬁltering, suggesting robustness to moderate heterogeneity in practice. Let F ⋆ denote the benign residual CDF under the optimal parameter vector w ⋆ . After training, all clients calibrate using a trained model b w (e.g., the steady-state global model), and deﬁne the parameter error e := b w − w ⋆ . Client k holds N k calibration residuals r k, 1 , . . . , r k,N k drawn IID from a residual distribution F k that is a perturbation of F ⋆ induced by e 2 W e partition the (normalized/clipped) score range [0 , 1] into H bins [ a h − 1 , a h ) and form the empirical histogram vector v k ∈ ∆ H . Deﬁne the corresponding benign population histogram p ⋆ :=  F ⋆ ( a h ) − F ⋆ ( a h − 1 )  H h =1 , and let q ∈ ∆ H denote an attack-dependent adversarial mean histogram. Intuitively , partial sharing reduces ∥ e ∥ and hence makes benign v k concentrate more tightly around p ⋆ , whereas Byzantine submissions tend to deviate toward q (cf. Fig. 2). Following Rob-FCP [37], the server computes pairwise distances and assigns each client a 2 For linear models with square loss, Lemma 5 implies that residual perturbations scale as O ( ∥ e ∥ ) , which in turn induces a small local perturbation of the residual CDF . IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. XX, 2026 8 maliciousness score based on its farthest neighbors: d k,k ′ := ∥ v k − v k ′ ∥ 2 , m k := P k ′ ∈ FN( k,K b − 1) d k,k ′ , where FN( k , K b − 1) is the set of the K b − 1 farthest clients from k and K b := K − |S B | is the e xpected number of benign clients. The server then retains the K b clients with the smallest maliciousness scores as the benign set used for quantile estimation. Lemma 6 (Histogram concentration for benign clients): Let a benign client k hav e N k IID calibration residuals, binned into H ﬁxed intervals [ a h − 1 , a h ) to form the empirical histogram v k ∈ ∆ H , with population vector p k = E [ v k ] . Then, for any δ ∈ (0 , 1) , with probability at least 1 − δ , we have ∥ v k − p k ∥ 2 ≤ C H s log(2 H /δ ) N k , (26) where one can take C H = √ 3 . The constant depends only on the boundedness of the one-hot bin indicators and not on N k . Pr oof: Let z j ∈ { e 1 , . . . , e H } be the one-hot v ector indicating the bin of the j -th residual. Then, v k = 1 N k P N k j =1 z j and p k = E [ z 1 ] . Deﬁne ξ j = z j − p k , which are independent, mean-zero, and satisfy ∥ ξ j ∥ 2 ≤ ∥ z j ∥ 2 + ∥ p k ∥ 2 ≤ 1 + 1 = √ 2 . Applying a vector Bernstein (or Hilbert-space Hoeffding) inequality [44] to ξ = P N k j =1 ξ j yields P ( ∥ v k − p k ∥ 2 ≥ ε ) = P ( ∥ ξ ∥ 2 ≥ N k ε ) ≤ 2 H exp  − N k ε 2 3  . (27) Setting ε = q 3 N k log(2 H /δ ) completes the proof. Lemma 7 (T raining error causes benign histogram drift): Let r ( w ) := | y − x ⊺ w | be the residual of linear regression and L x := p E ∥ x k,n ∥ 2 2 . Fix H bins with boundaries { a h } H h =0 and let p ⋆ be the benign population histogram under w ⋆ , and p k the population histogram under w = w ⋆ + e k . Assume the benign residual distribution under w ⋆ admits a density f ⋆ such that sup h =0 ,...,H f ⋆ ( a h ) ≤ f max < ∞ . Then, we have ∥ p k − p ⋆ ∥ 2 ≤ 2 p 2( H + 1) f max L x ∥ e k ∥ 2 . (28) Equi valently , ∥ p k − p ⋆ ∥ 2 ≤ C bin f max L x ∥ e k ∥ 2 where C bin := 2 p 2( H + 1) depends only on the binning. Pr oof: Let r ⋆ := r ( w ⋆ ) = | y − x ⊺ w ⋆ | and r k := r ( w ⋆ + e k ) = | y − x ⊺ ( w ⋆ + e k ) | , with CDFs F ⋆ and F k , respectively . By Lemma 5, we have | r k − r ⋆ | ≤ ∥ x ∥ 2 ∥ e k ∥ 2 = : ∆( x ) . Hence, for any t ≥ 0 , { r ⋆ ≤ t − ∆( x ) } ⊆ { r k ≤ t } ⊆ { r ⋆ ≤ t + ∆( x ) } . T aking probabilities over ( x , y ) yields F ⋆ ( t − ∆( x )) ≤ F k ( t ) ≤ F ⋆ ( t + ∆( x )) . Therefore, | F k ( t ) − F ⋆ ( t ) | ≤ F ⋆ ( t + ∆( x )) − F ⋆ ( t − ∆( x )) . Assuming F ⋆ admits a density f ⋆ with sup u ∈ R f ⋆ ( u ) ≤ f max (in particular at the bin edges), the mean-value bound gi ves F ⋆ ( t + ∆( x )) − F ⋆ ( t − ∆( x )) ≤ 2 f max ∆( x ) . T aking e xpectations over x and using Cauchy-Schwarz, | F k ( t ) − F ⋆ ( t ) | ≤ 2 f max E [∆( x )] = 2 f max ∥ e k ∥ 2 E ∥ x ∥ 2 ≤ 2 f max ∥ e k ∥ 2 q E ∥ x ∥ 2 2 = 2 f max L x ∥ e k ∥ 2 . (29) Now deﬁne the edge deviations g h := F k ( a h ) − F ⋆ ( a h ) for h = 0 , . . . , H , so that each bin-mass difference is ( p k − p ⋆ ) h =  F k ( a h ) − F k ( a h − 1 )  −  F ⋆ ( a h ) − F ⋆ ( a h − 1 )  = g h − g h − 1 , h = 1 , . . . , H. (30) Let g := ( g 0 , . . . , g H ) ⊺ and let B ∈ R H × ( H +1) be the ﬁrst- difference matrix such that p k − p ⋆ = Bg . Since each ro w of B has exactly one +1 and one − 1 , we have ∥ B ∥ 2 ≤ √ 2 , and hence ∥ p k − p ⋆ ∥ 2 ≤ ∥ B ∥ 2 ∥ g ∥ 2 ≤ √ 2 ∥ g ∥ 2 ≤ p 2( H + 1) ∥ g ∥ ∞ . Using the CDF bound at t = a h giv es ∥ g ∥ ∞ = max h | g h | ≤ 2 f max L x ∥ e k ∥ 2 , and therefore ∥ p k − p ⋆ ∥ 2 ≤ 2 p 2( H + 1) f max L x ∥ e k ∥ 2 , (31) which proves the claim. Pr oposition 2 (Separation margin impro ves under partial sharing): Let ∆ := ∥ q − p ⋆ ∥ 2 denote the (attack-dependent) separation between the adversarial mean histogram q and the benign population histogram p ⋆ . Let K b be the number of benign clients and N min := min k ∈B N k . Assume Lemma 7 giv es ∥ p k − p ⋆ ∥ 2 ≤ C drift L x ∥ e k ∥ 2 for benign k (with C drift depending only on the binning and f max ), and deﬁne e rms := max k ∈B p E ∥ e k ∥ 2 2 . Then, with probability at least 1 − δ ov er benign sampling, min Byz j min benign k ∥ v j − v k ∥ 2 ≥ ∆ − r b − r a , (32) where r b := C H q log(2 H K b /δ ) N min + C drift L x e rms and r a is a radius controlling adversarial concentration around q (e.g., analogous to the ﬁrst term if the attack is stochastic). Moreover , partial sharing reduces the Byzantine-induced steady-state error component E ω ( M ) (Lemma 4), which decreases e rms in attack- dominated regimes, thereby shrinking r b and increasing the separation margin ∆ − r b − r a . Pr oof (sketch): For any Byzantine j and benign k , by the triangle inequality , ∥ v j − v k ∥ 2 ≥ ∥ q − p ⋆ ∥ 2 − ∥ v j − q ∥ 2 − ∥ v k − p ⋆ ∥ 2 . The benign term satisﬁes ∥ v k − p ⋆ ∥ 2 ≤ ∥ v k − p k ∥ 2 + ∥ p k − p ⋆ ∥ 2 . Apply Lemma 6 with a union bound o ver K b benign clients IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. XX, 2026 9 to bound max k ∈B ∥ v k − p k ∥ 2 by the ﬁrst term in r b , and apply Lemma 7 plus Jensen to bound max k ∈B ∥ p k − p ⋆ ∥ 2 by C drift L x e rms . The adv ersarial term is controlled by r a . T aking minima yields the claim. Theor em 3 (Distance-based ﬁltering under partial sharing): Consider the calibration-stage ﬁltering in Section III-B , where maliciousness scores { m k } K k =1 are computed according to (12) and the B := |S B | clients with the lar gest scores are removed. Let K b := K − B and deﬁne ∆ := ∥ q − p ⋆ ∥ 2 , where p ⋆ is the benign population histogram and q is the adversarial mean histogram. Let N min := min k ∈B N k and deﬁne r b := C H s log(2 H K b /δ ) N min + C drift L x e rms , (33) where e rms upper bounds benign regression error (as deﬁned in Proposition 2) and C drift is the drift constant from Lemma 7. Assume an adversarial concentration radius r a such that max j ∈S B ∥ v j − q ∥ 2 ≤ r a holds with probability at least 1 − δ a , and assume B < K b − 1 . Deﬁne γ := ∆ − r a − r b and Γ := ∆ + r a + r b . If the score-separation condition ( K b − 1) γ > B Γ + ( K b − 1 − B ) 2 r b (34) holds, then with probability at least 1 − ( δ + δ a ) the ﬁltering is exact (no benign client is removed and no Byzantine client is retained). In particular , P ( misﬁltering ) ≤ δ + δ a . Moreover , under partial sharing, the Byzantine-induced steady-state training error decreases (Lemma 4), which reduces e rms (in attack- dominated regimes), shrinks r b in (33) , and makes (34) easier to satisfy . Pr oof: See Appendix A. Building on the ﬁltering guarantee of Theorem 3, we obtain the following corollaries. Cor ollary 3 (P ost-ﬁltering quantile bias): Let b C denote the set of clients retained by the calibration-stage ﬁlter (Section III-B ). Let ˆ q 1 − α be the (1 − α ) conformal quantile computed from the scor es (or sketches) contributed by clients in b B , and let q ⋆ 1 − α denote the corresponding benign quantile under w ⋆ . Deﬁne the ﬁltering failure probability ε := P ( b B  = B ) . Under the hypotheses of Theorem 3, ε ≤ δ + δ a (and, for the standard parameter choices in Lemma 6, it decays exponentially in N min ). Assume the local density conditions of Theorem 1 hold around q ⋆ 1 − α . Then | ˆ q 1 − α − q ⋆ 1 − α | ≤ C q  E ∥ e k ∥ 2 2  1 / 4 | {z } training-phase effect + C adv ε | {z } residual adversaries , (35) where C q is the constant from Corollary 1 (or Corollary 2), and C adv depends only on the score range (e.g., C adv = 1 if scores are normalized to [0 , 1] ). Consequently , partial sharing reduces the ﬁrst term via training MSE attenuation and, by increasing the separation margin γ , suppresses ε , tightening the overall calibration quantile. Cor ollary 4 (Cover age bounds under Byzantine attacks): Let K be the total number of clients, K b = K − |S B | the number of benign clients, and n b the number of calibration samples per benign client. Let ε denote the ﬁltering failure probability (a Byzantine client passes the ﬁlter). For the post-ﬁltering empirical cov erage d Co v , following the certiﬁcation frame work of [37], with probability at least 1 − β , we hav e 1 − α − δ L ≤ d Cov ≤ 1 − α + δ U , where δ L = εn b + 1 n b + K b + H · Φ − 1 (1 − β / (2 H K b )) 2 √ n b · 1 + τ 1 − τ , δ U = ε + K b n b + K b + H · Φ − 1 (1 − β / (2 H K b )) 2 √ n b · 1 + τ 1 − τ , (36) with τ = ( K − K b ) /K b and Φ − 1 the standard normal quantile function. Cor ollary 5 (T ighter bounds under partial sharing): Under PRISM-FCP with partial sharing ratio M /D < 1 , the co verage bounds in Corollary 4 are tightened through a reduction in the ﬁltering failure probability ε . Mechanism. By Theorem 3, ﬁltering succeeds with probability at least 1 − ( δ + δ a ) whenev er the score-separation condition holds, with benign radius r b and error level e rms as deﬁned therein. Partial sharing attenuates Byzantine inﬂuence during training (Lemma 4), reducing e rms (in attack-dominated regimes) and hence shrinking r b , which enlarges the separation margin γ = ∆ − r a − r b . This decreases the failure probability ε and tightens both δ L and δ U , since they are monotone increasing in ε . V . S I M U L A T I O N R E S U LT S A. Synthetic-Data Experiments Experimental setup. W e consider a federated network of K = 100 clients. At each training iteration, the server uniformly samples |S n | = 10 clients to participate. Each client k observes a non-IID stream ( x k,n , y k,n ) generated by the linear model (1) , with the ground-truth parameter vector normalized as ∥ w ⋆ ∥ 2 = 1 . W e set the model dimension to D = 50 and apply partial sharing with M = 15 parameters per iteration (sharing ratio M /D = 0 . 3 ), which corresponds to a 70% reduction in the number of exchanged parameters. The entries of x k,n are drawn from zero-mean Gaussian distributions with client- speciﬁc variances ς 2 k ∼ U (0 . 2 , 1 . 2) , while the observation noise is Gaussian with variance σ 2 ν k ∼ U (0 . 005 , 0 . 025) . All results are av eraged over 100 independent Monte Carlo trials. Byzantine model. W e consider |S B | = 20 Byzantine clients (20% of the network). During training, these clients inject additiv e Gaussian perturbations into their transmitted model updates with probability p a = 0 . 25 , using variance σ 2 B = 0 . 1 (cf. Section II-C ). During calibration, the same Byzantine clients submit adversarial nonconformity scores according to three attack scenarios (Fig. 2): (i) efﬁciency attack (deﬂating scores to shrink intervals by sending all-zero scores), (ii) cover age attack (inﬂating scores to widen intervals by sending scores IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. XX, 2026 10 Eﬃciency Cov erage Random Attac k Type 10 20 30 40 50 60 Maliciousness Score m k Byzantine Fig. 3. Distribution of maliciousness scores m k (cf. (12) ) under different calibration-phase Byzantine attacks. Byzantine clients ( red ) attain markedly larger scores than benign clients, enabling reliable outlier ﬁltering. 10 × the benign mean), and (iii) r andom attack (adding zero- mean Gaussian noise with variance σ 2 C = 0 . 5 to original scores while keeping them nonneg ativ e). W e set the size of the characterization vectors to H = 100 to balance histogram resolution and communication cost, and tar get 90% cov erage (i.e., α = 0 . 1 in (8)). Maliciousness-scor e separ ation. In Fig. 3, we show the mali- ciousness scores of all clients under the three calibration attacks. In each case, Byzantine clients concentrate on substantially larger scores than benign clients, making them clear outliers and enabling reliable ﬁltering prior to quantile estimation. Note that random attacks present a particularly formidable challenge in the context of mitigation strategies. As the variance of the added Gaussian noise diminishes, speciﬁcally when σ 2 C ≤ 0 . 2 , the difﬁculty associated with effecti vely countering these attacks increases. This phenomenon can be attributed to the increasingly cov ert nature of the random attacks, making their detection and subsequent mitigation notably more complex. Consequently , addressing the subtlety of these reduced-variance random attacks necessitates the development of more sophisticated and nuanced approaches in future w orks to ensure resilience. W e present the marginal coverage and average prediction- interval width for FCP , Rob-FCP , and PRISM-FCP under the same Byzantine attack scenarios as in Fig. 4 and T able I. W e train for 1 , 000 iterations, then perform conformal calibration using 1 , 000 samples and ev aluate on 1 , 000 test samples per client. For clarity , in Fig. 4, we only visualize the ﬁrst 30 test samples. In T able I, we compare FCP , Rob-FCP , and PRISM-FCP under Byzantine attacks that affect both the training and calibration phases. Rob-FCP uses full model sharing ( M /D = 1 ) with Byzantine ﬁltering, while FCP uses full sharing without ﬁltering. Under the coverage attacks, FCP exhibits inﬂated cov erage ( 100 . 0% ) together with interval widths that are 8 . 4 × larger than T ABLE I: Mar ginal coverage and av erage interval width under different Byzantine attacks ( K = 100 clients, |S B | = 20 Byzantine, D = 50 , α = 0 . 1 ). Attack Method M /D Coverage ( % ) Interval Width Efﬁcienc y PRISM-FCP 0.3 90 . 0 ± 0 . 1 0 . 91 ± 0 . 13 Rob-FCP 1.0 90 . 0 ± 0 . 1 1 . 68 ± 0 . 28 FCP 1.0 87 . 5 ± 0 . 2 1 . 58 ± 0 . 25 Coverage PRISM-FCP 0.3 90 . 0 ± 0 . 1 0 . 91 ± 0 . 13 Rob-FCP 1.0 90 . 0 ± 0 . 1 1 . 68 ± 0 . 28 FCP 1.0 100 . 0 ± 0 . 0 7 . 67 ± 1 . 23 Random PRISM-FCP 0.3 90 . 0 ± 0 . 1 0 . 91 ± 0 . 13 Rob-FCP 1.0 90 . 0 ± 0 . 1 1 . 68 ± 0 . 28 FCP 1.0 91 . 9 ± 0 . 6 1 . 87 ± 0 . 27 PRISM-FCP , while both PRISM-FCP and Rob-FCP remain close to the nominal 90% target. Under the efﬁcienc y and random attacks, the three methods achiev e similar coverage, but PRISM- FCP consistently produces tighter intervals, reﬂecting its lower training error . Ov erall, these results show that calibration-phase ﬁltering is necessary for reliable uncertainty quantiﬁcation under adversarial manipulation. PRISM-FCP further improves over Rob-FCP by substantially tightening the prediction interv als. Although both methods ﬁlter Byzantine clients during calibration, PRISM-FCP yields intervals that are 1 . 8 × narrower than Rob-FCP by additionally mitigating Byzantine inﬂuence during training via partial sharing. This agrees with Theorem 2, which predicts an M /D attenuation of injected perturbation energy . The training MSE corroborates this ef fect: PRISM-FCP achie ves an MSE of − 26 . 4dB compared to Rob-FCP’ s MSE of − 21 . 4dB ( 5dB improv ement), consistent with the MSE decomposition in Lemma 3. Fig. 5 corroborates Corollary 2 by reporting the calibration- quantile deviation across sharing ratios. The de viation decreases monotonically with M /D , and at M /D = 0 . 1 it is roughly 70% smaller than under full sharing ( M /D = 1 ). This trend aligns with the MSE decomposition in Lemma 3: partial sharing attenuates the Byzantine-induced term E ω (Lemma 4), which in turn reduces the perturbation of residuals and the resulting conformal quantile. Robustness to unknown |S B | . PRISM-FCP can be deplo yed either with a known Byzantine count |S B | , where score-based ﬁltering excludes exactly |S B | clients, or without this knowledge using a MAD-based outlier rule with a scale factor 1 . 4826 and a threshold of 2 . 5 as its parameters. Across all three attack types, the known- |S B | setting achiev es perfect detection ( 20 / 20 true positives and no false positiv es). When |S B | is unknown, a practical alternati ve is to ﬂag outliers using a robust MAD-based rule (Remark 3). The MAD rule also attains 100% recall, while incurring roughly one false positiv e on average. Importantly , both settings preserv e nominal cov erage ( 90% ), indicating that PRISM-FCP remains rob ust e ven when |S B | is not kno wn a priori. B. Real-Data Experiments (UCI Super conductivity Dataset) T o demonstrate practical applicability , we ev aluate PRISM- FCP on the UCI Superconductivity dataset [45], which contains IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. XX, 2026 11 5 10 15 20 25 30 Time Index ( n ) − 3 − 2 − 1 0 1 2 T arget V alue y (a) Efﬁcienc y attack 5 10 15 20 25 30 Time Index ( n ) − 4 − 2 0 2 4 (b) Coverage attack 5 10 15 20 25 30 Time Index ( n ) − 3 − 2 − 1 0 1 2 (c) Random attack Fig. 4. Illustrativ e prediction intervals under (a) efﬁcienc y , (b) coverage, and (c) random attacks. The true target values are shown as dashed lines, while prediction interv als from FCP and PRISM-FCP ( M /D = 0 . 3 ) are sho wn in red and green , respectively . 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Sharing Ratio M /D 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Quantile Deviation Fig. 5. Quantile deviation | ˆ q 1 − α − q ⋆ 1 − α | of PRISM-FCP versus sharing ratio M /D under different types of Byzantine attacks. 21 , 263 samples with D = 81 features describing material properties, with the target being the critical temperature, in Kelvin (K), at which superconducti vity occurs. Experimental Setup. W e distribute the dataset across K = 100 clients, including |S B | = 20 Byzantine clients. Each client is assigned 3 , 000 samples, split into 1 , 000 training, 1 , 000 calibration, and 1 , 000 test samples. 3 At each training iteration, the server uniformly samples |S n | = 20 clients to participate. Features are standardized to zero mean and unit variance, and targets are normalized similarly . As before, we compare: (i) PRISM-FCP with partial sharing ( M < D ) and Byzantine ﬁltering, (ii) Rob-FCP with full sharing ( M = D ) and Byzantine ﬁltering, and (iii) FCP with full sharing and no ﬁltering. 3 T o simulate non-IID data heterogeneity , we partition the target variable (critical temperature) into 10 equal-frequency quantile bins and dra w each client’ s mixing proportions ov er these bins from a Dir(0 . 5) distribution [46]. Each client receiv es 3 , 000 samples accordingly , with ov erlap permitted across clients. T ABLE II: Evaluation results on the UCI Superconductivity dataset with D = 81 . Attack Method M /D Coverage ( % ) Interval Width (K) Efﬁcienc y PRISM-FCP 0.25 90 . 0 ± 0 . 1 64 . 09 ± 1 . 25 0.49 90 . 0 ± 0 . 1 66 . 43 ± 1 . 99 0.74 90 . 0 ± 0 . 2 69 . 87 ± 2 . 81 1.0 90 . 0 ± 0 . 2 74 . 33 ± 4 . 37 Rob-FCP 1.0 90 . 0 ± 0 . 2 74 . 33 ± 4 . 37 FCP 1.0 87 . 5 ± 0 . 2 69 . 36 ± 4 . 10 Coverage PRISM-FCP 0.25 90 . 0 ± 0 . 1 64 . 09 ± 1 . 25 0.49 90 . 0 ± 0 . 1 66 . 43 ± 1 . 99 0.74 90 . 0 ± 0 . 2 69 . 87 ± 2 . 81 1.0 90 . 0 ± 0 . 2 74 . 33 ± 4 . 37 Rob-FCP 1.0 90 . 0 ± 0 . 2 74 . 33 ± 4 . 37 FCP 1.0 100 . 0 ± 0 . 0 364 . 98 ± 22 . 33 Random PRISM-FCP 0.25 90 . 0 ± 0 . 1 64 . 33 ± 1 . 28 0.49 90 . 0 ± 0 . 1 66 . 66 ± 2 . 04 0.74 90 . 0 ± 0 . 2 70 . 09 ± 2 . 86 1.0 90 . 0 ± 0 . 2 74 . 52 ± 4 . 39 Rob-FCP 1.0 90 . 0 ± 0 . 2 74 . 52 ± 4 . 39 FCP 1.0 91 . 7 ± 0 . 3 78 . 43 ± 4 . 17 W e sweep sharing ratios M /D ∈ { 0 . 25 , 0 . 49 , 0 . 74 , 1 . 0 } (i.e., M ∈ { 20 , 40 , 60 , 81 } ) and a verage results over 100 independent trials. The target coverage is 1 − α = 90% . In T able II, we present marginal cov erage and av erage interv al width under the three calibration-phase attack scenarios. The results corroborate the synthetic experiments and validate PRISM-FCP on real-world data: Covera ge validity . Across all three attacks and all sharing ratios, PRISM-FCP stays close to the 90% cov erage target, indicating that partial sharing does not compromise conformal v alidity in practice. In contrast, FCP without Byzantine ﬁltering fails under adversarial calibration: it substantially inﬂates intervals under cov erage attacks (reaching 100 . 0% cov erage with widths exceeding 5 × ) and undercovers under efﬁcienc y attacks (e.g., 87 . 5% co verage). Interval efﬁciency . PRISM-FCP yields tighter intervals than Rob-FCP . For example, at M /D = 0 . 25 ( M = 20 ), PRISM- FCP produces interv als that are about 13% narrower than Rob- FCP across all three attacks. This impro vement is attributable IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. XX, 2026 12 to reduced training error from partial sharing, which attenuates Byzantine inﬂuence during training. Communication-accuracy trade-off . Smaller sharing ratios generally tighten intervals due to stronger Byzantine attenuation, but the beneﬁt saturates around M /D ≈ 0 . 5 . In particular , reducing M /D from 0 . 49 to 0 . 25 provides only marginal additional width reduction while slightly slo wing con ver gence, suggesting 0 . 3 ≤ M /D ≤ 0 . 5 as a practical operating range that balances efﬁciency and communication cost. Attack-type in variance. The relati ve advantage of PRISM-FCP ov er Rob-FCP is stable across ef ﬁciency , coverage, and random attacks, consistent with the interpretation that training-stage mitigation via partial sharing complements the calibration-phase ﬁltering. Overall, PRISM-FCP pro vides two complementary beneﬁts: (i) r esilience to Byzantine behavior in both training and calibration, and (ii) communication efﬁciency through partial model parameter sharing and client subsampling, making it a practical approach for scalable FCP . V I . C O N C L U S I O N A N D F U T U R E W O R K W e proposed PRISM-FCP to mitigate Byzantine attacks in both training and calibration phases of federated conformal prediction. W e showed that partial sharing attenuates adversarial perturbation energy during training, reducing steady-state error and tightening prediction intervals, while histogram-based ﬁlter- ing strengthens robustness during calibration. Comprehensive experiments on both synthetic and real data corroborated our theoretical analysis. Several directions merit future inv estigation. Extending the theory beyond linear regression to nonlinear models and deep networks is an important next step. In addition, analyzing robustness against fully adaptive adversaries that exploit partial-sharing patterns will further strengthen the guarantees. A P P E N D I X A P RO O F O F T H E O R E M 3 W e prove that on a high-probability event, ev ery Byzantine client has a strictly larger maliciousness score than ev ery benign client, hence removing the B largest scores recovers exactly the Byzantine set. Step 1: Benign histogram concentration. For each benign client k ∈ B , Lemma 6 implies that for any δ ′ ∈ (0 , 1) , P   ∥ v k − p k ∥ 2 > C H s log(2 H /δ ′ ) N k   ≤ δ ′ . (37) Choose δ ′ = δ /K b and use N k ≥ N min to get P   ∥ v k − p k ∥ 2 > C H s log(2 H K b /δ ) N min   ≤ δ K b . (38) A union bound o ver all benign clients yields the ev ent E b :=    max k ∈B ∥ v k − p k ∥ 2 ≤ C H s log(2 H K b /δ ) N min    (39) satisﬁes P ( E b ) ≥ 1 − δ . Step 2: Drift due to training err or . Lemma 7 giv es, for each benign k , ∥ p k − p ⋆ ∥ 2 ≤ C drift L x ∥ e k ∥ 2 . By the deﬁnition of e rms used in the manuscript (as an upper bound on benign regression error), we hav e ∥ e k ∥ 2 ≤ e rms for benign k , hence max k ∈B ∥ p k − p ⋆ ∥ 2 ≤ C drift L x e rms . Combining this with E b and the triangle inequality , max k ∈B ∥ v k − p ⋆ ∥ 2 ≤ max k ∈B ∥ v k − p k ∥ 2 + max k ∈B ∥ p k − p ⋆ ∥ 2 ≤ r b on E b , where r b is deﬁned in (33). Step 3: Adversarial concentration and distance bounds. Deﬁne the adversarial concentration ev ent E a := { max j ∈S B ∥ v j − q ∥ 2 ≤ r a } , which holds with probability at least 1 − δ a by (3) . Let E := E b ∩ E a , thus P ( E ) ≥ 1 − ( δ + δ a ) . Conditioning on E , for an y two benign clients k , k ′ , we have ∥ v k − v k ′ ∥ 2 ≤ ∥ v k − p ⋆ ∥ 2 + ∥ v k ′ − p ⋆ ∥ 2 ≤ 2 r b . For any Byzantine client j and benign client k , the triangle inequality yields the lo wer bound ∥ v j − v k ∥ 2 ≥ ∥ q − p ⋆ ∥ 2 − ∥ v j − q ∥ 2 − ∥ v k − p ⋆ ∥ 2 ≥ ∆ − r a − r b = : γ , (40) and similarly , the upper bound ∥ v j − v k ∥ 2 ≤ ∥ v j − q ∥ 2 + ∥ q − p ⋆ ∥ 2 + ∥ v k − p ⋆ ∥ 2 ≤ ∆ + r a + r b = : Γ . (41) Step 4: Lower bound on Byzantine scores. Fix any Byzantine client j ∈ S B . There are K b benign clients, and by the bound abov e, each benign k satisﬁes ∥ v j − v k ∥ 2 ≥ γ . Consider any subset T ⊆ B with | T | = K b − 1 . Then, X k ∈ T ∥ v j − v k ∥ 2 ≥ ( K b − 1) γ . Since FN( j, K b − 1) contains the K b − 1 lar gest distances from j , its sum is at least the sum over any K b − 1 clients, hence m j = X k ′ ∈ FN( j,K b − 1) ∥ v j − v k ′ ∥ 2 ≥ ( K b − 1) γ . Step 5: Upper bound on benign scores. Fix any benign client k ∈ B . Among its K b − 1 farthest neighbors, at most B can be Byzantine (there are only B Byzantine clients total). For any benign neighbor k ′ we ha ve ∥ v k − v k ′ ∥ 2 ≤ 2 r b , and for an y Byzantine neighbor j we have ∥ v k − v j ∥ 2 ≤ Γ . Therefore, m k = X k ′ ∈ FN( k,K b − 1) ∥ v k − v k ′ ∥ 2 ≤ B Γ + ( K b − 1 − B )2 r b . Step 6: Score separation implies exact ﬁltering. By Steps 4–5, on E we have for e very Byzantine j and benign k , m j ≥ ( K b − 1) γ , m k ≤ B Γ + ( K b − 1 − B )2 r b . Under condition (34) , this yields m j > m k for all Byzantine j and benign k . Hence the B largest scores { m k } belong exactly to the Byzantine clients, so ﬁltering by (12) is exact on E . IEEE TRANSA CTIONS ON SIGNAL PR OCESSING, V OL. XX, 2026 13 Step 7: Misﬁltering pr obability . Since misﬁltering can occur only if E fails, P ( misﬁltering ) ≤ P ( E c ) ≤ P ( E c b ) + P ( E c a ) ≤ δ + δ a , which completes the proof. R E F E R E N C E S [1] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y . Arcas, “Communication-efﬁcient learning of deep networks from decentralized data, ” in Pr oc. Int. Conf. Artif. Intell. Stat. , 2017, pp. 1273–1282. [2] Q. Y ang, Y . Liu, T . Chen, and Y . T ong, “Federated machine learning: Concept and applications, ” A CM T rans. Intell. Syst. T echnol. , vol. 10, no. 2, pp. 1–19, 2019. [3] T . Li, A. K. Sahu, A. T alwalkar , and V . Smith, “Federated learning: Challenges, methods, and future directions, ” IEEE Signal Process. Mag. , vol. 37, no. 3, pp. 50–60, 2020. [4] P . Kairouz, H. B. McMahan, B. A vent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings et al. , “ Advances and open problems in federated learning, ” F ound. Tr ends Mach. Learn. , vol. 14, no. 1–2, pp. 1–210, 2021. [5] E. Lari, V . C. Gogineni, R. Arablouei, and S. W erner , “Noise-robust and resource-efﬁcient ADMM-based federated learning, ” Signal Process. , vol. 233, p. 109988, 2025. [6] V . Smith, C. Chiang, M. Sanjabi, and A. S. T alw alkar , “Federated multi- task learning, ” in Pr oc. Adv . Neural Inf. Pr ocess. Syst. , 2017. [7] J. W ang, Q. Liu, H. Liang, G. Joshi, and H. V . Poor, “T ackling the objectiv e inconsistency problem in heterogeneous federated optimization, ” in Pr oc. Adv . Neural Inf . Pr ocess. Syst. , vol. 33, 2020, pp. 7611–7623. [8] S. Reddi, Z. Charles, M. Zaheer , Z. Garrett, K. Rush, J. Kone ˇ cn ` y, S. Kumar , and H. B. McMahan, “ Adaptiv e federated optimization, ” in Proc. Int. Conf. Learn. Repr esent. , 2021. [9] K. Bonawitz, H. Eichner, W . Grieskamp, D. Huba, A. Ingerman, V . Ivanov , C. Kiddon, J. Kone ˇ cn ` y, S. Mazzocchi, B. McMahan et al. , “T owards federated learning at scale: System design, ” in Proc. MLSys , 2019, pp. 374–388. [10] J. W ang, Z. Charles, Z. Xu, G. Joshi, H. B. McMahan, M. Al-Shedi vat, G. Andrew , S. A vestimehr, K. Deng, J. Duchi et al. , “ A ﬁeld guide to federated optimization, ” in Pr oc. Int. Conf. Learn. Represent. , 2024. [11] Z. Charles, N. Garrett, Z. Xu, and G. Joshi, “T owards federated foundation models: Scalable dataset pipelines for group-structured learning, ” in Pr oc. Adv . Neural Inf. Pr ocess. Syst. , 2024. [12] Q. Y ang, Y . Liu, Y . Cheng, Y . Kang, T . Chen, and H. Y u, “Federated learning, ” IEEE T rans. Big Data , vol. 6, no. 4, pp. 673–688, 2020. [13] W . Y . B. Lim, N. C. Luong, D. T . Hoang, Y . Jiao, Y .-C. Liang, Q. Y ang, D. Niyato, and C. Miao, “Federated learning in mobile edge networks: A comprehensiv e surve y , ” IEEE Commun. Surv . T utor . , vol. 22, no. 3, pp. 2031–2063, 2020. [14] T . Gafni, N. Shlezinger, K. Cohen, Y . C. Eldar, and H. V . Poor , “Federated learning: A signal processing perspectiv e, ” IEEE Signal Process. Mag. , vol. 39, no. 3, pp. 14–41, 2022. [15] E. Lari, V . C. Gogineni, R. Arablouei, and S. W erner , “On the resilience of online federated learning to model poisoning attacks through partial sharing, ” in Pr oc. IEEE Int. Conf. Acoust. Speech Signal Pr ocess. , 2024, pp. 9201–9205. [16] E. Lari, R. Arablouei, V . C. Gogineni, and S. W erner, “Resilience in online federated learning: Mitigating model-poisoning attacks via partial sharing, ” IEEE T rans. Signal Inf. Pr ocess. Netw . , vol. 11, pp. 388–400, 2025. [17] V . V ovk, A. Gammerman, and G. Shafer , Algorithmic learning in a random world . Springer , 2005. [18] V . Plassier, N. Kotele vskii, A. Rubashevskii, F . Noskov , M. V elikanov , A. Fishkov , S. Horvath, M. T akac, E. Moulines, and M. Panov , “Ef ﬁcient conformal prediction under data heterogeneity , ” in Proc. Int. Conf. Artif . Intell. Stat. PMLR, 2024, pp. 4879–4887. [19] M. Zhu, M. Zecchin, S. Park et al. , “Federated inference with reliable uncertainty quantiﬁcation over wireless channels via conformal prediction, ” IEEE T rans. Signal Pr ocess. , vol. 72, pp. 1235–1250, 2024. [20] F . Y e, M. Y ang, J. P ang, L. W ang, D. W ong, E. Yilmaz, S. Shi, and Z. T u, “Benchmarking LLMs via uncertainty quantiﬁcation, ” Pr oc. Adv . Neural Inf. Process. Syst. , vol. 37, pp. 15 356–15 385, 2024. [21] X. Zhou, B. Chen, Y . Gui, and L. Cheng, “Conformal prediction: A data perspectiv e, ” A CM Comput. Surv . , vol. 58, no. 2, pp. 1–37, 2025. [22] J. Lei, M. G’Sell, A. Rinaldo, R. J. Tibshirani, and L. W asserman, “Distribution-free predictive inference for regression, ” J. Am. Stat. Assoc. , vol. 113, no. 523, pp. 1094–1111, 2018. [23] J. Ai and Z. Ren, “Not all distributional shifts are equal: Fine-grained robust conformal inference, ” in Pr oc. Int. Conf. Mach. Learn. , 2024, pp. 641–665. [24] C. Lu, Y . Y u, S. P . Karimireddy , M. Jordan, and R. Raskar, “Federated conformal predictors for distributed uncertainty quantiﬁcation, ” in Pr oc. Int. Conf. Mach. Learn. PMLR, 2023, pp. 22 942–22 964. [25] P . Humbert, B. Le Bars, A. Bellet, and S. Arlot, “One-shot federated conformal prediction, ” in Pr oc. Int. Conf. Mach. Learn. PMLR, 2023, pp. 14 153–14 177. [26] V . Plassier , M. Makni, A. Rubashevskii, E. Moulines, and M. P anov , “Conformal prediction for federated uncertainty quantiﬁcation under label shift, ” in Pr oc. Int. Conf. Mach. Learn. , 2023, pp. 28 101–28 139. [27] Y . Min, C. Zhang, L. Peng, and C. Zou, “Personalized federated conformal prediction with localization, ” in Proc. Adv . Neural Inf. Process. Syst. , 2025. [28] N. K outsoubis, A. W aqas, Y . Y ilmaz et al. , “Priv acy-preserving federated learning and uncertainty quantiﬁcation in medical imaging, ” Radiol.: Artif. Intell. , vol. 7, no. 4, 2025. [29] P . Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer, “Machine learning with adversaries: Byzantine tolerant gradient descent, ” Proc. Adv . Neural Inf. Pr ocess. Syst. , v ol. 30, 2017. [30] E. M. E. Mhamdi, R. Guerraoui, and S. Rouault, “The hidden vulnerability of distrib uted learning in byzantium, ” in Pr oc. Int. Conf. Mach. Learn. , 2018, pp. 3521–3530. [31] M. Fang, X. Cao, J. Jia, and N. Gong, “Local model poisoning attacks to Byzantine-Robust federated learning, ” in USENIX Security Symp. , Aug. 2020, pp. 1605–1622. [32] W . Huang, Z. Shi, M. Y e, H. Li, and B. Du, “Self-driven entropy aggregation for Byzantine-robust heterogeneous federated learning, ” in Pr oc. Int. Conf. Mach. Learn. , 2024, pp. 20 096–20 110. [33] D. Y in, Y . Chen, R. Kannan, and P . Bartlett, “Byzantine-robust distributed learning: T owards optimal statistical rates, ” in Proc. Int. Conf. Mac h. Learn. , 2018, pp. 5650–5659. [34] K. Pillutla, S. M. Kakade, and Z. Harchaoui, “Robust aggregation for federated learning, ” IEEE T rans. Signal Pr ocess. , vol. 70, pp. 1142–1154, 2022. [35] C. Qian, M. W ang, H. Ren, and C. Zou, “ByMI: Byzantine machine identiﬁcation with false discov ery rate control, ” in Proc. Int. Conf. Mach. Learn. , 2024, pp. 41 357–41 382. [36] B. Zhang, M. Fang, Z. Liu, B. Y i, P . Zhou, Y . W ang, T . Li, and Z. Liu, “Practical framew ork for privac y-preserving and Byzantine-robust federated learning, ” IEEE T rans. Inf . F orensics Security , 2025. [37] M. Kang, Z. Lin, J. Sun, C. Xiao, and B. Li, “Certiﬁably byzantine-rob ust federated conformal prediction, ” in Pr oc. Int. Conf. Mach. Learn. PMLR, 2024, pp. 23 022–23 057. [38] V . C. Gogineni, S. W erner , Y .-F . Huang, and A. Kuh, “Communication- efﬁcient online federated learning frame work for nonlinear regression, ” in Pr oc. IEEE Int. Conf. Acoust. Speech Signal Pr ocess. , 2022, pp. 5228– 5232. [39] ——, “Communication-ef ﬁcient online federated learning strategies for kernel regression, ” IEEE Internet Things J . , vol. 10, pp. 4531–4544, 2023. [40] B. Kailkhura, S. Brahma, and P . K. V arshney , “Data falsiﬁcation attacks on consensus-based detection systems, ” IEEE T rans. Signal Inf. Pr ocess. Netw . , vol. 3, no. 1, pp. 145–158, 2017. [41] B. De Finetti, “On the condition of partial exchangeability , ” Stud. Inductive Logic Pr obab. , vol. 2, pp. 193–205, 1980. [42] T . Dunning, “The T -Digest: Efﬁcient estimates of distributions, ” Software Impacts , vol. 7, p. 100049, 2021. [43] C. Leys, C. Ley , O. Klein, P . Bernard, and L. Licata, “Detecting outliers: Do not use standard deviation around the mean, use absolute de viation around the median, ” J . Exp. Soc. Psychol. , v ol. 49, pp. 764–766, 2013. [44] J. A. Tropp, “User-friendly tail bounds for sums of random matrices, ” F ound. Comput. Math. , vol. 12, no. 4, pp. 389–434, 2012. [45] K. Hamidieh, “ A data-driven statistical model for predicting the critical temperature of a superconductor, ” Comput. Mater . Sci. , vol. 154, pp. 346–354, 2018. [46] T .-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non- identical data distribution in federated optimization, ” arXiv preprint arXiv:1909.06335 , 2019.

PRISM-FCP: Byzantine-Resilient Federated Conformal Prediction via Partial Sharing

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment