Failure-Aware Access Point Selection for Resilient Cell-Free Massive MIMO Networks

F ailure-A w are A ccess P oin t Selection for Resilien t Cell-F ree Massiv e MIMO Net w orks Mostafa Rahmani Ghourtani 1 , Jun b o Zhao 1 , Yi Ch u 1 Hamed Ahmadi 1 , Da vid Grace 1 , Alister G. Burr 1 1 Sc ho ol of Ph ysics, Engineering and T ec hnology , Univ ersit y of Y ork, {rahmani.mostafa, Jun b o.zhao, Yi.c h u, hamed.ahmadi, da vid.grace, alister.burr}@y ork.ac.uk, Abstract—This pap er presents a F ailure-A ware A ccess Poin t Selection (F AAS) method aimed at improving hardw are re- silience in cell-free massive MIMO (CF-mMIMO) netw orks. F AAS selects APs for each user b y join tly considering channel strength and the failure probability of each AP . A tunable parameter α ∈ [0 , 1] scales these failure probabilities to mo del dieren t levels of netw ork stress. W e ev aluate resilience using t wo k ey metrics: the minim um-user sp ectral eciency , whic h captures worst-case user performance, and the outage proba- bilit y , dened as the fraction of users left without an y active APs. Simulation results show that F AAS maintains signi- can tly better performance under failure conditions compared to failure-agnostic clustering. A t high failure lev els, F AAS reduces outage b y o ver 85% and improv es w orst-case user rates. These results conrm that F AAS is a practical and ecient solution for building more reliable CF-mMIMO netw orks. Index T erms—Access point selection, Cell-free massiv e MIMO, Resilienc e, Sp ectral eciency I. In troduction Cell-free massiv e multiple-input multiple-output (CF- mMIMO) has emerged as a leading arc hitecture for b ey ond-5G/6G wireless net w orks, generalizing classical massiv e MIMO in to a distributed, cell-less paradigm [1], [2], [3]. By deploying a large n umber of distributed ac- cess p oints (APs) that jointly serve all users, CF-mMIMO lev erages macro-div ersit y , mitigates in ter-cell in terference, and ensures uniformly high data rates [4], [5]. Unlike traditional cellular systems, CF-mMIMO signicantly re- duces cell b oundaries and asso ciated edge eects, enabling consisten t quality of service (QoS) and ultra-reliable links [6]. A t the same time, the large num b er of distributed APs supp orts ecien t MU-MIMO transmission, allo wing the system to exploit spatial m ultiplexing gains that bo ost sp ectral eciency while main taining uniform co v erage and link reliabilit y [7], [8], [9]. Despite these touted reliabilit y benets, the resilience of CF-mMIMO net w orks in the face of hardware failures has receiv ed surprisingly limited attention, in contrast to recen t works highlighting resilience-by-design as a crucial paradigm for ensuring robust 6G comm unication netw orks [10]. In practice, ho w ev er, AP hardw are can malfunction The work presented in this pap er was funded by the UK De- partment for Science, Innov ation and T echnology under pro ject YO- RAN. or fail (e.g. due to p o w er outages, equipment faults, or main tenance issues), which poses a serious challenge to an y distributed antenna system. Conv en tional cellular net w orks suer outages when a base station fails, but a distributed CF net work could be more fault-toleran t b y design. F or example, recent architectural prop osals lik e the “radio strip es” concept suggest that no de failures can b e tolerated via in ternal routing mechanisms, thereb y impro ving net w ork robustness. This assumption holds par- ticularly in high-density deplo ymen ts, where ov erlapping AP cov erage ensures that the failure of a few no des has only a marginal impact on system-wide p erformance due to the inheren t spatial redundancy of CF-mMIMO [11]. While resilience in CF-mMIMO is often discussed qual- itativ ely [12], only a few works oer detailed analysis. Sadreddini et al. [13] use Marko v mo dels to show how limited fronthaul capacit y and long routing paths can disconnect UEs or degrade SINR. W ein b erger et al. [14] demonstrate that RIS can passiv ely enhance resilience b y providing alternative paths, even without optimized phase settings. Elk esha wy et al. [15] propose a data-driven activit y detector at the central pro cessing unit (CPU) that remains accurate under impairments, highlighting robustness against practical impairmen ts. In addition, [16] addresses hardware nonlinearity by mo deling P A distor- tions and optimizing user association and pow er con trol to mitigate their eects. Overall, there remains a signicant theoretical gap in understanding how probabilistic AP failures inuence CF-mMIMO p erformance and what can b e done to design resilien t cell-free net w orks. The theoretical nov elty of this w ork lies in in tegrating hardw are failure resilience into CF-mMIMO for the rst time in a systematic wa y . Rather than deriving closed- form analytical expressions for p erformance under failures, whic h remain highly complex due to com binatorial failure patterns, we prop ose a tractable mo deling framework that incorporates probabilistic AP failures into AP selec- tion and ev aluation. By explicitly dening failure-aw are user–AP asso ciations and resilience metrics, our w ork bridges the gap b etw een purely qualitative discussions of resilience and quan titativ e system-lev el analysis. W e sho w that even under mo derate AP failure rates, intel- ligen t AP selection can preserve muc h of the system’s sp ectral eciency , whereas traditional failure-agnostic approac hes suer more pronounced degradation. The pro- p osed F ailure-A w are AP Selection (F AAS) methodology oers a blueprint for making CF architectures failure- a w are: net w ork controllers can use failure probabilities (obtained from hardware health monitoring or historical data) to optimize user-AP asso ciations proactiv ely . T o v alidate these claims, we ev aluate F AAS under v arious probabilistic failure scenarios and compare the resulting user rates and fairness against baseline sc hemes without failure aw areness. The introduction of resilience in to CF-mMIMO, as pursued in this work, op ens a new researc h direction to ensure that the next generation of CF-mMIMO net w orks can deliver on their promise of ubiquitous, reliable connectivity even in the presence of inevitable hardw are failures. In the follo wing, w e detail the system model and assumptions, then presen t the F AAS strategy and its theoretical p erformance analysis under AP failure conditions. I I. Problem Statemen t As illustrated in Fig. 1, APs in a CF-mMIMO netw ork are susceptible to hardware failures caused by p o w er loss, comp onen t degradation, or synchronization issues. If each AP fails indep endently with a probability p f m, 0 , the set of activ e APs b ecomes a random subset of the total M deplo y ed units. In practice, p f m, 0 can b e estimated from hardw are reliabilit y statistics, eld measurements, or predictiv e health monitoring of AP comp onents. Typical v alues lie in the range 0 . 01 – 0 . 1 , corresp onding to 1–10% failure likelihoo ds as rep orted in radio access and p ow er systems. In this w ork, we adopt such representativ e ranges to mo del realistic stress levels rather than targeting a sp ecic failure mechanism, making the framework broadly applicable across dierent deplo ymen t scenarios. The assumption of indep enden t failures provides a tractable and widely used baseline in reliabilit y analysis; extending the framework to correlated failures (e.g., site-level or fron thaul outages) is an important op en problem left for future w ork. In suc h conditions, static user-to-AP asso ciations can lead to service outages or severe p erformance degradation when assigned APs are una v ailable. T o address this c hallenge, we prop ose the F AAS sc heme, whic h proactiv ely in tegrates failure aw areness in to CF-mMIMO system de- sign to enhance resilience. The k ey idea b ehind F AAS is to dynamically select a subset of APs for each user b y accounting for the probability of AP failures. Unlike con v en tional schemes that assume a xed, alwa ys-on set of APs, F AAS adapts the AP-user asso ciation strategy based on p otential outages, leveraging the redundancy inheren t in distributed AP deplo yments. Even if some APs randomly fail, the remaining active APs can main tain user service with minimal degradation. Fig. 1. F ailure-a w are AP selection framework: each AP is asso ciated with a failure probability p f m and CPU selects APs based on both channel strength and failure probabilit y . I I I. System Model A. Pilot T ransmission and Channel Estimation W e consider a CF-mMIMO system inv olving M APs equipp ed with N antennas and K users uniformly dis- tributed in the netw ork. The signals from all APs are transmitted to the CPU via fronthauls and pro cessed there. The at-fading c hannel coecient b et w een the m -th AP and the k -th user is dened as g mk = β 1 / 2 mk h mk , where β is the large-scale fading co ecient and h mk ∼ C N ( 0 , R mk ) denotes small-scale Ra yleigh fading with the spatial cor- related matrix R mk ∈ C N × N . T o obtain the c hannel state information (CSI), we assume that τ p m utually orthogonal pilot sequences, each of length τ p , are used. Let φ k ∈ C τ p × 1 with || φ k || 2 = τ p denote the pilot sequence assigned to the k -th user. The receiv ed pilot signal at the m -th AP is: y pilot m = K X k =1 √ p k g mk φ T k + z pilot m , (1) where p k represen ts the transmitted pow er for the k -th user, and z pilot m ∼ C N ( 0 , σ 2 z I N ) denotes the noise v ector at the m -th AP for the received pilot signal. Using the same approach [17], the coarse estimate is computed by ˇ g mk = 1 √ τ p y pilot m φ ∗ k . Then, the MMSE estimate of g mk is: ˆ g mk = √ p k τ p β mk R mk Ψ − 1 mk ˇ g mk , (2) where Ψ mk = X k ′ ∈ S k τ p p k ′ β mk ′ R mk ′ + σ 2 z I N , (3) where S k denotes the subset of users assigned the same pilot sequence k . B. Uplink Data T ransmission F or the uplink data transmission, the received signal at the m -th AP is: y m = K X k =1 g mk x k + z m , (4) where x k is the data transmitted b y the k -th user with p o w er σ 2 x , and z m ∈ C N × 1 denotes the noise vector for the data transmission. At the CPU, signals from selected APs are com bined b y the w eigh t w mk ∈ C 1 × N to detect the data x k as follo ws: ˆ x k = M X m =1 w mk D mk y m = w k D k g k x k + K X k ′  = k w k D k g k ′ x k ′ + w k D k z , (5) where w k = [ w 1 k , . . . , w M k ] , g k = [ g T 1 k , . . . , g T M k ] T , and the noise z = [ z T 1 , . . . , z T M ] T . The selected APs to serve user k is a subset M k ⊂ { 1 , . . . , M } , and it can b e determined b y a blo ck-diagonal matrix D k = diag ( D 1 k , . . . , D M k ) ∈ C M N × M N , where D mk = ( I N , if m ∈ M k 0 N , if m / ∈ M k (6) Then, the uplink sp ectral eciency (SE) for the user k can b e expressed b y: SE k = τ u τ u + τ p E  log 2 (1 + SINR k )  , (7) where τ u is the length of data for uplink transmission within one coherence interv al, and SINR k denotes the signal-to-in terference-plus-noise ratio (SINR) for user k , whic h is giv en b y (8) as stated in [18]. SINR k = p k | w k D k ˆ g k | 2 P K k ′  = k p k ′ | w k ′ D k ′ ˆ g k ′ | 2 + w k ζ k w H k , (8) where ζ k = D k ( P K k ′ =1 p k ′ C k ′ + σ 2 z I M N ) D k , and C k = diag ( C 1 k , . . . , C M k ) . C. User-cen tric Net w ork with Dynamic Coop erative Clus- ter T o illustrate the failure una w are AP selection, we reference the scalable user-centric CF-mMIMO describ ed in [18] as a b enchmark. In [18], the authors adopted dynamic co op erative clustering (DCC) to select the AP cluster that serves a sp ecic user. This approach can b e summarized as follows: the user rst selects its Master AP based on the strongest large-scale fading co ecien t; the pilot with the least pilot contamination (dominated by (3)) observed by the Master AP is then assigned to the user; nally , given a threshold, if the neigh boring AP has a slightly lo w er channel gain compared to the Master AP , the AP cluster is selected. W e assume that only a small subset of users contribute to the main in terference, as partial APs serving the curren t user may also serve other users. Therefore, partial MMSE (P-MMSE) is used to maximize the SINR in (8) for user k , as demonstrated in [18], and is given by: w k = p k ˆ g H k D k h X k ′ ∈P k p k ′ D k ˆ g k ′ ˆ g H k ′ D k + D k  X k ′ ∈P k p k ′ C k ′ + σ 2 z I M N  D k # † , (9) where † denotes the Mo ore–Penrose pseudo-inv erse, and the index k ′ is included in the set P k = { k ′ : D k D k ′  = 0 M N } , if user k ′ is partially serv ed by the same APs that serv e user k . IV. F ailure-A w are A ccess P oin t Selection This section presents the prop osed F AAS mec hanism. W e equip the system with a tunable reliability mo del, dene a combined utility metric for AP selection, and form ulate b oth av erage and worst-case sp ectral eciency under probabilistic failure conditions. A. F AAS Algorithm with Stress-Level P arameter W e in troduce a failure in tensit y parameter α ∈ [0 , 1] to scale eac h AP’s baseline failure probabilit y p f m, 0 , reecting conditions such as hardware faults, p ow er outages, or fron thaul disruptions: p f m = α p f m, 0 . (10) Here, α = 0 corresp onds to a failure-free netw ork, while α = 1 represen ts netw orks operating in challenging envi- ronmen ts under maxim um stress conditions. In the failure- a w are scheme, the CPU is assumed to know the failure probabilities p f m from monitoring or predictive data. It constructs the serving AP set M k for each UE k to ensure reliable service by prioritizing APs with strong and reliable c hannels. The selection rule is giv en b y: M k = ( m      P ˜ N m =1 ˜ β mk (1 − p f m ) P N m ′ =1 β m ′ k (1 − p f m ′ ) ≥ ε ) , (11) where ˜ β mk are large-scale fading co ecients sorted in descending order of asso ciated reliability , and ˜ N is the smallest n um b er of APs whose cum ulativ e reliabilit y- w eigh ted gain meets the predened threshold ε . The resulting set M k denes the structure of the combining matrix D k , activ ating only the selected APs that con- tribute most signicantly to reliable communication. By construction, this reliabilit y-weigh ted selection directly re- duces the probabilit y of user outage, since APs with higher surviv al probability (1 − p f m ) are fa vored, ensuring that the likelihoo d of all assigned APs failing simultaneously is minimized. T o av oid fragile assignments where only a single AP is selected (i.e., ˜ N = 1 ), which would under- mine resilience if that AP fails, w e imp ose a minimum co op erativ e cluster size of at least tw o APs p er user. This guaran tees that each user retains connectivity under a single-p oin t failure. More generally , this o or on cluster size can b e tuned depending on service requirements; for example, mission-critical deploymen ts may enforce ˜ N ≥ 3 for stronger redundancy . The threshold ε ∈ (0 , 1) plays a key role in balancing reliabilit y and coop eration size: it gov erns how many of the strongest and most reliable APs are selected to serv e each user. In our sim ulations, ε is xed to a represen tativ e v alue (e.g., 0.9) to ensure sucien t robustness while maintaining sparsit y in the user–AP asso ciation. T o ensure fairness, this thresholding and minimum cluster-size rule are ap- plied consistently across b oth the prop osed F AAS and the failure-agnostic clustering method, so that dierences in p erformance stem solely from failure aw areness rather than cluster size. B. Analytical Modeling of F ailure Impact In our F AAS framework, AP failures are mo deled as indep enden t Bernoulli ev en ts. Specically , eac h AP in M k remains active with probabilit y q = 1 − p f m . Assuming AP failures are indep enden t, the set of active APs for user k b ecomes D k , where each m ∈ M k fails with probability p f . The eective SINR is giv en by equation (8). Although AP failures introduce combinatorial randomness in which subsets of APs are op erational, this can b e abstracted using approximate techniques. T reating each AP as a no de that is indep endently “thinned” (retained) with probabilit y q leads to a v aluable statistical appro ximation: the eective cluster size is reduced in exp ectation b y this factor. While we omit detailed formulas here, analytic to ols lik e binomial thinning [19], commonly used in point- pro cess analysis of wireless netw orks, can derive tractable estimates for exp ectation and v ariance of SINR under failures. C. Resilience Metric T o ev aluate the fault tolerance of our F AAS sc heme, w e consider t w o complemen tary resilience metrics that jointly capture service reliability and quality under AP failure. First, w e use the minim um-user sp ectral eciency: SE min = min k ∈{ 1 ,...,K } E F [ SE k ] , (12) where the exp ectation is taken ov er all failure ev en ts F . This metric directly measures the w orst-case user through- put, ensuring that our design do es not marginalize cell- edge users or those served by less reliable APs. Alongside this, w e monitor the a v erage sp ectral eciency: SE = 1 K K X k =1 E F [ SE k ] , (13) but w e emphasize SE min to ensure F AAS genuinely en- hances resilience rather than merely elev ating aggregate p erformance. While our framework ev aluates sp ectral eciency in exp ectation ov er failure even ts, alternative reliabilit y metrics based on the distribution of user rates (e.g., the 1% outage rate or 99% quantile SE) are in- deed relev ant. W e adopt expectation-based metrics for analytical tractability and comparability with existing CF- mMIMO works [18], but ackno wledge that quantile-based metrics could provide a ner-grained view of resilience. Extending F AAS to explicitly optimize suc h distributional guaran tees is an imp ortan t direction for future w ork. W e dene the user outage probability as the prop ortion of UEs that experience complete service disruption follow- ing AP failures, sp ecically when no activ e access p oints remain within their allo cated co op erative cluster. In the simplest case, if AP failures are indep endent and a user is serv ed by |M k | APs each with failure probability p f m , the outage probability reduces to ( p f m ) |M k | . While this expression captures the intuition that outage is driven by the joint surviv al of all assigned APs, in our analysis we ev aluate outage more generally by av eraging ov er random failure even ts across the net work. This pro vides a tractable metric ev en when failure probabilities dier among APs or when reliabilit y-w eigh ted clustering is used. Imp ortan tly , the outage probability is not only an ev aluation metric but also implicitly incorp orated into the F AAS design: by weigh ting AP selection with their surviv al probability (1 − p f m ) in (11) and by enforcing a minimum co op erative cluster size, F AAS proactively reduces the likelihoo d of user outage. Thus, F AAS b oth minimizes outage risk in the design phase and veries the impro v emen t through the dened metric. Connection to F ailure‑Agnostic Baselines: In traditional sc hemes where AP selection ignores failure risks, actual AP failures yield an unpredictable and often sharp drop in minim um-user SE, as active APs are remov ed arbitrarily . In contrast, F AAS an ticipates this thinning by selecting more reliable APs, those with higher surviv al probabilit y , ev en if their nominal channel is sligh tly weak er. This preemptiv e resilience is absent in failure-agnostic designs. The combined utility metric mak es F AAS stress-aw are, dy- namically adapting selections as failure conditions worsen. The minimum-user SE metric enforces user fairness and robustness. Net w ork T opology Considerations: The prop osed algo- rithms in this w ork hav e b een describ ed under the assump- tion of a star-top ology architecture, in which all APs are connected to a CPU. This abstraction simplies analysis and reects a commonly adopted mo del in the literature. The F AAS framework, how ev er, is largely agnostic to the ph ysical top ology because the AP selection dep ends only on large-scale fading coecients and failure probabilities, b oth of which can b e obtained locally and aggregated b y either a centralized CPU or distributed controllers. The user-cen tric nature of clustering ensures that only a small subset of neighboring APs co op erate for each user, which do es not rely on full netw ork centralization. Therefore, the Minimum Spectral Efficiency [bit/s/Hz] 0.5 1 1.5 2 CDF 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 All available APs Failure aware AP Selection Failure agnostic AP Selection α = 0 α = 1 (a) L = 400 APs and N = 1 an tenna Minimum Spectral Efficiency [bit/s/Hz] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CDF 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 All available APs Failure aware AP Selection Failure agnostic AP Selection α = 0 α = 1 (b) L = 100 APs and N = 4 an tennas Fig. 2. CDF of uplink minim um user rate for cell-free massive MIMO with MMSE combining and considering failure probability for APs, as previously men tioned α = 0 corresponds to a failure-free netw ork, and α = 1 reects maximum stress. same selection rule can b e applied in star, hierarchical, or O-RAN style disaggregated architectures, as long as basic c hannel and reliabilit y information is a v ailable. V. Numerical Analysis T o ev aluate the p erformance of the prop osed F AAS sc heme, we consider a simulation setup similar to that in [18]. Specically , we simulate a CF-mMIMO netw ork in which K = 100 UEs and M APs are indep endently and uniformly distributed ov er a 2 × 2 km area. T wo congurations are examined: (i) M = 400 single-an tenna APs ( N = 1 ), and (ii) M = 100 APs each equipp ed with N = 4 antennas. Both congurations yield the same spatial an tenna density of 100 antennas/km 2 , while main taining a consisten t user densit y of 25 UEs/km 2 . T o mitigate edge eects and emulate an unbounded net w ork, the wrap-around te c hnique is applied, allowing accurate mo deling of b oth path loss and interference. The sim ulation adopts the propagation model described in [18], whic h incorporates spatially correlated Ra yleigh fading. All APs are p ositioned 10 meters ab ov e the UEs to enforce a realistic minimum path loss, reecting practical urban deplo ymen t conditions. W e assume a coherence block length of τ c = 200 sym b ols, with τ p = 10 symbols allo cated to orthogonal pilot transmission. The remaining τ u = 190 symbols are reserv ed for uplink data transmission. Each UE transmits with a xed p ow er of p k = 100 m W, and the total system bandwidth is set to 20 MHz. Unless otherwise stated, all p erformance results corresp ond to the uplink phase, where b oth sp ectral eciency and user outage probability are ev aluated under v arying levels of AP failure intensit y α . F or the baseline failure probability p f m, 0 , we consider v alues in the range 0 . 01 – 0 . 1 , consistent with rep orted reliabilit y lev els of commercial radio units and p ow er systems. The stress parameter α ∈ [0 , 1] then scales these baseline v alues to emulate dierent netw ork conditions, from nominal op eration to highly stressed en vironmen ts. T o preven t degenerate cases where a user would b e asso ciated with only a single AP , we enforce a minim um co op erativ e cluster size of tw o APs p er user in all simula- tions. This practical safeguard reects realistic deploymen t considerations and ensures that the resilience ev aluation is not biased by fragile single-AP assignmen ts. W e compare the prop osed F AAS scheme against a baseline failure- agnostic clustering approac h. Fig. 2 shows the CDF of the minim um uplink sp ectral eciency for tw o CF-mMIMO setups and three AP selection schemes: All APs, F ailure-Agnostic, and F AAS. When α = 0 , F ailure-Agnostic and F ailure-A ware p erform iden tically since all APs are functional. The All APs case p erforms b est across all settings but is unrealistic and unscalable. When α = 1 , failures occur according to the predened AP failure probabilit y . In this condition, the “All a v ailable APs” case achiev es the b est p erformance, making it an idealized but unscalable b enchmark. The failure-agnostic sc heme suers a noticeable p erformance degradation, par- ticularly in the low er tail of the CDF, due to random AP selection that do es not account for failure likelihoo ds. In contrast, the prop osed F AAS approach signicantly outp erforms the failure-agnostic strategy by in telligen tly selecting APs based on their reliability . This conrms the b enet of incorp orating failure aw areness in to AP selection for improving user fairness and netw ork robustness under realistic failure conditions. Fig. 3 presents the av erage spectral eciency and user outage probability as a function of α . As failure inten- sit y increases, all sc hemes exp erience degradation. While the All APs scheme maintains the highest av erage SE, F AAS consistently outp erforms F ailure-agnostic clustering b y selecting more reliable APs. This highligh ts F AAS’s adv antage in balancing p erformance and resilience. Failure intensity ( α ) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Mean Spectral Efficiency [bit/s/Hz] 3 3.5 4 4.5 5 5.5 10 -3 10 -2 10 -1 User outage probability All available APs Failure aware AP Selection Failure agnostic AP Selection (a) L = 400 APs and N = 1 antenna Failure intensity ( α ) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Mean Spectral Efficiency [bit/s/Hz] 3 3.5 4 4.5 5 10 -3 10 -2 10 -1 User outage probability All available APs Failure aware AP Selection Failure agnostic AP Selection (b) L = 100 APs and N = 4 antennas Fig. 3. Impact of failure intensit y α on mean sp ectral eciency and user outage probability for CF-mMIMO with MMSE combining under dierent AP selection strategies. Note: The outage curve for “All av ailable APs” is omitted as it remains zero for all α ; with a logarithmic scale, it is not visible. Fig. 3 shows the eect of increasing failure in tensit y α on both mean sp ectral eciency and user outage probabilit y for tw o CF-mMIMO congurations. As failures b ecome more frequent, all schemes exp erience performance degradation. The All APs strategy maintains the highest sp ectral eciency and zero outage by assuming ideal full connectivit y , though it is not scalable in practice. In contrast, the prop osed F AAS approach consistently outp erforms the failure-agnostic metho d by selecting APs based on their reliability , thereby enhancing b oth sp ectral eciency and resilience. F AAS signicantly reduces the user outage probabil- it y , particularly under high α , where failure-agnostic clustering suers from increased service disruption due to random AP assignment. By proactively incorporating failure aw areness in to AP selection, F AAS ensures stronger user connectivity even under stressed netw ork conditions. When comparing the t w o setups, many single-an tenna APs ( L = 400 , N = 1 ) versus fewer multi-an tenna APs ( L = 100 , N = 4 ), the single-antenna conguration demonstrates superior resilience. It achiev es low er outage probabilities and higher av erage sp ectral eciency across the range of α . This improv emen t stems from enhanced macro-div ersit y: UEs with weak links b enet more from the broader spatial distribution of single-antenna APs than from the lo cal interference suppression oered by few er, more capable APs. VI. Conclusion This pap er prop osed a failure-aw are AP selection strat- egy for CF-mMIMO systems, aiming to enhance net- w ork resilience under realistic hardware failure conditions. By integrating channel quality and AP-specic failure probabilities through a tunable stress parameter α , the F AAS algorithm adaptively selects reliable APs per user. Sim ulation results demonstrate that F AAS signicantly impro v es minimum sp ectral eciency and reduces outage probabilit y compared to failure-agnostic clustering, espe- cially under moderate to high failure stress. Additionally , w e observed that dense deplo yments of single-an tenna APs oer b etter resilience than few er m ulti-an tenna APs due to increased macro-diversit y . These ndings underscore the imp ortance of incorp orating failure resilience as a core design asp ect in CF-mMIMO netw orks. F uture work will extend F AAS to disaggregated architectures lik e O-RAN, incorp orating distinct failure mo dels for dierent netw ork comp onen ts. In future work, F AAS could also b e ev aluated and optimized with resp ect to quantile-based performance metrics (e.g., 99% user rate guarantees), pro viding even stronger resilience assurances b eyond exp ectation-based analysis. References [1] H. Q. Ngo, A. Ashikhmin, H. Y ang, E. G. Larsso n, and T. L. Marzetta, “Cell-free massive MIMO versus small cells,” IEEE T ransactions on Wireless Communications, vol. 16, no. 3, pp. 1834–1850, 2017. [2] Y. Chu, M. Rahmani, J. Shackleton, D. Grace, K. Cumanan, H. Ahmadi, and A. Burr, “T estb ed developmen t: An intelli- gent O-RAN based cell-free MIMO netw ork,” arXiv preprint arXiv:2502.08529, 2025. [3] H. Ahmadi, M. Rahmani, S. B. Chetty , E. E. T siropoulou, H. Arslan, M. Debbah, and T. Quek, “T ow ards sustainabilit y in 6g and b eyond: Challenges and opp ortunities of op en ran,” arXiv preprint arXiv:2503.08353, 2025. [4] G. Interdonato, P . F renger, and E. G. Larsson, “Scalability aspects of cell-free massive MIMO,” in ICC 2019-2019 IEEE International Conference on Communications (ICC). IEEE, 2019, pp. 1–6. [5] M. R. Ghourtani, J. Zhao, Y. Ch u, H. Ahmadigures, D. Grace- gures, R. G. Maunder, and A. Burr, “Link-level ev aluation of uplink cell-free mimo in 5g nr ov er frequency-selectiv e c hannels,” IEEE Op en Journal of the Communications Society , 2025. [6] M. Eskandari, M. Rahmani, and A. G. Burr, “Netw ork slicing in o-ran-enabled cell-free massive mimo: A drl-based pow er con- trol,” in 2025 IEEE Wireless Communications and Netw orking Conference (WCNC). IEEE, 2025, pp. 1–7. [7] M. Rahmani, M. Bashar, M. J. Dehghani, P . Xiao, R. T afazolli, and M. Debbah, “Deep reinforcement learning-based p ow er allo- cation in uplink cell-free massive mimo,” in 2022 IEEE Wireless Communications and Netw orking Conference (WCNC). IEEE, 2022, pp. 459–464. [8] M. Rahmani, M. Bashar, M. J. Dehghani, A. Akbari, P . Xiao, R. T afazolli, and M. Debbah, “Deep reinforcement learning- based sum rate fairness trade-o for cell-free mmimo,” IEEE T ransactions on V ehicular T echnology , vol. 72, no. 5, pp. 6039– 6055, 2022. [9] S. Mohammadzadeh, M. R. Ghourtani, K. Cumanan, A. Burr, and P . Xiao, “Pilot and data p ow er control for scalable uplink cell-free massive mimo,” IEEE Op en Journal of the Communi- cations So ciety , vol. 6, pp. 10 829–10 844, 2025. [10] L. Khalo op our, Y. Su, F. Raskob, T. Meuser, R. Bless, L. Janzen, K. Abedi, M. Andjelk o vic, H. Chaari, P . Chakrab orty et al., “Resilience-by-design in 6g netw orks: Literature review and nov el enabling concepts,” IEEE access, 2024. [11] H. Q. Ngo, G. Interdonato, E. G. Larsson, G. Caire, and J. G. Andrews, “Ultradense cell-free massiv e MIMO for 6G: T echnical ov erview and op en questions,” Pro ceedings of the IEEE, 2024. [12] A. Cho wdhury and C. R. Murthy , “Ho w resilient are cell- free massive MIMO OFDM systems to propagation delays?” in 2023 IEEE 24th International W orkshop on Signal Pro cessing Adv ances in Wireless Communications (SP A WC). IEEE, 2023, pp. 581–585. [13] J. Sadreddini, O. Haliloglu, and A. Reial, “Distributed MIMO precoding with routing constraints in segmented fronthaul,” in 2023 IEEE 34th Annual International Symp osium on Personal, Indoor and Mobile Radio Communications (PIMRC). IEEE, 2023, pp. 1–6. [14] K. W einberger, R.-J. Reifert, A. Sezgin, and E. Basar, “RIS- enhanced resilience in cell-free MIMO,” in WSA & SCC 2023; 26th International ITG W orkshop on Smart Antennas and 13th Conference on Systems, Communications, and Co ding. VDE, 2023, pp. 1–6. [15] A. Elkesha wy , H. F ares, and A. Nafkha, “Robust learning- based sparse recov ery for device activity detection in grant-free random access cell-free massive MIMO: Enhancing resilience to impairments,” arXiv preprint arXiv:2503.10280, 2025. [16] W. Jiang and H. D. Schotten, “Nonlinear p o wer amplier- resilient cell-free massiv e mimo: A joint optimization approac h,” arXiv preprint arXiv:2506.22094, 2025. [17] E. Björnson and L. Sanguinetti, “Making cell-free massive MIMO comp etitive with mmse pro cessing and centralized im- plementation,” IEEE T ransactions on Wireless Communica- tions, vol. 19, no. 1, pp. 77–90, 2019. [18] ——, “Scalable cell-free massive MIMO systems,” IEEE T rans- actions on Communications, v ol. 68, no. 7, pp. 4247–4261, 2020. [19] O. Kella and A. Löpk er, “On binomial thinning and mixing,” Indagationes Mathematicae, vol. 34, no. 5, pp. 1121–1145, 2023.

Failure-Aware Access Point Selection for Resilient Cell-Free Massive MIMO Networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment