Collaborative Safe Bayesian Optimization

Collaborati v e Safe Bayesian Optimization Alina Castell Blasco, Maxime Bouton Ericsson Resear ch, Stockholm Abstract —Mobile networks r equire safe optimization to adapt to changing conditions in trafﬁc demand and signal transmission quality , in addition to impro ving service performance metrics. With the increasing complexity of emerging mobile networks, traditional parameter tuning methods become too conservative or complex to e valuate. F or the ﬁrst time, we apply safe Bay esian op- timization to mobile networks. Moreover , we develop a new safe collaborative optimization algorithm called C O SB O, leveraging information from multiple optimization tasks in the network and considering multiple safety constraints. The resulting algorithm is capable of safely tuning the network parameter online with very few iterations. W e demonstrate that the proposed method impro ves sample efﬁciency in the early stages of the optimization process by comparing it against the S A F E O P T - M C algorithm in a mobile network scenario. Index T erms —Safe Bayesian Optimization, Gaussian Pr ocesses, Artiﬁcial Intelligence for Mobile Networks, 5G and 6G Networks. I . I N T R O D U C T I O N Modern mobile networks consist of complex systems where key parameters, such as antenna tilt or beamwidth, directly in- ﬂuence the coverage quality experienced by users. Optimizing these parameters is essential for maintaining efﬁcient network performance and adapting to changing environmental condi- tions. Howe ver , ev aluating the impact of new conﬁgurations is complex. It requires simulations or testing with real-world systems which might take days and can be challenging due to safety constraints such as maintaining minimum coverage lev els or minimum signal quality . This paper addresses the problem of safely and efﬁciently optimizing performance-related parameters in mobile network systems, since the e valuation of each ne w parameter might take up to a day to complete. The goal is to develop a method that maximizes performance subject to safety constraints through online optimization, that is, by safely exploring parameter values directly in the network. In addition, our objective is to use data from collaborativ e agents to improve the sample efﬁcienc y of current methods. W ith the increasing complexity of modern mobile networks, such as 5G and future 6G networks, traditional methods like rule-based strategies have become too conserv ativ e. Promising approaches like reinforcement learning (RL) require exten- siv e ofﬂine training data or high-quality simulators, and safe ev aluation in the netw ork is challenging. Bayesian Opti- mization (BO), and in particular safe Bayesian Optimization (safe BO), addresses some of these limitations. Unlike RL- based methods that require training, safe BO is an online This w ork has been performed with support of Sweden’ s Inno vation Agency . optimization approach. It uses probabilistic models, typically Gaussian processes (GPs), to explore the parameter space and iterativ ely update the knowledge about performance while ensuring that each new ev aluation is safe. Safe BO uses fewer e valuations than RL methods. Howe ver , it requires an initial safe set which might be small or poorly located, hence increasing the number of ev aluations needed to reach an optimal conﬁguration, or causing the algorithm to be stuck local optima. W e propose a new algorithm called Collabora- tiv e Safe Bayesian Optimization (CoSBO) that builds on the standard safe BO algorithm for multiple constraints S A F E O P T - M C [1], introducing a collaborative initialization strategy . The proposed method differs from previous methods in sev eral ways. Unlike rule-based systems, it adapts dynamically to the data. In contrast to RL-based methods, it optimizes online with no pre-training required. Although BO already improv es ef ﬁ- ciency by balancing exploration and exploitation, we further improv e sample efﬁcienc y by integrating external data into the optimization process and show that it helps discovering new optimal regions. The ﬁrst contribution of this paper is CoSBO, the new al- gorithm that takes advantage of the similarity between agents, such as antennas in similar geographical or netw ork traf ﬁc con- ditions, to enrich the initial estimation model of the network performance. By identifying the most correlated collaborator among a set of network optimization problems and transferring a subset of its data, the optimization process starts with a more informed model, without requiring additional e valuations at the initial stage. The quality and av ailability of the collaborator data are key factors to consider , since the beneﬁt may be reduced if the selected collaborator is poorly correlated. The second contribution consists of applying a safe Bayesian optimization method to the telecommunications domain for the ﬁrst time. This is done through a careful modeling of safety and performance functions for mobile networks and is ev aluated through series of simulation-based experiments. The method is shown to reach high-performing conﬁgurations with very few parameter ev aluations in the network while maintaining formal safety guarantees. In general, this work contributes to the dev elopment of adaptiv e, data-ef ﬁcient, and safety-constrained optimization techniques for modern mobile networks. The outline of this paper continues with related work and background, follo wed by a description of the problem and the CoSBO algorithm, and ﬁnishes with details on the experi- ments performed, results obtained, and concluding remarks. The code for the algorithm is a vailable at: https://github . com/EricssonResearch/collaborativ e- safe- bo, including hyper- parameters and simulated datasets used in our experiments. I I . R E L AT E D W O R K Recent studies provide RL-based approaches that optimize parameters in mobile networks in a multi-agent setting, such as antenna tilt optimization [2]. These methods require an ofﬂine training phase, using a large number of simulated data, since it is too risky and expensiv e to retriev e samples from real systems, and lack safety guarantees. In contrast, our work focuses on online optimization, aims to reduce the number of ev aluations to limit risks and computational cost of training, and considers safety guarantees. Instead of RL-based methods, we rely on Berkenkamp et al.’ s algorithm, S A F E O P T - M C [1], in the core structure of our method. This algorithm balances exploration of the safe set of conﬁgurations and exploitation of the optimal one, similar to Sui et al.’ s algorithm, S A F E O P T [3]. Maggi et al. [4] applied Bayesian optimization to radio resource management in cellular networks. Ho wever , their approach does not account for safety constraints during optimization nor collaborativ e data. In this paper , we use Berkenkamp et al.’ s approach that formalizes performance and safety as separate functions to optimize and adapt it to mobile network scenarios. Compared to C O S B O , none of the safe BO algorithms provides a collaborative framew ork. Existing multi-agent op- timization works create simulated functions [5], or optimize multiple agents in parallel [6]. By contrast, we work with real- world performance functions, and we aim to safely optimize a single agent, respectiv ely . Other works [7] [8] deﬁne a global re ward function and a global safety constraint, which is useful for planning instead of optimization. Our method uses similarities between different parameters and network areas to both speed up the optimization of the safe BO method and broaden the exploration while maintaining safety guarantees. I I I . B A C K G R O U N D In this section, we present Gaussian processes (GPs) and the standard safe Bayesian optimization method, which are the mathematical foundations of our Collaborativ e Safe Bayesian optimization algorithm in Section V. A. Gaussian Pr ocesses (GPs) Optimization problems inv olve ﬁnding the best parameter conﬁguration to achieve the maximum possible performance. Generally , this performance or objective function is unknown, so we use models that can approximate the behavior of the system based on previous observations. These approximations are called probabilistic surrogate models, they construct esti- mations ˆ f of the true objecti ve function f based on observa- tions or ev aluations of this function and quantify conﬁdence in these predictions [9]. W e use Gaussian processes (GPs) as our probabilistic surrogate model since it is suitable for complex black-box functions and can explicitly account for noisy performance measurements, with the predictiv e variance reﬂecting this uncertainty [1]. GPs represent a probability distribution over functions, i.e., the estimated function values are modeled as samples of a Gaussian distribution [10]. That is, the predicted value at any point in the function comes with an uncertainty estimate. This predicted point represents the point of the true objectiv e function, and this conﬁdence is used to decide which point to ev aluate next. GPs construct a GP prior distribution when no ev aluations have been done on the objectiv e function, this distribution is f ( x ) ∼ G P ( µ ( x ) , k ( x , x ′ )) , x , x ′ ∈ X , where µ ( x ) is the prior mean function and k ( x , x ′ ) is the covariance function or kernel between two points x , x ′ of X . In this work, the parameter space X corres ponds to domains of network conﬁguration parameters such as the electrical tilt of an antenna. The mean function represents prior knowledge about the function and the kernel controls the smoothness of the functions [9]. The kernel we use is the squared exponential (RBF), formally described in [9], eq. (15.9), due to its particular smoothness properties and common usage in the safe optimization literature. Giv en that we have a set of points x and their associated observ ations y , the next step is to predict the v alue ˆ y at point x ∗ conditioned on the previously observed values. Thus, we construct the GP posterior distrib ution , the distrib ution of possible unobserv ed values conditioned on observed values [9]. B. Safe Bayesian Optimization Safe BO algorithms aim to maximize an unknown objecti ve function while ensuring that ev ery selected parameter satisﬁes safety constraints with high conﬁdence. The optimization is restricted to a safe set of parameters that is gradually expanded as the algorithm gains more information about the behavior of the system. The algorithm starts with an initial safe set of points S 0 ⊆ X where safety constraints are satisﬁed. It then iterativ ely explores the parameter space to expand the region classiﬁed as safe while balancing exploitation to ﬁnd the optimal v alue. The point of con vergence is one reachable from the initial safe region through safe steps. Thus, if the initial set is poorly deﬁned and the exploration hyperparameter is conservati ve, the algorithm may fail to explore beyond it and will con verge to the optimal point within the safe region explored [1] [3]. I V . P RO B L E M S TA T E M E N T In this section, we present a formal deﬁnition of the constrained optimization for mobile networks and the system model. A. Constr ained Bayesian optimization Our goal is to maximize an unknown performance function f : X → R , e.g. throughput per user or signal quality of an antenna, where X ⊆ R d is the domain of input parameters such as electrical tilt. Since f is not known a priori , we can only learn about it through ev aluations at selected parameter points x ∈ X , e.g . certain tilt values, which yield observations that are noisy because signal propagation is inﬂuenced by div erse en vironmental and netw ork factors, of the form: ˆ f ( x ) = f ( x ) + ω n , ω ∼ N (0 , σ 2 n ) , (1) where ω n is a Gaussian noise with zero mean and σ 2 n noise variance. The v ariance quantiﬁes the uncertainty of the predictions. T o optimize our objectiv e function f is to obtain its maximum value. For that, we construct an accurate estimate function ˆ f , or surrogate model, based on these noisy observations. Speciﬁcally , we sequentially evaluate a ﬁnite set of parameter points X = [ x 1 , x 2 , ..., x n ] and obtain corresponding noisy observations ˆ f ( x i ) for each x i ∈ X . The surrogate model leverages these observations to approximate the true underlying function f , guiding the search for its maximum. The e valuated parameter points must satisfy a set of safety constraints (e.g., fraction of users above a signal quality lev el) deﬁned by q unknown constraint functions of the form g i ( x ) ≥ 0 , g i : X → R , where i ∈ I g = { 1 , . . . , q } are the safety function indices that belong to the set of function indices I g ⊂ I . Similarly as the objective function, the safety constraints are estimated from noisy observ ations since they are unkno wn a priori . Neither the performance function nor the safety constraints are known. Howe ver , we assume the e xistence of an initial safe set of parameters S 0 ⊆ X known to satisfy the safety conditions. This starting point is deriv ed from domain knowledge. In telecommunications applications, prior experience and system kno wledge allo w experts to identify parameter conﬁgurations that are guaranteed to be safe, for e xample, based on current deployment practices. W ith our C O S B O algorithm we pro vide an initial safe set intelligently informed from experience of dif ferent antenna parameter tuning problems. The optimization process then proceeds iteratively: at each time step t , the algorithm selects a ne w parameter point x t ∈ X to ev aluate, ensuring that x t is safe according to the current model estimates. Only parameters within the feasible region (where all safety constraints are satisﬁed) are ev aluated. Formally , the optimization problem is deﬁned as: max x ∈X f ( x ) subject to g i ( x ) ≥ 0 , i = 1 , . . . , q . (2) Nev ertheless, as mentioned in Section III-B, we can only ﬁnd the optimal point reachable from the initial safe region and learn the safety constraints up to some accuracy ϵ . Therefore, we deﬁne the reachable safe set R ϵ ( S 0 ) as the set of parameters that can be safely explored starting from the initial set S 0 , giv en the estimation accuracy ϵ , and without going through unsafe parameters. Consequently , the true optimal value is deﬁned as: f ∗ ϵ = max x ∈ R ϵ ( S 0 ) f ( x ) s. t. g i ( x ) ≥ 0 , i = 1 , . . . , q , (3) where R ϵ ( S ) := S ∪ \ i ∈I g { x ∈ X | ∃ x ′ ∈ S : g i ( x ′ ) − ϵ ≥ 0 } (4) B. System Model Having introduced the formal problem, we no w describe a real mobile network application. This scenario takes place in a region composed of multiple urban areas, where each has its o wn size, population density , geographical topology , and Fig. 1. Illustration of two mobile network scenarios where the antenna parameter being adjusted on the left is the horizontal beamwidth, and on the right is the electrical tilt. Each base station includes three cells. Both scenarios show two negati ve outcomes (non connected users) resulting from beamwidth/tilt conﬁguration and insufﬁcient coordination. mobile network infrastructure with base stations and antennas. These factors inﬂuence the performance of the antennas and are reﬂected on their performance functions which may differ or show similarities among areas with comparable characteris- tics. The optimization problem begins when a new antenna is to be deployed: its performance function is initially unknown and must be optimized to identify the best conﬁguration, while simultaneously satisfying a safety constraint that guarantees minimum cov erage for the served cell. W e consider two performance functions ( f ) corresponding to different parameters illustrated in Fig. 1, beamwidth and tilt, in addition to a coverage safety constraint ( g ). W e deﬁne the following measurements based on Reference Signal Received P ower (RSRP) since it is proportional to the antenna gain and the antenna gain is a function of tilt and beamwidth as described in [11], eq. (3). W e compute the Signal over Interfer ence and Noise Ratio (SINR) according to [11], eq. (4), and the throughput according to [11], eq. (5). The beamwidth performance f is quantiﬁed by the number of users with good coverage and low interference, the tilt performance f by the a verage user throughput and the safety constraint g by the number of users with acceptable RSRP . The respective formulas are as follows: T otalUsers = N X u =1 I  Interference u < h interference ∧ RS R P u > h RSRP  (5) Throughput = 1 N N X u =1 Throughput u (6) RSRP count = N X u =1 I  RSRP i > h RSRP  , (7) where u corresponds to users, N is the total number of users, and the thresholds h interference and h RSRP thresholds correspond to the 50th percentile of interference and 5th percentile of RSRP across users in dBs, respectively , measured in the initial state and used as constants throughout the optimization. V . T H E C O SBO M E T H O D In this section, we introduce a novel collaborative extension of the S A F E O P T - M C algorithm [1]. Our algorithm lev erages multiple agents by identifying the one most similar to the main function and transferring its data to the main agent, thereby enhancing the optimization process. W e illustrate an example run in Fig. 2, and the full procedure is described in Algorithm 1. A. Collabor ative initialization str ate gy W e propose a collaborativ e strategy to enhance the sample efﬁcienc y of S A F E O P T - M C [1] by using information from external agents, achiev ed by incorporating observations from collaborator agents into the main agent’ s GP model during initialization. The key is to identify the collaborator whose behavior most closely correlates with the main agent by calculating the similarity between the main agent and the collaborators. Howe ver , since at initialization there is no direct data a v ailable from the main agent, we propose to use an adjacent optimization domain, as described next. W e deﬁne X A as the main set of input parameters (e.g., electrical tilt) ov er which we optimize the main agent (e.g., an antenna). Then, we assume that there exists another do- main, denoted X B , over a separate set of input parameters (e.g., horizontal beamwidth). Additionally , we assume that all agents, both collaborators and the main agents, hav e estimated objectiv e functions on this domain and we hav e access to them. This assumption could easily be satisﬁed by starting a safe optimization on one parameter that we know is safer , as we demonstrate using safe BO. Alternativ ely , one could use the history of network conﬁguration changes performed on the area being optimized. Afterwards, we compute the Pearson correlation [12] be- tween the data points of the collaborators and the main agent on domain X B (e.g., horizontal beamwidth), formally: ρ ( x X B , x ′ X B ) = E [( x − µ x )( x ′ − µ x ′ )] σ x σ x ′ , (8) where x , x ′ ∈ X B represent the main agent and collaborator agents, and ( µ x , σ x ) and ( µ x ′ , σ x ′ ) correspond to their statisti- cal estimates, respectiv ely . Next, the collaborator with highest correlation is identiﬁed and, on the main input domain X A , we select the k most correlated samples, by default k = 10 , from its GP posterior distribution. These samples are then used to initialize the GP model of the main agent for optimization. Importantly , while the correlation is computed using X B , the transferred data originates from X A under the assumption that pairwise similarity across collaborators in X B reﬂects that in X A . In other words, two network areas that show strong correlation when being optimized for beamwidth will most likely be correlated for other parameters as well. This is prov en to hold for our simulation experiments in Section VI-C. B. The C O S B O algorithm Algorithm 1 starts by ﬁtting the GP models from a known safe initial parameter value x 0 (e.g., a tilt value) and its Algorithm 1 CoSBO Input: Initial safe parameter x 0 ∈ S 0 ⊆ X A , G P priors (( µ 0 , σ 0 , k ) , ( µ i , σ i , k i )) , ∀ i ∈ I g , G P posterior ˆ f X B , collaborator sets C X A , C X B 1: Initialize G P ← x 0 , ˆ f ( x 0 ) , ˆ g i ( x 0 ) , ∀ i ∈ I g 2: Identify most correlated collaborator: z c = ρ ( ˆ f X B , ˆ f C X B ) 3: Collaborator data: ( ˆ f ( x ) , ˆ g i ( x )) ∈ C X A , ∀ i ∈ I g 4: D k ← { top k points from collaborator data } 5: f or d = 1, . . . , k do 6: Let x d ∈ D k 7: Update G P ← x d , z c , ( ˆ f ( x d ) , ˆ g i ( x d )) ∈ C X A , ∀ i ∈ I g 8: end f or 9: f or t = 1, . . . do 10: Run S A F E O P T - M C 11: end f or measures ˆ f ( x 0 ) , ˆ g i ( x 0 ) (e.g., throughput per user and number of users with acceptable RSRP). Then, selects the most corre- lated collaborator using the Pearson correlation ρ ( ˆ f X B , ˆ f C X B ) between the main agent’ s GP posterior and the collaborator agents from the set C X B . After that, it selects the collabo- rator data from the main set C X A and identiﬁes the k most informativ e data points (the most correlated ones), which are stored in the set D k . Then, updates the G P model of the main agent’ s performance and safety functions with high-quality prior information. The optimization process then proceeds with the S A F E O P T - M C [1] algorithm. The visualization in Fig. 2 represents the optimization process of the horizontal beamwidth parameter of an antenna that aims to maximize the number of users with good coverage and low interference (performance function f ) while ensuring a minimum number of users with acceptable RSRP (safety constraint g ). The algorithm starts with the estimation of both the performance and safety function based on the transferred data from a collaborator (orange crosses). W ith the GP (light blue curv e) and the conﬁdence intervals (light blue areas), the algorithm is able to determine current observations and nearby parameter points as safe (green set). At each iteration, an ev aluation (red cross) of the real objectiv e function (gray curve) is done and added to the GP posterior . The selected point to ev aluate belongs to the safe set and is chosen because it either expands this safe set or maximizes the performance. The safety threshold (gray dashed line) indicates that the points below it are unsafe. In this example, the addition of collaborativ e data has created three separate safe regions, the algorithm takes advantage of this to explore all of them; by increasing the safe set of points, we improve the sample efﬁcienc y of the process. The data points obtained from the optimization process are ev aluations of the real objectiv e function, therefore, the GP models should ha ve more conﬁdence on those points than on the transferred ones from the collaborator . This is formally Fig. 2. Optimization process of the horizontal beamwidth of an antenna using the CoSBO algorithm to maximize the performance function (gray curve on the top) while satisfying the safety constraint (gray curve on the bottom). The visualization shows the progress of the estimated functions (light blue curves) at the ﬁrst, ﬁfth, and tenth iterations. The algorithm starts with collaborator data (orange crosses) that is added as observed data (black crosses). Then, based on the GP posterior and its conﬁdence intervals (light blue areas), it iterati vely selects and evaluates new data (red crosses) that are abov e the safety threshold (gray dashed line) and within the current safe set (green set). The input parameter region is truncated on the plot. deﬁned by e xtending the performance function to incorporate a context variable f : X × Z → R . Therefore, for each input x ∈ X the GP model is augmented with an associated conte xt v alue z ∈ Z and measurements are of the form ˆ f ( x , z ) [13]. On the one hand, the context value associated to the collaborator data is the correlation coefﬁcient between the main agent and the collaborator , formally , ρ ( x X B , x ′ X B ) = z c ∈ Z . On the other hand, the value associated to the measurements from the main agent is z m = 1 , the maximum Pearson correlation coefﬁcient which is 1 [12]. Hereby , the transferred data is of the form ( x , z c ) and is quantiﬁed with a higher uncertainty than the data e v aluated from the real objective function ( x , z m ) . Quantifying the context is essential to prev ent low-quality collaborators from degrading performance, since all trans- ferred samples are assigned lower conﬁdence than ev aluations obtained directly from the main agent. Consequently , if a col- laborator has misleading correlations, its inﬂuence decreases rapidly as real e valuations are accumulated. In the worst-case scenario, where none of the transferred data points contrib ute useful information for identifying the objective function, the effecti ve optimization process begins once the main agent’ s own ev aluations are incorporated. A safeguard mechanism against misleading correlations is proposed in Section VII. C. The S A F E O P T - M C algorithm This algorithm guarantees safety , and balances exploration and exploitation; that is, it iterativ ely selects points that can either expand the safe set of parameters (e.g., explore a region of the network with possibly poorer coverage) or maximize the objective function (e.g., maximize recei ved signal quality). T o ensure safety , the algorithm deﬁnes conﬁdence bounds around the G P posterior of both f and g i with its predictiv e posterior mean µ t ( x ) and variance σ 2 t ( x ) , which contain the true functions f and g i with high probability [1]: Q t ( x ) := [ µ t − 1 ( x ) ± β 1 / 2 t σ t − 1 ( x )] , (9) where the hyperparameter β controls the desired conﬁdence lev el. These conﬁdence interv als are used to compute safe regions and guide the selection of the next query point to ev aluate. A point is considered safe if its lower conﬁdence bound l ( x ) remains above the safety threshold h . The set of safe points are deﬁned as the safe region: S t = \ i ∈ I g [ x ∈ S t − 1 { x ′ ∈ X | l i t ( x ) ≥ h } , (10) which contains all points in the set S t − 1 [1]. T o trade off between exploration and exploitation, the algorithm identiﬁes two subsets of the safe set at ev ery iteration t : the set of potential maximizers (M), that could improve the estimate of the maximum, and potential expanders (E), that could e xpand the safe region. The maximizers are the safe points whose upper conﬁdence bound ( u ( x ) ) is higher than the largest lo wer conﬁdence bound ( l ( x ) ) of the performance function f . The expanders are the safe points that would produce a posterior distribution with a larger safe set if they were added to the G P of the safety functions g i . These points are the most uncertain, providing the most information and naturally lying near the boundary of the safe region [9]. These sets are originally deﬁned by Sui et al. [3]. Every time a new point is added to the GP model, the conﬁdence bounds Q , the safe set S , the maximizers set M and expanders set E are updated. This new point x t is the Fig. 3. T wo of the ﬁve simulator-generated topological maps for trafﬁc volumes 15 × 10 7 bit s − 1 . The mobile network is comprised by base stations (macro, red stars), the users (green dots), the obstacles or indoor areas (white squares), and the outdoor areas (black squares). The indoor-outdoor probability corresponds to a 0 . 5 q uantity . one with highest predictiv e uncertainty , thereby balancing safe exploration and performance improvement: x t = arg max x ∈ E t S M t max i ∈ I w n ( x , i ) , (11) w n ( x , i ) = u t ( x ) − l n ( x ) (12) This iterative process of expanding the safe set of points, exploiting promising regions, and exploring most uncertain ones, continues until no new safe points can be identiﬁed or no potential impro vement is possible. V I . E X P E R I M E N T S T o assess the performance of C O S B O we conduct ex- periments on a set of simulated mobile network scenarios and compare our algorithm against S A F E O P T - M C [1]. In particular , we aim to answer the following questions: Does C O S B O impro ve sample-efﬁcienc y? Is using data from a poorly correlated collaborator still more sample efﬁcient than no collaboration at all? A. Simulation En vir onment The experiments were carried out using a proprietary system lev el simulator that simulates realistic behavior of antennas and base stations modeling the signal propagation [14], and using 3GPP models [15] for NR urban macro scenarios. W e constructed ﬁve artiﬁcial maps to emulate the topology of a city based on a default square layout and placed sev en to twelve base stations according to a Poisson point process, providing spatial randomness in their distribution, with a minimum intersite distance. An example of these maps can be seen in Fig. 3. In addition to the topological v ariability of the different maps, we incorporated trafﬁc load variations which represent the amount of data moving across a network. W e deﬁned three load lev els: low , medium and high, correspond to a percentage of utilization of 5 − 10 % , 30 % and 60 − 80 % , and total trafﬁc volumes of 5 × 10 7 bit s − 1 , 15 × 10 7 bit s − 1 and 25 × 10 7 bit s − 1 , respectively . The trafﬁc volume is spread uniformly across uniformly distributed users in the map. All these factors inﬂuence the performance and safety functions producing unique network optimization problems for e very scenario. With 5 maps and 3 load lev els we have a total of 15 different network scenarios. The simulated data is available in our code repository . The optimization task was conducted over two parameters of the antennas: the horizontal beamwidth and the electrical tilt. These parameters were modeled using a urban macro propagation model with frequency of 2GHz using the standard reference of 3GPP TR 38.921 [15]. They inﬂuence the radia- tion pattern of the antenna and directly impact user experience metrics. The beamwidth determines the angular spread of the antenna’ s main lobe in the horizontal plane, narrower beamwidths result in poorer signal cov erage at the boundary of the cell sector (beam distortion is more likely to occur) while wider ones results in good signal coverage. The tilt controls the v ertical orientation of the antenna, narro wer tilts result in more focused coverage areas, which can increase signal strength but may cause some users to lose coverage; while wider tilts ensure no user loses coverages but also introduce interference if not properly aligned with the rest of antennas. B. Hyperpar ameter conﬁguration The hyperparameter conﬁguration we used to setup the topological map is described in T able I. In addition, we used a unique GP regression model to estimate each of the performance and safety functions that also has tunable hyperparameters including kernel function parameters, noise variance, and constants speciﬁc for the algorithms C O S B O and S A F E O P T - M C [1], all of which are described in T able II. W e tune the values of the variance and lengthscale, which controls the smoothness of the function, ho w quickly is it expected to change in terms of y values. The chosen value for the lengthscale is ℓ = 1 , as it provides a good balance of smoothness and is a standard choice in the literature. The noise variance is assigned dif ferent values for the performance and safety functions to rely more conﬁdently on safety estimates while accounting for potential variability in performance mea- surements. In the S A F E O P T - M C algorithm we use the explo- ration constant β which balances e xploration and e xploitation while preserving safety; we choose the value β = 2 since higher v alues focus on exploring the parameter space instead of exploiting it, while lower values focus on maximization and get stuck in local optima. In the C O S B O algorithm we deﬁne the safety threshold h , which determines the safe regions that can be explored. Since the safety constraint corresponds to the number of users with a signal quality level (RSRP) higher than − 130 dB (Eq. (7)), the threshold h is set to the 50th percentile of users that must have coverage. Moreov er , we scaled all the experimental scenarios since small dif ferences in the function values can greatly affect the correlation coefﬁcient and v alues outside the range of values of the GP prior distrib ution might be less likely to be observed, leading to inaccurate predictions. W e applied a Min-Max scaling to all the scenarios using scenario-speciﬁc minimum Fig. 4. Optimization method comparison for tilt and beamwidth using high-quality collaborators (on the left) and low-quality ones (on the right). Curv es correspond to the optimum obtained at each iteration of the process for a budget of 60 iterations. The shaded areas represent the 25 % - 75 % inter-quartile range of values obtained over the 15 s imulated networks times the 10 d ifferent starting points. CoSBO method is displayed in blue, SafeOpt-MC in orange, and an unsafe random exploration method in violet red. T ABLE I T O P O L O G I C A L M A P PAR A M E T E R S . Square Map Edge length 2000 m Number of bins 5000 Poisson Distribution λ m − 2 4 × 10 6 Min. intersite distance 400 m Deployment Number of users 1000 Sectors per site 3 T ABLE II G P M O D E L S A N D C O S B O PAR A M E T E R S . Kernel σ 2 ℓ f noise σ 2 g noise σ 2 β Threshold h k 0.5 1 1 × 10 − 4 1 × 10 − 5 2 0.4 10 and maximum values to normalize all function outputs to the [0 , 1] interval. This pre-processing step was possible because the functions were computed from simulations, allowing the true minimum and maximum values to be known beforehand. If those were not known, estimates based on collaborator data or domain-speciﬁc kno wledge could be used. In practice, his- torical performance management counters from a live network could be used to perform this normalization step. C. F ast online optimization in mobile networks W e measure the sample ef ﬁciency of the methods in terms of the highest average value of the objectiv e function found at each iteration in all scenarios of each input parameter , represented in each curve of Fig. 4. The experiments were repeated 10 times with dif ferent safe initial starting points randomly selected within a safe range. Our algorithm is represented in a blue curve, the baseline algorithm, S A F E O P T - M C [1], in an orange curve, and an unsafe random exploration process in violet red. The latter is a method that searches for the optimum by randomly exploring the entire parameter space. Since it does not consider a safety constraint nor a safe region of parameters, it randomly chooses any point from the entire parameter space and rapidly con ver ges tow ards the true optimal v alue. Kno wing that its conv ergence is faster than that of the safe optimization algorithms, we use it as a reference for the other two algorithms. W e separate the results by network parameter being tuned because their performance function values are too dif ferent to be plotted jointly . W e display the curves with conﬁdence intervals to represent the complete range of values, which are computed over the 15 simulated networks. In the graphic in the upper left of Fig. 4, the C O S B O (blue curve) reaches the asymptotic optimal value in fe wer iterations than the S A F E O P T - M C (orange curve); this is attributed to the addi- tional data receiv ed from the collaborator in the early stages of optimization. In particular , the optimal value of the tilt parameter is reached about sev en iterations earlier with the C O S B O algorithm than with S A F E O P T - M C , which in a liv e mobile network deployment can translate into an advantage of sev en days in the optimization process since measuring the performance function is often done using daily aggregate of network performance counters [2]. In the bottom left plot of Fig. 4, both algorithms (blue and orange curves) ov erlap no- tably with their conﬁdence intervals, meaning that they obtain similar optimal values at each iteration step, still, on average C O S B O obtains higher optimal values during the initial stage of the process. Therefore, the optimization results using the most correlated collaborator (i.e., high-quality) demonstrate that C O S B O is more sample ef ﬁcient than S A F E O P T - M C . Furthermore, our method con ver ges with fewer ev aluations compared to RL-based approaches in a similar simulation en vironment. Mendo et al. [2] (Figures 6 and 7) rely on the same simulation environment and report that their pre- trained agent requires at least 10 steps to ﬁne-tune to a new network scenario on the tilt optimization problem. Our approach achiev es it using fewer steps without the need for pre-training. A comprehensiv e comparison of BO and RL would signiﬁcantly widen the scope of this paper . Generally , GP based optimization may scale less than RL in terms of space complexity but requires orders of magnitude fewer samples. W e believe the overall trade-off strongly fav ors BO for the problem at hand. Exploring hybrid RL-BO strategies is an interesting future direction. W e further ev aluate the robustness of our collaborativ e strategy by testing it with lower -quality collaborators, i.e., the least correlated ones. The correlation coefﬁcients of these collaborators is, in av erage, a 0 . 39 f or the beamwidth pa- rameter functions and a 0 . 15 f or the tilt parameter functions across the 15 scenarios. These are signiﬁcantly low values considering that the maximum Pearson coefﬁcient is 1 a nd a coefﬁcient of 0 i ndicates no correlation [12]. In the plots on the right side of Fig. 4 the two compared algorithms perform similarly for both optimized parameters, as indicated by the considerable o verlap of their conﬁdence intervals and their con vergence to the asymptotic optimal value in similar itera- tions. These graphics show that when the chosen collaborator is not correlated at all, the algorithm does not gain information. Howe ver , the performance of our algorithm does not worsen; in fact, it achiev es optimal values comparable to S A F E O P T - M C [1]. Thus, ev en under limited collaboration, our algorithm remains as sample-efﬁcient as S A F E O P T - M C. This results from the smart use of the context variable computed with the correlation coefﬁcient of the selected collaborator discussed in Section V -B. Furthermore, our collaborativ e strategy relies on the assumption that pairwise similarity across collaborators in domain X B reﬂects that in the main domain X A . The fact that the C O S B O algorithm achiev es a higher sample efﬁcienc y with high-quality collaborators than with low-quality ones supports this assumption. V I I . C O N C L U S I O N S A N D F U T U R E W O R K W e presented a collaborative extension of the safe Bayesian optimization framew ork that enhances the sample-efﬁciency by le veraging observ ations collected from other agents de- ployed in similar , but not identical, en vironments. The method accelerates the identiﬁcation of safe and high-performing regions in the parameter space by integrating external data at early stages into the optimization process, i.e., directly into the Gaussian process model. W e demonstrated ho w to model mobile network optimization as a safe Bayesian optimization problem and how to apply S A F E O P T - M C [1]. Through real- istic simulations, we demonstrated that our proposed collab- orativ e strategy impro ves sample efﬁcienc y when using high- quality collaborators. In addition, we prove that our method achiev es comparable efﬁciency to S A F E O P T - M C even under low-quality collaborators. C O S B O is particularly ef fecti ve during the early stages of optimization and recommended in scenarios where safe e xploration is essential and the av ailable ev aluation budget is limited or complex. Finally , although our experiments rely on simulations, the proposed method could be implemented in a real network within the O-RAN architecture running as an rApp in a centralized non real time Radio Intelligent Controller (RIC) [16]. Future work includes exploring estimation-based ap- proaches to improve collaborator selection and looking into uncertainty-aware correlation metrics. Extending the telecom- munications application to multiple safety constraints which could inﬂuence the selection of collaborators is a straightfor- ward extension, as well as expanding the set of parameters to be tuned, which we refrained to do to focus the analysis on the collaborativ e aspect. Adding a safeguard approach to require a minimum Pearson correlation v alue for a collaborator to be used. Another avenue of future w ork could be to combine the collaborativ e safe exploration strategy proposed in this work to reinforcement learning based methods. R E F E R E N C E S [1] F . Berkenkamp, A. Krause, and A. P . Schoellig, “Bayesian optimization with safety constraints: Safe and automatic parameter tuning in robotics, ” Mac hine learning , v ol. 112, no. 10, pp. 3713–3747, 2023. [2] A. Mendo, J. Outes-Carnero, Y . Ng-Molina, and J. Ramiro-Moreno, “Multi-agent reinforcement learning with common policy for antenna tilt optimization, ” arXiv pr eprint arXiv:2302.12899 , 2023. [3] Y . Sui, A. Goto vos, J. Burdick, and A. Krause, “Safe exploration for optimization with gaussian processes, ” in International confer ence on machine learning , 2015. [4] L. Maggi, A. V alcarce, and J. Ho ydis, “Bayesian optimization for radio resource management: Open loop po wer control, ” IEEE J ournal on Selected Areas in Communications , vol. 39, no. 7, pp. 1858–1871, 2021. [5] J. L ¨ ubsen, C. Hespe, and A. Eichler, “T owards safe multi-task bayesian optimization, ” in 6th Annual Learning for Dynamics & Contr ol Confer ence , 2024. [6] Q. Chen, L. Jiang, H. Qin, and R. A. Kontar, “Multi-agent collaborati ve bayesian optimization via constrained gaussian processes, ” T echnometrics , pp. 1–23, 2024. [7] Z. Zhu, E. Bıyık, and D. Sadigh, “Multi-agent safe planning with gaussian processes, ” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IR OS) , 2020. [8] A. P . V inod, S. Safaoui, A. Chakrabarty, R. Quirynen, N. Y oshikawa, and S. Di Cairano, “Safe multi-agent motion planning via ﬁltered reinforcement learning, ” in 2022 International Conference on Robotics and Automation (ICRA) , 2022. [9] M. J. K ochenderfer and T . A. Wheeler, Algorithms for Optimization . The MIT Press Cambridge, 2019. [10] C. K. W illiams and C. E. Rasmussen, Gaussian pr ocesses for machine learning . MIT press Cambridge, MA, 2006, vol. 2. [11] H. Farooq, A. Imran, and M. Jaber, “Ai empowered smart user association in lte relays hetnets, ” in 2019 IEEE International Conference on Communications W orkshops (ICC W orkshops) , 2019. [12] D. Na varro and D. Foxcroft, Learning Statistics with jamovi: A T utorial for Be ginners in Statistical Analysis . Jan. 2025, I S B N : 978-1-80064-937-8. [13] A. Krause and C. S. Ong, “Contextual g aussian process bandit optimization, ” in Advances in Neural Information Pr ocessing Systems 24: 25th Annual Conference on Neural Information Pr ocessing Systems 2011. Pr oceedings of a meeting held 12-14 December 2011, Granada, Spain , 2011. [14] H. Asplund, M. Johansson, M. Lundev all, and N. Jald ´ en, “A set of propagation models for site-speciﬁc predictions, ” in 12th Eur opean Conference on Antennas and Propa gation (EuCAP 2018) , 2018. [15] 3GPP, “5g; study on international mobile telecommunications (imt) parameters for 6.425 - 7.025 ghz, 7.025 - 7.125 ghz and 10.0 - 10.5 ghz ( release 17), ” 3GPP, T ech. Rep. 3GPP TR 38.921 version 17.1.0 (2022-05)), 2022. [16] O-RAN Alliance, “O-ran architecture description, ” O-RAN Alliance, T ech. Rep. O-RAN.WG1.TS.O AD-R004-v15.00, 2025. Fig. 5. Optimization method comparison for tilt and beamwidth using high-quality collaborators and a GP model with lengthscale value 4 Curv es correspond to the optimum obtained at each iteration of the process for a budget of 60 iterations. The shaded areas represent the 25 % - 75 % inter- quartile range of v alues obtained over the 15 s imulated networks times the 10 d ifferent starting points. CoSBO method is displayed in blue, SafeOpt- MC in orange, and an unsafe random exploration method in violet red. A P P E N D I X T o further understand the behavior of both safe algorithms, C O S B O and S A F E O P T - M C [1], we conducted additional experiments by modifying a key GP hyperparameter: the kernel lengthscale ℓ , which inﬂuences the smoothness of the estimated functions ˆ f , ˆ g . W e observe the effect of a larger lengthscale value ( ℓ = 4 ) compared to the standard conﬁg- uration used previously ( ℓ = 1 ). The smoothness of the GP posterior increases with the lengthscale, which also affects the width of the conﬁdence intervals used to guide the exploration [10]. Smoother models produce narrower intervals near the observations, encouraging the selection of points farther from the known data. This leads to increased exploration, but also raises the risk of selecting unsafe points due to the wider safe region. Figure 6 presents additional results for the optimization of the horizontal beamwidth parameter , similar to those in Fig. 2 of the main text, but using the same topological map under a different trafﬁc v olume. The plot illustrates an increase in smoothness in the estimated function. Analogously to Fig. 4, in Fig. 5 we measure the sample efﬁcienc y of the methods using a GP model with lengthscale 4. The results indicate that smoother GPs better capture the tilt parameter functions (top), Fig. 6. Optimization process of the horizontal beamwidth of an antenna using the CoSBO algorithm to maximize the performance function (gray curve on the top) while satisfying the safety constraint (gray curve on the bottom). The visualization shows the progress of the estimated functions (light blue curves) at the ﬁrst, ﬁfth, and tenth iterations using a smooth GP model with lengthscale 4 The algorithm starts with collaborator data (orange crosses) that is added as observed data (black crosses). Then, based on the GP posterior and its conﬁdence intervals (light blue areas), it iteratively selects and evaluates new data (red crosses) that are abov e the safety threshold (gray dashed line) and within the current safe set (green set). which are inherently smooth, while slightly decreasing the performance on the beamwidth functions (bottom), which are highly v ariable. For the beamwidth optimization problem, we can see that our method still provides signiﬁcant impro vements in sample ef ﬁciency .

Collaborative Safe Bayesian Optimization

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment