Efficient Counting and Simulation in Content-Oblivious Rings

Eﬃcient Counting and Simulation in Content-Oblivious Rings Jérémie Chalopin #  Aix Marseille Univ, CNRS, LIS, Marseille, F rance Yi-Jun Chang #  National Univ ersit y of Singap ore, Singap ore Giusepp e A. Di Luna # DIA G, Sapienza Univ ersit y of Rome, Italy Haoran Zhou #  National Univ ersit y of Singap ore, Singap ore Abstract In the conten t-oblivious (CO) model (prop osed by Censor-Hillel et al. [ 6 ]), pro cesses inhabit an async hronous netw ork and comm unicate only b y exc hanging pulses: zero-size messages that conv ey no information b ey ond their existence. A series of w orks has clariﬁed the computational p o w er of this mo del. In particular, it was sho wn that, when a leader is present and the net w ork is 2-edge-connected, conten t-oblivious comm unication can sim ulate classical async hronous message passing. Subsequent results extended this equiv alence to leaderless oriented and unoriented rings, and, under non-uniform assumptions, to general 2-edge- connected net w orks. While these results are decisive from a computability standpoint, they do not address eﬃciency . The simulator of Censor-Hillel et al. requires O ( n 3 b + n 3 log n ) pulses to em ulate the send of a single b -bit message, making it impractical ev en on modest-size netw orks In this pap er, we therefore fo cus on message-eﬃcien t computation in CO netw orks. W e study the fundamen tal problem of counting in ring top ologies, b oth b ecause knowing the exact net work size is a basic prerequisite for many distributed tasks and b ecause counting immediately implies a broad class of aggregation primitives. W e give an algorithm that counts using O ( n 1 . 5 ) pulses in anon ymous rings with a leader, an O ( n log 2 n ) algorithm for coun ting in rings with IDs. Moreov er, w e sho w that an y coun ting algorithm in CO requires Ω( n log n ) pulses. In terestingly , in the course of this in v estigation, we design a simulator for classic message passing that enables parallel neighbor-to-neighbor communication: in one simulated round, each pro cess can send a b -bit message to each of its neighbors using only O ( b ) pulses p er pro cess. The sim ulator extends to general 2-edge-connected netw orks, after a pre-pro cessing step that requires O ( n 8 log n ) pulses, where n is the num ber of pro cesses, allowing thus eﬃcien t simulation of asynchronous message passing in general 2-edge-connected net w orks. 2012 ACM Subject Classiﬁcation Theory of computation → Distributed algorithms Keyw o rds and phrases Asynchronous Systems, Conten t-Oblivious Netw orks, Leader Election, Ori- en ted Rings F unding This work has b een partially supp orted by ANR pro ject MIMETIQUE (ANR-25-CE48- 4089-01), b y the Ministry of Education, Singapore, under its Academic Researc h F und Tier 1 (24-1323-A0001), and b y the Italian MUR National Recov ery and Resilience Plan funded by the Europ ean Union - NextGenerationEU through projects SERICS (PE00000014). Contents 1 In tro duction 1 1.1 Con tributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Related w ork 4 3 System model 5 4 Comp osabilit y 6 5 Coun ting in anon ymous ring with a leader using O ( n 1 . 5 ) messages 7 6 Sim ulating the lo cal model: Message exchange on rings 13 6.1 CCWBitSending Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6.1.1 Correctness of CCWBitsending . . . . . . . . . . . . . . . . . . . . . . 16 6.2 ComputeOR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.2.1 Correctness of the ComputeOR Algorithm . . . . . . . . . . . . . . . . . 19 6.3 Correctness of the MsgExchange Algorithm . . . . . . . . . . . . . . . . . . . 23 7 Message exchange on 2-edge-connected netw orks 24 8 Computing self-decomp osable aggregation functions 26 9 Minim um ﬁnding and multiset computation 28 9.1 Minim um ﬁnding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 9.2 Multiset computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 10 Low er b ound 30 11 Conclusions and op en problems 33 J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 1 1 Intro duction A k ey asp ect of distributed computing research is the design of computationally eﬃcient primitiv es under diﬀeren t communication mo dels. A wealth of research has studied ho w to create eﬃcient algorithms, measured in terms of message complexity , giving us optimal solutions for foundational primitives suc h as consensus, reliable broadcast, leader election, coun ting, etc. A deﬁning asp ect of the message-passing models is the amoun t of information that a message may carry . The usual sp ectrum in this regard go es from the LOCAL mo del, where messages ha v e unrestricted size, to the CONGEST mo del, where messages are of logarithmic size, down to mo dels where messages are of constant size. In this pap er, we fo cus on async hronous systems where pro cesses exchange messages that carry no conten t. This is the conten t-oblivious mo del (CO mo del) of Censor-Hillel et al. [ 6 ]. In this mo del, pro cesses communicate ov er a bidirectional netw ork, under an async hronous sc heduler, b y exc hanging pulses. A pulse carries no information, not even the identit y of the sending pro cess. This implies that the only information av ailable to a pro cess is the count of the pulses received on each lo cal p ort. A con ten t-oblivious mo del may arise in the case of fully defective channels, that is, when an adversary is able to corrupt the conten t of all messages in transit but cannot delete or inject new messages. It is worth noting that there are other realistic physical mo dels in which comm unication is based on pulses. Examples include ultraviolet communication (where light pulses are used to comm unicate [ 27 ]), neural systems (where electrical spikes across dendrites are used), and asynchronous circuits (where the absence of a clo ck signal requires pulse-based comm unication). Censor-Hillel et al. [ 6 ] show ed that this w eak mo del is equiv alen t to asynchronous message passing when a leader is present, and the netw ork is 2-edge connected. F urther results [ 17 , 7 , 8 ] ga v e leader election algorithms for rings and, in the non-uniform case (that is, when an upp er b ound on the netw ork size is kno wn), for general 2-edge connected netw orks, showing that in these cases the CO mo del and classic message passing are computationally equiv alen t. Ho w ev er, computational equiv alence says nothing ab out message eﬃciency . The simulator of Censor-Hillel et al., although a remarkable landmark from the p oint of view of computability equiv alence, is highly ineﬃcient, generating, in the worst case, O ( n 3 b + n 3 log n ) [ 6 , Theorem 2] CO messages for each b -bit message sent b y the sim ulated algorithm. This high num b er of messages makes the computation impractical even in netw orks of mo dest size. In this pap er we fo cus on message-eﬃcien t computations for CO netw orks. W e fo cus in particular on the problem of counting the n um b er of pro cesses in a ring netw ork. This problem is imp ortant p er se, as the knowledge of the exact netw ork size is a requirement for man y algorithms and computations. Moreov er, assuming that each pro cess p i starts with an input i i ∈ I , counting the num b er of o ccurrences of each input allows us to compute many functions of the pro cesses’ inputs, suc h as min , max , av erage, median, mo de, and similar aggregates. Due to its centralit y counting has b een inv estigated in several w orks [ 13 , 12 , 23 ] but never in the CO setting. Quite interestingly , this study of the counting problem leads us to the design of a simulator from a lo cal mo del to the CO mo del that is more eﬃcient than the one of Censor-Hillel et al. and works for any 2-edge-connected netw ork. 2 Eﬃcient Computing in Content-oblivious Communication 1.1 Contributions W e study distributed computation in oriented rings of n pro cesses communicating via c ontent- oblivious messages. Throughout, w e assume the presence of a unique leader. In rings where pro cesses hav e unique IDs, this assumption is without loss of generality , as a conten t-oblivious leader election algorithm can b e executed as a prepro cessing step [7, 8, 17]. Self-decomp osable aggregation. Our primary algorithmic goal is to compute global aggrega- tion functions ov er data distributed across the ring. W e fo cus on the class of self-de c omp osable aggr e gation functions , which admit a natural divide-and-combine structure. F ormally , an aggregation function f deﬁned ov er multisets is self-decomp osable if there exists an asso ciative op erator ⊕ such that, for any disjoint multisets X 1 and X 2 , f ( X 1 ⊎ X 2 ) = f ( X 1 ) ⊕ f ( X 2 ) , where ⊎ denotes the multiset union. This class captures many fundamental distributed tasks, including counting, summation, maxima, parity , and related semigroup-based computations. Self-decomp osability enables hierarc hical aggregation: partial results computed on disjoint subsets of pro cesses can b e merged. Algo rithm simulation for content-oblivious rings. Our ﬁrst technical contribution is an eﬃcien t algorithm simulator for anonymous conten t-oblivious rings. In particular, it allo ws the sim ulation of a synchronous round of the CONGEST [ b ] mo del, where eac h pro cess sends a b -bit message to each of its neighbors, using only O ( b ) pulses p er pro cess. In this sense, the sim ulation is asymptotically message-optimal. The simulator supp orts the emulation of a lo gic al ring in which only a subset of pro cesses actively participate in the computation, while the remaining pro cesses act solely as rela ys that forward information. This ﬂexibility is crucial in our algorithms. Eﬃcient aggregation and counting with unique IDs. Using this simulator, we obtain an eﬃcien t deterministic aggregation algorithm by building a hierarchical decomp osition of the ring. This decomp osition is computed by simulating a deterministic maximal indep endent set (MIS) algorithm in the LOCAL mo del, whic h inherently requires unique IDs. The approach yields our main aggregation result. ▶ Theo rem 1 (Eﬃcient aggregation) . L et f b e a self-de c omp osable aggr e gation function deﬁne d on multisets over a universe U . Consider a c ontent-oblivious ring C of n pr o c esses with a designate d le ader, wher e e ach pr o c ess p is e quipp e d with a distinct identiﬁer of at most λ bits and holds an element x p ∈ U . A ssume that for every non-empty subset S of pr o c esses, the bit length of the value f  U p ∈ S { x p }  is at most β . Then ther e exists a deterministic algorithm that quiesc ently terminates and computes f  U p ∈ C { x p }  at al l pr o c esses using O (( λ + β ) n log n ) pulses. As a direct corollary , we obtain an eﬃcien t deterministic counting algorithm. ▶ Co rollary 2 (Eﬃcient counting) . In a c ontent-oblivious ring of n pr o c esses with distinct O ( log n ) -bit IDs and a designate d le ader, ther e exists a deterministic, quiesc ently terminating algorithm that c omputes n at al l pr o c esses using O ( n log 2 n ) pulses. J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 3 Pro of. W e apply Theorem 1 with λ = O ( log n ) , aggregation function f ( X ) = | X | , universe U = { 1 } , and x v = 1 for all v ∈ C . Then f  U v ∈ C { x v }  = n , and since β = O ( log n ) , the total num b er of pulses is O ( n log n ( λ + β )) = O ( n log 2 n ) . ◀ W e complement our upp er b ound with an Ω( n log n ) -message low er b ound. This result sho ws that our O ( n log 2 n ) -message counting algorithm is optimal up to a m ultiplicative O (log n ) factor. ▶ Theo rem 3 (Lo wer b ound) . In oriente d rings with c ontent-oblivious messages, wher e pr o c esses have unique IDs and ther e is a pr e-sele cte d le ader, any algorithm that c orr e ctly solves c ounting when pr o c esses do not know n must have an exe cution with message c omplexity Ω( n log n ) . Counting with anonymity . W e next turn to a strictly weak er mo del: an oriented ring with a unique leader 1 , where all pro cesses are anonymous . In this setting, deterministic local symmetry-breaking tec hniques, including the simulation of LOCAL algorithms that underpin Theorem 1, are no longer av ailable. W e show that counting remains p ossible in this anonymous mo del, and that it simulta- neously enables symmetry breaking. Sp eciﬁcally , our algorithm assigns to each pro cess its clo c kwise distance from the leader, thereby inducing a unique identiﬁer for every pro cess. As a result, the algorithm not only counts the n umber of pro cesses but also transforms an anon ymous ring with a leader in to an ID-equipp ed ring. This transformation serves as a b o otstrapping step that enables the deterministic aggregation and MIS-based algorithms dev elop ed earlier. ▶ Theo rem 4 (Counting with anonymity) . Ther e is a deterministic, quiesc ently terminating algorithm that c ounts the size of an anonymous ring with O ( n 1 . 5 ) message c omplexity, pr ovide d that a unique le ader exists. Mor e over, e ach anonymous pr o c ess on the ring outputs the clo ckwise distanc e fr om the unique le ader to itself. 3n Finding the mi nimum and computing the mul tiset of the input. When eac h pro cess p starts with an input x p ∈ N , w e also consider tw o notable aggregation problems: ﬁnding the minim um input x min and computing the m ultiset of inputs, i.e., the set of distinct inputs together with their m ultiplicities. F or these tw o problems, we develop ad ho c techniques that impro ve on our b ound of Theorem 1. Sp eciﬁcally , we show that: ▶ Theo rem 5. In an anonymous ring of n pr o c esses with a designate d le ader wher e e ach pr o c ess has an input x p ∈ N , ther e exists a deterministic, quiesc ently terminating algorithm that c omputes x min = min p { x p } using O ( n | x min | ) pulses. ▶ Theo rem 6. In a ring of n pr o c esses, with distinct O ( log n ) -bit IDs and designate d le ader, wher e e ach pr o c ess has an input x p ∈ N , ther e exists a deterministic, quiesc ently terminating algorithm that c omputes the multiset {{ x p }} of inputs using O ( nD ( B + log 2 n )) pulses wher e B = max p {| x p |} and D is the numb er of diﬀer ent inputs. A summary of our results p ertaining Counting and related problems is in T able 1. All the algorithms that we present are c omp osable , that is they can b e executed back-to-bac k. 1 W e stress that the presence of a predetermined leader is necessary in this setting due to the ring symmetry [2]. 4 Eﬃcient Computing in Content-oblivious Communication Algo rithm simulation for general 2-edge-connected netw orks. Finally , we show that our simulator for conten t-oblivious rings extends to arbitrary synchronous message-passing algorithms on general 2-edge-connected netw orks. The main insight is to exploit a classical structural result from graph theory , which guaran tees that the edges of any 2-edge-connected graph can b e cov ered b y three ev en-degree subgraphs, eac h of whose connected comp onents admits an Eulerian cycle [1]. After a prepro cessing phase that computes such a decomp osition, we apply our ring sim ulator along these Eulerian cycles, thereby enabling the simulation of message-passing comm unication on the original netw ork. As in the ring setting, the resulting simulator is asymptotically optimal: simulating a single round of the CONGEST [ b ] mo del incurs only O ( b ) pulses of communication p er edge. ▶ Theo rem 7 (Algorithm simulation for 2-edge-connected netw orks) . L et A b e an algorithm in a message-p assing, synchr onous, and anonymous 2-e dge-c onne cte d network G = ( V , E ) , wher e e ach pr o c ess c an send b bits p er r ound on e ach e dge. Ther e is a quiesc ently terminating c ontent-oblivious algorithm on a ring with a unique le ader that simulates A , with the fol lowing sp e ciﬁc ations. Ther e is a pr e-pr o c essing step with O ( n 8 log n ) pulses, wher e n is the numb er of pr o c esses in G . After that, simulating a r ound of CONGEST [ b ] r e quir es sending O ( b ) pulses along e ach e dge, i.e, with a c onstant multiplic ative overhe ad. T able 1 Summary of the results for the Counting problem and Input multiset problem presented in the pap er. n is the num b er of pro cesses in the system, B indicates the maximum bit len of an input and D the diﬀerent n umber of initial inputs. Section Problem Leader IDs Message Complexity §5 Coun ting Y es (necessary) No O ( n 1 . 5 ) §8 Coun ting Y es Y es O ( n · log 2 n ) §9.2 Inputs Multiset Y es Y es O ( nD ( B + log 2 n )) §10 Coun ting Y es Y es Ω( n · log n ) 2 Related wo rk The inv estigation of CO mo dels b egan with Censor-Hillel et al. [ 6 ], who show ed that the presence of a leader and a 2 -edge-connected top ology suﬃces to simulate message-passing algorithms. Their simulator includes an initialization step requiring O ( n 8 log n ) pulses, and sim ulating the transmission of a b -bit message needs O ( n 3 b + n 3 log n ) pulses. Other w ork on the CO mo del has mainly fo cused on the L e ader Ele ction (LE) problem [ 17 , 8 , 7 ]. F rei et al. [ 17 ] were the ﬁrst to inv estigate LE in rings, giving a terminating LE algorithm for orien ted rings that uses O ( n ID max ) pulses, where ID max denotes the maxim um identiﬁer in the netw ork, as w ell as a stabilizing algorithm for unoriented rings. This result was extended by Chalopin et al. [ 8 ], who presented a terminating LE algorithm for unoriented rings using O ( n ID max ) pulses and, assuming a known upp er b ound N on the netw ork size, an LE algorithm for general 2 -edge-connected netw orks using O ( m N ID min ) pulses. The eﬃciency of LE in non-uniform oriented rings was studied by Chalopin et al. [ 7 ], who gav e an algorithm requiring O ( N log ID min ) pulses. J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 5 Censor-Hillel et al. [ 6 ] established that no nontrivial task in the CO mo del can b e solved in netw orks that are not 2-edge-connected. Recently , Chang, Chen, and Zhou [ 9 ] show ed that certain meaningful tasks remain solv able in suc h netw orks, without con tradicting this imp ossibilit y result. In particular, they prov ed that, giv en that the net work top ology is kno wn to all pro cesses, terminating leader election in trees is p ossible if and only if the tree is not edge-symmetric. The pursue of message eﬃciency has b een a longstanding o verarc hing research topic in distributed computing. Here w e limit to brieﬂy describ e the message eﬃciency concepts in the LOCAL and CONGEST mo del in order to highlight the radical diﬀerences with our en vironment. In the LOCAL mo del, the relationship b etw een round complexity and message complexity has b een inv estigated in [ 3 , 15 ], leading to the conclusion that it is alw ays p ossible to design algorithms that are sim ultaneously round-optimal and message-optimal, up to p olylogarithmic factors. In the CONGEST mo del, a line of work has developed message-complexity b ounds for classic global graph problems, often yielding tigh t (or near-tigh t) results. This includes MST [ 16 , 18 , 21 , 24 ], and broader families of optimization problems [ 14 ]. In addition to these problem-sp eciﬁc b ounds, several pap ers fo cus explicitly on the relationship b et ween round and message trade-oﬀs [ 19 , 20 ]. It is clear that techniques developed in the aformentioned w ork do not apply to our severely restricted mo del. A closely related strand of w ork in vestigates the b e eping communication mo del [ 4 , 5 , 11 , 26 ]. In this abstraction, computation pro ceeds in discrete rounds and, at each round, a pro cess chooses b etw een tw o actions: it either broadcasts an undiﬀeren tiated “b eep” to its neigh b orho o d or it sta ys silent and listens. A key feature is that the framework assumes global sync hron y . Our mo del lacks this kind of shared temp oral reference, so such timing-based enco dings are una v ailable. F or this reason results in that setting do not extend to our con text. 3 System mo del W e consider a distributed system of n pro cesses P = { p 0 , p 1 , . . . , p n − 1 } that communicate exclusiv ely by message passing. The pro cesses are arranged either in a ring C or in a general 2-edge-connected graph G = ( V , E ) . In a ring C , each pro cess has exactly tw o communication p orts, lab eled 0 and 1 . In a general 2-edge-connected graph G , each pro cess corresp onds to a v ertex in V and has one communication p ort for each incident edge. Each p ort connects to exactly one neighboring pro cess and supp orts bidirectional communication. The cost of an algorithm is measured b y its message c omplexity , deﬁned as the total n umber of messages sent by all pro cesses during an execution. IDs and Anonymi t y . W e assume that each pro cess is equipp ed with a distinct identiﬁer dra wn arbitrarily from N . F ormally , for every j ∈ [0 , n − 1] , let ID j denote the identiﬁer of pro cess p j . W e assume that there exists a distinguished leader p ℓ . In some sections w e assume an anonymous system with a distinguished leader, i.e., all pro cesses except the leader start in the same lo cal state. Asynchronous System. The underlying system is asynchr onous . Messages ma y exp erience arbitrary but ﬁnite dela ys, with no kno wn upp er b ound, and no message is ever dropp ed. If m ultiple messages are delivered to a pro cess at the same instant, they are placed into a lo cal buﬀer from which the pro cess may retrieve them at arbitrary times. 6 Eﬃcient Computing in Content-oblivious Communication The time needed for a lo cal computation at a pro cess is also arbitrary but b ounded, and there is no global clo ck or shared notion of time. In our correctness and complexity argumen ts, we adopt the standard abstraction that all non-blo c king lo cal actions (i.e., all actions except receiving a message) take zero time. Content-Oblivious Algorithms. W e fo cus on c ontent-oblivious algorithms , in whic h the pa yload of messages is irrelev ant: the only information conv eyed is the bare fact that a message w as sen t and a pro cess can only see the p orts used to receiv e the message. One can think of each message as an empty string, or equiv alently as a pulse [17]. This abstraction captures an adversarial en vironmen t that may arbitrarily corrupt message con ten ts during transmission. How ever, the adversary is not allow ed to remov e messages from the netw ork or to insert extra messages. All pro cesses are fault-free and never crash. Since messages carry no pa yload, reordering pulses on a link cannot aﬀect the b eha vior of a con tent-oblivious algorithm. Hence we may assume FIF O links without loss of generality . Oriented Rings. Throughout the pap er, we w ork with oriente d rings. This means that all pro cesses agree on a common direction of clo ckwise (CW) versus c ounter-clo ckwise (CCW) around the ring. Concretely , w e consider a ring ( p 0 , p 1 , . . . , p n − 1 ) and assume that, for every j ∈ [0 , n − 1] : at pro cess p j , p ort 1 is connected to p j +1 , and p ort 0 is connected to p j − 1 , where indices are taken modulo n . A message is said to trav el clo ckwise if it is sent on p ort 1 and received on p ort 0 (i.e., it go es from p i to p i +1 ). Symmetrically , a message tra v els coun ter-clo c kwise if it is sent on p ort 0 and receiv ed on p ort 1 . The assumption of oriented ring is not restrictive in our mo del, as the leader can orient the entire ring with a trivial pre-pro cessing step in which a single pulse is sen t across the ring. Quiescent T ermination. A c onﬁgur ation consists of the lo cal states of all pro cesses together with the multiset of messages currently in transit on communication links. A conﬁguration is quiesc ent if all pro cesses are in a halting state and no messages are in transit. An execution is quiesc ently terminating if it reaches a quiescent conﬁguration after ﬁnitely many steps, and a n algorithm is quiesc ently terminating if every execution from an initial conﬁguration has this prop erty . A pro cess quiesc ently terminates if, once it enters the halting state, it neither sends nor receives any further messages. The LOCAL and CONGEST mo dels. In this work, we are interested in sim ulating classical sync hronous message-passing mo dels within an asynchronous conten t-oblivious framework. In the LOCAL mo del [22], computation pro ceeds in synchronous rounds, and in eac h round ev ery pro cess may send a message of unbounded size along each of its inciden t edges. When comm unication is restricted to messages of at most b bits p er edge p er round, the mo del is referred to as the CONGEST [ b ] mo del [25]. 4 Comp osabilit y W e will often in v oke m ultiple algorithms bac k-to-back (e.g., run algorithm A and, as so on as it returns, start algorithm B ). Since all pulses are indistinguishable, some care is needed in an asynchronous system: diﬀeren t pro cesses may return from A at diﬀerent times, and a pro cess still executing A m ust not confuse pulses from B with pulses from A . W e handle this using a simple suﬃcient condition, called the c omp osable ending pr op erty : J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 7 ▶ Deﬁnition 8 (Comp osable ending property) . Consider a CO algorithm A exe cute d by al l pr o c esses of an oriente d ring with FIFO links. W e say that A has the comp osable ending prop ert y if in every exe cution: A is quiesc ently terminating. Ther e exists a unique terminator pr o c ess that r eturns last fr om A . Ther e exists a dir e ction d ∈ { 0 , 1 } such that A ends with a termination wa ve initiate d by the terminator on p ort d . F r om the moment the terminator sends this pulse until it terminates the exe cution of A , it never waits to r e c eive a pulse on p ort d . The terminator r eturns imme diately after r e c eiving the termination wave b ack on p ort 1 − d . Every other pr o c ess r eturns fr om A imme diately after r e c eiving (on p ort 1 − d ) the termination-wave pulse and forwar ding exactly one pulse on p ort d . ▶ Theo rem 9. If algorithm A has the c omp osable ending pr op erty, then it c an b e c omp ose d with any other algorithm B . Pro of. Fix an y execution of the comp osed algorithm (run A , then run B ). By the quiescent termination condition, once a pro cess starts B it will nev er again receive any pulse of A , so A -pulses cannot b e mistaken as B -pulses. It remains to argue that a pro cess still executing A cannot consume a pulse sent by a neigh b or that has already started B . Let d b e the direction given by Deﬁnition 8. Consider an y pro cess p diﬀeren t than the terminator that is still in A . Let p − b e the predecessor of p along the termination-w av e direction, so that pulses sent by p − on p ort d are received b y p on p ort 1 − d . Pro cess p − starts B only after returning from A , hence only after it has sent the termination-w av e pulse to p on this link. By FIF O, p receiv es that termination-wa ve pulse b efore an y later pulse on the same link, and after receiving it, p returns from A . Therefore, p cannot consume any B -pulse while still executing A . Finally , consider the terminator t . The ﬁrst pro cess to return from A is the neighbor reac hed from t b y following p ort d , and it may start B long b efore t starts B . Any pulse it sends to t is delivered to t on p ort d . By Deﬁnition 8, after initiating the termination wa ve the terminator do es not receive on p ort d , so these pulses cannot b e consumed during A and are buﬀered until t starts B . ◀ 5 Counting in anonymous ring with a leader using O ( n 1 . 5 ) messages In this section, we describ e an algorithm that counts the size of an anonymous ring, provided the existence of a leader pro cess. Additionally , each pro cess learns its p osition on the ring ev entually: formally , on a ring of size n , each pro cess p ev entually outputs n , the ring size, and an integer d ∈ [0 , n ] equal to the clo ckwise distance from the unique leader to p . A naive O ( n 2 ) algo rithm One ma y immediately come up with the follo wing O ( n 2 ) -pulses algorithm, following a “explore-b ounce back” motif: The leader pro cess sends one pulse clo c kwise at initiation, and whenev er receiving a counter-clockwise pulse. F or a non-leader pro cess, it reﬂects the ﬁrst clockwise pulse (absorb the clockwise pulse, and send back a coun ter-clo c kwise pulse), and b eha v es as rep eater thereafter. The ﬁrst clo ckwise trav ersal indicates ﬁnishing counting the ring, and the ring size equals the num b er of coun ter-clo ckwise pulses the leader has received so far. The leader no w triggers a counter-clockwise pulse tra v ersal to terminate the coun ting. The leader can then utilize a simple pro cedure to broadcast the size count (see Algorithms 3 and 4). Ho wev er, it b ecomes signiﬁcantly exp ensive to explore new pro cesses in the later stage of the algorithm. On av erage, an “explore-b ounce bac k” iteration would tra vel Θ( n ) in distance, 8 Eﬃcient Computing in Content-oblivious Communication hence resulting in the O ( n 2 ) complexity . Our ob jective in the following presentation is to signiﬁcan tly reduce the maximum distance tra veled by “explore-b ounce back” iteration, b y breaking the algorithm into phases. ▶ Theo rem 4 (Counting with anonymity) . Ther e is a deterministic, quiesc ently terminating algorithm that c ounts the size of an anonymous ring with O ( n 1 . 5 ) message c omplexity, pr ovide d that a unique le ader exists. Mor e over, e ach anonymous pr o c ess on the ring outputs the clo ckwise distanc e fr om the unique le ader to itself. The pseudo co de pro cedures for non-leader and leader pro cesses are presented in Algo- rithms 1 and 2, while Algorithms 3 and 4 is inv ok ed to communicate the ring size after coun ting is ﬁnished. Up on initiation, the leader pro cess inv okes Leader( phase = 1 ), while non-leader pro cesses inv oke NonLeader( phase = 1 , counted = f al se, r el ay ed = 0 ). T echnical overview The algorithm progresses in phases, starting from phase 1. In phase i , a temp orary leader prob es i consecutiv e pro cesses in its clo ckwise neighborho o d and transfers temp orary leadership to the pro cess with clo ckwise distance i from it, i.e., the last pro cess prob ed in this phase. The transfer to leadership is p erformed by the c ounter clo ckwise c hannel, whic h distinguishes itself from a probing pulse and is visible to all pro cesses on the ring; therefore, all pro cesses progress synchronously in rounds. Probing clo ckwise neighb o rs in phase i T o prob e i consecutiv e clockwise neighbors, a temp orary leader r sends and receiv es i pr obing pulses (Line 3-6 of Algorithm 2) on its p ort 1. A pro cess yet to b e prob ed is alwa ys executing the Non-leader algorithm (Algorithm 1) and has v ariable c ounte d set to false . When a pro cess in the i -clo c kwise neighborho o d of r receiv es a probing pulse from p ort 0, it reﬂects the ﬁrst pulse and sets c ounte d to true to indicate itself as already prob ed (Line 3-5 of Algorithm 1). The newly-prob ed pro cess writes to its global v ariable my P hase the current phase num b er; thereafter, it acts as a rep eater for an y future incoming pulse from p ort 0, plus the next immediate pulse from p ort 1 (Line 7-11 of Algorithm 1), while incremen ting its v ariable r elay ed whenev er it relays a round-trip. Therefore, the j th clockwise pulse sent by the temporary leader r is reﬂected by its j th clo c kwise neighbor, b efore returning to r . Up on r ﬁnishing receiving i pulses, all pro cesses in r ’s i -clo c kwise neigh b orho o d ha ve counted = tr ue , among which the i th clockwise neighbor is the only pro cess with r elay ed = 0 . The r elay ed v ariable therefore captures the num b er of pro cesses prob ed in the same phase as, but after the v ariable holder. T ransfering leadership The temp orary leader r is now ready to transfer leadership to its i th clo c kwise neigh b or, whic h we alias as r ′ . First, r sends a pulse on p ort 0, and receiv es a pulse on p ort 1 (Line 12-13 of Algorithm 2). This counter-clockwise pulse trav ersal, which w e call the tr ansfer pulse , triggers Line 12-13 of Algorithm 1 at all other pro cesses on the ring. In particular, r ′ is the only pro cess that en ters the branch of Line 14 in Algorithm 1, while all other non-leader pro cesses enter Line 19 of Algorithm 1. At this momen t, r ′ is ready to b ecome the new leader. It sends a pulse on port 1 and receiv es a pulse on p ort 0. This clo c kwise pulse trav ersal, whic h we call the c onﬁrmation pulse , mak es all other non-leader pro cesses en ter Line 21; therefore, all pro ceed to the next phase. This conﬁrmation pulse also trav els through the previous leader (Line 14-15 of Algorithm 2), whic h makes the old leader also pro ceed to the next phase as a non-leader. J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 9 Detecting the completion of probing all p ro cesses Deﬁne the original leader (b efore any leadership transfer) as r 1 , similarly , the i th temp orary leader as r i . Assume now all pro cesses ha ve b een prob ed ( counted = tr ue ), and the temp orary leader r i is ab out to prob e r 1 again. This v ery probing pulse of r i , unlik e an y previous probing pulses, will trav el around the ring in a clo ckwise direction b efore arriving at p ort 0 or r i . T o see this, notice ev ery pro cess except r i triggers Line 7-9 of Algorithm 1 as counted = tr ue . Hence r i learns that all pro cesses on the ring hav e b een prob ed, and p erforms Line 8-9 of Algorithm 2, sending 3 pulses trav ersing the ring in coun ter-clo c kwise direction. The ﬁrst counter-clockwise trav ersal inv okes Line 10-11 of Algorithm 1 at all other pro cesses, the second trav ersal inv okes Line 12-13,19 of Algorithm 1, while the third trav ersal in vok es Line 25-26. All pro cesses except r i are thus informed of the completion of coun ting and en ter RcvCount() to receiv e the counting result. Broadcasting the counting result As all other pro cesses are aw are of the current phase i , the last leader r i only needs to communicate count enco ded in binary for all pro cesses to learn the ring size and their resp ective clo c kwise distance to the leader. Note that for the comm unication of count to terminate, r i needs to send a dedicated ⊥ sym b ol at the end of its transmission. Therefore, r i can use the orien tation of 2 consecutiv e tra v ersals, which has up to 4 p ossibilities, to enco de an alphab et of size 3 ( { 0 , 1 , ⊥} ). Ring size Each pro cess calculate ring size n = i ( i − 1) 2 + count + 1 . The i − 1 complete phases coun t i ( i − 1) 2 pro cesses, plus count -man y successfully reﬂected probing pulses in phase i , plus one - the leader of the i -th phase, r i . Distance from leader First, notice that for all pro cesses except the last leader, r elay ed − 1 - man y (for the last leader, r elay ed -man y) other pro cesses are counted after itself in the same round. The minus 1 is necessary due to the counting completion testing: the last leader in tro duced an additional clo ckwise + counterclockwise tra v ersal (Line 7-8 of Algorithm 2), whic h resulted in an increment of r el ay ed for all pro cesses except the last leader. No w, deﬁne k = r elay ed for the last leader, and k = r elay ed − 1 for ev ery other pro cess. Dep ending on the my P hase v ariable of the pro cess, there are tw o cases: If my P hase < i : d = my P hase ( myP hase +1) 2 − k − 1 . This is b ecause the phases from 1 , 2 , . . . , my P hase are known to hav e completed. In particular, there are k -man y pro cesses that are counted after the pro cess of concern in that same iteration, which do not contribute to the clo ckwise distance starting from the leader. The min us 1 is due to the fact that the leader itself was included in the counting as w ell. If my P hase = i : d = n − k − 1 . In this case, the last phase - my P hase - was not completed. Directly subtracting from the ring size the num b er of pro cesses prob ed in the last phase, after probing the pro cess of concern. W e dedicate the next lemma to establish the initial condition for every phase i ≤ ⌈ n 0 . 5 ⌉ as identical to the ending condition of its previous phase. The main purp ose it to inductively sho w that every leadership transfer is successful. ▶ Lemma 10. Consider a ring of pr o c esses p erforming the O ( n 1 . 5 ) c ounting algorithm. A s- sume at some moment, al l pr o c esses other than r i = p 0 have starte d to p erform NonL e ader( phase = i, counted, r elay ed ), and r i = p 0 has starte d to p erform L e ader( phase = i ). L et ( p 0 , p 1 , . . . , p i ) b e a clo ckwise se gment of pr o c esses that have counted = f al se . With a ﬁnite delay, the fol lowing statements b e c ome true. 10 Eﬃcient Computing in Content-oblivious Communication (1) A l l pr o c esses exc ept p 0 and p i c al l NonL e ader( phase = i + 1 , counted, r elay ed ), and their counted, r elay ed ar e not change d. (2) Pr o c ess p 0 c al ls NonL e ader( phase = i + 1 , counted = tr ue, r el ay ed = 0 ). (3) Pr o c ess p i c al ls L e ader( phase = i + 1 ) after (1) and (2) b e c ome true. (4) Pr o c esses p 1 , . . . , p i have counted = tr ue , and my P hase = i . (5) Pr o c esses p j , j ≤ i has r el ay ed = i − j . Pro of. W e prov e by induction from i = 1 . Recall that phase i has tw o comp onents: ﬁrst, the current leader prob es i clo c kwise neigh b ors, and then transfers leadership to p i , the i -th clo ckwise neighbor. W e prov e that (4) and (5) is immediately true for every i . Given that the i clo ckwise neighbors of p 0 ha ve counted = f alse initially , they reﬂect the ﬁrst prob e pulse, and rela y the later probing pulses, as describ ed in Line 3-11 of Algorithm 1. Observe that the probing is controlled entirely by p 0 , and is lo cal to the clo ckwise segment ( p 0 , p 1 , . . . , p i ) . (4) and (5) are immediate corollaries from the description of Line 3-11 of Algorithm 1. No w w e pro ve (1), (2), and (3) ﬁrst for i = 1 . When so, every pro cess other than p 0 and p 1 has counted = f alse, r el ay ed = 0 , and my P hase undeﬁned. Pro cess p 0 transfers leadership to p i b y sending a counterclockwise trav ersing tr ansfer pulse . This pulse is relay ed by all pro cesses on the ring by running Line 12-13 of Algorithm 1. The only pro cess that triggers Line 14 of Algorithm 1 and b ecomes a new leader is p 1 . It sends a clockwise trav ersing c onﬁrmation pulse , which ev en tually arrives at itself. Note that all other pro cesses on the ring see the clo c kwise trav ersing c onﬁrmation pulse only after the counterclockwise tra v ersing tr ansfer pulse . At this p oint, p 1 initiates Leader( phase = i + 1 ). W e therefore conclude the pro of for (1), (2), and (3) when i = 1 . No w let the lemma hold for i = k . F or i = k + 1 , notice that we only need to reason that no pro cesses other than p i can b ecome the new leader, that is, p i is the only pro cess that can trigger Line 14 of Algorithm 1. F or pro cesses that do not b elong to the clo ckwise segmen t ( p 0 , p 1 , . . . , p i ) , they either hav e my P hase undeﬁned if they are not prob ed yet, or my P hase < i if they are prob ed in an earlier phase. ◀ W e no w pro ve a lemma stating that ev ery phase i , except for the last, incomplete coun ting phase, is comp osable: eﬀectively , every pro cess can correctly lab el a pulse it receives with the phase i that such pulse was pro duced in. ▶ Lemma 11. F or i ≤ ⌈ n 0 . 5 ⌉ , the ﬁrst i phases c an b e c omp ose d. Pro of. W e show that pulses b elonging to consecutive phases are temp orally separated. By the end of the i th phase, the following condition holds: All pro cesses other than p i (the last pro cess prob ed in phase i ) has started to p erform NonLeader( phase = i, counted, r elay ed ), whic h is guaranteed by applying clauses (1) and (2) to the i th iteration; And that p i has started to p erform Leader( phase = i ), guaran teed b y applying clause (3) to the i th iteration. If p i still has i + 1 consecutive clo ckwise neigh b ors counted = f alse , the induction hypothesis for phase i + 1 is therefore met. W e argue that the i + 1 th phase can b e correctly concatenated after execution up to phase i : by clause (3), the last prob ed pro cess is the last to quiescen tly terminate during phase i , which also initiates communications of phase i + 1 . Therefore, pulses b elonging to phase i and i + 1 are temp orally separated. ◀ W e no w prov e that the last counting phase has c omp osable ending pr op erty , hence to facilitate the concatenation of ring size broadcast and receiving afterwards. ▶ Lemma 12. Phase i = ⌈ n 0 . 5 ⌉ has c omp osable ending pr op erty. J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 11 Pro of. Directly from the deﬁnition of Deﬁnition 8. The last counting phase is quiescently terminating, as every pro cess other than last leader terminates up on receiving the last of the three consecutive counter-clockwise trav ersals initiated by the last leader (Line 8-9 of Algorithm 2), and is quiescent as the leader also terminates the last after the three consecutiv e counter-clockwise trav ersals conclude. The last counter-clockwise pulse trav ersal is the termination wave required by Deﬁnition 8. ◀ ▶ Theorem 13. The total pulse c omplexity of the describ e d c ounting algorithm is O ( n 1 . 5 ) . Pro of. F or the i th phase, i new, unprob ed pro cesses are prob ed and ha ve their counted set to tr ue , except for the last phase, after which the entire ring has b een prob ed. Therefore, there are O ( n 0 . 5 ) probing phases. In eac h phase except the least, the pulses generated came from t w o sources: The curren t leader probing i clo c kwise neigh b ors, which uses up to O ( i 2 ) = O ( n ) pulses, and leadership transfer, whic h comprises tw o ring trav ersal pulse wa ves (a counter-clockwise transfer pulse starting and ending at the old leader, and a clo ckwise conﬁrmation pulse starting and ending at the next leader), which in total comprise O ( n ) pulses. In the last phase, instead of the usual leadership transfer, the last leader pro duces 4 pulse tra v ersals (ﬁrst clo c kwise, next three counter-clockwise) to signal the ending of counting. This is still manageable in O ( n ) pulses. Ev en tually , the last leader only has to broadcast its internal v ariable count (deﬁned as the num b er of pro cesses coun ted by the last leader in the last phase, therefore count ≤ i ). F or each bit of count and additionally an end-of-transmission symbol, tw o pulse tra versals around the ring are required. In total, O ( n log n ) pulses are required for broadcasting the ring size. ◀ 12 Eﬃcient Computing in Content-oblivious Communication Algo rithm 1 NonLeader( phase, counted, r elay ed ). The algorithm is called at the b egin- ning on each non-leader pro cess with parameters phase = 1 , counted = f al se, r el ay ed = 0 , whic h are also local v ariables. The v ariable my P hase is lo cal (to a pro cess) and global (stored across p ossibly more than one in vocation of Nonleader()). 1 while true do 2 receiv e a message on p ort q 3 if ¬ counted ∧ q = 0 then // first time counted, pong back 4 send a message on p ort 0 5 counted ← tr ue 6 my P hase ← phase // I was counted in phase my P hase 7 else if counted ∧ q = 0 then 8 r elay ed ← r el ay ed + 1 // r el ay ed is used to remember the number of pulses that are counted after me in the same phase. 9 send a message on p ort 1 10 receiv e a message on p ort 1 11 send a message on p ort 0 12 else if q = 1 then 13 send a message on p ort 0 14 if ( phase = my P hase ) ∧ ( r elay ed = 0) then // I have to become the new leader 15 send a message on p ort 1 16 receiv e a message on p ort 0 17 phase ← phase + 1 18 execute Leader ( phase ) 19 else 20 receiv e a message on p ort q 21 if q = 0 then 22 send a message on p ort 1 23 phase ← phase + 1 24 con tinue 25 else 26 send a message on p ort 0 // If I receive two ccw messages, the algorithm is done and I have to receive the count. 27 execute RcvCount() J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 13 Algo rithm 2 Leader( phase ) called at the b eginning on the initial leader pro cess with phase = 1 1 count ← 0 2 rep eat 3 send a message on p ort 1 4 receiv e a message on p ort q 5 if q = 1 then 6 count ← count + 1 7 else 8 send three messages on p ort 0 9 receiv e three message on p ort 1 10 execute SndCount ( count ) 11 until count = phase 12 send a message on p ort 0 13 receive a message on p ort 1 14 receive a message on p ort 0 15 send a message on p ort 1 16 phase ← phase + 1 17 execute NonLeader( phase, tr ue, 0 ) Algo rithm 3 SndCount( count ) 1 for i = 0 to l en ( count ) − 1 do 2 if bit ( count, i ) = 0 then 3 send tw o messages on port 0 4 receiv e tw o messages on p ort 1 5 else 6 send tw o messages on port 1 7 receiv e tw o messages on p ort 0 8 send a message on p ort 1 9 receive a message on p ort 0 10 send a message on p ort 0 11 receive a message on p ort 1 12 return n Algo rithm 4 RcvCount() 1 L = [] 2 rep eat 3 receiv e a message on p ort q 4 send a message on p ort 1 − q 5 receiv e a message on p ort u 6 send a message on p ort 1 − u 7 if ( q, u ) = (1 , 1) then 8 L.append (0) 9 else if ( q, u ) = (0 , 0) then 10 L.append (1) 11 until ( q , u ) = (0 , 1) 12 return bitencode ( L ) 6 Simulating the lo cal mo del: Message exchange on rings In this section, we describ e an algorithm that enables pro cesses to exc hange messages of arbitrary size with their neighbors. This communication o ccurs in parallel across all pro cesses and requires O ( n ) messages to send and receive a single bit. The algorithm requires the existence of a leader pro cess. ▶ Theo rem 14. L et A b e an algorithm in a message-p assing, synchr onous, p ossibly anony- mous ring, wher e e ach pr o c ess may send a diﬀeren t message on eac h incident edge in a r ound, and let b b e the maximum numb er of bits to b e sent in the r ound. Consider an oriente d ring 14 Eﬃcient Computing in Content-oblivious Communication with a unique le ader and a designate d set of active pr o c esses (including the le ader); al l other pr o c esses ar e rela ys . L et R b e the virtual ring obtaine d by c ontr acting e ach maximal r elay se gment into a single link. Ther e is a quiesc ently terminating c ontent-oblivious algorithm on the physic al ring that simulates A r ound-by-r ound on R with multiplic ative overhe ad: e ach pr o c ess sends O ( b ) pulses p er simulate d r ound. The pseudo co de is shown in Algorithm 5. This algorithm is executed by all pro cesses. Eac h pro cess may send one message clo ckwise and another counterclockwise. Messages are enco ded as lists of bits; if a pro cess do es not wish to send a message, it may inv oke the pro cedure with an empt y list. The algorithm p erforms a lo op in which each pro cess sends and receiv es one bit clo ckwise and one bit counterclockwise (see lines 7–10). This is done by ﬁrst sending a bit clo ckwise (and receiving one counterclockwise), via the CWBitSending pro cedure, and then sending a bit counterclockwise (and receiving one clo ckwise) via the CCWBitSending pro cedure. The lo op con tinues until all pro cesses hav e ﬁnished transmitting their messages. This termination condition is c heck ed by having eac h pro cess inv oke ComputeOr , passing a v ariable set to true if it still has bits to send and false otherwise. The ComputeOr pro cedure computes the logical OR of these v alues across all pro cesses. As we will show, each inv oked pro cedure causes a pro cess to send a constant num b er of messages, so the ov erall communication cost p er lo op iteration is O ( n ) messages. In man y higher-lev el algorithms, we hav e a set of inactiv e pro cesses that act as a rela y by just forwarding messages, these pro cesses are essentially acting as communication links. W e therefore distinguish b etw een active pro cesses, which call MsgExchange , and r elay pro cesses, which must still forw ard pulses to allo w their neighbors to communicate. Relay pro cesses execute the pro cedure MsgRelay (Algorithm 6). Intuitiv ely , a relay b ehav es like a bidirectional link: whenev er it receives a pulse on one p ort, it promptly forwards a pulse on the opp osite p ort, thus preserving the structure of the bit-sending proto col. Ho w ev er, rela ys still participate in global termination detection. In each iteration, relay pro cesses call ComputeOR(false) . Since ComputeOR returns the logical OR of these v alues across all pro cesses, b oth MsgExchange and MsgRelay lo ops terminate simultaneously as so on as no activ e pro cess has bits left to send. This guarantees that relay pro cesses stop forw arding pulses exactly when the current message-exchange phase is complete. Algo rithm 5 MsgExchange ( msg cw , msg ccw ): sends message msg cw to your cw neigh b our and message msg ccw to y our ccw neighbour, while receiving a message from them. 1 phase = 0 2 r cv d cw = [] 3 r cv d ccw = [] 4 b + = msg cw [ phase ] // is ⊥ if phase > l en ( msg cw ) 5 b − = msg ccw [ phase ] // is ⊥ if phase > l en ( msg ccw ) 6 activ e = ( b +  = ⊥ ) ∨ ( b −  = ⊥ ) 7 while ComputeOR ( activ e ) do 8 rcv d ccw .append ( CWBitSending ( b + )) 9 rcv d cw .append ( CCWBitSending ( b − )) 10 phase + + 11 b + = msg cw [ phase ] // is ⊥ if phase > l en ( msg cw ) 12 b − = msg ccw [ phase ] // is ⊥ if phase > l en ( msg ccw ) 13 activ e = ( b +  = ⊥ ) ∨ ( b −  = ⊥ ) 14 return bitenco de ( r cv d cw ) , bitenco de ( r cv d ccw ) J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 15 Algo rithm 6 MsgRelay . 1 while ComputeOR ( f alse ) do 2 CWBitRelay () 3 CCWBitRelay () 4 return 6.1 CCWBitSending Algorithm The CCWBitSending pro cedure is shown in Algorithm 7. Each active pro cess p i in vok es the algorithm with a bit b i that it wishes to send to its counter-clockwise neighbor. The v alue b i can b e 0 , 1 , or ⊥ , where ⊥ is a sp ecial symbol indicating that no bit is b eing sen t. The algorithm also enables pro cess p i to receive the bit sent by its clo ckwise neighbor p i +1 . Eac h pro cess sends messages counter-clockwise through p ort 0 , transmitting one, tw o, or zero messages dep ending on whether it needs to send a 0 , 1 , or ⊥ , resp ectiv ely . The pro cess also set a v ariable cnt to a v alue equal to the num b er of counter-clockwise messages it has sen t. The leader process is the only one that also sends a message clo ckwise. This is a sync hronization message that a pro cess p i forw ards only after receiving conﬁrmation from its coun ter-clo c kwise neighbor p i − 1 that p i − 1 has received the bit transmitted by p i . Once a pro cess receiv es a message from its clo ckwise neighbor (i.e., on p ort 1 ), it resp onds b y sending a clo ckwise message (see lines 13 and 17). A pro cess then waits until it has received from its counterclockwise neighbor (on p ort 0 ) a n umber of messages equal to the num b er it sent to transmit its bit, plus one (see the predicate of the while lo op at line 12). When this o ccurs, the pro cess knows that the transmitted bit has b een successfully receiv ed b y its counterclockwise neighbor and that one of the received messages is the synchronization one. A t this p oint, the pro cess forw ards the synchronization message clo ckwise (line 19). Note, ho wev er, that after forw arding, the pro cess must still ackno wledge incoming counterclockwise messages by sending counterclockwise pulses, since its clockwise neigh b or may still b e transmitting (see the repeat–until lo op at line 23). Once the leader receives the synchronization pulse bac k, it sends a ﬁnal pulse clo ckwise (line 19). This pulse is then forw arded by each pro cess and serves as a signal indicating that they should stop resp onding to counter-clockwise pulses and that the algorithm has terminated. Rela y b ehavio r for clo ckwise bit sending. In the description ab ov e we fo cused on CWBitSending as executed by active . How ever, we may ha ve inactive pro cesses that b ehav e like a communi- cation link so that pulses can trav erse the ring, but these pro cesses should also terminate exactly when the active CCWBitSending instances terminate. Suc h pro cesses execute the relay-side pro cedure CCWBitRelay (Algorithm 8). Intuitiv ely , a relay forwards every incoming pulse to the opp osite p ort and k eeps track, in a single integer v ariable cnt , of the net num b er of pulses it has seen: cnt is initialised to 0 and is incremen ted for each pulse received from p ort 1 and decremented for each pulse received from p ort 0 . Non-leader relays simply forward all pulses and stop when cnt drops b elo w − 1 . The leader rela y additionally injects the initial synchronization pulse on p ort 1 , and stops forwarding exactly when it receives the ﬁnal counterclockwise termination pulse with cnt = − 1 . A t that p oin t cnt b ecomes − 2 and the lo op condition fails. This inv ariant ensures that an interv al consisting only of relay pro cesses is indistinguishable 16 Eﬃcient Computing in Content-oblivious Communication from a single link for the active processes executing CCWBitSending , and that all relays terminate synchronously with them at the end of the bit-sending phase. CWBitSending Since the CWBitSending and CWBitRelay algorithms are symmetric to the coun ter-clo c kwise ones, their co de and correctness pro ofs are omitted. Algo rithm 7 CCWBitSending ( bit ): sends a bit counterclockwise and receiv es one from the clo c kwise neighbor. The input bit ∈ { 0 , 1 , ⊥} , where ⊥ denotes the absence of a bit to b e sen t. The function returns a v alue in { 0 , 1 , ⊥} . 1 if bit = 0 then 2 send a message on p ort 0 3 cnt ← 1 4 else if bit = 1 then 5 send tw o messages on port 0 6 cnt ← 2 7 else 8 cnt ← 0 9 if leader then 10 send a message on p ort 1 11 in ← 0 12 while cnt ≥ 0 do 13 receiv e a message on p ort q 14 if q = 0 then 15 cnt ← cnt − 1 16 else 17 send a message on p ort 1 18 in ← in + 1 19 send a message on p ort 1 20 if leader then 21 receiv e a message on p ort 0 22 else 23 rep eat 24 receiv e a message on p ort q 25 if q = 1 then 26 in ← in + 1 27 send a message on p ort q 28 else 29 send a message on p ort 1 − q 30 un til q = 0 31 if in = 0 then 32 return ⊥ 33 else if in = 1 then 34 return 0 35 else if in = 2 then 36 return 1 6.1.1 Co rrectness of CCWBitsending W e now prov e correctness of CCWBitSending and CCWBitRelay . ▶ Lemma 15. A ssume that al l pr o c esses exe cute CCWBitSending (no r elays). Then every pr o c ess terminates and e ach p i r eturns b i +1 , the bit of its clo ckwise neighb or. During the exe cution of the algorithm at most 6 n pulses ar e sent. Mor e over, CCWBitSending has the c omp osable ending pr op erty. J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 17 Algo rithm 8 CCWBitRelay (): participate to the CCWBitSending as a rela y process, a rela y will b eha ve as a link but it will terminate its execution synchronously with the active pro cesses executing CCWBitSending . 1 cnt ← 0 2 if leader then 3 send a message on p ort 1 4 while cnt ≥ − 1 do 5 receiv e a message on p ort q 6 if ¬ ( leader ∧ q = 0 ∧ cnt = − 1) then 7 send a message on p ort 1 − q 8 if q = 1 then 9 cnt ← cnt + 1 10 else 11 cnt ← cnt − 1 Pro of. Let κ ( ⊥ ) = 0 , κ (0) = 1 , κ (1) = 2 . By lines 1–8, p i sends κ ( b i ) messages on p ort 0 and sets cnt i ← κ ( b i ) . The only sends on p ort 0 in Algorithm 7 are the initial bit-enco ding sends in lines 2–5. All synchronization and termination pulses are sent on p ort 1 . Hence every message received on p ort 1 by p i is a bit-enco ding pulse originally sent on p ort 0 by its clo ckwise neighbor p i +1 . There are exactly κ ( b i +1 ) such pulses in the whole execution. Moreov er, in i is incremented only when p i receiv es on p ort 1 (lines 18 and 26), and it is incremented exactly once p er suc h receive. Thus, if we show that p i is still executing the algorithm when each of those κ ( b i +1 ) pulses arrives, then necessarily in i = κ ( b i +1 ) . The leader sends one extra sync hronization pulse clo ckwise on p ort 1 at line 10, this pulse will do tw o lo ops around the ring, at the end of the ﬁrst lo op the leader will know that each pro cess has received the bit sent to him (this is the transmission end wa ve), the second lo op is used to make pro cesses terminate (this is the termination wa ve). No w lo ok at p i and its counterclockwise neighbor p i − 1 . While p i is in the while -lo op, cnt i decreases only on receives from p ort 0 , i.e., on pulses sent on p ort 1 by p i − 1 . Symmetrically , p i − 1 decreases its own counter cnt i − 1 only on receives from its p ort 0 , i.e., on pulses sent on p ort 1 by p i − 2 , and so on around the ring. Crucially , p i lea ves the while -lo op only once cnt i < 0 (line 12), i.e., after it has received strictly more than κ ( b i ) pulses from p i − 1 on p ort 0 . The ﬁrst κ ( b i ) of these pulses are ac knowledgmen ts that p i − 1 has already seen all κ ( b i ) bit pulses sent by p i on p ort 0 (recall that p i − 1 alw a ys reply to a pulse on p ort 1 either b y line 17 or by line 27, we are for no w assuming that p i − 1 do es not terminate); the “ +1 ” comes from the wa ve initiated b y the leader’s termination end pulse and propagated via forwards on p ort 1 (at line 19). This can b e shown by a simple induction on pro cesses. Therefore p i cannot send its transmission end pulse (line 19) b efore p i − 1 has ﬁnished pro cessing all of p i ’s bit pulses, this also implies that p i − 1 cannot terminate b efore p i executes line 19, remember that the termination pulse is the second lo op of the transmission end pulse. Applying this argument consistently around the ring, w e obtain that the leader’s trans- mission end pulse and the lo cal cnt -tests ensure no pro cess forwards the ﬁnal clo ckwise transmission end w av e until its counterclockwise neighbor has consumed all relev ant bit pulses. In particular, once the transmission end pulse reac hes again the leader (that then exits the while at line 12 and sends the termination wa ve at line 19) all transmissions hav e b een receiv ed. Consequen tly , for each i all κ ( b i +1 ) bit pulses sen t b y p i +1 on p ort 0 are received on 18 Eﬃcient Computing in Content-oblivious Communication p ort 1 at p i while p i is still executing; none can b e lost or dela yed past p i ’s termination. Th us in i is incremented exactly κ ( b i +1 ) times, and we conclude that in i = κ ( b i +1 ) . Notice that once the leader sends the termination wa ve (executing line 19) all other pro cesses are executing the rep eat-until lo op at line 23 and no other message is in the netw ork, th us each pro cess receiving it will forw ard the pulse (at line 29) and terminate with the correct output. The leader will b e the last pro cess to receive the pulse back and it will b e the last to terminate with a correct output leaving no remaining pulses in the netw ork. Th us CCWBitSending satisﬁes Deﬁnition 8 with terminator = leader and d = 1 . Finally , from the ab o v e a simple coun ting on the num b er of pulse sen t by each process sho w that this is at most 6 (t wo pulses to transmit its bit, tw o pulses to ackno wledge the receiv ed bits, and t wo pulses one for the transmission end and one for the termination wa ve), th us a total b ound of 6 n pulses generated b y the algorithm. ◀ ▶ Lemma 16. Consider a maximal interval of pr o c esses that exe cute CCWBitRelay (A lgo- rithm 8), while al l other pr o c esses exe cute CCWBitSending . Then this interval is e quivalent to a single link b etwe en its two neighb oring active pr o c esses and al l r elays in the interval terminate. If the le ader is a CCWBitRelay it wil l b e the last pr o c ess to terminate. Pro of. Eac h rela y forwards every incoming pulse on p ort q to port 1 − q (except in the single sp ecial case describ ed b elow) and up dates cnt b y +1 on p ort 1 and by − 1 on p ort 0 (lines 5–11). Thus pulses trav erse the relay interv al without duplication or loss, and in the same direction as in a single-link segment. Non-leader relays run the lo op while cnt ≥ − 1 , so they stop only after the net num b er of pulses from p ort 0 exceeds b y at least 1 those from p ort 1 , which happ ens only when a rela y forwards the termination pulse (observe that the cnt = − 1 when the rela y forwards the transmission end pulse). The leader relay additionally injects one synchronization pulse on p ort 1 (line 3) and suppresses forw arding of exactly one ﬁnal pulse arriving from p ort 0 when cnt = − 1 (guard in the if -statemen t), this pulse is the termination pulse that is reaching bac k the leader. Pro cessing that last pulse still decrements cnt to − 2 , so the lo op condition fails and the leader rela y terminates. F rom the p oint of view of its tw o active neighbors, thus every pulse en tering the interv al even tually exits on the other side and each relay terminates exactly after forwarding the termination pulse. ◀ ▶ Theo rem 17. In an oriente d ring wher e active pr o c esses run CCWBitSending with inputs b i ∈ { 0 , 1 , ⊥} and al l other pr o c esses run CCWBitRelay , every pr o c ess terminates and e ach active pr o c ess p i r eturns exactly the bit of its clo ckwise neighb or p i +1 (tr e ating r elay se gments as links). During the exe cution of the algorithm at most 6 n pulses ar e sent. Mor e over, the c ombine d exe cution of CCWBitSending / CCWBitRelay has the c omp osable ending pr op erty. Pro of. Collapse eac h maximal relay interv al into a single abstract link. By Lemma 16, this do es not c hange the b ehavior observed by the active endp oints, and all rela ys terminate when the endp oints do. On the resulting virtual ring, all pro cesses are active and execute CCWBitSending , so Lemma 15 applies: every pro cess terminates and p i returns b i +1 . The comp osable ending prop erty follows from Lemma 15 on the virtual ring, together with the fact that rela ys only forward pulses and terminate in the same ﬁnal termination wa ve (Lemma 16). ◀ J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 19 6.2 ComputeOR Algorithm The pseudo co de is shown in Algorithm 9. In this algorithm, each pro cess p i tak es as input a Bo olean v ariable bool i , and the algorithm allows all pro cesses to compute the global logical OR W p i ∈ Π bool i . The leader pro cess p ℓ is the one that kick-starts the pro cess by sending a counter-clockwise message if bool l is false and clo ckwise otherwise. Supp ose that bool l is false . The counter-clockwise message sent by p ℓ will b e forwarded b y each non-leader pro cess p i for which bool i = false . If all pro cesses hav e bool i = false , the leader will even tually receiv e the message back on p ort 1 (see line 9). The leader will then send another counter-clockwise message, whic h will circulate around the ring and inform each pro cess that the result of the OR op eration is false . Each pro cess detects this condition up on receiving t wo consecutive messages on p ort 1 (see line 28), after which it decides that the output is false . When this ﬁnal message returns to the leader, the leader also sets its output to false . Supp ose no w that at least one pro cess p j has bool j = true , restricting ourselves now to the case where p j  = p ℓ . In this case, when p j receiv es the counter-clockwise message initiated b y the leader, it will send a clo ckwise message instead of forwarding the coun ter-clo c kwise one (see lines 21–23). This clo c kwise message propagates back tow ard the leader, indicating to eac h pro cess receiving this message that at least one pro cess holds a true v alue. All pro cesses in the clo ckwise segment b etw een p ℓ and p j (excluding p ℓ and p j ) terminate outputting true when the message is receiv ed (they receiv ed tw o consecutive messages on p orts that are not b oth 1 ). When the leader receives this message on p ort 0 (see lines 14–18), it concludes that the global OR is true . The leader then sends a message clockwise. This message serves to inform all pro cesses in the clo ckwise segmen t from p ℓ to p j that the result of the OR is true . The message is forwarded until it reac hes p j , whic h is waiting for it at line 22. At this p oin t, p j sends the message bac k counterclockwise. This returning message ensures that all pro cesses in the aforemen tioned segment with input false can decide: they will detect tw o consecutiv e messages arriving on diﬀerent p orts (not b oth on p ort 1) and thus conclude with output true . Pro cesses with input true also terminate during this phase, as they execute lines 21–23. Finally , the leader decides that the output is true up on receiving this message bac k. When bool l = true , the leader sends a ﬁrst message clo ckwise. This message is forwarded b y all pro cesses in the netw ork until it returns to the leader. Any forw arding pro cess with input true then waits for pulse arriving from the counter-clockwise direction. F or this reason, the leader subsequen tly sends a counter-clockwise message to make all pro cesses commit to a v alue. This counter-clockwise message is forw arded by all pro cesses, whic h commit up on its reception to output true . Pro cesses with output false recognize this condition by receiving t wo consecutive messages that do not b oth arrive on port 1. Finally , to make ComputeOR safely comp osable with subsequent pro cedures (Section 4), w e app end a last clo ckwise b arrier wave after the OR v alue has b een determined. This barrier is collectively executed b efore the termination of ComputeOR . The leader sends one ﬁnal pulse clo c kwise; eac h non-leader forwards it once and then returns; ﬁnally , the leader receives the pulse back and returns. This adds exactly n pulses and ensures that the pro cedure has the comp osable ending prop ert y . 6.2.1 Co rrectness of the ComputeOR Algo rithm W e now prov e the correctness of Algorithm 9. 20 Eﬃcient Computing in Content-oblivious Communication Algo rithm 9 ComputeOR ( bool ): returns tr ue if at least one pro cess called it with tr ue , f al se otherwise. 1 ans ← tr ue 2 if leader then 3 if bool then 4 send a message on p ort 1 5 else 6 send a message on p ort 0 7 receive message on p ort q 8 if leader then 9 if q = 1 then 10 send message on p ort 0 11 receiv e message on p ort 1 12 ans ← f al se 13 else 14 if bool then 15 send message on p ort 0 16 else 17 send message on p ort 1 18 receiv e message on p ort 1 19 else 20 if bool then 21 send message on p ort 1 22 receiv e message on p ort 1 − q 23 send message on p ort 0 24 else 25 send message on p ort 1 − q 26 receiv e message on p ort u 27 send message on p ort 1 − u 28 if u = 1 ∧ q = 1 then 29 ans ← f al se // Barrier for composability 30 if leader then 31 send message on p ort 1 32 receiv e message on p ort 0 33 else 34 receiv e message on p ort 0 35 send message on p ort 1 36 return ans J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 21 T o show that the returned v alues are correct, we distinguish cases based on the vector ( b o ol i ) p i ∈ Π . W e wan t also the remark that, by construction, at each instant of execution of ComputeOR at most one pulse is present in the net work. ▶ Lemma 18. A ssume bool i = false for every pr o c ess p i . Then every pr o c ess terminates and outputs false . Mor e over, this exe cution has the c omp osable ending pr op erty. Pro of. Since bool ℓ = false , the leader p ℓ sends its initial pulse coun ter-clo c kwise on p ort 0 at line 6. Consider the pulse that starts at p ℓ on p ort 0 . Let p i b e the ﬁrst non-leader reached counter-clockwise from p ℓ . By orien tation, this pulse is received on p ort 1 , hence in line 7 we hav e q i = 1 . Since bool i = false , p i executes the “false” branch at line 24 and sends a message on p ort 1 − q i = 0 at line 25, i.e., further coun ter-clo c kwise. By induction along the counter-clockwise direction, the same holds at every non-leader: the ﬁrst pulse received is this pulse (arriving on p ort 1 ), so q i = 1 , and the pro cess forwards the pulse on p ort 0 . Even tually the pulse reaches p ℓ from its clo c kwise neigh b or and is receiv ed on p ort 1 , so at the leader w e hav e q ℓ = 1 in line 7. Since q ℓ = 1 , the leader takes the branc h at line 9. It sends another pulse on p ort 0 at line 10, starting a se c ond coun ter-clo c kwise wa ve, and then waits to receiv e a pulse on p ort 1 at line 11. Up on that receive, it sets ans ← false at line 12. No w consider the second pulse emitted by the leader on p ort 0 . Eac h non-leader p i has already executed lines 7 and 25, and is blo ck ed at its second receive in line 26. The next message it receives is exactly this second wa ve, coming from its clo c kwise neighbor on p ort 1 . Hence u i = 1 for all non-leaders. Pro cess p i then sends on p ort 1 − u i = 0 at line 27, forw arding the second wa ve, and since u i = 1 ∧ q i = 1 holds at line 28, it sets ans ← false at line 29. The second wa ve even tually reac hes the leader from its clockwise neighbor on p ort 1 , unlo c king the receive at line 11, after which the leader has ans = false . A t this p oint the ring is quiescent, and the algorithm executes the ﬁnal clo ckwise barrier w av e (lines 36 and ab o ve). Each pro cess returns false after forw arding the barrier, and the leader returns last after receiving it back. ◀ ▶ Lemma 19. A ssume bool ℓ = false and ther e exists at le ast one non-le ader p j with bool j = true . Then every pr o c ess terminates and outputs true . Mor evo er, this exe cution has the c omp osable ending pr op erty. Pro of. Let p j b e the ﬁrst pro cess with input true encoun tered when moving counter-clockwise from the leader. That is, p j is the ﬁrst true pro cess on the sequence p ℓ , p ℓ − 1 , p ℓ − 2 , . . . . P artition the ring into three disjoint sets: A : the non-leaders b etw een p ℓ and p j along this counter-clockwise path (excluding p ℓ and p j ); B : the non-leaders on the other (clo ckwise) arc b et w een p ℓ and p j ; the tw o distinguished pro cesses p ℓ and p j . Since bool ℓ = false , p ℓ sends a pulse on p ort 0 at line 6. F or each p i ∈ A , the ﬁrst pulse receiv ed (at line 7) is this, arriving on p ort 1 , so q i = 1 . Because bool i = false , each such p i executes the false branch at line 24, sends on p ort 1 − q i = 0 and waits at line 26. Hence the pulse is forwarded counter-clockwise through A and reaches p j from its clo ckwise neighbor on p ort 1 , giving q j = 1 . 22 Eﬃcient Computing in Content-oblivious Communication Since bool j = true and q j = 1 , p j executes the true branch at line 20. It sends a message on p ort 1 (clo ckwise) at line 21 and then w aits at line 22 for a message on p ort 1 − q j = 0 . It will later send on p ort 0 at line 23 and then return true after the ﬁnal barrier w av e. Consider the pulse that p j sends on p ort 1 . It trav els clo c kwise through all pro cesses in A and even tually reaches the leader. Eac h p i ∈ A has already q i = 1 , sen t a counter-clockwise message on p ort 0 , and is blo c k ed at its second receive in line 26. The next message delivered to p i is this clo ckwise w a ve, arriving from its coun ter-clo c kwise neighbor on p ort 0 , so u i = 0 . Pro cess p i then sends on port 1 − u i = 1 at line 27, forw arding the bac k wa ve, and since u i = 0 , the condition u i = 1 ∧ q i = 1 at line 28 is false. Hence p i k eeps ans = true . The same clo ckwise wa ve ev en tually reaches the leader, which receives it as its ﬁrst message at line 7 on p ort 0 , hence q ℓ = 0 . A t the leader w e hav e q ℓ = 0 and bool ℓ = false . So it takes the “else” branc h of line 9 and the “else” at line 16, sending a message on p ort 1 at line 17, and then waiting for a message on p ort 1 at line 18. Eac h p i ∈ B has not yet received any message. Its ﬁrst receive at line 7 is the pulse, arriving from its counter-clockwise neighbor on p ort 0 , so q i = 0 . Then: if bool i = false , p i executes the false branch, sends on p ort 1 − q i = 1 and waits for its second message at line 26; if bool i = true , p i executes the true branch, sends on p ort 1 and waits for its second message on p ort 1 − q i = 1 at line 22. In either case, the pulse contin ues clo ckwise through B un til it reaches p j from the other side on p ort 0 . Recall that p j is blo c k ed w aiting at line 22 for a message on p ort 1 − q j = 0 . The pulse now arriv es at p j from its clo ckwise neigh b or on p ort 0 , satisfying this receive. Then p j sends a message on p ort 0 (counter-clockwise) at line 23 and k eeps ans = true . The counter-clockwise message sent by p j tra v els from p j bac k to the leader through the pro cesses in B . It is sen t on p ort 0 and thus eac h predecessor in the counter-clockwise direction receives it on p ort 1 . F or a pro cess p i ∈ B : If bool i = false , then is blo ck ed at line 26. It now receiv es the pulse on p ort 1 , so u i = 1 . It sends on port 1 − u i = 0 at line 27, forwarding the pulse, and since q i = 0 the condition u i = 1 ∧ q i = 1 is false; thus it keeps ans = true . If bool i = true , then is blo ck ed at line 22 waiting on p ort 1 − q i = 1 . The termination w a ve arrives on p ort 1 and satisﬁes this receive. The pro cess then sends on p ort 0 and k eeps ans = true . Hence every pro cess in B has ans = true . Finally , the pulse reaches the leader from its clo ckwise neighbor on p ort 1 . This is the message the leader is w aiting for at line 18. A t this p oint the ring is quiescent and all pro cesses execute the ﬁnal clo ckwise barrier wa ve, after whic h every pro cess returns true (line 36), with the leader returning last. ◀ ▶ Lemma 20. A ssume bool ℓ = true (other inputs arbitr ary). Then every pr o c ess terminates and outputs true . Mor ever, the exe cution has the c omp osable ending pr op erty. Pro of. Since bool ℓ = true , the leader sends its initial pulse clo ckwise on p ort 1 at line 4. F ollow this pulse from p ℓ . At each non-leader p i , this wa ve is receiv ed from its counter- clo c kwise neigh b or on p ort 0 , hence the ﬁrst receive in line 7 sets q i = 0 . If bool i = false , then p i executes the false branch at line 24, sends on p ort 1 − q i = 1 and w aits at line 26. If bool i = true , it executes the true branch at line 20, sends on p ort 1 J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 23 and waits at line 22 on p ort 1 − q i = 1 . In all cases, the pulse is forw arded clo c kwise on p ort 1 , so it do es a full tour of the ring and even tually returns to the leader from its coun ter-clo ckwise neighbor on p ort 0 . Therefore, the leader’s ﬁrst receive at line 7 has q ℓ = 0 . A t the leader, we hav e q ℓ = 0 and bool ℓ = true . Th us it executes the “if ” branc h at line 14: it sends a termination pulse on p ort 0 at line 15 and then waits for a pulse on p ort 1 at line 18. Consider an y non-leader p i . After Phase 1, it has q i = 0 , has sen t once on p ort 1 , and is blo c k ed waiting for a second message: on any p ort u i at line 26 if bool i = false ; on p ort 1 − q i = 1 at line 22 if bool i = true . The ﬁnal pulse propagates counter-clockwise: it is alw a ys sen t on p ort 0 and thus received on p ort 1 by the previous pro cess in that direction. Hence, for every non-leader p i , the second pulse it receives is the termination pulse on p ort 1 . If bool i = false , its second receive in line 26 yields u i = 1 . The pro cess then sends on p ort 1 − u i = 0 at line 27, forwarding the termination pulse, and since q i = 0 the condition u i = 1 ∧ q i = 1 is false; it therefore keeps ans = true . If bool i = true , the pro cess was waiting at line 22 for a pulse on p ort 1 − q i = 1 . The termination pulse arrives on p ort 1 and unlo ck this receive; the pro cess then sends on p ort 0 at line 23 and keeps ans = true . After the coun ter-clo c kwise termination w a ve returns to the leader at line 18, the ring is quiescen t and all pro cesses execute the ﬁnal clo ckwise barrier wa v e. Afterwards every pro cess returns true (line 36), with the leader returning last. ◀ ▶ Theo rem 21. F or any assignment of input bits ( bool i ) n − 1 i =0 , every pr o c ess terminates when exe cuting Algorithm 9, and al l pr o c esses output the same Bo ole an: W n − 1 i =0 bool i . The total numb er of pulses use d is exactly 3 n . Mor e over, ComputeOR has the c omp osable ending pr op erty. Pro of. There are tw o p ossible v alues for the global OR. Case 1: W i bool i = false . Then all inputs are false . By Lemma 18, every pro cess terminates and outputs false , which equals the global OR. Case 2: W i bool i = true . Then at least one pro cess has input true . If bool ℓ = true , Lemma 20 applies, and every pro cess terminates and outputs true . If bool ℓ = false , Lemma 19 applies (since there exists at least one non-leader with input true ), and every pro cess terminates and outputs true . In all three cases we hav e the comp osable ending by the appropriate lemma. Moreov er, each non-leader pro cess sends exactly tw o messages in the OR-computation phase: one after its ﬁrst receive ( lines 21, or 25) and one after its second receive (lines 23, or 27). In addition, eac h pro cess sends exactly one message in the ﬁnal clo ckwise barrier wa ve. It is easy to observ e that the leader also sends exactly tw o messages in the OR-computation phase and one message in the barrier. Hence every execution uses exactly 3 n pulses. ◀ 6.3 Co rrectness of the MsgExchange Algo rithm W e are now ready to prov e the correctness of the MsgExchange algorithm. ▶ Theo rem 22. Consider an oriente d ring with a unique le ader. L et e ach active pr o c ess exe cute Algorithm 5 with input messages msg cw and msg ccw (p ossibly empty), and let e ach 24 Eﬃcient Computing in Content-oblivious Communication r elay pr o c ess exe cute A lgorithm 6. Each active pr o c ess r e c eives exactly the message sent to it by its clo ckwise and c ounter-clo ckwise active neighb ors (tr e ating r elay se gments as links), and A lgorithm 5 r eturns these two messages (the message sent c ounter-clo ckwise by its clo ckwise neighb or and the message sent clo ckwise by its c ounter-clo ckwise neighb or). Mor e over, the c ombine d exe cution of Algorithms 5 and 6 has the c omp osable ending pr op erty. L et L b e the length of the maximum message sent. Then the exe cution sends at most 15 nL + 3 n pulses. That is O(L) pulses p er pr o c esses. Pro of. Collapse eac h maximal relay interv al into a single abstract link. By Theorem 17 and symmetry , each inv o cation of CWBitSending / CWBitRelay and CCWBitSending / CCWBitRelay terminates and each active pro cess receives exactly the corresp onding bit from its neigh b or (treating rela y segments as links). By Theorem 21, each inv o cation of ComputeOR terminates and returns the global OR of the lo cal activ e ﬂags. Since these subroutines hav e the comp osable ending prop erty , rep eated applications of Theorem 9 imply that they can b e executed bac k-to-back in each iteration without cross-interference. Therefore, in each phase, ev ery activ e pro cess sends at most one bit clockwise and one bit coun ter-clo ckwise, and app ends exactly the bits it receives from its tw o neighbors. The lo op contin ues un til all active pro cesses hav e no bits left to send, whic h is detected simultaneously when ComputeOR(active) returns false . Relay pro cesses alwa ys pass false , hence they terminate in the same iteration as the active ones. Eac h inv oked subroutine is quiescent terminating, th us the entire execution is quiescent terminating. The pulse bound follows by summing the costs of the L bit-sending phases ( ≤ 12 nL pulses) and the L + 1 calls to ComputeOR ( = 3 n ( L + 1) pulses). Finally , since the last inv oked pro cedure b efore returning from Algorithms 5 and 6 is ComputeOR , the combined message-exc hange pro cedure also has the comp osable ending prop erty . ◀ Pro of of Theorem 14. Fix one sync hronous round of A on the virtual ring R of activ e pro cesses (relay segments contracted in to links). In the simulation, each active process enco des the b bits it would send clo ckwise (resp. counter-clockwise) in this round as lists msg cw and msg ccw and executes MsgExchange ( msg cw , msg ccw ) ; eac h rela y pro cess executes MsgRelay . By Theorem 22, each active pro cess receiv es exactly the messages sen t to it b y its tw o active neigh b ors in R , and thus it can p erform the same state up date as in A for this round. By the comp osable ending prop erty in Theorem 22, the simulation of consecutive rounds do es not create interference b etw een pulses of diﬀeren t rounds. The complexit y of O ( b ) pulses p er round follows directly from Theorem 22. ◀ 7 Message exchange on 2-edge-connected net w orks In this section, we extend the message sending routine of Section 6 to the case of general 2-edge-connected graphs. ▶ Theo rem 7 (Algorithm simulation for 2-edge-connected netw orks) . L et A b e an algorithm in a message-p assing, synchr onous, and anonymous 2-e dge-c onne cte d network G = ( V , E ) , wher e e ach pr o c ess c an send b bits p er r ound on e ach e dge. Ther e is a quiesc ently terminating c ontent-oblivious algorithm on a ring with a unique le ader that simulates A , with the fol lowing sp e ciﬁc ations. Ther e is a pr e-pr o c essing step with O ( n 8 log n ) pulses, wher e n is the numb er of pr o c esses in G . J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 25 After that, simulating a r ound of CONGEST [ b ] r e quir es sending O ( b ) pulses along e ach e dge, i.e, with a c onstant multiplic ative overhe ad. W e outline one step of the simulation for a 2-edge-connected netw ork here. Robbins cycle construction W e ﬁrst use Algorithm 4 of [ 6 ] to obtain a Robbins cycle of the graph. A Robbins cycle on G is a directed cycle that go es through all vertices of G , and an y edge of G that app ears multiple times in the Robbins cycle alw ays follo ws the same orien tation. This is p ossible from the initial assumption of a unique leader and distinct IDs. ▶ Theo rem 23 ([ 6 , Lemma 19]) . L et G = ( V , E ) b e a 2-e dge-c onne cte d network, and n = | V | . Ther e is a c ontent-oblivious algorithm that c onstructs a Robbins cycle C of length O ( n 3 ) , which takes O ( n 8 log n ) pulses, such that e ach pr o c ess knows its clo ckwise and c ounter-clo ckwise neighb ors for e ach of its o c curr enc es in C . Net wo rk top ology construction Giv en the Robbins cycle, it already suﬃces in terms of feasibilit y for the purp ose of simulating any algorithm. ▶ Theo rem 24 (Theorem 10 of [ 6 ]) . L et G = ( V , E ) b e a 2-e dge-c onne cte d gr aph, n = | V | , and C b e a R obbins cycle over G . A ny asynchr onous pr oto c ol π that tr ansmits b bits c an b e simulate d with O ( | C | · b + | C | log | V | ) c ontent-oblivious pulses. Ho wev er, using such a simulation naiv ely can b e costly in the long run, as each message size explo des at least by a factor of | C | , whic h is at least Ω( n ) . Instead, w e will use the Robbins cycle only to disseminate knowledge of netw ork top ology , to devise a less costly simulation. F ollowing the assumption of [ 6 ] that eac h ID has size O ( log n ) , there is a proto col suc h that eac h pro cess learns the netw ork top ology with O ( | m | log n ) bits (by rep orting each edge), whic h requires O ( | m | · n 3 log n ) pulses. Combined with step 1, O ( n 8 log n ) pulses are required to construct the netw ork top ology at all pro cesses. Cycle cover ﬁnding An Eulerian subgraph of G is a subgraph where all vertices hav e ev en degree. W e make use of the follo wing fact. ▶ Lemma 25 ([ 1 , Lemma 4.1]) . Every 2-e dge-c onne cte d gr aph c an b e c over e d by thr e e Eulerian sub gr aphs (i.e., e ach e dge app e ars in at le ast one Eulerian sub gr aph). F urther, every Eulerian subgraph can b e brok en down into cycles by iteratively removing cycles. Let every pro cess v , with the knowledge of netw ork top ology , run a deterministic algorithm to obtain a cycle cov er of the graph G lo cally as C = ( C 1 , C 2 , . . . , C k ) . Let C v denote the subsequence (maintaining order) containing cycles in which pro cess v is present. Note that each edge app ears in one, tw o, or three cycles. On-cycle simulation Now v in v oke the (eﬃcient) CONGEST [ b ] sim ulation algorithm for rings (Algorithm 5), for each cycle contained in C v , in the order of C v . F or a message that v w ould lik e to send to neighbor u , v transmits the message at its earliest p ossible timing, i.e., when simulating the ﬁrst cycle among C v that contains edge { u, v } . F or subsequen t cycles among C v that also con tains edge { u, v } , v remains silen t (i.e., sending empt y message ⊥ ) to u . Notably , when sim ulating message passing for one cycle, only tw o incident edges of v are relev ant. F or currently irrelev ant edges, v holds incoming pulses while the simulation of the curren t cycle is on-going. 26 Eﬃcient Computing in Content-oblivious Communication T o prov e the correctness of the entire simulation, assume pro cess v is contained in distinct cycles cov ers C i and C j . W e sho w that a pulse sent by v ’s neighbor u while simulating C i will not b e pro cessed by v if the latter is simulating C j . If u do es not b elong to C j , v will receiv e such a pulse from an irrelev ant ingress p ort from its ongoing simulation and will hold it temp orarily during the simulation of C j . Now, let u b elong to C j , hence the edge { u, v } b elongs to C i and C j . It is prov en in Theorem 22 that the simulation of C i (or C j , whichev er has a smaller index) has c omp osable ending pr op erty , hence can b e safely concatenated to the latter sim ulation, according to Theorem 9. That means b oth u and v can correctly attribute pulses to the ring simulation that spawns them, hence pro ving the correctness claim. Complexit y The pulses needed at one pro cess to simulate a round of passing a message of size b in the 2-edge-connected net w ork is O ( b ) , i.e., multiplicativ e ov erhead. This follows from the multiplicativ e ov erhead of sim ulating one round of CONGEST [ b ] on a ring as in Theorem 14, together with the fact that eac h edge is at most cov ered by three cycles. 8 Computing self-decomp osable aggregation functions In this section, we design an eﬃcient aggregation algorithm for conten t-oblivious rings by com bining the simulator from Theorem 14 with the ComputeOR algorithm. The pro cedure ComputeOR is describ ed in Algorithm 9, and its correctness and p erformance guarantees are established in Theorem 21. The simulator is used to run an MIS algorithm ov er a virtual ring ov er the active pro cesses. ▶ Lemma 26 (Cole–Vishkin [ 10 ]) . In the CONGEST [1] mo del, for a ring C with a designate d le ader and whose pr o c esses have distinct λ -bit IDs, ther e exists a deterministic algorithm that c omputes an MIS of C c ontaining the le ader in O ( λ ) r ounds. Pro of. The lemma follows from the classical Cole–Vishkin color reduction algorithm [ 10 ]. In an oriented ring, a prop er k -coloring can b e transformed into a prop er O ( log k ) -coloring in one round by exchanging colors with neighboring pro cesses. Starting from the distinct IDs as an initial prop er coloring, rep eating this reduction for O ( log ∗ n ) iterations yields a prop er O (1) -coloring. When implemented in the CONGEST [1] mo del, the total num b er of rounds required for the color-reduction pro cess is λ + O (log λ ) + O (log log λ ) + · · · = O ( λ ) . Finally , a prop er O (1) -coloring can b e conv erted into a maximal indep endent set that includes the leader in O (1) additional rounds. Let k = O (1) denote the num b er of colors. Initially , the leader joins the MIS. Then, for i = 1 , 2 , . . . , k , eac h pro cess of color i joins the MIS if none of its neighbors has already joined. ◀ A t a high lev el, our aggregation algorithm pro ceeds iteratively . In each iteration, starting from the original ring, we compute an MIS of the current ring by simulating the algorithm of Lemma 26 using the simulator from Theorem 14. Pro cesses outside the indep endent set send their lo cal aggregates to a neighboring pro cess in the indep endent set and then b ecome inactiv e. The remaining pro cesses form a virtual ring on which the algorithm contin ues. After O ( log n ) iterations, only a single active pro cess remains, having accum ulated the aggregate of all v alues in the ring. Each iteration incurs O ( λ + β ) pulses p er pro cess, where the O ( λ ) term arises from the MIS computation and the O ( β ) term from transmitting lo cal aggregates. Thus, the ov erall message complexity is O (( λ + β ) log n ) pulses p er pro cess. Implemen ting this approach in v olves several subtleties. In particular, b oth the num b er of pro cesses n and the maximum identiﬁer length λ are initially unknown. Consequen tly , the J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 27 algorithm must incorp orate mechanisms that allo w all pro cesses to detect the termination of the o v erall computation, as well as the completion of individual subroutines; this is achiev ed using ComputeOR . ▶ Theo rem 1 (Eﬃcient aggregation) . L et f b e a self-de c omp osable aggr e gation function deﬁne d on multisets over a universe U . Consider a c ontent-oblivious ring C of n pr o c esses with a designate d le ader, wher e e ach pr o c ess p is e quipp e d with a distinct identiﬁer of at most λ bits and holds an element x p ∈ U . A ssume that for every non-empty subset S of pr o c esses, the bit length of the value f  U p ∈ S { x p }  is at most β . Then ther e exists a deterministic algorithm that quiesc ently terminates and computes f  U p ∈ C { x p }  at al l pr o c esses using O (( λ + β ) n log n ) pulses. Pro of. W e b egin with a prepro cessing step that allows all pro cesses to determine λ , which is required to run the MIS algorithm of Lemma 26. The prepro cessing pro ceeds as follows. F or i = 1 , 2 , 3 , . . . , w e execute ComputeOR , where the Bo olean input of each pro cess indicates whether its iden tiﬁer length is at most i . The lo op terminates once the outcome of ComputeOR is false . A t that p oint, all pro cesses learn that λ equals the largest index for which the outcome was true . This prepro cessing step requires O ( nλ ) pulses. The main algorithm op erates in phases. At the b eginning of phase t , there is a set A ( t ) of active pro cesses, whic h alwa ys includes the leader, and each activ e pro cess p stores a v alue w p ( t ) . Initially , A (0) consists of all pro cesses in the ring, and w p (0) = f ( x p ) for every pro cess p . Each phase consists of the following three steps. 1. Loneliness test. W e run ComputeOR , where the Bo olean input of each pro cess indicates whether it is a non-leader in A ( t ) . If the outcome is false , then all pro cesses know that | A ( t ) | = 1 , and the algorithm terminates. This step requires O ( n ) pulses. 2. MIS on active processes. W e execute the MIS algorithm from Lemma 26 on the logical ring induced b y A ( t ) using the simulator of Theorem 14. Let I ( t ) ⊆ A ( t ) denote the resulting MIS, which, by Lemma 26, is guaranteed to include the leader. Each activ e pro cess learns whether it b elongs to I ( t ) . This step requires O ( λn ) pulses. 3. Aggregation. Eac h pro cess in I ( t ) b ecomes the cen ter of a cluster and remains active in the next phase; thus, A ( t + 1) = I ( t ) . Ev ery pro cess p ∈ A ( t ) \ I ( t ) selects a neighboring pro cess p ′ ∈ I ( t ) , joins the cluster of p ′ as a leaf, and sends its v alue w p ( t ) to p ′ . Each cluster center p ′ then applies the asso ciative op erator ⊕ to its o wn v alue w p ′ ( t ) and to all v alues received from its lea v es, and sets w p ′ ( t + 1) to the resulting aggregate. Recall that ⊕ satisﬁes, for any disjoint multisets X 1 and X 2 , f ( X 1 ⊎ X 2 ) = f ( X 1 ) ⊕ f ( X 2 ) . This aggregation step requires O ( β ) rounds in the CONGEST [1] mo del and can therefore b e implemented using the simulator from Theorem 14. W e emphasize that the simulator w orks even when messages hav e v arying lengths and the parameter β is unknown. This step requires O ( β n ) pulses. By construction, the algorithm even tually terminates with the leader as the only active pro cess, holding the v alue f  U v ∈ C { x v }  . Since an MIS of a ring contains at most half of the pro cesses, the num b er of phases is O (log n ) . The total message complexity is therefore O (log n ) · ( O ( n ) + O ( λn ) + O ( β n )) = O (( λ + β ) n log n ) , as required. 28 Eﬃcient Computing in Content-oblivious Communication It remains to disseminate the ﬁnal aggregate to all pro cesses. One option is to use Algorithms 3 and 4: the leader runs SndCount  f  U v ∈ C { x v }  while all other pro cesses run RcvCount (). The dissemination step requires O ( nβ ) pulses. Alternativ ely , the ﬁnal v alue can b e broadcast along the hierarchical decomp osition constructed by the algorithm. W e trav erse the phases in reverse order, and in eac h phase t , the center of each cluster broadcasts the v alue f  U v ∈ C { x v }  to the leav es of its cluster. As in the aggregation step, this broadcast can b e implemented using O ( β n ) pulses. Since there are O ( log n ) phases, the total num b er of pulses required for this dissemination step is O ( β n log n ) . ◀ 9 Minimum ﬁnding and multiset computation In this section, we describ e an algorithm to compute the minimum input x min of the pro cesses and then sho w how to use it to compute the entire multiset of inputs. Applying the algorithm of Section 8 to the minimum-ﬁnd ing problem would yield an O ( B n log n ) -pulse algorithm where B is an upp er b ound on the size of the identiﬁers and on the size of the inputs. Moreo v er, this algorithm would require unique identiﬁers. In Section 9.1, we present an algorithm that uses O ( | x min | n ) pulses and works without IDs, establishing the following theorem. ▶ Theo rem 5. In an anonymous ring of n pr o c esses with a designate d le ader wher e e ach pr o c ess has an input x p ∈ N , ther e exists a deterministic, quiesc ently terminating algorithm that c omputes x min = min p { x p } using O ( n | x min | ) pulses. If we wan t to compute the set of inputs, w e can iteratively apply the algorithm of Theorem 5 in order to discov er all the initial inputs in increasing order. If we w ant to compute the multiset {{ x p }} of v alues, i.e., the set of initial v alues together with their m ultiplicities, we show in Section 9.2 we can do it by iteratively applying the algorithm of Theorem 5 and the counting algorithm of Theorem 4 or Corollary 2, proving the following theorem. ▶ Theo rem 6. In a ring of n pr o c esses, with distinct O ( log n ) -bit IDs and designate d le ader, wher e e ach pr o c ess has an input x p ∈ N , ther e exists a deterministic, quiesc ently terminating algorithm that c omputes the multiset {{ x p }} of inputs using O ( nD ( B + log 2 n )) pulses wher e B = max p {| x p |} and D is the numb er of diﬀer ent inputs. 9.1 Minimum ﬁnding In this section we pro ve Theorem 5. Supp ose that each pro cess p j starts with an input x j . W e now discuss a leader-based algorithm to ﬁnd the minimum input x min with message complexit y O ( n · B ) . This pro cedure will then b e used to compute the multiset of the inputs. W e ﬁrst compute the length B = | x min | of the minim um input x min using ComputeOR . Starting with B = 0 , we increment B un til there is an active pro cess with an input of length B , and then we keep as activ e the pro cesses that hav e an input of minimum length. This can b e done using B iterations of ComputeOR that use O ( nB ) pulses in total 2 . F rom now on, w e assume that every input is represented as a bit string of equal length B . 2 Using binary search, one can even compute the length of the minimum input with O ( n log B ) pulses, but since the second phase of the algorithm uses O ( nB ) pulses, this do es not improv e the asymptotic complexity of the algorithm J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 29 The minim um is discov ered bit by bit by comparing the arra y bits , whic h contains the bit enco ding of the pro cess’s input, with the MSB at position 0 and the LSB at the last p osition. The pro cedure is in Algorithm 10. Each pro cess maintains a lo cal b o olean v ariable active and a lo cal array minBits . Intuitiv ely , minBits stores the preﬁx of the minimum input that has b een discov ered so far, and a pro cess is active exactly when its own input shares that preﬁx; only activ e pro cesses can still b e candidates for the global minimum. Initially , the active ﬂag records whether the pro cess participates in the search (the designated starting activ e set). When we wan t to ﬁnd the minimum all pro cesses are active . In phase i , a pro cess p reads its lo cal bit bit = bits [ i ] . T o determine the minimum bit at this p osition, each pro cess computes hasZero = ( active ∧ ( bit = 0)) , and all pro cesses in vok e ComputeOR on this v alue. If the result existsZero is true , then there exists at least one active pro cess with bit 0 , so the minimum bit at p osition i is 0 . In this case, ev ery activ e pro cess with bit 1 sets active ← false , and all pro cesses app end 0 to minBits . If instead existsZero is false , then no active pro cess has bit 0 , and all active bits must b e 1 ; the minimum bit is 1 , the active set remains unchanged, and all pro cesses app end 1 to minBits . After this step, the set of active processes is precisely the subset of participants whose input agrees with the newly extended preﬁx minBits . After len ( bits ) phases, the array minBits records the entire bitstring of the global minim um input, and each pro cess conv erts it into an integer and outputs in min . W e omit the discussion on the correctness of the algorithm describ ed as it is immediate from the correctness and comp osability of the ComputeOR pro cedure, the cost of the pro cedure is a total of 6 nB pulses as it calls 2 B times ComputeOR . Algo rithm 10 MinFinding(bits, active) computes the minimum input among the pro cesses that start with active=true; at the end the active ﬂag remains true exactly at those minima. The MSB of bits is in position 0 and the LSB is in the last p osition. 1 B ← 0 2 rep eat 3 B ← B + 1 4 until ComputeOR ( activ e ∧ ( B ≥ len( bits ))) 5 activ e ← activ e ∧ (len( bits ) = B ) 6 phase ← 0 7 minB its ← [ ] 8 while phase < len( bits ) do 9 bit ← bits [ phase ] 10 hasZ er o ← ( activ e ∧ ( bit = 0)) 11 existsZ er o ← ComputeOR ( hasZ er o ) 12 if existsZ ero then 13 if active ∧ ( bit = 1) then 14 activ e ← f alse 15 minB its .app end( 0 ) 16 else // no active bit 0: all active bits are 1 17 minB its .app end( 1 ) 18 phase ← phase + 1 19 minI nput ← convertBitsToInt ( minB its ) 20 return minI nput 9.2 Multiset computation In this section w e prov e Theorem 6. Supp ose that each pro cess p j starts with an input x j ∈ I nput . Our goal now is to compute the multiset of inputs. W e assume the existence of a leader p ℓ and work in the ID-equipp ed setting of Section 8 so that we can inv oke its 30 Eﬃcient Computing in Content-oblivious Communication coun ting pro cedure. W e main tain a lo cal b o olean ﬂag eligible , initially set to true at eve ry pro cess, indicating that its input has not y et b een accoun ted for. As long as ComputeOR ( eligible ) returns true , all pro cesses execute MinFinding ( bits , eligible ) (Algorithm 10 described in the previous section). This a algorithm iden tiﬁes the current minim um v alue x min among eligible pro cesses and sets activ e = true at exactly the eligible pro cesses with input x min . Next, these active pro cesses execute the counting algorithm of Section 8 on the virtual ring induced by the active set (all other processes act as relays), and all pro cesses learn the multiplicit y #( x min ) . Finally , eac h pro cess up dates eligible ← eligible ∧ ¬ activ e , remo ving all o ccurrences of x min from the eligible set, and the lo op contin ues. The pro cedure ab o v e is correct as all the algorithms comp osing it ha ve the comp osable ending. Complexit y: Let B b e the input bit length, let D b e the n um b er of distinct input v alues, and let k 1 , . . . , k D b e their multiplicities (so P r k r = n ). Each MinFinding in vocation costs at most O ( nB ) pulses, and it is executed once per distinct v alue, so the total cost of minimum-ﬁnding is O ( nB D ) . F or a v alue of multiplicit y k r , the counting step uses the algorithm of Section 8 to compute the sum of the lo cal indicator bits activ e , which costs O ( n log 2 n ) pulses. Hence the total counting cost is O ( nD log 2 n ) . Overall, the multiset computation uses O ( nB D + nD log 2 n ) = O ( nD ( B + log 2 n )) pulses. 10 Lo wer b ound In this section w e pro v e that when pro cesses do not know any upp er b ound on ring size, an y algorithm that solves counting must send Ω( n log n ) messages in some execution ev en if pro cesses ha ve distinct identiﬁers and a leader pro cess p ℓ is present. Interface traces of intervals. F or an interv al (contiguous subpath) I = ( p a , p a +1 , . . . , p b ) w e denote its tw o b oundary edges by e L ( I ) = { p a − 1 , p a } , e R ( I ) = { p b , p b +1 } . Fix an execution E and ﬁx its global interlea ving order of atomic even ts (a total order chosen b y the adversary). Let Σ = { L in , L out , R in , R out } . Consider the subsequence ( d 1 , . . . , d m ) of all delivery events in E that o ccur on the t wo b oundary edges e L ( I ) and e R ( I ) , listed in that global order. Deﬁne a mapping π I from these delivery even ts to Σ by π I ( d ) =            L in if d is a delivery at p a of a pulse that arrived fr om p a − 1 , L out if d is a delivery at p a − 1 of a pulse that arrived from p a , R in if d is a delivery at p b of a pulse that arrived from p b +1 , R out if d is a delivery at p b +1 of a pulse that arrived from p b . The interfac e tr ac e of I in E is the word trace E ( I ) := π I ( d 1 ) π I ( d 2 ) · · · π I ( d m ) ∈ Σ ∗ . Let | trace E ( I ) | denote its length. If M ( e ) is the num b er of pulses that trav erse edge e , then | trace E ( I ) | = M ( e L ( I )) + M ( e R ( I )) . (1) T wo interv als are tr ac e-e quivalent if their interface traces are iden tical words. J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 31 ▶ Observation 27 (Counting traces) . F or k ≥ 0 , the numb er of p ossible wor ds over Σ = { L in , L out , R in , R out } of length at most k is P k i =0 4 i ≤ 4 k +1 . ▶ Lemma 28 (Splicing with a shared b oundary) . L et E b e an exe cution of an algorithm A on an oriente d ring R . L et I 1 ⊂ I 2 b e intervals that shar e their left b oundary, e L ( I 1 ) = e L ( I 2 ) =: e L , and assume the le ader p ℓ is not in I 2 . If I 1 and I 2 ar e tr ac e e quivalent, that is trace E ( I 1 ) = trace E ( I 2 ) , then in the ring R ′ obtaine d by r eplacing I 2 with I 1 (pr eserving orientation and wiring to the same two external neighb ors) ther e exists an exe cution E ′ in which every pr o c ess outside I 2 has exactly the same lo c al history as in E . In p articular, the le ader has the same history and output in E and E ′ . Pro of. Fix the global in terleaving (total order) of even ts in E . Let O = ( o 0 , o 1 , . . . , o f ) b e the subsequence of all even ts that are not b oundary deliveries on I 2 ’s tw o b oundary edges (so O includes all sends and all deliveries a wa y from those tw o edges). F or each b oundary deliv ery δ on I 2 ’s b oundaries, let i ( δ ) ∈ {− 1 , 0 , . . . , s, s + 1 } b e the unique index such that δ o ccurs strictly after o i ( δ ) (if i ( δ ) > − 1 ) and b efore o i ( δ )+1 (if i ( δ ) < s + 1 ) in E (m ultiple b oundary deliv eries may share the same index; their in ternal order is the one from E ). Since trace E ( I 1 ) = trace E ( I 2 ) , the ordered list of left-b oundary lab els (those in { L in , L out } ) and the ordered list of right-boundary lab els (those in { R in , R out } ) are iden tical for I 1 and I 2 . W e now ﬁrst show the sequence of even ts in E ′ and then we sho w that such sequences is realizable. Exe cution E ′ on R ′ : (i) replay the en tire outside sequence O v erbatim; (ii) for every b oundary deliv ery δ on I 2 in E , insert in E ′ a boundary delivery δ ′ placed in the same gap after o i ( δ ) and b efore o i ( δ )+1 , with the same lab el π I 2 ( δ ) ∈ { L in , L out , R in , R out } . If the lab el is in { L in , L out } , perform δ ′ on the same physic al e dge e L ; if it is in { R in , R out } , perform δ ′ on the right b oundary of I 1 in R ′ (whic h has the same external right neighbor as I 2 had). R e alizability of E ′ : The pro of pro ceeds by a simple inductive argument. All ev en ts in O can b e replay ed without problem until the ﬁrst b oundary even t δ ′ . If δ ′ is incoming to I 1 ( L in or R in ), the sender is the external neighbor; since O is unchanged, that neighbor issues the corresp onding send in E ′ as in E , and we choose a ﬁnite delay so the delivery o ccurs after the prescrib ed index in O (more precisely b et ween o i ( δ ) and o i ( δ )+1 .). If δ ′ is outgoing from I 1 ( L out or R out ), in E the pro cesses of I 1 executed some in ternal sc hedule pro ducing exactly this b oundary word; by inductiv e hypothesis w e can replicate that internal schedule inside I 1 in E ′ and place the corresp onding send b efore the prescrib ed deliv ery , choosing a ﬁnite delay so it arriv es at the prescrib ed index in O (more precisely b et w een o i ( δ ) and o i ( δ )+1 .). Pulses carry no con ten t and delays are ﬁnite but otherwise unconstrained, so this is alw ays feasible irresp ectiv ely of the IDs of pro cesses in I 1 . Because O is replay ed v erbatim and each external neighbor sees the same ordered b oundary deliv eries at the same places among outside even ts, ev ery pro cess outside I 2 , in particular the leader, has the same lo cal history in E ′ as in E . ◀ W e now prov e the low er b ound. ▶ Theo rem 3 (Lo wer b ound) . In oriente d rings with c ontent-oblivious messages, wher e pr o c esses have unique IDs and ther e is a pr e-sele cte d le ader, any algorithm that c orr e ctly solves c ounting when pr o c esses do not know n must have an exe cution with message c omplexity Ω( n log n ) . Pro of. Assume for contradiction that there exists a correct algorithm A and an inﬁnite set of sizes for which the worst-case message complexit y is o ( n log n ) . T akes one of such n > 100 , 32 Eﬃcient Computing in Content-oblivious Communication sp eciﬁcally one where on a ring R of size n there is an execution E of A with total message complexit y M such that: 4 6 M n +1 < n/ 4 − 1 , notice that since the message complexit y of A is o ( n log n ) suc h situation exists for a suﬃcien tly large n . The pro of now pro ceeds by sho wing that such execution E mak es the leader output a wrong count. Recall that M ( e ) is the n um b er of pulses that tra v erse edge e (b oth directions) in execution E , so M = P e ∈ E dg es ( R ) M ( e ) where E dg es ( R ) is the set of ring edges, | E dg es ( R ) | = n . Let m := M n . W e now show that we can choose a sparse edge at at least distance n/ 2 + 1 from the leader. By a veraging and the fact that each M ( e ) is non-negative, at least n/ 2 edges satisfy M ( e ) ≤ 2 m , we deﬁne such edges as sparse edges. Among those, pick an edge e l = { u, v } with b oth u  = ℓ and v  = ℓ , that maximizes the distance from one of its endp oints to the leader p ℓ ; let v b e that endp oint, and let d b e the num b er of edges encountered when moving on the maxim um distance arc from v up to p ℓ , without loss of generality we assume that the arc is a clo c kwise arc, otherwise the pro of is the same by just assuming the arc counter-clockwise. By the fact that at least n/ 2 > 50 ≥ 5 edges are sparse and among any ﬁve edges, at least one has an endp oint v whose longer arc to p ℓ has length d ≥ n/ 2 + 1 . Recall that we hav e M ( e l ) ≤ 2 m . W e now show that there are at least n/ 4 interv als with sparse interface traces. F or each t = 0 , 1 , . . . , d − 1 , deﬁne the clo ckwise interv al I t := ( v , v + 1 , . . . , v + t ) , which do es not contain p ℓ . Then e L ( I t ) = e l , while e R ( I t ) = { v + t, v + t + 1 } . The edges e R ( I t ) are distinct for diﬀerent t and lie on the clo ckwise arc from v to ℓ . Th us we hav e: d − 1 X t =0 M ( e R ( I t )) ≤ M . A v eraging ov er all d edges we hav e that the av erage num b er of messages exchanged on edges in I d − 1 is: 1 d d − 1 X t =0 M ( e R ( I t )) ≤ M d ≤ M n/ 2 = 2 m. Hence, using the same av eraging algorithm ab o v e, at least d/ 2 ≥ n/ 4 indices t satisfy M ( e R ( I t )) ≤ 4 m . F or each such t , | trace E ( I t ) | ≤ M ( e L ( I t )) + M ( e R ( I t )) ≤ 2 m + 4 m = 6 m. Let T := { t ∈ { 1 , . . . , d − 1 } : M ( e R ( I t )) ≤ 4 m } . W e hav e | T | ≥ n/ 4 − 1 , and the family { I t : t ∈ T } is totally ordered by inclusion. By Observ ation 27, the num b er of distinct traces o ver Σ of length at most 6 m is at most 4 6 m +1 , and by our choice of the ring size and execution w e hav e 4 6 m +1 < n/ 4 − 1 . Hence there exist t 1 < t 2 in T , such that trace E ( I t 1 ) = trace E ( I t 2 ) . Set I 1 := I t 1 and I 2 := I t 2 ; then I 1 ⊂ I 2 , they share the left b order and neither con tains p ℓ . So we can apply Lemma 28. Let R ′ b e the ring obtained by replacing I 2 with I 1 (preserving orien tation), and let n ′ := | R ′ | < n . By Lemma 28, there is an execution E ′ of A on R ′ in which every pro cess outside I 2 , in particular the leader ℓ , has the same lo cal history as in E , and therefore the same output. J. Chalopin, Y.-J. Chang, G. A. Di Luna, H. Zhou 33 Since pro cesses do not know n , correctness of counting requires that, on a ring of size n , the leader even tually outputs n , and on a ring of size n ′ , the leader even tually outputs n ′ . But in E and E ′ the leader’s lo cal history is identical, so its output is the same in b oth executions. This cannot equal b oth n and n ′ , a contradiction. ◀ 11 Conclusions and op en p roblems In this work, we demonstrated that conten t-oblivious communication is far more p ow erful than previously understo o d. W e sho wed that message-passing algorithms on rings, and more generally on 2-edge-connected net w orks, can b e sim ulated in the con tent-oblivious setting with only a c onstant m ultiplicative ov erhead. This constitutes a substantial improv ement o ver the previously kno wn simulator of Censor-Hillel et al. [ 6 ], which incurs a large p olynomial m ultiplicative ov erhead. On rings, our sim ulator op erates with no prepro cessing b eyond the existence of a pre- selected leader. F or general 2-edge-connected netw orks, the simulation relies on a prepro cess- ing phase that requires sending a p olynomial num b er of pulses. While our results establish that suc h prepro cessing suﬃces to enable constant-o verhead simulation, an imp ortant op en question is whether the cost of this prepro cessing can b e signiﬁcan tly reduced. A central application of our simulator is a deterministic O ( n log 2 n ) -message counting algorithm for rings with unique O ( log n ) -bit IDs and a leader, which nearly matches our Ω( n log n ) low er b ound. Closing the gap b etw een these b ounds remains an intriguing op en problem. The picture is far less complete in the anonymous setting, where the p ow er and limitations of con ten t- oblivious computation remain largely unknown. Here, despite the absence of IDs and the inheren t diﬃculty of symmetry breaking, we show ed that counting is p ossible using O ( n 1 . 5 ) pulses in the presence of a leader. Our algorithm simultaneously solves the naming problem b y assigning unique O ( log n ) -bit IDs to all pro cesses. Is it p ossible to obtain substan tially more eﬃcient solutions for counting and naming in anonymous conten t-oblivious rings? References 1 N. Alon and M. T arsi. Cov ering multigraphs by simple circuits. SIAM Journal on Algebr aic Discr ete Metho ds , 6(3):345–350, 1985. 2 D. Angluin. Lo cal and global prop erties in netw orks of pro cessors (extended abstract). In STOC , pages 82–93, 1980. 3 S. Bitton, Y. Emek, T. Izumi, and S. Kutten. Message reduction in the lo cal mo del is a free lunc h. In PODC , pages 300–302, 2019. 4 A. Casteigts, Y. Métivier, J. M. Robson, and A. Zemmari. Counting in one-hop b eeping net w orks. The or etical Computer Scienc e , 780:20–28, 2019. 5 A. Casteigts, Y. Métivier, J. M. Robson, and A. Zemmari. Design patterns in b eeping algorithms: Examples, emulation, and analysis. Information and Computation , 264:32–51, 2019. 6 K. Censor-Hillel, S. Cohen, R. Gelles, and G. Sela. Distributed computations in fully-defective net w orks. Distribute d Computing , 36(4):501–528, 2023. 7 J. Chalopin, Y.-J. Chang, L. Chen, G. A. Di Luna, and H. Zhou. Brief Announcemen t: Non-Uniform Conten t-Oblivious Leader Election on Oriented Asynchronous Rings. In DISC , pages 51:1–51:7, 2025. 8 J. Chalopin, Y.-J. Chang, L. Chen, G. A. Di Luna, and H. Zhou. Con ten t-Oblivious Leader Election in 2-Edge-Connected Net w orks. In DISC , pages 21:1–21:22, 2025. 34 Eﬃcient Computing in Content-oblivious Communication 9 Y.-J. Chang, L. Chen, and H. Zhou. Bey ond 2-Edge-Connectivity: Algorithms and Imp ossibilit y for Con ten t-Oblivious Leader Election. In ITCS , pages 36:1–36:23, 2026. 10 R. Cole and U. Vishkin. Deterministic coin tossing with applications to optimal parallel list ranking. Information and Contr ol , 70(1):32–53, 1986. 11 A. Cornejo and F. Kuhn. Deploying wireless netw orks with b eeps. In DISC , pages 148–162, 2010. 12 Giusepp e A. Di Luna and Giov anni Viglietta. Computing in anonymous dynamic netw orks is linear. In 2022 IEEE 63r d A nnual Symp osium on F oundations of Computer Scienc e (FOCS) , pages 1122–1133, Den v er, CO, USA, 2022. IEEE. doi:10.1109/FOCS54457.2022.00108 . 13 Giusepp e A. Di Luna and Gio v anni Viglietta. Eﬃcient computation in congested anonymous dy- namic net works. Distribute d Computing , 38:95–112, 2025. doi:10.1007/s00446- 025- 00481- z . 14 F. Dufoulon, S. Pai, G. Pandurangan, S. V. Pemmaraju, and P . Robinson. The Message Complexit y of Distributed Graph Optimization. In ITCS , pages 41:1–41:26, 2024. 15 F. Dufoulon, G. P andurangan, P . Robinson, and M. Scquizzato. The Singular Optimality of Distributed Computation in LOCAL. In OPODIS , pages 26:1–26:17, 2025. 16 M. Elkin. A simple deterministic distributed mst algorithm with near-optimal time and message complexities. Journal of the A CM , 67(2):1–15, 2020. 17 F. F rei, R. Gelles, A. Ghazy , and A. Nolin. Con tent-oblivious leader election on rings. In DISC , pages 26:1–26:20, 2024. 18 M. Ghaﬀari and F. Kuhn. Distributed MST and Broadcast with F ew er Messages, and F aster Gossiping. In DISC , pages 30:1–30:12, 2018. 19 R. Gmyr and G. Pandurangan. Time-Message T rade-Oﬀs in Distributed Algorithms. In DISC , pages 32:1–32:18, 2018. 20 B. Haeupler, D. E. Hershko witz, and D. W a jc. Round-and message-optimal distributed graph algorithms. In PODC , pages 119–128, 2018. 21 V. King, S. Kutten, and M. Thorup. Construction and impromptu repair of an mst in a distributed net w ork with o (m) comm unication. In PODC , pages 71–80, 2015. 22 N. Linial. Lo cality in distributed graph algorithms. SIAM Journal on Computing , 21(1):193– 201, 1992. 23 Erw an Le Merrer, Anne-Marie Kermarrec, and Laurent Massoulié. Peer to p eer size estimation in large and dynamic netw orks: A comparative study . In P r o c e e dings of the 15th IEEE International Symp osium on High Performanc e Distribute d Computing (HPDC-15), Paris, F r anc e, June 19–23, 2006 , pages 7–17. IEEE, 2006. doi:10.1109/HPDC.2006.1652131 . 24 G. P andurangan, P . Robinson, and M. Scquizzato. A time-and message-optimal distributed algorithm for minim um spanning trees. A CM T r ansactions on Algorithms , 16(1):1–27, 2019. 25 D. P eleg. Distribute d c omputing: a lo c ality-sensitive appr o ach . SIAM, 2000. 26 R. V acus and I. Ziccardi. Minimalist leader election under weak communication. In PODC , 2025. 27 R. Xu, C. Gong, and Z. Xu. Pulse-laser based long-range nlos ultraviolet communication: Pulse resp onse p osition estimation and frame sync hronization optimization. In ICCC , pages 163–168, 2019. doi:10.1109/ICCChina.2019.8855803 .

Efficient Counting and Simulation in Content-Oblivious Rings

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment