A Framework for Exploring Social Interactions in Multiagent Decision-Making for Two-Queue Systems

A Frame work for Exploring Social Interactions in Multiagent Decision-Making for T wo-Queue Systems Mallory E. Gaspard, Naomi Ehrich Leonard Abstract —W e introduce a new framework for multiagent decision-making in queueing systems that leverages the agility and rob ustness of nonlinear opinion dynamics to break inde- cision during queue selection and to capture the inﬂuence of social interactions on collecti ve beha vior . Queueing models ar e central to understanding multiagent behavior in ser vice settings. Many prior models assume that each agent’s decision-making process is optimization-based and governed by rational r esponses to changes in the queueing system. Instead, we introduce an internal opinion state, driven by nonlinear opinion dynamics, that repr esents the e volving strength of the agent’ s prefer ence between two available queues. The opinion state is inﬂuenced by social interactions, which can modify purely rational r esponses. W e propose a new subclass of queueing models in which each agent’ s behavioral decisions (e.g., joining or switching queues) are determined by this evolving opinion state. W e prove a sufﬁcient parameter condition that guarantees the Marko v chain describing the evolving opinion and queueing system states reaches the Nash equilibrium of an underlying congestion game in ﬁnite expected time. W e then explore the richness of the new framework through numerical simulations that illustrate the role of social interactions and an individual’s access to system information in shaping collecti ve beha vior . I . I N T R O D U C T I O N From airport security screening lanes [1] to task allocation in distributed computing [2], queueing systems are ubiquitous in ev eryday life. Mathematical models of these systems are crucial tools for understanding an indi vidual’ s decision-making process in service settings, as well as for gaining insight into ho w these individual choices shape the collectiv e behav- ior . Decision-making in queueing systems typically inv olves agents choosing actions (e.g., joining a queue, switching queues, or leaving altogether) [3] that lead them to one of sev eral mutually exclusive physical states. Their choices are often motiv ated by a cost or reward associated with being in particular locations within the system. For example, an agent may decide to switch to a different queue if it is shorter or if another queue seems to hav e faster service [4]. Classical queueing frame works typically assume that agents are rational in deciding their actions. Foundational models The w ork in this paper was supported by a gift from W illiam H. Miller III. Research w as sponsored by the Army Research Ofﬁce and was accomplished under Grant Number W911NF2410126. The vie ws and conclusions contained in this document are those of the authors and should not be interpreted as representing the ofﬁcial policies, either expressed or implied, of the Army Research Of ﬁce or the U.S. Government. The U.S. Gov ernment is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. M. Gaspard is with the Dept. of Mechanical and Aerospace Engineering and the Dept. of Ecology and Evolutionary Biology at Princeton Univ ersity , Princeton, NJ, 08544 USA; mallory .gaspard@princeton.edu. N. E. Leonard is with the Dept. of Mechanical and Aerospace Engineering at Princeton Univ ersity , Princeton, NJ, 08544 USA; naomi@princeton.edu. such as those presented in [5]–[8] derive threshold-based decision policies from indi vidual optimizations of each agent’ s expected reward and waiting cost. Other common approaches such as those surve yed in [9] capture s trategic decision-making in a non-cooperative game formulation and characterize de- sired behavior through Nash equilibria. Howe ver , psycholog- ical studies have demonstrated that humans do not rationally respond to changes in a queueing system [10], [11]. Feelings of stress, unfair treatment, and post-decision discomfort can lead to irrational decision-making by prompting unnecessary switching [12], driving waiting time misperception [13], and supporting biased attachment to a particular option [14]. Recent approaches presented in the service systems lit- erature aim to relax the perfect rationality assumption and incorporate more realistic features of human decision-making. These approaches include models based on bounded rationality [15], as well as models that speciﬁcally account for emo- tional inﬂuences in an individual’ s decision-making process, consistent with observations from behavioral economics [16]. For example, [17] uses bounded rationality to capture imper- fections in the agent’ s reasoning about the e xpected waiting time. [18] uses an SIR model to illustrate how emotions such as patience, urgency , and friendliness spread at the population level and consequently inﬂuence agents’ action choices. Although useful, these approaches often aggregate the en vironmental and social factors that impact the decision- making process, and thus have limited capacity to describe how each agent responds individually and dynamically to these changing factors when choosing actions. T o address this, we formulate a queueing framew ork in which each agent’ s decision-making process is go verned by a continuously ev olving internal state modeled by a nonlinear dynamical system. Using the nonlinear opinion dynamics (NOD) frame work developed in [19] for fast and ﬂexible multiagent decision-making in dynamic environments (see e.g., [20], [21]), we model the agent’ s internal state as an opinion that reﬂects their le vel of preference for one or the other queue in a two-queue system. In contrast to existing framew orks, our model makes no rationality or optimization- based assumptions about each agent’ s decision-making pro- cess. Instead, any rational actions that arise are emergent phenomena as a result of the underlying opinion dynamics. Although this approach remains largely underexplored in the existing literature, it suggests promise in modeling ﬁner- grained social and environmental inﬂuences on each agent’ s decision-making process and permits a wider set of collective behaviors than its counterparts. Moreover , the NOD dynamical system has a bifurcation structure that supports multistability and hysteresis [19] and pro vides a mechanism for neutrality br eaking in settings where an agent might ﬁnd the av ailable queues to be equally desirable choices. Our primary contribution is the introduction of an agent based queueing model (ABQM) in which the probability of an agent successfully executing a maneuver to one of two av ail- able queues is directly informed by their e volving opinion state (Section II). Gi ven the range of possible collectiv e behaviors that can arise from the NOD-driven maneuver dynamics, we discuss the interpretation of queue selection as a congestion game and highlight how Nash-equilibrium queue membership conﬁgurations can emerge from this dynamic process (Section III). W e cast the ABQM as a Marko v chain and utilize standard martingale and probability arguments to prove a suf ﬁcient parameter condition that permits the system to reach a Nash equilibrium in ﬁnite expected time (Section IV). Although our condition is restrictiv e, we illustrate through numerical simulation the existence of a wider parameter re gime which leads to the same results, and we highlight scenarios in which the ABQM behaves consistently with a stochastic learning process in congestion games (Section V). W e brieﬂy conclude and mention future directions in Section VI I I . O P I N I O N - D R I V E N S E L E C T I O N B E T W E E N T W O Q U E U E S Consider a queueing system with two single-server queues ( A and B ) and a waiting pool W for agents who have not yet joined a queue. The system starts with N agents, and no new agents are allo wed to enter the system. Once an agent joins a queue, they are allowed to switch between queues until they are served. W e assume that each server operates according to a Poisson point process with rate µ q , q ∈ { A, B } that serves agents uniformly at random when a service event occurs. Agents leave the system completely after service. Notation: W e denote vectors and matrices in bold . Agent i ’ s physical state at time t is ℓ i ( t ) ∈ { 0 , ± 1 } where 0 , +1 , and − 1 index W , A , and B , respecti vely . Agent i ’ s internal state is represented by the opinion z i ( t ) ∈ [ − 1 , 1] . The physical and internal states of the N -agent group at time t are encoded in the vectors ℓ ( t ) := ( ℓ 1 ( t ) , . . . , ℓ N ( t )) ⊆ { 0 , ± 1 } N and z ( t ) := ( z 1 ( t ) , . . . , z N ( t )) ⊆ [ − 1 , 1] N , respectively . The social connections between each agent are represented by an N -node network described by the adjacency matrix A := [ a ij ] ∈ R N × N . a ij reﬂects the strength of the inﬂuence that agent j has on the internal state of agent i , and sgn( a ij ) indicates whether this inﬂuence is attractive ( + ) or repulsiv e ( − ). For simplicity , we restrict a ij ∈ { 0 , ± 1 } . The matrix inﬁnity norm is deﬁned as ∥ A ∥ ∞ := max i P j | a ij | . Lastly , for a, b ∈ R , the statement a ∧ b is equiv alent to min( a, b ) . A. Nonlinear Opinion Dynamics W e utilize nonlinear opinion dynamics (NOD) [19] to model the continuous ev olution of each agent’ s internal state. Here, the opinion quantiﬁes the agent’ s preference for one of two mutually exclusive options ( A or B ) and is inﬂuenced by self-reinforcement, social connections, and the changing en vi- ronment. Follo wing the setup for the NOD system presented in [19], agent i ’ s opinion evolv es according to the ordinary differential equation ˙ z i = − λ i z i + tanh   ω i u i z i + α i N X j =1 ,j  = i a ij ˆ z j − γ i b ( · )   (1) where u i = u 0 i + K z 2 i represents the attention that agent i pays to its o wn internal state, u 0 i ∈ R is agent i ’ s basal attention, and K ≥ 0 is the gain. ˆ z j is agent i ’ s estimate of agent j ’ s opinion. In practice, ˆ z j can be estimated by rapid analysis of sensory signals such as body orientation, gaze, and hand gestures [22], [23]. The parameter λ i ∈ R 0 , + is a damping coefﬁcient, ω i ∈ R 0 , + is a self-reinforcement weight, γ i weighs the importance of environmental information, and α i ∈ R 0 , + is the social inﬂuence weight. The function b ( · ) captures the time-v arying en vironmental input that agent i receiv es from observations about the queueing system state (e.g., queue imbalance, wait time difference, or a combination of other metrics). In the ABQM, agents mak e decisions that affect queue states at discrete times, so Equation (1) is formally a hybrid dynamical system in which the opinions ev olve in continuous time, while the environmental input b ( · ) is piecewise constant between the decision epochs. When b ( · ) = 0 and the basal attention is below a critical value, the neutral state ( z = 0 ) is a stable equilibrium of Equation (1). As shown in [19], this equilibrium can be desta- bilized when the basal attention exceeds a critical threshold u ∗ 0 or the input b ( · ) is large enough, i.e., greater than a u 0 -dependent implicit threshold. Destabilization then induces opinion formation through a pitchfork bifurcation, which also serves as a neutrality br eaking mechanism when agents regard all options as equally preferable. When the attention gain K lies belo w a critical value K ∗ , the resulting bifurcation is a supercritical pitchfork with two stable branches corresponding to a strong preference for A ( z i > 0 ) and a strong preference for B ( z i < 0 ). When the population is homogeneous , closed-form e xpressions for u ∗ 0 and K ∗ are a vailable. Here, u ∗ 0 i = λ − ασ ( A ) ω , where σ ( A ) is the largest eigen value of A . A L yapunov-Schmidt reduction further reveals that the bifurcation is a supercritical pitchfork when K < K ∗ = λ 3 3 ω . B. Agent Based Opinion-Driven Queueing Model W e model the queueing process as an agent-based model ov er a ﬁnite time horizon [0 , T ] . While each agent’ s opinion ev olves continuously according to Equation (1), agents only select actions (e.g., join or switch) at discrete decision epochs, k ∈ { 0 , . . . , M } . Decision epochs have equal spacing ∆ t D , and the time between epochs is a decision interval, [ t k , t k +1 ) . Let z k i denote the opinion of agent i at t k . Similarly , let ℓ k i represent agent i ’ s queue membership at t k , and b k be input from the en vironment at t k . The global state of the system at epoch k is ( z k , ℓ k ) . The opinion state ODEs are continuously integrated over the decision interval [ t k , t − k +1 ] where t − k +1 denotes the instant before the next epoch. The updated opinion value at the end of the interval is z k +1 i . Actions are executed immediately before the start of the next decision interval. The outcomes are pr obabilistic , and the success probability is directly informed by z k +1 i . At the end of the decision interv al (just prior to t k +1 ), every agent independently draws one value U k i ∼ Unif (0 , 1) . If agent i is in W at step k , then they will join a queue whenever U k i ≤ | z k +1 i | (2) where sgn( z k +1 i ) matches the index of the destination queue. If i ∈ A k ∪ B k , they switch queues whenever U k i ≤ | z k +1 i | 1 { sgn( z k +1 i )  = ℓ k i } . (3) The queue membership ℓ k +1 is determined by the action outcomes, and b k +1 is updated using the resulting membership counts. The process repeats through all decision intervals. Since each agent’ s movement probability depends only on the current state ( z k , ℓ k ) , and we use a time-explicit numerical integration scheme on the system deﬁned in (1) to obtain z k +1 , the joint state X k := ( z k , ℓ k ) e volves as a time-homogeneous Markov chain on the state space S := [ − 1 , 1] N × { 0 , ± 1 } N . This representation allows us to use techniques from stochastic processes to analyze collecti ve behavior . The model also has a meaningful game theoretic interpretation. In particular, when b k captures queue membership dif ference, and once W is empty , ℓ k can be interpreted as a strategy proﬁle in which each agent chooses a queue and incurs a cost based on the resulting queue membership counts. This naturally describes an underlying congestion game , and we aim to characterize pa- rameter regimes and properties of the social network where the NOD-driv en, stochastic movement dynamics lead to emergent queue conﬁgurations corresponding to the Nash equilibria. I I I . Q U E U E S E L E C T I O N A N D C O N G E S T I O N G A M E S W e examine the connection between the NOD-driv en ABQM and congestion games in the two-queue setting. Con- gestion games capture scenarios where agents must str ate gi- cally choose from a set of av ailable resources with usage- dependent costs [24]. In our setting, each queue is an a vailable resource with membership-dependent cost. Rosenthal showed in [25] that congestion games are potential games [26] and thus admit at least one pure-strategy Nash equilibrium. Deﬁnition 1 (Queue Selection Congestion Game) . Consider N agents choosing between two queues, A and B . Let n A and n B be the number of agents in A and B , respectively . Denote the r esour ce costs for A and B as c A ( n A ) and c B ( n B ) , and assume the y are nondecr easing functions of their arguments. Each agent selects a queue, and the r esulting membership vector ℓ ∈ {− 1 , 1 } N is a Nash equilibrium if no agent can unilaterally decr ease their cost by switching queues. When resource costs are the same across queues, it is straightforward to determine the Nash equilibrium analytically . Proposition 1. Consider a congestion game where N ag ents ar e choosing between two queues with identical r esour ce costs, c A = c B = c ( · ) . A queue membership conﬁguration ( n A , n B ) is a Nash equilibrium if and only if | n A − n B | ≤ 1 . Pr oof. Suppose | n A − n B | > 1 . W ithout loss of generality , assume n A > n B + 1 , and consider an agent in A . If the agent in A switches to B , then the queue membership counts become n A − 1 and n B + 1 , and the agent incurs a cost of c ( n B + 1) . Since n A > n B + 1 and the resource costs are nondecreasing, it follo ws that c ( n A ) ≥ c ( n B + 1) , with a strict inequality if c ( · ) is increasing. Because the agent unilaterally decreased their cost by switching, this conﬁguration cannot be a Nash equilibrium. Con versely , suppose | n A − n B | ≤ 1 , and consider an agent in A . If the agent switches to B , then the ne w membership counts become n A − 1 and n B + 1 . Since n B + 1 ≥ n A and the costs are nondecreasing, it follows that c ( n B + 1) ≥ c ( n A ) . Thus, the agent cannot unilaterally decrease their cost by switching to B . The argument is symmetric for agents in B switching to A , so the conﬁguration ( n A , n B ) is a Nash equilibrium. Unlike standard congestion game models, our agents do not directly optimize the resource costs. When an agent in the more expensi ve queue is sensitive to environmental input and b ( · ) captures queue membership imbalances, the input bias may shift their internal state and increase their switching probability . As such, an equilibrium queue conﬁguration may emer ge as a result of the NOD-driv en decision-making dynam- ics. In this sense, our ABQM can be interpreted as a stochastic learning process whose dynamics may drive the system tow ard the Nash equilibrium of the underlying congestion game. In the next section, we establish a sufﬁcient condition in volving the NOD parameters and the social network structure to provide ﬁnite expected Nash hitting time guarantees. I V . S U FFI C I E N T C O N D I T I O N F O R H I T T I N G T H E S E T O F N A S H C O N FI G U R A T I O N S I N F I N I T E T I M E Determining when the joint-state Markov chain {X k } k ≥ 0 reaches a Nash conﬁguration in ﬁnite expected time is crucial to understanding when our ABQM can be viewed as a stochas- tic learning mechanism and how systems of living agents may naturally settle in optimal conﬁgurations, even when they are not explicitly optimizing anything. Let ∆ Q := n A − n B denote the signed queue imbalance. At a Nash equilibrium, the magnitude of the imbalance is ∆ Q ∗ := ( 0 , if N is ev en 1 , if N is odd . Outside of N , the minimum queue imbalance magnitude is b # := min X / ∈N { b k } = 2 . Deﬁnition 2 (Nash Conﬁguration Band) . Consider the con- gestion game corr esponding to selection between two identical queues. Let n A , n B , n W be the number of agents in A , B , and W r espectively . Denote the Markov chain state as X . The band containing the Nash conﬁgurations for this game is N := {X ∈ S : n W = 0 and | ∆ Q | ≤ ∆ Q ∗ } . (4) W e assume that ev ery agent has complete information about the system. They can perfectly compute ˆ z j for all other agents and they ha ve exact kno wledge of the signed queue imbalance b k = ∆ Q k at each epoch. Lemma 1 (Input Sign Alignment with the Cheaper Queue) . Let σ k := − sgn( b k ) . If γ i b # ≥ ω i ( u 0 i + K ) + α i ∥ A ∥ ∞ + m (5) is satisﬁed for some m > 0 and all agents i , then for any state X k / ∈ N , η i ( t ) := σ k   ω i u i z i + α i N X j =1 ,j  = i a ij ˆ z j − γ i b k   ≥ m holds over [ t k , t k +1 ) . That is, the input to tanh( · ) in the opinion formation ODE is aligned with the cheaper queue. Pr oof. Since | z i | ≤ 1 , u i ≤ ( u 0 i + K ) . Also, | ˆ z j | ≤ 1 yields | P N j =1 ,j  = i a ij ˆ z j | ≤ ∥ A ∥ ∞ . By deﬁnition of σ k , σ k ( − γ i b k ) = γ i | b k | . So, η i ( t ) satisﬁes η i ( t ) ≥ γ i | b k | − ω i ( u 0 i + K ) − α i ∥ A ∥ ∞ . When X k / ∈ N , | b k | ≥ b # , thus over [ t k , t k +1 ) , η i ( t ) ≥ γ i b # − ω i ( u 0 i + K ) − α i ∥ A ∥ ∞ ≥ m. Lemma 2 (Uniform Opinion Magnitude Bound) . Assume the condition in Lemma 1 holds for all i and some m . Recall ∆ t D := t k +1 − t k and deﬁne β i := − e − λ i ∆ t D + tanh( m ) λ i (1 − e − λ i ∆ t D ) . If ∆ t D satisﬁes ∆ t D > max i  1 λ i ln  λ i + tanh( m ) tanh( m )  (6) then, ther e exists a uniform constant β > 0 such that σ k z k +1 i ≥ β and thus | z k +1 i | ≥ β for all i . Pr oof. T o avoid treating the sign cases for b k separately , recall σ k := − sgn( b k ) and deﬁne y i ( t k ) = σ k z k i . Since b k is constant on [ t k , t k +1 ) , σ k is also constant ov er the interv al. Let x i denote the input of tanh( · ) and tanh( m ) = ν . Since tanh is odd, σ k tanh( x i ) = tanh( σ k x i ) = tanh( η i ) . By the condition in Lemma 1, we have that tanh( η i ( t )) ≥ ν . So, ˙ y i + λ i y i ≥ ν . W e multiply both sides of the inequality by the integrating f actor e λ i t . Integrating over the decision interv al [ t k , t k +1 ) and dividing both sides by e λ i t k +1 yields y i ( t − k +1 ) ≥ e − λ i ∆ t D y i ( t k ) + ν λ i (1 − e − λ i ∆ t D ) . (7) Using the bound y i ( t k ) ∈ [ − 1 , 1] ∀ k , we obtain y i ( t − k +1 ) ≥ − e − λ i (∆ t D ) + ν λ i (1 − e − λ i ∆ t D ) = β i . (8) Thus, σ k z k +1 i ≥ β i . Deﬁne β := min i { β i } . Bounding β i away from zero for all agents yields the sufﬁcient condition ∆ t D > max i  1 λ i ln  λ i + ν ν  . (9) When the condition abov e is satisﬁed, ev ery agent has σ k z k +1 i ≥ β and | z k +1 i | ≥ β , ∀ i. (10) Thus, when the decision interval is sufﬁciently long, every agent’ s opinion is uniformly bounded away from neutrality in fa vor of the cheaper queue. Theorem 1 (Nash Hitting in Finite Expected T ime) . Assume X 0 / ∈ N . Let W k denote the set of agents in W at k . Suppose ther e exists a constant ζ > 0 such that when n k W > 0 , ther e is at least one agent i ∈ W k with | z k +1 i | ≥ ζ . If ther e exists an m > 0 such that γ i b # ≥ ω i ( u 0 i + K ) + α i ∥ A ∥ ∞ + m (11) holds for all i , the decision interval duration ∆ t D satisﬁes the condition in Lemma 2, and ther e exists a constant ψ ∈ (0 , 1) such that | z k +1 i | ≤ ψ , for all agents in the mor e expensive queue when X k / ∈ N and n k W = 0 , then the Markov chain {X k } k ≥ 0 will hit N in ﬁnite expected time. Using standard drift arguments [27] and probability tech- niques [28], we prove Theorem 1 in two parts. W e ﬁrst bound the expected number of epochs to empty W . Then, we bound the e xpected number of additional epochs to reach N once W is empty . Pr oof. Deﬁne the natural ﬁltration F k = σ ( X 0 , . . . , X k ) . Consider the queueing system starting from an initial state X 0 / ∈ N with n 0 W ≤ N . Let p k i = | z k +1 i | denote the probability of agent i ∈ W k joining a queue during the decision interval [ t k , t k +1 ) , where sgn( z k +1 i ) indicates the destination. Deﬁne τ W := inf { k ≥ 0 | n k W = 0 } and J k W as a sum of Bernoulli random variables denoting the number of agents that leave the waiting pool during the decision interv al. Since agents cannot join W , n k +1 W = n k W − J k W and the number of agents in W is monotonically nonincreasing. By assumption, there is at least one i ∈ W k with p k i ≥ ζ , so E [ n k +1 W − n k W | F k ] = − E [ J k W | F k ] = − X i ∈ W k p k i ≤ − ζ . Deﬁne the stopped process W k = n k ∧ τ W W + ζ ( k ∧ τ W ) . W k is a supermartingale, and thus E [ n k ∧ τ W W + ζ ( k ∧ τ W )] ≤ n 0 W . Since E [ n k ∧ τ W W ] ≥ 0 and n 0 W ≤ N , ζ E [ k ∧ τ W ] ≤ N . k ∧ τ W approaches τ W almost surely as k → ∞ , so by the Monotone Con ver gence Theorem, we conclude E [ τ W ] ≤ N /ζ . (12) Now , consider the process starting at τ W . Deﬁne τ N := inf { k ≥ τ W | X k ∈ N } . Let e Q k denote the set of agents in the mor e expensive queue at epoch k and | e Q k | be its cardinality . Let M k be the number of switches from e Q k and s k be the number of switches from e Q k needed to hit N by the next epoch. Gi ven F k , M k is a sum of conditionally independent Bernoulli random variables, and by the conclusion of Lemma 2, and the assumption that | z k +1 i | ≤ ψ , ∀ i ∈ e Q k when X k / ∈ N and n k W = 0 , P ( X k +1 ∈ N | F k ) ≥  | e Q k | s k  β s k (1 − ψ ) | e Q k |− s k . Thus, the probability that the Markov chain hits N by the ne xt epoch is uniformly bounded below as p N := min 1 ≤ q ≤ N min 1 ≤ s ≤ q  q s  β s (1 − ψ ) q − s . (13) Let Υ = τ N − τ W . So, for τ W < k < τ N , P ( X k +1 ∈ N | F k ) ≥ p N . Further , ∀ m ≥ 0 , P (Υ > m + 1 | F τ W + m ) ≤ (1 − p N ) 1 Υ >m . T aking the expectation of both sides yields P (Υ > m + 1) ≤ (1 − p N ) P (Υ > m ) . (14) Induction on m giv es P (Υ > m ) ≤ (1 − p N ) m . By the tail- sum formula and geometric series, we hav e E [Υ] = ∞ X m =0 P (Υ > m ) ≤ ∞ X m =0 (1 − p N ) m = 1 p N . (15) Combining equations (12) and (15), we conclude E [ τ N ] ≤ N ζ + 1 p N < ∞ . (16) The sufﬁcient condition presented in the statement of The- orem 1 is restrictiv e, and there are man y parameter re gimes in which it may be violated but the chain still hits N in ﬁnite time. Further, we conjecture that anti-cooperativ e social network structures may ev en support persistence in N , i.e., provide more than just ﬁnite-time hitting. W e in vestigate these notions in Section V through numerical simulations. V . N U M E R I C A L E X P E R I M E N T S In all experiments, we consider N = 10 identical agents choosing between identical queues A and B . All agents are initially in W (i.e., ℓ 0 = 0 ), and we set both µ A = µ B = 0 to focus on the simplest queue-selection congestion game. W e run the ABQM simulation over a ﬁnite time horizon T = 30 with decision interval ∆ t D = 0 . 1 . W e integrate the NOD ODEs using a Runge-Kutta-45 (RK-45) scheme with timestep ∆ t = 0 . 01 . For all agents, we ﬁx γ i = 0 . 5 , ω i = 1 , λ i = 1 , and α i = 0 . 2 . This regime corresponds to agents whose opinions are shaped by joint inﬂuences from social connec- tions, self-reinforcement, and queue imbalance information. The critical parameters are K ∗ = 1 / 3 and u ∗ 0 i = 1 for the anti-cooperativ e netw ork. T aking u 0 i = 1 . 25 and K = 0 . 25 for all i places the system slightly above the bifurcation point and ensures a supercritical pitchfork. For each agent, z 0 i is sampled from a mean-zero normal distribution with standard deviation 0 . 1 . Our simulations aim to explore two main questions: 1) Are cooperativ e or anti-cooperati ve network structures more conducive to stabilizing queue balance? 2) How does limited access to queue information affect the system’ s ability to hit and persist in N ? W e consider fully cooperative networks ( A + := 11 T ) and anti-cooperativ e networks ( A − := − 11 T ). T o model incom- plete information, we set b k = 0 , ∀ k for N b = ρN randomly T ABLE I S U MM A RY S TA T I ST I C S : C O O P ER A T I V E N E T WO R K ρ τ N σ ( τ N ) r S T N 1: ρ = 0 2.96 1.12 1 21.96 0.06 2: ρ = 0 . 2 3.01 1.17 1 18.01 0.12 3: ρ = 0 . 4 2.95 1.13 1 12.56 0.22 4: ρ = 0 . 6 3.45 1.19 0.74 1.58 0 5: ρ = 0 . 8 2.8 1.05 0.12 0.47 0 6: ρ = 1 1.958 0.50 0.003 0.12 0 T ABLE II S U MM A RY S TA T I ST I C S : A N T I -C O O PE R A T I V E N E T WO R K ρ τ N σ ( τ N ) r S T N 1: ρ = 0 3.01 1.25 1 4.51 16.92 2: ρ = 0 . 2 3.07 1.21 1 4.53 17.03 3: ρ = 0 . 4 3.17 1.29 1 3.88 17.68 4: ρ = 0 . 6 3.43 1.63 0.9 2.05 20.46 5: ρ = 0 . 8 5.74 4.00 0.76 0.51 22.50 6: ρ = 1 7.32 4.53 0.86 0.04 22.56 selected agents, where ρ ∈ { 0 , 0 . 2 , 0 . 4 , 0 . 6 , 0 . 8 , 1 } . F or each network structure and ρ , we run 10 , 000 trials and record the av erage hitting time ( τ N ), its standard de viation ( σ ( τ N )) , fraction of trials that hit N ( r ), av erage number of queue switches per agent ( S ), and the average amount of time that the system stays at N after the last recorded hit ( T N ). T ables I and II present these statistics (rounded to two decimal places). Discussion: Data in T ables I and II, indicate that the network structure itself does not hav e a signiﬁcant inﬂuence on whether the agents are able to hit a Nash conﬁguration in ﬁnite time, but anti-cooperativ e network structures are more conduci ve to queue balance stabilization ov erall. Even when less than half of the population (e.g., ρ ≥ 0 . 6 ) has access to imbalance information, the anti-cooperati ve agents still reach a Nash conﬁguration at a high rate, while cooperati ve agents in limited information regimes often fail to hit N before T . Persistence in N is heavily inﬂuenced by both network structure and information access. Cooperative agents do not exhibit any notable persistence in N , while anti-cooperativ e agents always persist in N and the duration incr eases as information access decreases. Figures 1 and 2 illustrate this behavior when ρ = 0 . 6 . Despite only four agents having Fig. 1. Queue lengths (left) and per-agent opinions (right) ov er time for 10 anti-cooperativ e agents. 6 agents lack queue imbalance information. Agents polarize quickly , do not switch queues, and easily settle in N . Fig. 2. Queue lengths (left) and per-agent opinions (right) ov er time for 10 fully-cooperativ e agents when 6 agents lack imbalance information. After initial herding near t = 4 where all agents end up in A , the agents rapidly settle into tw o distinct groups based on their access to imbalance information. information access, the anti-cooperative agents quickly po- larize into equal queue lengths without any switching. This illustrativ e example and the data in T able II suggest that anti- cooperative agents can successfully learn a Nash equilibrium of the underlying queue-selection congestion game in ﬁnite time. On the other hand, a group of cooperative agents herd around t = 4 when four agents simultaneously move from B to A , resulting in all agents ending up in the same queue. Then, the four informed agents quickly react by settling in B to avoid the high imbalance cost in A . This behavior reﬂects a fundamental tradeoff between responding rationally to environmental information and maintaining commitment to social connections, particularly in cooperativ e networks. V I . C O N C L U S I O N A N D F U T U R E D I R E C T I O N S W e introduced an agent-based queueing framework where each agent’ s actions are driv en by a continuously e volving internal opinion state. W e proved a sufﬁcient condition under which the system hits a Nash equilibrium queue conﬁguration in ﬁnite expected time, and we numerically explored how so- cial network structure and information access inﬂuence agent behavior . Our results suggest that network structure has limited impact on the group’ s ability to reach a Nash equilibrium in ﬁnite time, but it inﬂuences post-hitting behavior . Anti- cooperativ e networks support persistence in a Nash conﬁg- uration after the ﬁnal Nash hit. Further , as information access decr eases , anti-cooperati ve agents tend to settle into persistent polarization more quickly , while cooperative agents herd and split into groups reﬂecting their access to queue information. Future studies will analyze mixed-sign social networks and extend the Markov chain analysis to non-identical queues. W e will also expand the frame work to accommodate priority service ordering and compare our results to real data. A C K N O W L E D G M E N T The authors thank V aibhav Sri vasta va for the helpful dis- cussion about the supermartingale arguments in Section IV . R E F E R E N C E S [1] X. Liu, Z. Y u, and W . Ma, “Deep reinforcement learning based approach for performance evaluation of airport security screening lanes, ” in 2025 IEEE 7th International Confer ence on Civil Aviation Safety and Information T ec hnology (ICCASIT) . IEEE, 2025, pp. 675–682. [2] M. Mitzenmacher, “The po wer of two choices in randomized load balancing, ” IEEE Trans. P arallel Distrib . Syst. , vol. 12, no. 10, pp. 1094– 1104, 2002. [3] J. F . Shortle, J. M. Thompson, D. Gross, and C. M. Harris, Fundamentals of Queueing Theory . John W iley & Sons, 2018. [4] E. Koenigsber g, “On jockeying in queues, ” Management Science , vol. 12, no. 5, pp. 412–436, 1966. [5] P . Naor , “The regulation of queue size by levying tolls, ” Econometrica: Journal of the Econometric Society , pp. 15–24, 1969. [6] N. Chr, “Individual and social optimization in a multiserver queue with a general cost-beneﬁt structure, ” Econometrica: Journal of the Econometric Society , pp. 515–528, 1972. [7] N. M. Edelson and D. K. Hilderbrand, “Congestion tolls for poisson queuing processes, ” Econometrica: Journal of the Econometric Society , pp. 81–92, 1975. [8] W . Lin and P . Kumar , “Optimal control of a queueing system with two heterogeneous servers, ” IEEE Tr ansactions on Automatic Control , vol. 29, no. 8, pp. 696–703, 1984. [9] R. Hassin and M. Havi v , T o Queue or Not to Queue: Equilibrium Behavior in Queueing Systems . Springer Science & Business Media, 2003, vol. 59. [10] D. H. Maister, “The psychology of waiting lines, ” Harvard Business School, Background Note 684-064, 1984. [11] R. Zhou and D. Soman, “Looking back: exploring the psychology of queuing and the effect of the number of people behind, ” Journal of Consumer Research , vol. 29, no. 4, pp. 517–530, 2003. [12] Y .-N. Li, Z. Cui, J. Ji, and J. W ei, “When do people switch queues? an empirical study of discretionary queue switching at a physical examination center , ” Decis. Sci. , vol. 56, no. 4, pp. 361–382, 2025. [13] R. C. Larson, “Or forum—perspectiv es on queues: social justice and the psychology of queueing, ” Operations Researc h , vol. 35, no. 6, pp. 895–905, 1987. [14] Z. Carmon, K. W ertenbroch, and M. Zeelenberg, “Option attachment: when deliberating makes choosing feel lik e losing, ” Journal of Consumer Resear ch , vol. 30, no. 1, pp. 15–29, 2003. [15] H. A. Simon, “ A behavioral model of rational choice, ” The Quarterly Journal of Economics , vol. 69, no. 1, pp. 99–118, 1955. [16] D. Ariely , “The end of rational economics, ” Harvard Business Review , vol. 87, no. 7-8, pp. 78–84, 2009. [17] T . Huang, G. Allon, and A. Bassamboo, “Bounded rationality in service systems, ” Manufacturing & Service Operations Management , vol. 15, no. 2, pp. 263–279, 2013. [18] J. Xue, M. Zhang, and H. Y in, “ A personality-based model of emotional contagion and control in crowd queuing simulations, ” ACM T rans. on Modeling and Computer Simulation , vol. 33, no. 1-2, pp. 1–23, 2023. [19] A. Bizyae va, A. Franci, and N. E. Leonard, “Nonlinear opinion dynamics with tunable sensitivity , ” IEEE T ransactions on Automatic Control , vol. 68, no. 3, pp. 1415–1430, 2022. [20] C. Cathcart, M. Santos, S. Park, and N. E. Leonard, “Proacti ve opinion- driv en robot navigation around human movers, ” in 2023 IEEE/RSJ International Confer ence on Intelligent Robots and Systems (IR OS) . IEEE, 2023, pp. 4052–4058. [21] H. Hu, J. F . Fisac, N. E. Leonard, D. Gopinath, J. DeCastro, and G. Rosman, “Think deep and fast: Learning neural nonlinear opinion dynamics from inv erse dynamic games for split-second interactions, ” in 2025 IEEE International Confer ence on Robotics and Automation (ICRA) . IEEE, 2025, pp. 16 678–16 684. [22] S. R. Langton, R. J. W att, and V . Bruce, “Do the eyes have it? Cues to the direction of social attention, ” T r ends in Cognitive Sciences , vol. 4, no. 2, pp. 50–59, 2000. [23] G. Csibra and G. Gergely , “‘Obsessed with goals’: functions and mechanisms of teleological interpretation of actions in humans, ” Acta Psychologica , vol. 124, no. 1, pp. 60–78, 2007. [24] T . Roughgarden, Selﬁsh Routing and the Price of Anarchy . MIT Press, 2005. [25] R. W . Rosenthal, “ A class of games possessing pure-strategy nash equilibria, ” International Journal of Game Theory , vol. 2, no. 1, pp. 65–67, 1973. [26] D. Monderer and L. S. Shapley , “Potential games, ” Games and Economic Behavior , vol. 14, no. 1, pp. 124–143, 1996. [27] S. P . Meyn and R. L. T weedie, Mark ov Chains and Stochastic Stability . Springer Science & Business Media, 2012. [28] D. W illiams, Pr obability with Martingales . Cambridge Univ ersity Press, 1991.

A Framework for Exploring Social Interactions in Multiagent Decision-Making for Two-Queue Systems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment