Adaptive Robust Game-Theoretic Decision Making for Autonomous Vehicles

Adapti ve Rob ust Game-Theoretic Decision Making for Autonomous V ehicles Gokul S. Sankar and K youngseok Han Abstract — In a typical trafﬁc scenario, autonomous vehicles are requir ed to share the road with other road participants, e.g., human driven vehicles, pedestrians, etc. T o successfully na vigate the trafﬁc, an adaptive robust level- k framework can be used by the autonomous agents to categorize the agents based on their depth of strategic thought and act accordingly . Howev er , mismatch between the vehicle dynamics and its predictions, and improper classiﬁcation of the agents can lead to undesirable behavior or collision. Rob ust appr oaches can handle the uncer - tainties, however , might result in a conser vative beha vior of the autonomous vehicle. This paper pr oposes an adaptive rob ust decision making strategy for autonomous vehicles to handle model mismatches in the prediction model while utilizing the conﬁdence of the driver behavior to obtain less conservati ve actions. The effectiveness of the proposed approach is validated for a lane change maneuver in a highway driving scenario. I . I N T RO D U C T I O N Connected and automated v ehicles (CA Vs) is a disrupti ve transportation technology that has the potential to change our habits and provide great safety beneﬁts. Despite man y recent advances in CA V technologies, full automation of the vehicles that provides better or at least as good driving proﬁciency as compared to the human driv ers are still ﬂawed to be deployed in the market [ 1 ]. One of the most signiﬁcant challenges is to plan the motion in mix ed trafﬁc scenarios, where the autonomous vehicles (A Vs) coe xist with the human driv en vehicles (HVs), bike riders and pedestrians [ 2 ]. In particular , describing the human decision making process is dif ﬁcult since humans do not alw ays exhibit optimized behavior due to limited rationality [3]. The interactions between agents, either vehicle-to-vehicle or vehicle-to-pedestrian, in a mixed trafﬁc scenario, are accounted by considering the stochastic reachable sets of the vehicles [ 4 ]. The desired path of each agent is assumed to be known, ho wev er , might be imperfect. Alternativ ely , game-theoretic framework can be used to m ake strategic decisions that handle interactions in the multi agent mixed trafﬁc scenario. In the le vel-k game-theoretic approach, the agents are categorized into hierarchical structure of their cognitive abilities [ 5 ]. The interactions are modeled taking into account the rationality of the other agents. This might lead to a less conservati ve action the reachable set method. Pre viously , the game-theoretic modeling approach has been applied to highway driving scenario in [ 6 ], and to an unsignalized intersection in [ 7 ], [ 8 ]. A multi-model strategy is used that assigns an agent to multiple le vels in the hierarchical structure Gokul S. Sankar and Kyoungseok Han are with the Depart- ment of Aerospace Engineering, Univ ersity of Michigan, Ann Ar- bor , MI, USA. Emails: ggowrisa@umich.edu (G. Sankar) and kyoungsh@umich.edu (K. Han) with certain conﬁdence from the perspectiv e of the A Vs. The estimate of the driver level and its probability of being at a certain lev el is updated at ev ery time step by the autonomous agents. Although these approaches effecti vely describe rational decision making for the HVs, uncertainties due to the simple vehicle dynamic model used within the framework, and an improper estimation of the driver level of the other agents, are not considered. These uncertainties in the position of the HVs in the decision making process might lead to collision. A high ﬁdelity dynamic model can be used to eliminate certain lev el of uncertainty but is accompanied by increased computational burden. Also, most likely , the interactions between agents in a giv en trafﬁc scenario might be short for the A Vs to estimate an accurate driv er model. In [ 4 ], the uncertainty in the interactions between the agents are described by probabilistic deviations from the desired path, ho wev er , as mentioned earlier, the desired path assumed might be inaccurate. Robust model-based approaches namely , feedback min- max model predictiv e control (MPC) [ 9 ], tube MPC [ 10 ] and constraint-tightening methodology [ 11 ] hav e been suggested and well established for the last couple of decades. These robust approaches provide constraint satisfaction guarantees for the uncertainties originating from a bounded and compact disturbance set. The constraint tightening approach has no additional online computational load and been successfully implemented in real time automoti ve applications [ 12 ], [ 13 ] and aerospace applications [ 11 ]. Ho wev er , due to its simplicity in handling the discrete input actions considered in this work, min-max strategy is employed to provide robustness to the uncertainties described above. The min-max strategy considers the worst-case disturbance af fecting the behavior/performance of the system and provides control actions to mitigate the effect of the worst-case disturbance. Therefore, these control actions might lead to a conserv ativ e behavior if the disturbance set size that should be handled is large. In [ 14 ], a conditional value-at-risk objectiv e function was used to account for the model mismatch. A constant conﬁdence level is introduced, which describes the degree of aggressiv eness of the HV . Ho wev er , a priori assumption on the conﬁdence level of the human drivers may not be realistic. T o provide robustness against all possible trafﬁc situations, a conservati ve driving policy has been suggested for the A V by [ 15 ], [ 16 ], Howe ver , conservati ve actions can hav e adverse effects on the trafﬁc, for instance, lead to a road congestion, disharmony with the other road participants. Less conserv ativ e motion can be planned for the A V when the behavior of human vehicles are predicted [ 17 ]. A A1 A2 A3 B1 B2 B3 B 1 2 3 Fig. 1: Sample lane changing sequences that can be chosen by the autonomous vehicle (blue) based on the motion of other non-autonomous vehicles (red) are { ( A, A 1) , ( A 1 , B 1) , ( B 1 , B ) } , { ( A, A 2) , ( A 2 , B 2) , ( B 2 , B ) } and { ( A, A 3) , ( A 3 , B 3) , ( B 3 , B ) } . In this paper, an adaptiv e robust approach has been proposed to provide less conservati ve actions to the A Vs to perform a desired maneuver in the multi agent trafﬁc scenario. The min-max approach anticipates the uncertainties originating from a disturbance set while computing the control action. The adaptiv e strategy proposed in this paper modiﬁes the size of the disturbance sets of the other interacting v ehicles in accordance to the belief of the aggressiv eness of the corresponding human drivers. The proposed approach can provide ‘balanced’ control actions for the A V which is more conserv ativ e than the nominal (non-robust) strategy to handle model mismatch and on the other hand, less conservati v e than traditional robust approach by e xploiting the conﬁdence on the estimated driv er model. The proposed approach is validated through simulation studies for performing a lane changing maneuver on a highway driving scenario with multiple agents. A. Notation The symbol Z [ a, b ] denotes a set of consecuti ve integers from a to b and 2 Z + denotes set of positiv e ev en integers; ∅ denotes an empty set. For a vector x , x > 0 denotes element- wise inequality . The operator ⊕ denotes the Minko wski addition, deﬁned for the sets A and B as A ⊕ B : = { a + b | a ∈ A ∀ b ∈ B } . I I . P RO B L E M S T ATE M E N T Consider a highway driving traf ﬁc scenario shown in the Fig. 1 with human driven vehicles shown in red and the autonomous vehicle in blue. The goal of the A V is to perform a lane change. A subset of the possible lane changing trajectories are sho wn in the Fig. 1. The trajectory { ( A, A 1) , ( A 1 , B 1) , ( B 1 , B ) } is considered to be an aggres- si ve maneuver as the lane change occurs closer to the human vehicle 3 and might lead to collision. On the other hand, { ( A, A 3) , ( A 3 , B 3) , ( B 3 , B ) } is considered conservati ve as the A V completes the maneuver by performing the lane change farthest from the HV 3. Being conservati ve can lead to road congestion and disharmony as mentioned earlier . Therefore, it is desirable to develop an adapti ve control strategy for the A V that is able to perform the desired maneuver with reduced conservatism whilst planning a safe motion to avoid collision as in { ( A, A 2) , ( A 2 , B 2) , ( B 2 , B ) } . The strategy should be able to modify the behavior of the A V according to the conﬁdence on the estimated behavior of the interacting vehicles. I I I . V E H I C L E DY N A M I C M O D E L The vehicle dynamics are represented by the following discrete model [18] x ( t + 1) = x ( t ) + v ( t ) cos ( ψ ( t ) + β ( t )) ∆ t + w x ( t ) (1a) y ( t + 1) = y ( t ) + v ( t ) sin ( ψ ( t ) + β ( t )) ∆ t + w y ( t ) (1b) ψ ( t + 1) = ψ ( t ) + v ( t ) l r sin ( β ( t )) ∆ t (1c) v ( t + 1) = v ( t ) + a ( t ) ∆ t (1d) β ( t ) = arctan  l r l r + l f tan ( δ f ( t ))  , (1e) where t denotes the discrete time instant; the pair ( x ( t ) , y ( t )) represent the global position of the center of mass of the vehicle; the vehicle’ s speed is denoted by v ( t ) ; β ( t ) is the angle of v ( t ) with respect to the longitudinal axis of the vehicle; ψ ( t ) denotes the vehicles yaw angle (the angle between the vehicles heading direction and the global x- direction); a ( t ) denotes the vehicles acceleration at time t ; ∆ t denotes the time step size; δ f ( t ) represents the front steering angle; and l f and l r are the distance of the center of the mass of the vehicle to the front and rear axles, respecti vely; w x ( t ) and w x ( t ) denote the uncertainty in the position of the center of mass, respectively . It is assumed that the uncertainties originate from a closed and compact set deﬁned as W : =  w = ( w x , w y ) | ζ w ≤ θ , ζ ∈ R a × 2 , θ ∈ R b  , (2) where a, b ∈ 2 Z + . The disturbance set is assumed to contain the origin. Furthermore, it is assumed that the rear wheels cannot be steered. Therefore, the control input to the model (1) , represented by γ ( t ) = ( a ( t ) , δ f ( t )) , is the acceleration and front steering angle pair . I V . R O B U S T G A M E - T H E O R E T I C D E C I S I O N M A K I N G At each time instant, each vehicle selects an input pair from the ﬁnite action set, Γ = { (0 , 0) , (0 , δ f , nom ) , (0 , − δ f , nom ) , ( a nom , 0) , ( − a nom , 0) , ( a max , 0) , ( − a max , 0) , ( a nom , δ f , max ) , ( a nom , − δ f , max ) } , where a nom , δ f , nom and a max , δ f , max are the nominal and maximum acceleration, front steer angle, respectiv ely . The inputs pairs in Γ correspond to the actions, { “maintain”, “turn slightly left”, “turn slightly right”, “accelerate”, “decelerate”, “maximum acceleration”, “maximum deceleration”, “turn left and accelerate”, “ turn right and accelerate” } , respectiv ely . The input pair to be applied at e very time step is decided based on optimizing a re ward function. A. Rewar d function The decision making process of the vehicle in choosing the optimal input pair follows a receding horizon strategy . The nominal model used within the prediction horizon is giv en by x t + j +1 = x t + j + v t + j cos ( ψ t + j + β t + j ) ∆ t (3a) y t + j +1 = y t + j + v t + j sin ( ψ t + j + β t + j ) ∆ t (3b) ψ t + j +1 = ψ t + j + v t + j l r sin ( β t + j ) ∆ t (3c) v t + j +1 = v t + j + a t + j ∆ t (3d) β t + j = arctan  l r l r + l f tan ( δ f , t + j )  , (3e) where j ∈ Z [0 , N − 1] represents the prediction step, and γ t + j = ( a t + j , δ f , t + j ) denotes the input pair applied to (3) at a prediction step j . A sequence of actions, γ t = { γ t , γ t +1 , · · · , γ t + N − 1 } , is chosen that maximizes a cumu- lativ e rew ard given by R ( γ t ) = N − 1 X j =0 λ j R t + j , (4) where R t + j is the stage reward at a prediction step j determined at time step t for an input, γ t + j ∈ Γ ; λ ∈ [0 , 1] is the discount factor . By the receding horizon strategy , the input applied to (1) , γ ( t ) , is the ﬁrst element of γ t ∗ =  γ ∗ t , γ ∗ t +1 , · · · , γ ∗ t + N − 1  is applied at each time instant t . The stage reward at a prediction step j , R t + j , is deﬁned as R t + j = α T φ t + j , (5) where φ t + j = { φ 1 , t + j , φ 2 , t + j , · · · , φ m, t + j } is the feature vector at step j and the weights for these features are in α = { α 1 , α 2 , · · · , α m } , in which α i > 0 , ∀ i ∈ Z [1 , m ] . For the lane changing scenario in Fig. 1, the features considered are described below . Rectangular outer approximation of the geometric contour of each vehicle is considered as shown by the dash-dotted boxes in Fig. 1. This outer approximation is referred as the collision av oidance zone (c). The features, φ 1 , t , φ 2 , t and φ 3 , t , are indicator functions based on the collision avoidance zone of the vehicles that respectively characterize: • Collision status - The intersection of the collision av oidance zone of the ego vehicle with that of any other vehicle indicates a collision or a danger of collision. If an ov erlap is detected then φ 1 , t is assigned a value − 1 ; and 0 , otherwise. • On-road status - The intersection of the collision av oid- ance zone of the ego vehicle with that of green regions shown in Fig. 1 indicates that the ego vehicle is outside the road boundaries. The feature φ 2 , t = − 1 if an overlap is detected; φ 2 , t = 0 , otherwise. • Safe zone violation status - A safe zone (s) of a vehicle is a rectangular area that subsumes the collision av oidance zone of the vehicles with a safety margin. The safety margin is chosen based on the minimum distance to be maintained from the surrounding vehicles. If an ov erlap of the safe zone of the ego vehicle with that of another vehicle is detected then φ 3 , t is assigned a value − 1 ; and 0 , otherwise. The other features considered in this work characterize: • Distance to objecti ve - In order to encourage the ego vehicle to change lane and reach a reference point in the new lane,  x ref , y ref  , the feature φ 4 , t is deﬁned as φ 4 , t = −    x t − x ref   +   y t − y ref    . (6) • Distance to lane center - The feature, φ 5 , t , deﬁned as φ 5 , t = −   y t − y lc   , (7) where y lc is the y-coordinate of the center of the current lane that is included to encourage the ego vehicle to be at the middle of the current lane. • V elocity error - The deviation of the velocity of the ego vehicle from a reference velocity , v ref , is described by the feature φ 6 , t as φ 6 , t = −   v t − v ref   , (8) where the reference velocity is typically chosen as the legislated speed limit. Intuitiv ely , the ﬁrst three features are prioritized to avoid actions taken by the controller leading to a catastrophe, and therefore, the tuning parameters are chosen as α 1 , α 2 , α 3  α 4 , α 5 , α 6 . B. Level-k framework In a multi-agent trafﬁc scenario, the interactiv e nature of the decision making process is taken into account by the features, φ 1 , t and φ 3 , t , of the stage rew ard in (5) that depend on the states of other vehicles. T o compute the cumulative rew ard in (4) , for a giv en sequence of actions of the l th au- tonomous vehicle, γ t [ l ] = { γ t [ l ] , γ t +1 [ l ] , · · · , γ t + N − 1 [ l ] } , it is required to predict the actions of other agents, γ t [ i ] = { γ t [ i ] , γ t +1 [ i ] , · · · , γ t + N − 1 [ i ] } , ∀ i ∈ O , where O =  i | i ∈ Z [1 , n ] , i 6 = l  with n representing the number of agents, and the corresponding state of the trafﬁc, s t + j , at prediction steps j = 0 , 1 , · · · , N − 1 , where s t = [ x t [1] , y t [1] , v t [1] , θ t [1] , · · · , x t [ n ] , y t [ n ] , v t [ n ] , θ t [ n ]] T . In this paper , lev el-k game theory [ 19 ], [ 20 ] is utilized to model the vehicle-to-vehicle interactions and thus predict the actions of the other agents o ver the horizon. In lev el- k game theory , it is assumed that the decisions taken by the strategic agents are based on the predictions of the actions of the other agents and the agents can have hav e dif ferent reasoning lev els. The reasoning depth of an agent is indicated by k ∈ { 0 , 1 , · · · } . The hierarchy begins with agents at lev el-0, where the agents make instinctive decisions to achieve the objective without accounting for the interactions between other agents. On the other hand, the agents at lev el- k ∀ k > 0 , consider the interactions by assuming that the other agents are at level- ( k − 1) and take decisions accordingly . For instance, a l th lev el-1 agent assumes other agents are at level-0 and predicts their ac- tion sequences, γ (0) t [ i ] = n γ (0) t [ i ] , γ (0) t +1 [ i ] , · · · , γ (0) t + N − 1 [ i ] o ∀ i ∈ O , to compute its own action sequence, γ (1) t [ l ] = n γ (1) t [ l ] , γ (1) t +1 [ l ] , · · · , γ (1) t + N − 1 [ l ] o . The lev el-k game theory was adapted to model the vehicle-to-v ehicle interactions at an unsignalized four-way intersection in [ 7 ]. It is assumed that lev el-0 vehicles consider the other vehicles in the trafﬁc scenario as stationary obstacles. Therefore, these level-0 driv ers implicitly assume the others will yield the right of way , and can be regarded ‘aggressiv e’. And level-1 dri vers consider other dri vers to be aggressive and take ‘cautious’ actions. In [ 7 ], the driv ers are categorized into le vel-0, 1 and 2. Since, the behavior of lev el-2 driv er will be similar to that of the lev el-0 driv ers, in this paper , only lev el-0 and 1 driv ers are considered. The stage reward value obtained for l th lev el- k agent at a prediction step j for an action γ ( k ) t + j [ l ] , depend on the current trafﬁc state, s 0 , the ego agent’ s actions, n γ ( k ) t [ l ] , γ ( k ) t +1 [ l ] , · · · , γ ( k ) t + j − 1 [ l ] o , and the actions of other agents, n γ ( k − 1) t [ i ] , γ ( k − 1) t +1 [ i ] , · · · , γ ( k − 1) t + j − 1 [ i ] o ∀ i ∈ O , is giv en as R ( k ) t + j [ l ] = R t + j  γ ( k ) t + j [ l ]    s 0 , γ ( k ) t [ l ] , γ ( k ) t +1 [ l ] , · · · , γ ( k ) t + j − 1 [ l ] , γ ( k − 1) t [ i ] , γ ( k − 1) t +1 [ i ] , · · · , γ ( k − 1) t + j − 1 [ i ]  , (9) and its cumulative reward is R ( k )  γ ( k ) t [ l ]  = N − 1 X j =0 λ j R ( k ) t + j [ l ] . (10) C. Multi-model strate gy Human dri vers, initially , do not hav e perfect knowledge about other driv ers. Howe ver , they gain better understandings of other driver’ s characteristics through interactions, and there- fore, resolve conﬂicts effecti vely . Similarly , the autonomous vehicles in a multi-agent trafﬁc scenario, hold an initial belief about the dri ver model (le vels-0 and 1) of the other v ehicles as a probability distribution over both models. This facilitates the expression of a driver’ s degree of aggressiv eness (or cautiousness) as a continuous parameter between lev els-0 and 1. Subsequently , based on the actual action applied by the other agents, the estimate of the probability distrib ution is updated at every step. From the perspecti ve of an l th autonomous agent, the probability that the i th other agent can be modeled as lev el- k is represented by P K l i = k . The probability of the model k is increased when it matches the actual action by k ∗ = arg min k ∈{ 0 , 1 }    γ [ i ]( t ) − γ ( k ) t [ i ]    (11a) ˜ P K l i = k ∗ ( t ) = P K l i = k ∗ ( t − 1) + ∆ P (11b) P K l i = k ( t ) = ˜ P K l i = k ( t ) P 1 ˜ k =0 ˜ P K l i = ˜ k ( t ) , ∀ k ∈ { 0 , 1 } , (11c) where γ [ i ]( t ) and γ ( k ) t [ i ] represent the actual and predicted action taken by i th agent assuming lev el- k model, respectiv ely; ∆ P > 0 is a constant that denotes the rate of increment of the probability;    γ [ i ]( t ) − γ ( k ) t [ i ]    =    a ( t ) − a ( k ) t    +    δ f ( t ) − δ ( k ) f , t    . When the input pair of the actual action is equal to that of the predicted action, the probability distribution remains unchanged. In order to incorporate the multi-model strategy in the deci- sion making process and select the optimal action according to the model of other agents, the expected cumulativ e reward for the l th agent, using (9) and (10), is giv en by R P ( γ t [ l ]) = 1 X k =0 P K l i = k ( t ) R ( k )  γ ( k ) t [ l ]  , ∀ i ∈ O . (12) D. Robust decision making The mismatch between the actual position of the center of mass of the vehicle which is used to determine the rectangular outer approximation of the vehicle (see Section IV -A) and its predictions obtained using the dynamic model in (3) might lead to collision. In the multi-agent traf ﬁc scenario under consideration, there are two sources of modeling errors: (i) the uncertainties, w ( m ) =  w ( m ) x , w ( m ) y  ∈ W m , resulting due to the use of a simpliﬁed model in (3) ; and (ii) the uncertainty , w ( d ) =  w ( d ) x , w ( d ) y  ∈ W d , arising due to unknown driv er model. Hence, the disturbance set deﬁned in (2) is W = W m ⊕ W d . (13) Robust approaches can be used to account for these uncertainties while computing the control actions. Since a discrete set of input actions is considered in this work, feedback min-max strategy [ 9 ] is utilized in this work for considering the uncertainties originating from the disturbance set W . Since the autonomous agents update the driv er models of the other agents in the multi-agent traf ﬁc scenario at each step according to (11) , an adaptiv e scheme is proposed to incorporate the conﬁdence on the driver models and leverage the fact that le vel-1 driv ers are cautious. The disturbance set considered at each time step is modiﬁed of the other agents and lev erage . ¯ W l i ( t ) = W m ⊕ P K l i =0 ( t ) W d (14) where at time t , P K l i =0 ( t ) is the probability that the i th agent is a lev el-0 driver from the perspective of the l th autonomous agent, and ¯ W l i ( t ) denotes the disturbance set. It is assumed that, initially , all the agents are level-0 driv ers, i.e., P K l i =0 (0) = 1 , ∀ i ∈ O . Essentially , this assumption allows the autonomous agent to be cautious with another interacting agent when there is no/less information about that agent. If an i th agent is le vel-0, as time evolv es, P K l i =0 ( t ) will continue to be equal to one, and hence, the autonomous agent take conservati ve actions (or behav e like level-1 driv er). On the other hand, when the i th agent is a le vel-1 dri ver , P K l i =0 ( t ) will decrease, resulting in a reduced disturbance set size, thereby , allowing the autonomous agent to take less conservati ve actions and adapt to the behavior of the interacting agents while capable of handling the uncertainties arising due to the use of a simple prediction model. When interacting with an agent i ∈ O , the objec- tiv e the autonomous agent l is to maximize the expected cumulativ e reward (10) , while accounting for the effect of the worst-case uncertainty from the possible distur - bance realizations. The optimal control sequence, γ ∗ t [ l ] =  γ ∗ t [ l ] , γ ∗ t +1 [ l ] , · · · , γ ∗ t + N − 1 [ l ]  , is obtained by solving the following optimization problem γ ∗ t [ l ] = arg max γ t [ l ] ∈ Γ min w p t + j ∈ ¯ W l i ( t ) R P ( γ t [ l ] ) (15a) s. t. ∀ j ∈ Z 0 , N − 1 , (3c) , (3d) , (3e) , ∀ i ∈ O , ∀ p ∈ P ˜ x t + j +1 = ˜ x t + j + v t + j cos ( ψ t + j + β t + j ) ∆ t + w p x, t + j (15b) ˜ y t + j +1 = ˜ y t + j + v t + j sin ( ψ t + j + β t + j ) ∆ t + w p x, t + j , (15c) where w p t + j =  w p x, t + j , w p y , t + j  denote a possible realization of the uncertainty in the global position of the center of mass in x and y directions, respectiv ely; and P represents the set of inde xes the realizations. Since the decision v ariables of (15) are discrete, it is solved using a decision tree approach. The autonomous agent then applies the ﬁrst element γ ∗ t [ l ] of the optimal control action sequence, i.e., γ ( t ) = γ ∗ t [ l ] . V . S I M U L A T I O N R E S U LT S The proposed adaptiv e robust approach is validated for the lane changing maneuver on a numerically simulated three lane highway section 1 . Consider the multi-agent trafﬁc scenario sho wn in the subﬁgure [1 , 1] (ﬁrst element represents ro w and second element represents column) in Fig. 2. The objective of the autonomous agent (blue) is to change from lane II to lane III, while the human agents keep their respectiv e lane. All the human driv ers are assumed to be lev el-1, i.e., they exhibit cautious behavior , howe ver , it is unknown to the A V . The sampling time is set to 0 . 5 s and two step prediction horizon is considered. The proposed methodology is compared to the nominal, and the robust strategies during the decision making process. The trafﬁc simulation under the nominal decision making strategy is shown in the subﬁgures in the ﬁrst column of Fig. 2. In this case, the disturbance set considered for solving (15) is chosen as an empty set, i.e., ¯ W l i ( t ) = ∅ . Since the disturbances are not considered in this case, it can be noted the A V chooses to steer left as soon as the simulation begins which provides the maximum reward. This move is considered to be aggressive. Howe ver , the human vehicle 4 , being cautious, reacts by steering left at time t = 2 s and returning to the lane center at t = 4 s once the A V has passed. The A V completes the lane change between 60 m and 70 m . The results obtained by using the robust strategy that considers ¯ W l i ( t ) = W , for the decision making process, is sho wn in the subﬁgures in the third column of Fig. 2. The 1 The code is made publicly available at https://github.com/ gokulsivasankar/RobustDecisionMaking T ABLE I: Collision and lane change rates under v arious strategies. Strategy Collision rate (%) Lane change rate (%) Nominal 21 78 Adaptiv e 2 93 Robust 1 75 disturbance set W is deﬁned as in (13) . As mentioned in Section IV -D, initially , all the human drivers are considered to be level-0 from the perspective of the A V , while the y are assigned to be level-1 driv ers. As seen in Fig. 3, due to interactions with vehicle 4 , as time ev olves, the A V is able to reduce the probability of vehicle 4 being lev el-0. It can also be noted from subﬁgure [3 , 3] in Fig. 2, robust control actions are taken to account for the set, s ⊕ ¯ W l i ( t ) , dash-dotted box surrounding the human dri vers shown in the Fig. 2. As a result, the A V is conserv ati ve in performing the lane change maneuver by completing it between 90 m and 100 m . Lastly , the simulation results for the adaptiv e control strategy is shown in the subﬁgures in the second column of Fig. 2. The initial disturbance set of all the human agents are similar to the previous case. Howe ver , by using the adaptiv e disturbance set in (14) for computing the control actions, the A V is able to complete the lane change around 70 m . Also, it can be observed that the size of the set, s ⊕ ¯ W l i ( t ) , ∀ i ∈ { 1 , 3 } , remains constant because, the level- 0 and 1 actions are the same for these two vehicles and therefore, according to (11) , the dri ver model is not updated. The proposed strategy allo ws the A V to behave cautiously when there is uncertainty in the dri ver model and adapt its beha vior according to the estimate of the model of the interacting driv er . Initializing from the same condition as mentioned abo ve, multiple simulation runs were performed for the trafﬁc scenario under each decision making strate gy . The collision and lane changing rates under the three strategies are shown in T able I. Due to space constraints, the discussions are omitted. V I . C O N C L U S I O N S An adaptive robust decision making strategy has been proposed for the autonomous vehicles sharing the road with human driv ers in a multi-agent trafﬁc scenario. The interactions between vehicles are modeled using a level- k game-theoretic framew ork. The proposed robust control strategy accounts for the uncertainties in the vehicle dynamic model and the driv er model estimation. The autonomous vehicle estimates the driv er model of the other agents at each time step and is sho wn to use it to adapt its behavior through numerical simulations of a lane changing maneuver . A C K N O W L E D G E M E N T W e would like to express our appreciation to Mr . Nan Li, Dr . Ilya K olmanovsky and Dr . Anouck Girard who provided expertise that greatly assisted the research. Distance (m) Distance (m) Distance (m) 1 2 3 4 I II III t = 0 s t = 1 s t = 2 s t = 3 s t = 4 s (a) Nominal (b) Adaptive (c) Robust Fig. 2: A four second simulation sequence with a one second time interv al (see along the column) showing the lane changing maneuver performed by the autonomous vehicle under (a) nominal; (b) adaptive; and (c) rob ust decision making strategies in a multi-trafﬁc scenario. The circled numbers indicate the vehicle ids and the Roman numerals denote the lane number in the subﬁgure [1 , 1] . The dash-dotted lines indicate the set s ⊕ ¯ W l i ( t ) , ∀ i ∈ O from the perspectiv e of the autonomous agent. R E F E R E N C E S [1] R. Okuda, Y . Kajiwara, and K. T erashima, “ A survey of technical trend of adas and autonomous driving, ” in T echnical P apers of 2014 International Symposium on VLSI Design, Automation and T est , pp. 1–4, IEEE, 2014. [2] D. A. Lazar, R. Pedarsani, K. Chandrasekher, and D. Sadigh, “Maxi- mizing road capacity using cars that inﬂuence people, ” in 2018 IEEE Confer ence on Decision and Contr ol (CDC) , pp. 1801–1808, IEEE, 2018. [3] T . L. Grifﬁths, F . Lieder , and N. D. Goodman, “Rational use of cognitiv e resources: Levels of analysis between the computational and the algorithmic, ” T opics in cognitive science , vol. 7, no. 2, pp. 217– 229, 2015. [4] M. Althoff, O. Stursberg, and M. Buss, “Model-based probabilistic collision detection in autonomous driving, ” IEEE T ransactions on Intelligent T ransportation Systems , vol. 10, no. 2, pp. 299–310, 2009. [5] D. O. Stahl, “Evolution of smartn players, ” Games and Economic Behavior , vol. 5, no. 4, pp. 604–617, 1993. [6] N. Li, D. W . Oyler, M. Zhang, Y . Y ildiz, I. Kolmano vsky , and A. R. Girard, “Game theoretic modeling of driver and vehicle interactions for veriﬁcation and validation of autonomous vehicle control systems, ” IEEE T ransactions on contr ol systems tec hnology , vol. 26, no. 5, pp. 1782–1797, 2017. [7] N. Li, I. Kolmano vsky , A. Girard, and Y . Yildiz, “Game theoretic modeling of vehicle interactions at unsignalized intersections and application to autonomous vehicle control, ” in 2018 Annual American T ime (s) P K 2 4 =0 Fig. 3: Probability that the human dri ven vehicle 4 can be modeled as lev el-0 from the perspective of the autonomous vehicle 2 reduces as time e volves since vehicle 4 is le vel-1. Contr ol Conference (ACC) , pp. 3215–3220, IEEE, 2018. [8] R. Tian, S. Li, N. Li, I. Kolmano vsky , A. Girard, and Y . Y ildiz, “ Adaptive game-theoretic decision making for autonomous vehicle control at roundabouts, ” in 2018 IEEE Confer ence on Decision and Contr ol (CDC) , pp. 321–326, IEEE, 2018. [9] P . O. Scokaert and D. Mayne, “Min-max feedback model predictive control for constrained linear systems, ” IEEE T ransactions on Automatic contr ol , vol. 43, no. 8, pp. 1136–1142, 1998. [10] D. Q. Mayne, M. M. Seron, and S. Rakovi ´ c, “Robust model predictiv e control of constrained linear systems with bounded disturbances, ” Automatica , vol. 41, no. 2, pp. 219–224, 2005. [11] A. Richards and J. P . How , “Model predictiv e control of vehicle maneuvers with guaranteed completion time and robust feasibility , ” in American Control Conference , 2003. Proceedings of the 2003 , vol. 5, pp. 4034–4040 vol.5, June 2003. [12] G. S. Sankar, R. C. Shekhar, C. Manzie, T . Sano, and H. Nakada, “Fast calibration of a robust model predictive controller for diesel engine airpath, ” IEEE T ransactions on Control Systems T echnology , pp. 1–15, 2019. [13] G. S. Sankar, R. C. Shekhar , C. Manzie, T . Sano, and H. Nakada, “Model predictive controller with av erage emissions constraints for diesel airpath, ” Control Engineering Practice , vol. 90, pp. 182 – 189, 2019. [14] G. I. Jin, S. Bastian, M. M. Richard, and A. Matthias, “Risk-aware motion planning for automated vehicle among human-driv en cars, ” in 2019 American Control Confer ence (A CC) , pp. –, IEEE, 2019. [15] L. Claussmann, A. Carvalho, and G. Schildbach, “ A path planner for autonomous driving on highways using a human mimicry approach with binary decision diagrams, ” in 2015 Eur opean Control Confer ence (ECC) , pp. 2976–2982, IEEE, 2015. [16] S. Brechtel, T . Gindele, and R. Dillmann, “Probabilistic decision- making under uncertainty for autonomous driving using continuous pomdps, ” in 17th International IEEE Conference on Intelligent T rans- portation Systems (ITSC) , pp. 392–399, IEEE, 2014. [17] D. Sadigh, S. Sastry , S. A. Seshia, and A. D. Dragan, “Planning for autonomous cars that lev erage effects on human actions., ” in Robotics: Science and Systems , vol. 2, Ann Arbor, MI, USA, 2016. [18] J. Kong, M. Pfeiffer , G. Schildbach, and F . Borrelli, “Kinematic and dynamic vehicle models for autonomous driving control design, ” in 2015 IEEE Intelligent V ehicles Symposium (IV) , pp. 1094–1099, IEEE, 2015. [19] M. A. Costa-Gomes and V . P . Crawford, “Cognition and behavior in two-person guessing games: An experimental study , ” American Economic Review , vol. 96, no. 5, pp. 1737–1768, 2006. [20] M. A. Costa-Gomes, N. Iriberri, and V . P . Crawford, “Comparing mod- els of strategic thinking in van huyck, battalio, and beil’ s coordination games, ” Journal of the Eur opean Economic Association , vol. 7, no. 2/3, pp. 365–376, 2009.

Adaptive Robust Game-Theoretic Decision Making for Autonomous Vehicles

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment