Poisson-MNL Bandit: Nearly Optimal Dynamic Joint Assortment and Pricing with Decision-Dependent Customer Arrivals

P oisson-MNL Bandit: Nearly Optimal Dynamic Join t Assortmen t and Pricing with Decision-Dep enden t Customer Arriv als Junh ui Cai Department of Information T echnology , Analytics, and Operations, Univ ersity of Notre Dame, jcai2@nd.edu Ran Chen Department of Statistics and Data Science, W ashington Universit y in St. Louis, ran.c@wustl.edu Qitao Huang Department of Mathematics, Tsinghua Universit y , qitaohuang@tsingh ua.edu.cn Linda Zhao Department of Statistics and Data Science, Univ ersity of Pennsylv ania, lzhao@wharton.upenn.edu W u Zhu Department of Finance, Tsinghua Universit y , zhu wu@sem.tsinghua.edu.cn * W e study dynamic join t assortmen t and pricing where a seller up dates decisions at regular accoun t- ing/op erating interv als to maximize the cumulativ e p er-p erio d reven ue ov er a horizon T . In many settings, assortmen t and prices aﬀect not only what an arriving customer buys but also how many customers arrive within the p erio d, whereas classical multinomial logit (MNL) mo dels assume arriv als as ﬁxed, p otentially leading to sub optimal decisions. W e prop ose a Poisson–MNL mo del that couples a con textual MNL choice mo del with a Poisson arriv al model whose rate dep ends on the oﬀered assortment and prices. Building on this model, w e dev elop an eﬃcien t algorithm PMNL based on the idea of upper conﬁdence bound (UCB). W e establish its (near) optimalit y by proving a non-asymptotic regret b ound of order √ T log T and a matching lo wer bound (up to log T ). Simulation studies underscore the imp ortance of accoun ting for the dep endency of arriv al rates on assortment and pricing: PMNL eﬀectively learns customer c hoice and arriv al mo dels and pro vides joint assortmen t-pricing decisions that outp erform others that assume ﬁxed arriv al rates. Key wor ds : con textual bandits; dynamic assortment; dynamic pricing; customer arriv al; on-line decision-making. * Authorship is in alphab etic order. 1 1 In tro duction Assortmen t (what pro ducts to oﬀer) and pricing (at what prices) are among the cen tral decision problems in rev enue managemen t. In many retail and platform settings, these decisions are made at regular “accoun ting” interv als—daily , weekly , or on other op erational cycles (Ma et al. 2018, Bro wn et al. 2023, Aparicio et al. 2023). Ov er suc h a p erio d, the oﬀered assortment and prices inﬂuence reven ue through t w o channels: how many customers arrive and what those customers pur chase . Individual customer purc hase b ehavior is often mo deled via discrete c hoice mo dels, most notably the m ultinomial logit (MNL) mo dels, fo cusing on optimizing expected reven ue p er arriving customer, while the arriv al pro cess is t ypically taken as ﬁxed. In practice, ho w ev er, customer arriv als are aﬀected by assortment and pricing decisions: a more attractive selection or more comp etitive prices can pull in additional traﬃc (Kahn 1995, W ang 2021). Therefore, models that ignore such decision-dep enden t arriv als can lead to sub optimal decisions for p er-p erio d reven ue maximization. In this pap er, we incorp orate customer arriv als into the choice-based mo deling for the dynamic join t assortment-pricing problem. Our goal is to sequentially determine assortmen t and pricing decisions { S t , p t } T t =1 among N pro ducts to maximize cum ulative expected reven ue ov er a time horizon T . Under a classical MNL mo del, given an assortment set S ⊆ [ N ] = { 1 , 2 , . . . , N } and prices p = { p j } j ∈ S , an arriving customer purc hases pro duct j ∈ S with probability q j ( S, p ) = exp( v j − p j ) 1+ P k ∈ S exp( v k − p k ) , where v j denotes the intrinsic v alue of pro duct j and can b e further contextu- alized by d z -dimensional pro duct features z . Consequently , the classical approach aims at maxi- mizing the exp ected customer-wise cum ulative reward based on the p er-customer exp ected reward P j ∈ S q j ( S, p ) p j . Ho wev er, a key asp ect is ov erlo oked: the num b er of customer arriv als in each p erio d is random and dep ends on the assortment-pricing decision. T o capture the eﬀect of assortment and pricing on arriv als, w e mo del the n umber of arriv als in eac h p erio d as a Poisson coun t with mean prop ortional to a decision-dep endent arriv al rate λ ( S, p ), i.e., the arriv al rate is a function of the assortmen t S and prices p . Com bining the arriv al mo del and c hoice mo del together, given an assortmen t S and prices p , the conditional mean of the p er-p erio d rew ard for this p erio d is: E [ R | S, p ] ∝ λ ( S, p ) · X j ∈ S q j ( S, p ) p j . This intuitiv e form highligh ts the coupling betw een per-customer rev enue and p er-p erio d rev en ue induced b y an assortment-pricing decision: a decision can impro ve conv ersion or margins y et reduce customer arriv als enough to low er total p erio d reven ue; conv ersely , one that app ears w orse on a p er-customer reven ue can b e optimal if it attracts substantially more customers. F or example, discoun ting a “magnet” pro duct may reduce sales of other pro ducts through substitution, but still increase ov erall reven ue by dra wing in more arriv als during the p erio d; similarly , expanding the 2 assortmen t by adding an item may dilute purc hase probabilities, yet increase arriv als by making the oﬀer set more attractive. These examples underscore that the eﬀect of assortment and pricing on arriv als can tak e many forms. The key , then, is to develop a ﬂexible mo del for the arriv al rate that captures the dep endency of arriv als on the assortmen t-pricing decision. T o this end, instead of conﬁning this dep endency to a sp eciﬁc functional form, w e parametrize λ ( S, p ) through a rich set of basis functions of ( S, p ), allowing the mo del to b e expressive enough to accommo date a wide range of dep endence structures. Since neither customer arriv al nor choice b eha vior is known a priori, b oth must b e learned from data generated under past decisions while maximizing cumulativ e reward o ver time. Sp eciﬁcally , in eac h p erio d, the ﬁrm chooses an assortment and prices, observes the realized customer arriv als and purc hases, and updates its inference on the mo del parameters using all a v ailable observ ations. Suc h an online learning problem falls under the umbrella of b andit pr oblems (Lattimore and Szep esv´ ari 2020), where a k ey c hallenge is to balance “exploration” and “exploitation” o v er action space, i.e., feasible assortments and prices, so as to maximize cumulativ e rew ard, or equiv alently , minimize cumulative r e gr et relative to an oracle p olicy that kno ws the mo del parameters and alwa ys selects the optimal action. W e prop ose a new algorithm, P oisson-MNL ( PMNL ), which jointly learns the parameters of the P oisson arriv al mo del and the MNL choice mo del and provides dynamic assort- men t and pricing decisions using an upp er conﬁdence b ound (UCB) approach. W e establish regret b ounds sho wing that PMNL is optimal up to a logarithmic factor in the time horizon T and show in sim ulation studies that it outp erforms b enchmarks that assume a ﬁxed arriv al rate when customer arriv al is inﬂuenced by the oﬀered assortment and pricing. Let us summarize some of our main con tributions: 1. A decision-dep enden t arriv al-choice mo del for join t assortment-pricing. W e pro- p ose a P oisson-multinomial logit mo del (P oisson-MNL) that captures customer arrivals and customer choic es . Sp eciﬁcally , w e mo del customer arriv als within each decision p erio d via a P oisson mo del with a rate that depends on the oﬀered assortment and corresp onding prices, while w e mo del customer purchase b ehavior using a con textual MNL mo del incorporating pro duct features and prices. This formulation captures ho w the assortment-pricing decision inﬂuence both the volume of customer arriv als and their purchase outcomes. Ignoring the dep endency of customer arriv als on assortment and pricing, as in classical MNL mo dels, can lead to sub optimal p olicies when decisions are made p er-p erio d rather than p er-customer. Our mo del is also ﬂexible and expressive. F or the arriv al mo del, we do not imp ose a pre- sp eciﬁed relationship b e t w een the arriv al rate and the assortment-pricing decision. Instead, it suﬃces to identify a rich enough set of basis functions such that the log arriv al rate is linear in these bases. F or the choice mo del, we allo w pro ducts to b e characterized b y a set 3 of observ able features, enabling learning at the attribute level and generalization to newly in tro duced pro ducts. W e further allo w these features (e.g., customer ratings) to change ov er time. Our framework nests the standard MNL mo del as a sp ecial case when the arriv al rate is assumed to b e ﬁxed (i.e., the arriv al mo del co eﬃcients are zero). 2. A nearly optimal eﬃcien t algorithm ( PMNL ) for online learning. W e develop an eﬃ- cien t online algorithm that sequentially provides joint as sortmen t-pricing decisions based on observ ations av ailable up to each p erio d so as to maximize the cum ulative exp ected reven ue o v er a given time horizon T . The c hallenge lies in eﬃcien tly learning b oth the Poisson arriv al and the MNL c hoice parameters based on highly dep endent observ ations, while strategically taking actions (join t assortmen t-pricing) that balance the exploration-exploitation trade-oﬀ. Our algorithm adopts a tw o-stage design based on maxim um likelihoo d estimation (MLE). The ﬁrst stage conducts O (log T ) rounds of exploration to obtain suﬃcien tly accurate initial parameter estimates. In the second stage, w e tak e an upp er conﬁdence b ound (UCB) approach to explore and exploit: in eac h round, it selects an assortment and prices by maximizing an upp er conﬁdence b ound on the p er-p erio d exp ected reward. Constructing b oth the estimator and the upp er b ound are substantially more challenging than in standard MNL bandit mo dels, due to three in tertwined diﬃculties: (i) P oisson arriv als in tro duce unbounded and non-sub-Gaussian random v ariables that requires new to ols b eyond those used in standard MNL analyses; (ii) arriv al randomness complicates the analysis of the c hoice mo del parameter estimation by bringing in extra randomness to the observ ations of purc hase outcomes in addition to the choice randomness; and (iii) the unknown arriv al param- eters enter MNL error b ounds, whereas the algorithm requires conﬁdence b ounds to b e free from these unkno wns. W e address these c hallenges by lev eraging concen tration inequalities for martingales with incremen ts satisfying Bernstein conditions; carefully designing and analyzing estimators and statistics whose randomness is dominated b y that induced by the c hoice model, and deriving fully data-dependent error b ounds. W e further sharp en the constan ts in the error b ounds with new analytical to ols, resulting in signiﬁcantly improv ed practical p erformance. 3. Non-asymptotic regret b ounds. W e establish a non-asymptotic upp er b ound for the exp ected regret of PMNL of order √ T log T for all T , and a matc hing non-asymptotic low er b ound of the order Ω( √ T ), implying that our algorithm is nearly minimax optimal. T o our kno wledge, both b ounds are the ﬁrst of their kind for dynamic joint assortment and pricing with con textual information and decision-dep endent arriv als. T o compare with classical MNL b ounds, note that under sp eciﬁc conﬁgurations of assumptions for our mo del parameters, the arriv al rate is decision-irrelev an t, the c hoice mo del is price-irrelev ant, and the diﬀerence with the classical MNL framework reduces to whether the assortment and pricing 4 decisions can b e changed p er-customer or p er-p erio d. Due to the constant arriv al rate, the upp er and low er b ounds under our setup can b e naturally translated to the classical MNL framew ork while retaining the rate, after reindexing T from perio ds to customers. Notably , our upp er b ound impro v es up on the state-of-the-art results ( O ( √ T log T )) and our low er bound is free from the stringen t conditions required by existing MNL low er b ounds. In addition to the non-asymptotic lo w er b ound, we also pro vide an as ymptotic lo w er b ound showing dependence on parameter dimensions. 1.1 Related Literature Dynamic Assortment and Pricing. Our paper ﬁrst contributes to the extensive literature on dynamic assortment and pricing. In dynamic assortment, discrete c hoice mo del is widely accepted, in particular the multinomial logit (MNL) choice mo del (Rusmevichien tong et al. 2010, Chen and W ang 2017, Agraw al et al. 2019, Chen et al. 2020). The MNL framew ork has b een extended to con textual settings b y incorporating pro duct features (Cheung and Simchi-Levi 2017, Chen et al. 2020, Miao and Chao 2022, Lee et al. 2025), as well as to adversarial con textual settings (P erivier and Goy al 2022, Lee and Oh 2024). F or con textual MNL bandits, Chen et al. (2020) and Oh and Iy engar (2021) establish a regret bound of O ( d z √ T log T ) using diﬀerent proof strategies. F or lo w er b ounds, Chen et al. (2020) establish an asymptotic lo wer b ound of Ω( d z √ T /K ) and Lee and Oh (2024) establish Ω( d z p T /K ) for a class of algorithms that select the same product K times in each p erio d. Our Poisson-MNL mo del is motiv ated b y practical operational settings where decisions are made p er-p erio d instead of p er-customer and incorp orates a Poisson mo del to account for the random arriv al num b er that dep ends on the decision. Under sp eciﬁc conﬁgurations of assumptions for our mo del, our regret b ound can b e naturally translated to the contextual MNL framew orks with a non-asymptotic upp er b ound of order d z √ T log T , a non-asymptotic lo wer b ound of order √ T , and an asymptotic lo w er b ound of order d z √ T ( K holds constan t) all under w eak er assumptions. Dynamic pricing is another ma jor area in rev enue management (Klein b erg and Leighton 2003, Araman and Calden tey 2009, Besbes and Zeevi 2009, Broder and Rusmevichien tong 2012, den Bo er and Zwart 2014, Keskin and Zeevi 2014). Recen t researc h in dynamic pricing further considers customer characteristics (Ban and Keskin 2021, Chen and Gallego 2021, Bastani et al. 2022) and pro duct features (Qiang and Bay ati 2016, Jav anmard and Nazerzadeh 2019, Cohen et al. 2020, Miao et al. 2022, F an et al. 2024). Much of this literature fo cuses on single-pro duct settings, while man y sellers must price multiple pro ducts sim ultaneously . The (multinomial/nested) logit mo dels has b een applied to m ulti-pro duct pricing problems (Ak¸ ca y et al. 2010, Gallego and W ang 2014) and more recen t w ork incorporates pro duct features into the demand mo del (Ja v anmard et al. 2020, F erreira and Mo w er 2023). F or example, Ja v anmard et al. (2020) establish a regret b ound 5 T able 1 Compa rison of PMNL and related MNL metho ds for dynamic assortment or/and pricing problems with T rounds, N products, maximum assortment size K , and if any , d z -dimensional product features (contexts) and d x -dimensional set of basis functions on assortment-p ricing fo r the arrival rate mo del. Method / P ap er Assortment Pricing Arriv al Context Upper b ound Low er bound This paper ( PMNL ) ✓ ✓ ✓ ✓ O (( d z + d x ) √ T log T ) Ω(( d z + √ d x ) √ T ) Agraw al et al. (2019) ✓ ✗ ✗ ✗ O ( p N T log( N T )) Ω( p N T /K ) Chen et al. (2020) ✓ ✗ ✗ ✓ O ( d z √ T log T ) Ω( d z √ T /K ) Oh and Iyengar (2021) ✓ ✗ ✗ ✓ O ( d z √ T log T ) Ω( d z √ T /K ) Lee and Oh (2024) ✓ ✗ ✗ ✓ O ( d z p T /K ) (Adversarial) Ω( d z p T /K ) Jav anmard et al. (2020) ✗ ✓ ✗ ✓ O (log( T d z )( √ T + d z log T )) Ω( √ T ) Miao and Chao (2021) ✓ ✓ ✗ ✗ O ( √ N T log( N T )) – F erreira and Mo wer (2023) ✗ ✓ ✓ ✗ ✓ – – of O (log( T d z )( √ T + d z log T )) and a low er b ound of Ω( √ T ). Similar to dynamic assortment, our b ounds can be translated to such dynamic pricing settings and are b etter. More importantly , our fo cus is on dynamic joint assortment and pricing decisions made p er-p erio d , with a random n umber of customer arriv als whose rate dep ends on the decision. While dynamic assortment and pricing problems ha v e been studied separately and extensively , researc h on the join t assortment-pricing problem is relatively sparse. Chen et al. (2022a) study this problem in an oﬄine setting, and Miao and Chao (2021) prop ose a Thompson-sampling based algorithm using an MNL c hoice mo del with pro duct-sp eciﬁc mean utilities and price sensitivities. In contrast, we incorp orate the pro duct feature and, more imp ortantly , allo w customer arriv als to v ary depending on the oﬀered assortmen t and prices. T able 1 compares our metho ds with related MNL-based metho ds. Bandits. The dynamic assortment and pricing problems are closely related to the bandit problem, whic h dates back to the seminal work of Robbins (1952). In each round, a decision-mak e r chooses an action (arm) and then observ es a rew ard. The goal is to act strategically to minimize cumulativ e regret. There is now an extensive literature on the bandit problem, including m ulti-armed bandits, con textual bandits, linear bandits, and generalized linear bandit; w e refer the reader to the com- prehensiv e b o ok by Lattimore and Szep esv´ ari (2020) and references therein for more background. The dynamic assortment problem can b e casted as a multi-armed bandit problem: for example, eac h feasible assortment can b e treated as an arm. Suc h a naiv e form ulation, ho w ev er, results in  N K  arms and thus suﬀers from the curse of dimensionalit y . MNL bandits provide one tractable w ay to imp ose structure on this com binatorial action space through an MNL c hoice mo del (Agraw al et al. 2019) and con textual MNL bandits further accoun t for product features (Chen et al. 2020). While the MNL structure is widely appreciated in the op erations managemen t comm unit y due to its connection to utilit y theory , it is highly speciﬁed and therefore requires careful, mo del-sp eciﬁc analysis, as already demonstrated by aforemen tioned assortment/pricing literature, and even more so for our P oisson–MNL mo del built up on it. 6 Customer Arrivals. Customer arriv al is a k ey comp onent of any service system and often mo deled using P oisson mo dels (P oisson 1837, Kingman 1992). In assortment and pricing optimization, there exists a stream literature that combines the MNL mo del with Poisson arriv al mo dels (V ulcano et al. 2012, Ab dallah and V ulcano 2021, W ang 2021). Their fo cus is on the oﬄine estimation problem instead of the dynamic assortment and pricing decision problem, and they typically imp ose restrictiv e parametric forms on the arriv al rate. In contrast, we focus on the dynamic decision problem, and our arriv al mo del is designed to b e ﬂexible and expressive: we only need to iden tify a set of basis functions on assortment-pricing such that the log arriv al rate is linear in these bases. Recen tly , F erreira and Mo wer (2023) prop ose a demand learning and dynamic pricing algorithm for v arying assortmen ts in each accoun ting p erio d. They also use a P oisson mo del for customer arriv als, assuming the arriv al rate is an unknown absolute constan t. They adopt a learn-then-earn approac h: the ﬁrst learning stage (“price to learn”) learns the parameters in the c hoice mo del b y oﬀering prices that maximize the expected information gain, i.e., b y maximizing the determinan t of the Fisher information matrix; and the second stage (“price to earn”) c ho oses prices in a greedy fashion to maximize the exp ected rev en ue. In eac h p erio d of the second stage, after observing rew ards, both the choice and arriv al models are updated using all a v ailable data. They demonstrate the eﬀectiv eness of their algorithm through a con trolled ﬁeld exp eriment with an industrial partner b y b enchmarking it against their baseline policies, but do not provide regret b ounds. Our setting and results diﬀer in sev eral resp ects. First, we allow the P oisson arriv al rate to dep end on the assortmen t-pricing decision, whereas they assume the arriv al rate to b e ﬁxed. Second, our metho d mak es join t assortment-pricing decisions, while theirs optimizes prices for v arying assortmen ts that are giv en at the b eginning of each p erio d. Finally , we pro v e that our algorithm is nearly optimal. 1.2 Notation W e use bold lo w ercase lette rs denote v ectors (e.g., a ) and b old upp ercase letters denote matrices (e.g., A ). The Euclidean norm of a is ∥ a ∥ 2 . F or a matrix A , its op erator (sp ectral) norm is ∥ A ∥ op := sup ∥ x ∥ 2 =1 ∥ Ax ∥ 2 . F or a symmetric matrix A , let λ min ( A ) and λ max ( A ) denote its smallest and largest eigenv alues, resp ectiv ely . F or matrices A , B of the same dimension, we write A ⪯ B if B − A is positive semideﬁnite, and A ≺ B if B − A is positive deﬁnite. F or an y symmetric p ositiv e deﬁnite matrix A , deﬁne the A -weigh ted norm b y ∥ x ∥ A := √ x ⊤ Ax . F or an integer N ≥ 1, let [ N ] := { 1 , 2 , . . . , N } . W e use 1 {·} for the indicator function. W e adopt the standard Landau sym b ols O ( · ) and Ω( · ) to denote asymptotic upp er and exact b ounds, resp ectiv ely . W e use Poi for the P oisson arriv al model and MNL for the multinomial logit (MNL) choice mo del. 7 1.3 Outline The remainder of the pap er is organized as follows. In Section 2, w e formally describ e the general- ized MNL mo del with an unkno wn and time-sensitive customer arriv al Poisson pro cess. Section 3 presen ts our PMNL algorithm for demand learning and dynamic joint assortmen t-pricing decision- making. Section 4 pro vides the regret b ound and a matc hing low er b ound (up to log ( T )) for our algorithm. In Section 5, we ev aluate the p erformance of our algorithm and compare it with other existing algorithms. Section 6 concludes. Details of the pro ofs are deferred to the App endix. 2 Problem F orm ulation 2.1 Choice Mo del with P oisson Arriv al Consider a retailer selling N a v ailable pro ducts, indexed b y j ∈ [ N ]. The retailer mak es assortmen t- pricing decisions ov er a horizon of T p erio ds, indexed by t ∈ [ T ]. The time p erio d can b e of an y predetermined gr anularity (e.g., b y da y , w eek, month, or by the arriv al of one customer). A t the start of eac h time p erio d t , the retailer observes a set of d z -dimensional pro duct features z t = { z j t } N j =1 where z j t ∈ R d z . Using the features and past observ ations up to time t , the retailer oﬀers an assortment S t ∈ S ⊂ [ N ] under the cardinality constraint | S t | = K with prices p t = ( p j t ) N j =1 ∈ P ⊂ R N ++ where R N ++ represen ts the set of positive real num b ers in N dimensions. The retailer then observes n t customers arriving during the time p erio d where eac h customer i ∈ [ n t ] either purc hases one item from the assortment, i.e., C ( i ) t ∈ S t , or do es not purchase, i.e., C ( i ) t = 0. If pro duct j ∈ S t is chosen, the retailer earns reven ue r i = p j t ; otherwise, if C ( i ) t = 0, then r i = 0. Note that the assortment and prices are ﬁxed within eac h p erio d and only c hange across p erio ds, whic h reﬂects the practice of many retailers, who prefer to adjust decisions at regular interv als, as discussed previously . T o capture b oth customer arriv als and their subsequent purchasing b eha vior, w e com bine a P oisson arriv al mo del with the c hoice mo del describ ed b elow. Poisson arrival mo del. In each p erio d t , the num b er of customers arriving n t follo ws a Poisson distribution with mean arriv al rate Λ t : n t ∼ P oisson(Λ t ) (1) where Λ t = Λ λ t with Λ ∈ R ++ b eing a known p ositive base arriv al rate, which dep ends on the predetermined granularit y , and λ t captures the arriv al rate p er unit time, which dep ends on the curren t assortmen t S t and the price vector p t . The base arriv al rate Λ acts as a scaling factor that adjusts for the length of time interv als. F or instance, if the retailer wan ts to c hange the gran ularity from da y to w eek, w e can simply adjust the base arriv al rate b y m ultiplying sev en. As noted b efore, most existing literature fails to consider the dep endence of customer arriv al on the assortment/prices and often assumes a ﬁxed arriv al rate, often normalized to one, i.e., λ t = 1, 8 whic h ma y lead to less proﬁtable decisions. T o account for the dependency , we mo del the arriv al rate to explicitly dep end on the current assortmen t S t and prices p t . In particular, w e assume that the unit arriv al rate λ t tak es a log-linear form λ t := λ ( S t , p t ; θ ∗ ) = exp( θ ∗⊤ x ( S t , p t )) (2) where x ( S t , p t ) is a set of suﬃcient statistics that fully captures the dep endence of arriv al rate on the assortmen t and prices and θ ∗ ∈ R d x is the unkno wn parameter of dimension d x . Without loss of generality , w e assume span( { x ( S t , p t ) | S ∈ S , p ∈ P } ) is full rank with rank d x (see more discussions on the rank in App endix EC.2). Suc h a log-linear form is simple yet ﬂexible. It is commonly-used in Poisson regression to mo del the relationships b etw een predictor and count outcomes (Brown 1986, Wink elmann 2008). The set of suﬃcient statistics x ( S t , p t ) can b e ﬂexibly deﬁned based on the sp eciﬁc context. F or example, x ( S t , p t ) can include terms that capture v arious asp ects of assortmen t and pricing structure: the inheren t attractiv eness of individual pro ducts, price eﬀects, pairwise interactions b etw een items, and when av ailable pro duct features. W e sho wcase t w o sp eciﬁc forms of λ t in the following remark. Remark 1. (Two examples of λ t ) Our prop osed succinct form of the arriv al rate is ﬂexible enough to incorporate economic principals in the literature. F or example, one can consider the follo wing form to accoun t for price sensitivit y and pro duct v ariet y: λ t := Y i ∈ S t  p i p h  − α i = exp − X i ∈ S t α i log  p i p h  ! , (3) where p h denotes the highest feasible price. The price sensitivity is mo deled through the negativ e dep endence of λ t on p it , with low er prices attracting more customers, and the magnitude of this eﬀect gov erned by item-sp eciﬁc α i > 0. The summation captures assortmen t v ariety , as oﬀering more pro ducts can attract higher customer arriv als, which aligns with the marketing literature on the p ositiv e impact of pro duct v ariety on customer attraction (Lancaster 1990, Kahn 1995). This simple model is eﬀ ectiv e in capturing the ov erall impact of assortmen t and price, but ignores dep endencies b etw een pro ducts, suc h as complementarit y or substitution. An alternativ e form that incorp orates these dep endencies is given by λ t := exp    X i ∈ S t α i p it + X i,j ∈ S t i  = j β ij p it p j t    (4) where the ﬁrst sum captures the individual eﬀects and the second sum captures the pairwise eﬀects. The individual eﬀects account for the inherent utility of each pro duct, represented by α i , and the price sensitivity . This summation also reﬂects the pro duct v ariet y eﬀect (Kahn 1995), as the arriv al rate increases with the inclusion of more pro ducts in the assortment. 9 The second sum captures the eﬀect of relativ e prices of item pairs. When β ij is p ositiv e, the pair- wise eﬀect is complemen tary , meaning that the presence of b oth items in the assortmen t increases the arriv al rate, and the eﬀect is more pronounced if the price of product i is high relativ e to pro duct j , i.e., when p it p j t is large. This complemen tary eﬀect enables interesting dynamics (W ang 2021). Including a new pro duct in the assortmen t may cannibalize the existing pro ducts, y et the total sales of the existing pro ducts ma y increase if the arriv al rate increases largely enough. On the other hand, reducing the price of a pro duct ma y allo w other pro ducts to take a “free-rider” adv antage, b eneﬁting from the higher arriv al rate and p otentially b o osting their total sales. The summation also captures a price v ariety eﬀect: higher price v ariation within the assortmen t, as represen ted by the price ratios, can lead to a higher arriv al rate. When β ij is negative, the pairwise eﬀect b ecomes substitutable, me aning that the presence of b oth items in the assortmen t reduces the arriv al rate. This substitutiv e eﬀect can arise from customer p erceptions of redundancy and c hoice ov erload. Sp eciﬁcally , when similar pro ducts ha ve large price diﬀerences, customers may b ecome skeptical of the pricing strategy , p otentially ero ding trust and leading to reduced arriv als. Remark 2. (Exogenous factors for λ t ) F or simplicity , we fo cus on the case where λ t dep ends only on the assortment and pricing; ho w ev er, our model can b e extended to incorp orate other exogenous factors that inﬂuences arriv als, such as macro economic conditions and seasonal eﬀects. MNL choic e mo del. During p erio d t , eac h customer i ∈ [ n t ] either purchases a pro duct or makes no purc hase, i.e., C ( i ) t ∈ S t ∪ { 0 } , according to an MNL model. Sp eciﬁcally , the probability of customer i c ho osing pro duct j is given b y q ( j, S t , p t , z t ; v ∗ ) = P ( C ( i ) t = j | S t , p t , z t ; v ∗ ) =        exp( v ∗⊤ z j t − p j t ) 1 + P k ∈ S t exp( v ∗⊤ z kt − p kt ) , ∀ j ∈ S t ; 1 1 + P k ∈ S t exp( v ∗⊤ z kt − p kt ) , j = 0 , (5) where v ∗ ∈ R d z are unkno wn preference parameters that characterize the impact of pro duct features on the intrinsic v alue of the pro ducts. W e allow the pro duct features to c hange ov er time since pro duct features, suc h as ratings and popularity scores, are not static and can change ov er time, thereb y allo wing the utilit y of pro ducts to evolv e o v er time. F or notation simplicity , when there is no am biguity , w e use q t ( j ; v ∗ ) to denote the choice probability of pro duct j for j ∈ S t and the non-purc hase probabilit y when j = 0 at time t . Under the c hoice mo del, the exp ected reven ue of each customer i is r ( S t , p t , z t ; v ∗ ) = E " X j ∈ S t p j t 1 ( C ( i ) t = j | S t , p t , z t ; v ∗ ) # = X j ∈ S t p j t q t ( j ; v ∗ ) . (6) 10 Then the exp ected reven ue at time p erio d t across all n t customers is giv en b y R t ( S t , p t ) = R ( S t , p t , z t ; v ∗ , θ ∗ ) = E [ n t r ( S t , p t , z t ; v ∗ ) ] = Λ λ ( S t , p t ; θ ∗ ) X j ∈ S t p j t q t ( j ; v ∗ ) , (7) whic h is the exp ected num b er of customers multiplied b y the exp ected reven ue of each customer. 2.2 Retailer’s Ob jectiv e and Regret The ob jective of the retailer is to design a p olicy π that chooses a sequence of history-dep endent actions ( S 1 , p 1 , S 2 , p 2 , . . . , S T , p T ) so as to maximize the exp e cte d cumulative r evenue o ver T p erio ds E π h P T t =1 R t ( S t , p t ) i . F ormally , a p olicy is a sequence of (sto c hastic) functions π = { π t } T t =1 , where eac h π t maps a h istory of actions and observ ed outcomes up to time t to the assortment and pricing decision at time t in a sto chastic sense, i.e., π t : H t → ( S t , p t ), where H t represen ts the history up to time t , and is deﬁned as H t =  C (1) 1 , C (2) 1 , . . . , C ( n 1 ) 1 , . . . , C (1) t − 1 , . . . , C ( n t − 1 ) t − 1 , S 1 , . . . , S t − 1 , p 1 , . . . , p t − 1 , z 1 , . . . , z t  . (8) Note that π t can b e sto chastic in the sense that its action output has randomness, i.e., π ( H t ) is a random v ariable. Given a p olicy π , we use P π {·} and E π {·} to denote the probabilit y measure and exp ectation if we tak e actions follo wing p olicy π . If the parameters asso ciated with the arriv al mo del θ ∗ and the choice model v ∗ w ere known a priori, then the retailer could choose an optimal assortmen t S ∗ t ∈ S and prices p ∗ t ∈ P that max- imizes the exp ected rev en ue (7) for eac h p erio d, i.e., ( S ∗ t , p ∗ t ) := arg max S, p R t ( S, p ). This optimal solution yields an optimal cum ulativ e reven ue o ver time horizon T : P T t =1 R t ( S ∗ t , p ∗ t ), or equiv a- len tly , P T t =1 Λ λ ( S ∗ t , p ∗ t ; θ ∗ ) r ( S ∗ t , p ∗ t , z t ; v ∗ ). This optimal v alue is not attainable because ( θ ∗ , v ∗ ) is unkno wn in practice, but it serv es as a useful b enchmark for p erformance of any algorithm. Using this b enc hmark, we e v aluate a p olicy π b y cumulative r e gr et —that is, the deﬁcit b etw een the exp ected cumulativ e rev en ue o v er the time horizon T of the optimal solution and that of π : R π ( T ) = ( E " T X t =1 λ ( S ∗ t , p ∗ t ; θ ∗ ) r ( S ∗ t , p ∗ t , z t ; v ∗ ) # − E π " T X t =1 λ ( S t , p t ; θ ∗ ) r ( S t , p t , z t ; v ∗ ) #) Λ . (9) T o estimate the unknown parameters ( θ ∗ , v ∗ ), we need to design an algorithm that simultaneously learns the P oisson arriv al mo del and the MNL choice model on the ﬂy (exploration) as well as min- imizing the cumulativ e reven ue (exploitation). This exploration–exploitation problem for dynamic assortmen t and pricing, to whic h w e refer as Poisson-MNL b andit , is our fo cus. 11 2.3 Assumptions Before presen ting the algorithm, we ﬁrst state sev eral assumptions. These assumptions are mild and common for MNL mo dels, which w e will discuss in detail b elow. Importantly , our algorithm itself is still executable without these assumptions, and we exp ect it to b eha v e well even if some assumptions fail to hold. Assumption 1. The fe asible assortment set S c onsists of al l K -subsets of [ N ] . Assumption 2. The fe asible pric e ve ctor p ∈ P lies within the r ange [ p l , p h ] N for p ositive c on- stants 0 < p l < p h . Assumption 3. The MNL choic e mo del pr efer enc e p ar ameter v ∗ ∈ R d z satisﬁes || v ∗ || 2 ≤ ¯ v . Assumption 4. The Poisson arrival mo del p ar ameter θ ∗ ∈ R d x satisﬁes || θ ∗ || 2 ≤ 1 . Assumption 5. F or any fe asible assortment S ∈ S and any fe asible pric e p ∈ P , the suﬃcient statistic x ( S, p ) is sc ale d such that || x ( S, p ) || 2 ≤ ¯ x for some c onstant ¯ x > 0 . Assumption 6. The pr o duct fe atur e se quenc e { z t } T t =1 is i.i.d. sample d fr om an unknown distri- bution with a density µ , wher e ∥ z j t ∥ 2 ≤ 1 for e ach j ∈ [ N ] and t ∈ [ T ] . The distribution µ satisﬁes the fol lowing c ondition: we c an c onstruct a pr e-determine d se quenc e of assortments { S init s } t s =1 , such that for any t ≥ t 0 wher e t 0 = max n ⌈ log( d z T ) σ 0 (1 − log 2) ⌉ , 2 d x o , the fol lowing holds with pr ob ability at le ast 1 − T − 1 : σ min   t X s =1 X j ∈ S init s z j s z ⊤ j s   ≥ σ 0 t, (10) wher e σ 0 is a p ositive c onstant dep endent on µ . F urthermor e, ther e exists a p ositive c onstant σ 1 and a pric e se quenc e { p init s } t s =1 such that for any t ≥ t 0 σ min t X s =1 x ( S init s , p init s ) x ⊤ ( S init s , p init s ) ! ≥ σ 1 t. (11) Assumptions 1 and 2 imp ose standard feasibility conditions on the assortment and price sets as in the literature (Chen et al. 2020, 2021). Compared to Assumption 1, other related work considers a feasible set of up-to- K -pro duct ass ortmen t (Agra w al et al. 2019, Oh and Iyengar 2021). This assumption admits a larger feasible set and thus enlarges the set of legitimate algorithms. Our algorithm also w orks for this alternativ e feasible set, with the same theoretical guaran tee, and our low er b ound applies. Assumption 2 deﬁnes the feasible price range as [ p l , p h ]: the low er b ound t ypically reﬂects either the minimal currency unit or pro duct cost, and the upp er b ound comes from the idea of “c hok e price”, b eyond whic h the demand is eﬀectively zero. Assumption 3 assumes the b oundedness of parameter in the MNL c hoice mo del (5), which is standard in the con textual MNL-bandit literature (Chen et al. 2020, Oh and Iyengar 2021). By 12 assuming v ∗ is upp er b ounded b y ¯ v in ℓ 2 norm, w e enco de the common b elief that the pro duct features t ypically ha v e a b ounded inﬂuence on pro duct utility . Assumptions 4 and 5 assume b oundedness for the Poisson arriv al mo del (2): they b ound the inﬂuence of assortment-pricing decisions on the unit arriv al rate λ t and the ℓ 2 -norm of the v ector x ( S, p ) for all feasible assortment-pricing decisions. These assumptions, together with the base arriv al rate Λ, ensure that the arriv al rate Λ t is b ounded, reﬂecting the reality that the mark et size is inheren tly limited. Assumption 6 assumes that the product feature vector z j t for j ∈ [ N ] and t ∈ [ T ] are randomly generated from a compactly supp orted and non-degenerate density , with additional isotropic con- ditions (10)-(11) on z j t and x ( · , · ). These isotropic conditions ensure information av ailabilit y in all directions and are necessary for an algorithm to conv erge. Assumption 6 can b e easily satisﬁed and is w eaker than its counterpart in contextual MNL-bandit literature (Chen et al. 2021, Oh and Iy engar 2021), whic h w e state b elow. Assumption 7. The fe atur e ve ctors z j t for j ∈ [ N ] and t ∈ [ T ] is i.i.d. acr oss b oth j and t fr om an unknown distribution with density µ supp orte d on  z ∈ R d z : ∥ z ∥ 2 ≤ 1  . A dditional ly, the minimum eigenvalue of the exp e cte d c ovarianc e matrix E µ ( z z ⊤ ) is b ounde d b elow by a p ositive c onstant ¯ σ 0 . Assumption 7 is stronger than the MNL component of Assumption 6, but serv es the same purp ose. W e note that the corresp onding assumption in Chen et al. (2021) and Oh and Iy engar (2021) is solely for as sortmen t under an MNL mo del, without pricing and a Poisson arriv al mo del. Under Assumption 7, Assumption 6 holds with σ 0 = K ¯ σ 0 2 and we show the corresp onding sequence required b y Assumption 6 in App endix EC.3. 3 Algorithm In this section, w e describ e our PMNL algorithm for the P oisson-MNL bandit problem. Our algo- rithm in v olv es t w o stages. The ﬁrst stage fo cuses on pure exploration to obtain a goo d initial estimator for ( θ ∗ , v ∗ ), which w e refer to as the “pilot estimators” ( ˆ θ , ˆ v ), b y applying the maxim um lik eliho o d estimation (MLE) to data from the ﬁrst T 0 p erio ds. This stage yields a small conﬁdence region in which the true parameters lie with high probability and is practically very imp ortan t for the second stage to w ork well, though it do es not aﬀect the regret rate. The second stage builds on the upp er conﬁdence b ound (UCB) strategy , also known as optimism in the face of uncertaint y , to balance exploration and exploitation. Sp eciﬁcally , in eac h p erio d w e mak e the assortmen t-pricing decision that maximizes the upp er conﬁdence b ound of the exp ected rev enue based on the Fisher information, and then w e up date the parameters using a lo cal MLE around the pilot estimators. The full pro cedure is describ ed in Algorithm 1. 13 Algorithm 1: The PMNL algorithm for the Poisson-MNL bandit problem. Output : Assortmen t and prices ( S T 0 +1 , p T 0 +1 ) , . . . , ( S T , p T ). 2: Input : Time horizon T , feasible assortmen t set S , feasible price range P , parameters ¯ x, ¯ v, Λ , σ 0 , σ 1 , t 0 , and dimensions d z , d x ; Initialization : t = 1, t 0 = max n ⌈ log( d z T ) σ 0 (1 − log 2) ⌉ , 2 d x o , T 0 := max { t 0 + 1 , ⌊ log T ⌋} , c 0 deﬁned in Equation (26), τ θ and τ v in Equations (28) and (30), ω θ and ω v in Equations (33) and (35); 4: Stage 1: Pur e Explor ation with Glob al MLE while t ≤ T 0 do 6: Observ e pro duct features z t = { z j t } N j =1 ; Set assortmen t and pricing as S t = S init t and p t = p init t as in Assumption 6; 8: Observ e n t customer arriv als with their decisions C ( i ) t ∈ S t ∪ { 0 } for i = 1 , . . . , n t ; t ← t + 1; 10: end while Compute the pilot estimator using global MLE: 12: ˆ θ = arg max || θ || 2 ≤ ¯ v L Poi T 0 ( θ ); ˆ v = arg max || v || 2 ≤ 1 L MNL T 0 ( v ). Stage 2: L o c al MLE and Upp er Conﬁdenc e Bound (UCB) 14: for t = T 0 + 1 to T do Observ e pro duct features z t = { z j t } N j =1 ; 16: Compute lo cal MLE: b θ t − 1 ∈ arg max || θ − ˆ θ || 2 ≤ τ θ L Poi t − 1 ( θ ); b v t − 1 ∈ arg max || v − ˆ v || 2 ≤ τ v L MNL t − 1 ( v ); 18: F or every assortment S ∈ S and price p ∈ P , compute the upp er conﬁdence b ound of exp ected rev enue: ¯ R t ( S, p ) := Λ λ ( S , p ; ˆ θ t − 1 ) r ( S, p , z t ; ˆ v t − 1 ) + p h min ( Λ( e ¯ x − e − ¯ x ) , r Λ e (2 τ θ +1) ¯ x ω θ    I Poi − 1 2 t − 1 ( b θ t − 1 ) M Poi t ( b θ t − 1 | S, p ) I Poi − 1 2 t − 1 ( b θ t − 1 )    op ) + Λ e ¯ x min  p h , r c 0 ω v    b I MNL − 1 2 t − 1 ( b v t − 1 ) c M MNL t ( b v t − 1 | S, p ) b I MNL − 1 2 t − 1 ( b v t − 1 )    op  ; Compute and set ( S t , p t ) := arg max S ∈ S , p ∈P ¯ R t ( S, p ); 20: Observ e n t customer arriv als with their decisions C ( i ) t ∈ S t ∪ { 0 } for i = 1 , . . . , n t ; end for 14 As noted in the in tro duction, developing the algorithm is not a routine adaptation of standard MNL bandit algorithms and is technically more challenging. The Poisson arriv al and MNL choice mo dels are entangled, so the algorithm must join tly account for uncertaint y from arriv als and c hoices. In particular, constructing estimators for ( θ ∗ , v ∗ ) and, crucially , their corresp onding upper b ounds for the UCB strategy requires new technical to ols. The main diﬃculties include: 1. The Poisson arriv al mo del pro duces unbounded and non-sub-Gaussian random v ariables, whic h mak es the mec hanisms commonly used for deriving the error bounds in assortmen t problems inapplicable (Oh and Iyengar 2021, Abbasi-Y adkori et al. 2011) and thus requiring new to ols. W e address this problem b y utilizing concentration inequalities for martingales with incremen ts satisfying Bernstein conditions (W ainwrigh t 2019). 2. Arriv al randomness aﬀects the n umber of observ ed purchases, so the observ ations used to learn c hoice mo del parameters mix both arriv al and c hoice mo del randomness, making it more diﬃcult to construct an estimator and its error b ound for the choice mo del. W e tac kle this c hallenge by designing and analyzing statistics whose randomness primarily comes from the c hoice mo del rather than the arriv al mo del. 3. The unkno wn parameters in the Poisson arriv al mo del inevitably enter the error b ounds for the MNL mo del estimators, whereas the algorithm requires conﬁdence b ounds that are free of unknown quan tities. W e resolv e this problem b y further b ounding the error b ound with a purely data-dep enden t statistic. 4. As error b ounds of the estimators enter UCB t yp e algorithms, their constants also matters for practical p erformance. Rather than relying on standard analytical techniques, w e carry out a reﬁned analysis and obtain b etter constants. In what follo ws, w e ﬁrst in tro duce the likelihoo d function and then describ e the tw o stages. 3.1 Lik eliho o d F unction Both stages of our algorithm rely on MLE to estimate the unknown parameters ( θ ∗ , v ∗ ). W e therefore ﬁrst sp ecify the lik eliho o d function. The likelihoo d function at p erio d t can b e written as t Y s =1    P ( Poi (Λ λ ( S s , p s ; θ )) = n s ; θ ) n s Y i =1 Y j ∈ S s ∪{ 0 } q s ( j ; v ) 1 n C ( i ) s = j o    . T o obtain the maxim um lik eliho o d estimator, we usually maximize the log-lik eliho o d function instead. Sp eciﬁcally , it decomp oses into the sum of the log-likelihoo ds of the Poisson arriv al mo del and the MNL c hoice mo del, as follows: L t ( θ , v ) := L Poi t ( θ ) + L MNL t ( v ) = t X s =1 ℓ Poi s ( θ ) + t X s =1 ℓ MNL s ( v ) (12) 15 where ℓ Poi s ( θ ) := n s log λ ( S s , p s ; θ ) − Λ λ ( S s , p s ; θ ) + n s log Λ − log n s ! , (13) ℓ MNL s ( v ) := n s X i =1 X j ∈ S s ∪{ 0 } 1 { C ( i ) s = j } log q s ( j ; v ) . (14) 3.2 Stage 1: Pure Exploration for Pilot Estimators via Global MLE The ﬁrst stage of the algorithm is to obtain a pair of the pilot estimators ( ˆ θ , ˆ v ) that are close to the true parameters ( θ ∗ , v ∗ ) at the end of the ﬁrst T 0 exploration p erio ds. W e b egin with setting the length of this initial stage as T 0 = max { t 0 + 1 , ⌊ log T ⌋} , where t 0 is deﬁned in Assumption 6. Note that T 0 is a v ery small n umber, meaning that this stage is v ery short and thus incurring only a small exploration cost. Keeping the exploration p erio ds short is critical to ﬁrms, particularly for start-ups or businesses in comp etitiv e mark ets, as substan tial initial losses can p oten tially jeopardize surviv al b efore they can leverage the insigh ts to generate reven ue. Next, w e make assortment-pricing decisions according to the initial sequence { S init t , p init t } T 0 t =1 from Assumption 6. When the stronger condition Assumption 7 holds, we can take the initial sequence as in App endix EC.3. In practice, one can let x b e a vector of pro duct dummies and prices, and construct the initial sequence b y uniformly selecting an assortment and prices. There also exist more reﬁned pro cedures to ensure Assumption 6 holds. With an initial sequence { S init t , p init t } T 0 t =1 and the observ ations of customer arriv als and their c hoices, we pro ceed to estimate the pilot estimators ( ˆ θ , ˆ v ) b y maximizing the log-lik eliho o d of the ﬁrst stage, L T 0 ( θ , v ). Giv en that the log-likelihoo d decomp oses into L Poi T 0 ( θ ) and L MNL T 0 ( v ), we can obtain the pilot estimator ( ˆ θ , ˆ v ) by simply maximizing their corresp onding log-likelihoo ds, i.e., ˆ θ := arg max || θ || 2 ≤ 1 L Poi T 0 ( θ ) , ˆ v := arg max || v || 2 ≤ ¯ v L MNL T 0 ( v ) . (15) In Section 4.2.1, we show that the pilot estimators are close to the true parameters ( θ ∗ , v ∗ ). In particular, with high probability , the estimation error ∥ ˆ θ − θ ∗ ∥ 2 ≤ τ θ and ∥ ˆ v − v ∗ ∥ 2 ≤ τ v , where τ θ and τ v are deﬁned in Equations (28) and (30), resp ectively , and b oth are of order O ( log T T 0 ) = O (1). These b ounds serve as the basis for the lo cal MLE and the conﬁdence b ounds in the second stage. 3.3 Stage 2: Lo cal MLE and Upp er Conﬁdence Bound (UCB) In the second stage, we mak e assortment-pricing decis ions leveraging the idea of UCB to balance exploration and exploitation, and up dates the estimates using lo cal MLE, iteratively . Given the pilot estimator ( ˆ θ , ˆ v ) with their error b ounds τ θ and τ v , at each p erio d t > T 0 , we obtain the lo cal MLE b θ t − 1 (resp ectiv ely , b v t − 1 ) by maximizing the log-lik eliho o d L Poi t − 1 ( θ ) (respectively , L MNL t − 1 ( v )) within a ball centered at the pilot estimator ˆ θ (resp ectiv ely , ˆ v ) with a radius τ θ (resp ectiv ely , τ v ): b θ t − 1 := arg max || θ − ˆ θ || 2 ≤ τ θ L Poi t − 1 ( θ ) , b v t − 1 := arg max || v − ˆ v || 2 ≤ τ v L MNL t − 1 ( v ) . (16) 16 Giv en ( b θ t − 1 , b v t − 1 ) and the pro duct features z t , we make an assortment-pricing decision ( S t , p t ) that maximizes an upp er conﬁdence b ound of the exp ected reven ue R t ( S, p ). The upp er conﬁdence b ound dep ends on b oth the estimation and the estimation error of the P oisson arriv al and MNL choice parameters, where the estimation error can b e captured by the Fisher information of θ and v , deﬁned as follows. Definition 1. F or the Poisson arriv al mo del, we deﬁne the Fisher information with resp ect to θ for t p erio ds, I Poi t ( θ ), given history { H s } t s =1 as follo ws: I Poi t ( θ ) := t X s =1 M Poi s ( θ ) , where M Poi s ( θ ) := E  − ∇ 2 θ ℓ Poi s ( θ )   H s  = Λ λ ( S s , p s ; θ ) · x ( S s , p s ) x ( S s , p s ) ⊤ . (17) Definition 2. F or the MNL c hoice mo del, we deﬁne the Fisher information with resp ect to v for t p erio ds, I MNL t ( v ), given history { H s } t s =1 as follo ws: I MNL t ( v ) := t X s =1 M MNL s ( v ) , where M MNL s ( v ) := E  − ∇ 2 v ℓ MNL s ( v )   H s  =Λ λ ( S s , p s ; θ ∗ ) X j ∈ S s q s ( j ; v ) z j s z ⊤ j s − X j,k ∈ S s q s ( j ; v ) q s ( k ; v ) z j s z ⊤ ks ! . (18) Note that in Deﬁnitions 1 and 2, the assortmen t and pricing decisions S s = π ( H s ) and p s = π ( H s ), where the p olicy π is an algorithm to b e designed to mak e decisions S s and p s based on the history H s . At p erio d t , given the history H t (whic h includes pro duct features z t ), we need to decide S and p . T o emphasize the dep endence of the Fisher information on diﬀeren t c hoices of S and p at p erio d t , we write their summands as functions of ( S, p ) for S ∈ S and p ∈ P : M Poi t ( θ | S, p ) := Λ λ ( S, p ; θ ) x ( S, p ) x ⊤ ( S, p ) , (19) M MNL t ( v | S, p ) := Λ λ ( S, p ; θ ∗ ) X j ∈ S q ( j, S, p , z t ; v ) z j t z ⊤ j t − X j,k ∈ S q ( j, S, p , z t ; v ) q ( k , S, p , z t ; v ) z j t z ⊤ kt ! | {z } ϕ ( S, p , z t ; v ) . (20) Clearly , M MNL s ( v ) = M MNL s ( v | S s , p s ). T o translate the estimation errors of b θ t − 1 and b v t − 1 in to the estimation error of the rew ard, t w o quan tities are in v olv ed: I Poi − 1 2 t − 1 ( b θ t − 1 ) M Poi t ( b θ t − 1 | S, p ) I Poi − 1 2 t − 1 ( b θ t − 1 ) and I MNL − 1 2 t − 1 ( b v t − 1 ) M MNL t ( b v t − 1 | S, p ) I MNL − 1 2 t − 1 ( b v t − 1 ), which change the weigh ting matrix of the norm in which the error b ounds of the estimators can b e deriv ed. Since the latter in v olv es the unkno wn ground truth θ ∗ as in Equation (20), w e cannot com pute it directly . F ortunately , w e manage to con- struct a data-dependent b ound based on the boundedness assumptions 4 and 5, whic h implies exp {− ¯ x } ≤ λ ( S, p ; θ ∗ ) = exp { θ ∗ ⊤ x ( S, p ) } ≤ exp { ¯ x } . 17 Definition 3. The b ounds of I MNL t ( v ) and M MNL t ( v | S, p ) are deﬁned as follows: b I MNL t ( v ) := t X s =1 Λ exp {− ¯ x } ϕ ( S s , p s , z s ; v ) , c M MNL t ( v | S, p ) := Λ exp { ¯ x } ϕ ( S, p , z t ; v ) . (21) As ϕ ( S, p , z ; v ) is p ositive semideﬁnite, the following lemma is straightforw ard. Lemma 1. The fol lowing ine qualities hold: b I MNL t ( v ) ⪯ I MNL t ( v ) , c M MNL t ( v | S, p ) ⪰ M MNL t ( v | S, p ) . (22) No w, based on our estimator b θ t − 1 and b v t − 1 , given the newly observed pro duct features z t , we construct the follo wing upp er conﬁdence b ound for exp ected reven ue for every S ∈ S and p ∈ P : ¯ R t ( S, p ) := Λ λ ( S , p ; b θ t − 1 ) r ( S, p , z t ; b v t − 1 ) (23) + p h min ( Λ( e ¯ x − e − ¯ x ) , r Λ e (2 τ θ +1) ¯ x ω θ    I Poi − 1 2 t − 1 ( b θ t − 1 ) M Poi t ( b θ t − 1 | S, p ) I Poi − 1 2 t − 1 ( b θ t − 1 )    op ) (24) + Λ e ¯ x min  p h , r c 0 ω v    b I MNL − 1 2 t − 1 ( b v t − 1 ) c M MNL t ( b v t − 1 | S, p ) b I MNL − 1 2 t − 1 ( b v t − 1 )    op  . (25) The ﬁrst term corresp onds to the estimated rev enue and the last tw o terms (24) and (25) represent the estimation error of the rew ard induced by the estimation errors of b θ t − 1 and b v t − 1 resp ectiv ely . Sp eciﬁcally , the term (24) primarily dep ends on ω θ —the upper b ound of ∥ b θ t − 1 − θ ∗ ∥ I Poi t − 1 ( b θ t − 1 ) deﬁned in Equation (33)—and the upp er b ound of the op erator norm of the matrix that changes the weigh ted norm ∥ · ∥ I Poi t − 1 ( b θ t − 1 ) to ∥ · ∥ M Poi t ( b θ t − 1 | S, p ) . The term (25) primarily dep ends on ω v —the upp er b ound of ∥ b v t − 1 − v ∗ ∥ I MNL t − 1 ( b v t − 1 ) deﬁned in Equation (35)—and the u pp er bound of the operator norm of the matrix that c hanges the weigh ted norm ∥ · ∥ I MNL t − 1 ( b v t − 1 ) to ∥ · ∥ M MNL t ( b v t − 1 | S, p ) . The constan t c 0 is deﬁned as: c 0 = ( p h − p l ) 2 Λ exp( − ¯ v ) [ 3(exp(4 τ v ) − 1)( K exp( ¯ v − p l ) + 1) + 1 ] . (26) 4 Regret Analysis W e no w turn to the theoretical analysis of our procedure, beginning in Section 4.1 with our main theorem on the regret b ound, follo wed by Section 4.2 dev oted to pro ofs of the regret b ound, and concluding with a matc hing lo w er b ound in Section 4.3. 4.1 Regret analysis W e b egin by stating a non-asymptotic b ound on the exp ected cum ulative regret incurred by Algo- rithm 1 of O ( √ T log T ). 18 Theorem 1. F or any Poisson-MNL b andit pr oblem under Assumptions 1 to 6, ther e ar e univer- sal p ositive c onstants c 1 , c 2 , c 3 such that for al l T ≥ max n 4 + 2 log( d z )+1 σ 0 (1 − log 2) − 2 log( σ 0 (1 − log 2)) σ 0 (1 − log 2) , 2 d x + 1 o , the exp e cte d cumulative r e gr et of Algorithm 1 is b ounde d as R π ( T ; v , θ ) ≤ c 1 + c 2 d z p T log T + c 3 d x p T log T , (27) wher e c 1 dep ends only on Λ , ¯ x, p h ; c 2 only on Λ , ¯ x, ¯ v , σ 0 , p l , p h ; and c 3 only on Λ , ¯ x, σ 1 , p h . Remark 3. Theorem 1 pro vides a non-asymptotic regret b ound that holds uniformly for almost all T . The regret b ound giv es an order of ( d z + d x ) √ T log T , which is nearly optimal, compared with the low er b ounds pro vided in Section 4.3. Note that the condition on T is very mild—the threshold only dep ends on the dimensions and σ 0 (i.e., of the order 1 σ 0 log ( d z /σ 0 ) + d x ). This condition essen tially requires T ≥ T 0 , where T 0 is deﬁned in Algorithm 1 as T 0 := max {⌈ log( d z T ) σ 0 (1 − log 2) ⌉ + 1 , 2 d x + 1 , ⌊ log T ⌋} . If T is smaller than the threshold, we will b e sta ying in Stage 1 the entire time, and the regret b ound will b e of the order T 0 = O (log ( d z T ) + d x ). Remark 4. Note that when the arriv al rate is a kno wn constan t, it is equiv alent to d x = 0, whic h gives the regret b ound O ( d z √ T log T ). If we further require p l = p h = p > 0, our problem reduces to the dynamic assortmen t problem, and our rate is faster than the state-of-the-art rate for dynamic assortmen t O ( d z √ T log T ) (Chen et al. 2020, Oh and Iyengar 2021). 4.2 Pro of Sk etch W e pro vide a proof sk etch of Theorem 1 in this section. F or details of the pro of, please see App endix EC.5 to EC.12. The pro of consists of four ma jor steps: 1. Bounding the parameter estimation errors. In Section 4.2.1, w e establish the high- probabilit y b ounds on the estimation error for b oth the MLE pilot estimators ( ˆ θ and ˆ v ) in Stage 1 and the lo cal MLE estimators ( b θ t and b v t ) in Stage 2. 2. Bounding the arriv al rate estimation error. In Section 4.2.2, we b ound the diﬀerence b et w een the estimated arriv al rate and the true arriv al rate at p erio d t with a high probability of 1 − O (1 /T ). 3. Bounding the exp ected per-customer rev en ue error. In Section 4.2.3, we b ound dif- ference b etw een the estimation of exp ected p er-customer reven ue with truth at p erio d t with a high probabilit y of 1 − O (1 /T ). 4. Bounding the exp ected regret. Combining the results ab ov e, in Section 4.2.4 we show that with high probabilit y , the gap b etw een the reward of the optimal assortmen t-pricing decision and that of our p olicy at each time is upp er-b ounded by the gap b etw een our upp er conﬁdence b ound and the true rew ard, uniformly . Bounding the summation of the latter gaps and accoun ting for the lo w-probabilit y failure ev en t yields the regret b ound. 19 Before going in to details, we review t wo classes of random v ariables: sub-Gaussian random v ariables (Deﬁnition 4) and random v ariables satisfying the Bernstein c ondition (Deﬁnition 5) (cf. W ainwrigh t 2019, Chapter 2). In standard MNL-bandit problems, the log-lik eliho o d or its gradien t of eac h time p erio d is typically sub-Gaussian (Abbasi-Y adk ori et al. 2011, Chen et al. 2020) follow ed from the b oundedness of the reward, which makes it easier to b ound the full log-likelihoo d (or its gradien t) thanks to the thinner tail of the summands. How ever, in our mo del, the sub-Gaussian prop ert y no longer holds due to the additional P oisson pro cess, thereby requiring a more careful analysis. It turns out that the log-lik eliho o d of each time p erio d in our case instead satisﬁes the Bernstein condition. Definition 4 (Sub-Gaussian). A random v ariable X with mean µ = E [ X ] is sub-Gaussian if there exists a p ositive constan t σ such that E  e c ( X − µ )  ≤ e σ 2 c 2 / 2 for all c ∈ R . Definition 5 (Bernstein Condition). A random v ariable X with mean µ = E [ X ] is said to satisfy the Bernstein condition if there exist constants V > 0 and c > 0 such that for all integers k ≥ 2: E  | X − µ | k  ≤ k ! 2 V c k − 2 . 4.2.1 Bounding the Parameter Estimation Errors. W e ﬁrst establish the high- probabilit y estimation error b ounds for the MLE pilot estimator b θ and b v from Stage 1 (pure exploration) in Lemmas 2 and 3, and then pro vide the high-probability error b ounds for the lo cal MLE b θ t and b v t from Stage 2 in Lemmas 4 and 5. The pro ofs and technical details are deferred to App endix Sections EC.5 and EC.9 as well as Sections EC.6 and EC.10, resp ectively . Lemma 2. With pr ob ability at le ast 1 − 2 T − 1 , the ℓ 2 err or of the pilot estimator b θ is b ounde d as: ∥ b θ − θ ∗ ∥ 2 ≤ τ θ . Her e, τ θ = min { 1 , ˜ τ θ } , wher e ˜ τ θ is b ounde d as: ˜ τ 2 θ ≤ 2 exp(2 ¯ x ) T Λ σ 1 2 + 4 log( T ) Λ exp( ¯ x ) T 0 + s 4 log( T ) Λ exp( ¯ x ) T 0 + exp( − ¯ x ) ! + 8(2 ¯ xc 4 + 1) exp( ¯ x ) T 0 Λ σ 1 ( log T + d x log ( 3 ¯ x (Λ T + 1) )) . (28) Her e, c 4 is deﬁne d as c 4 = max    1 , 2 e 2 log 2  1 + 3 Λ exp( ¯ x )  √ 6 π Λ exp( − ¯ x )    2 e log  1 + 3 Λ exp( ¯ x )  . (29) 20 Lemma 3. With pr ob ability at le ast 1 − 3 T − 1 , the ℓ 2 err or of the pilot estimator b v is b ounde d as: ∥ b v − v ∗ ∥ 2 ≤ τ v . Her e, τ v = min { 1 , ˜ τ v } , wher e ˜ τ v is b ounde d as: ˜ τ 2 v ≤ 2 exp(2 ¯ x ) κT Λ σ 0 2 + 4 log( T ) Λ exp( ¯ x ) T 0 + s 4 log( T ) Λ exp( ¯ x ) T 0 ! + 8 exp( ¯ x ) κT 0 Λ σ 0 ( ( d z + 1) log T + d z log(6Λ) ) + 8 exp( ¯ x ) κT 0 Λ σ 0 p T 0 Λ exp( ¯ x )(( d z + 1) log T + d z log(6Λ)) , (30) and κ = exp( − ¯ v − p h ) ( K exp( ¯ v − p l ) + 1) 2 . (31) Remark 5. Both τ 2 θ in Lemma 2 and τ 2 v in Lemma 3 are of the order O  log T T 0  . By our initial- ization T 0 = Ω(log T ), O  log T T 0  simpliﬁes to O (1). Recall that in Stage 2, i.e., for t = T 0 + 1 , . . . , T , w e obtain the lo cal MLEs b θ t and b v t within their feasible regions: || b θ t − b θ || 2 ≤ τ θ and || b v t − b v || 2 ≤ τ v . While this constan t shrink age of the feasible regions may app ear less meaningful, it actually plays a crucial role b ecause the estimation errors of the lo cal MLEs scale exp onen tially with the radii τ θ and τ v . Consequently , this shrink age leads to b oth a substan tial impro v emen t in algorithmic p erformance and a substantial reduction in the regret. Remark 6. The constant κ in Lemma 3 dep ends only on the num b er of pro ducts in the assort- men t and the b oundedness conditions. In the literature, it is common to introduce an additional abstract assumption that essen tially assumes the p ositive deﬁniteness of the Fisher information via such a constan t κ (see, e.g., Oh and Iyengar (2021, 2019)), or, equiv alently , 1 / Υ as used in Cheung and Simchi-Levi (2017). In our case, this positive deﬁniteness follows directly from the more concrete and natural b oundedness assumptions, with κ explicitly sp eciﬁed in Equation (31). In Stage 2, w e obtain the lo cal MLEs (16) within the feasible regions. With the radii of the feasible regions shrink from 1 and ¯ v to τ θ and τ v resp ectiv ely , w e ha v e the following tw o uniform error b ounds for the lo cal MLEs. Lemma 4. With pr ob ability 1 − 4 T − 1 , the fol lowing hold uniformly over al l t = T 0 , . . . , T − 1 :  b θ t − θ ∗  ⊤ I Poi t ( θ ∗ )  b θ t − θ ∗  ≤ ω θ ,  b θ t − θ ∗  ⊤ I Poi t  b θ t   b θ t − θ ∗  ≤ ω θ , ∥ b θ t − θ ∗ ∥ ≤ 2 τ θ , (32) wher e ω θ ≤ 8 e 2 τ θ ¯ x 1 2 + e ¯ x + r 2 e ¯ x log T T Λ + 4 log T T Λ + 2( τ θ ¯ xc 4 + 1)  2 log T + d log (6 τ θ ¯ x (Λ T + 1))  ! . (33) 21 Lemma 5. With pr ob ability 1 − 4 T − 1 , the fol lowing hold uniformly over al l t = T 0 , . . . , T − 1 : ( b v t − v ∗ ) ⊤ I MNL t ( v ∗ ) ( b v t − v ∗ ) ≤ ω v , ( b v t − v ∗ ) ⊤ I MNL t ( b v t ) ( b v t − v ∗ ) ≤ ω v , ∥ b v t − v ∗ ∥ ≤ 2 τ v , (34) wher e ω v ≤ 8 c 8 e ¯ x + 4 c 8 r 8 e ¯ x log T T Λ + 32 c 8 log T T Λ + 8(4 τ v c 4 + c 5 ) c 8 (( d z + 2) log T + d z log(6 τ v Λ)) , (35) c 5 = 16 τ 2 v 4 τ v + exp( − 4 τ v ) − 1 , (36) c 8 = 3(exp(4 τ v ) − 1)( K exp( ¯ v − p l ) + 1) + 1 . (37) Remark 7. Since b oth τ θ and τ v are of order O (1), b oth ω θ and ω θ are of order O (log T ). Note that the Fisher information I Poi t ( v ) and I MNL t ( v ) are the sums of t p ositive deﬁnite rank-1 matrices. These lemmas show that the b θ t and b v t increasingly concentrate around the ground truths θ ∗ and v ∗ as t increases. Remark 8. Since τ θ and τ v are of order O (1), both c 5 and c 8 are also constants. In particular, c 5 is small and almost tigh t, in con trast to existing analyses that t ypically yield an o verly conserv ativ e constan t of order exp (8 τ v ). Achieving this improv ement requires a careful analysis that is often omitted in the literature. Such an improv emen t also translates in to better performance of the algorithm as c 5 app ears directly in the conﬁdence b ound. See Remark EC.1 for details. Remark 9. As discussed in Remark 5, the dependence of ω θ on τ θ is in exp onential terms. Hence, the lo calization (or shrink age) in Stage 1 plays an important role in the constan ts, esp ecially when ¯ x is relatively large. Similarly , the dep endence of ω v on τ v is also exp onential (in c 8 ). 4.2.2 Bounding the Arriv al Rate Error. As noted at the end of Section 3.3, the estimation error of the exp ected rev en ue R ( S, p , z t ; θ ∗ , v ∗ ) of p erio d t dep ends on the estimation errors of the exp ected p er-customer rev en ue and the arriv al rate, b oth of which can b e analyzed based on the parameter estimation errors in the MNL choice mo del and the Poisson arriv al mo del discussed ab o v e. In this section, we fo cus on the error of arriv al rate; and we will discuss the error of the exp ected p er-customer reven ue in Section 4.2.3. Lemma 6. Supp ose Equation (32) in L emma 4 holds for t − 1 , then the fol lowing hold    λ ( S, p ; b θ t − 1 ) − λ ( S, p ; θ ∗ )    ≤ min  Λ( e ¯ x − e − ¯ x ) , r Λ e (2 τ θ +1) ¯ x ω θ          I Poi − 1 2 t − 1 ( b θ t − 1 ) M Poi t ( b θ t − 1 | S, p ) I Poi − 1 2 t − 1 ( b θ t − 1 )          op  | {z } η Poi t , (38)    λ ( S, p ; b θ t − 1 ) − λ ( S, p ; θ ∗ )    ≤ min  Λ( e ¯ x − e − ¯ x ) , r Λ e (2 τ θ +1) ¯ x ω θ          I Poi − 1 2 t − 1 ( θ ∗ ) M Poi t ( θ ∗ | S, p ) I Poi − 1 2 t − 1 ( θ ∗ )          op  | {z } η ∗ , Poi t . (39) 22 A dditional ly, we have          I Poi − 1 2 t − 1 ( b θ t − 1 ) M Poi t ( b θ t − 1 | S, p ) I Poi − 1 2 t − 1 ( b θ t − 1 )          op ≤ e 4 τ θ ¯ x          I Poi − 1 2 t − 1 ( θ ∗ ) M Poi t ( θ ∗ | S, p ) I Poi − 1 2 t − 1 ( θ ∗ )          op . (40) Remark 10. Lemma 6 holds deterministically . The key idea is to b ound the c hange in the arriv al rate given that the change in the parameter (from θ ∗ to b θ t − 1 ) is b ounded by known quantities. The detailed pro of is given in Section EC.7. Remark 11. Since Equation (32) in Lemma 4 holds uniformly for all t = T 0 , . . . , T − 1 with probabilit y at least 1 − 4 T − 1 , the error b ounds in Lemma 6 also hold uniformly for t = T 0 + 1 , . . . , T with 1 − 4 T − 1 . Remark 12. Equation (38) forms the basis for the arriv al-rate-induced conﬁdence bound in Equation (24): multiplied by p h , this data-driven quantit y uniformly comp ensates for the p otential deﬁcit in rew ard estimation due to the estimation error of the arriv al rate, with high probabilit y . Remark 13. Note that b oth terms          I Poi − 1 2 t − 1 ( θ ∗ ) M Poi t ( θ ∗ | S, p ) I Poi − 1 2 t − 1 ( θ ∗ )          op and          I Poi − 1 2 t − 1 ( b θ t − 1 ) M Poi t ( b θ t − 1 | S, p ) I Poi − 1 2 t − 1 ( b θ t − 1 )          op v anish as t increases, at rates roughly of the order 1 /t . 4.2.3 Bounding the Exp ected P er-customer Rev en ue Error. The estimation error b ound of the p er-customer reven ue r ( S , p , z t ; v ∗ ) is giv en in Lemma 7. Lemma 7. Supp ose Equation (34) in L emma 5 holds for t − 1 , then the fol lowing hold for t with any S ⊆ [ N ] and p ∈ ( p l , p h ) N : | r ( S, p , z t ; b v t − 1 ) − r ( S, p , z t ; v ∗ ) | ≤ min  p h , r ω v c 0    b I − 1 2 MNL t − 1 ( b v t − 1 ) c M MNL t ( b v t − 1 | S, p ) b I − 1 / 2 t − 1 MNL ( b v t − 1 )    op  | {z } η MNL t , (41) | r ( S, p , z t ; b v t − 1 ) − r ( S, p , z t ; v ∗ ) | ≤ min  p h , r ω v c 0    b I − 1 / 2 t − 1 MNL ( v ∗ ) c M MNL t ( v ∗ | S, p ) b I − 1 / 2 t − 1 MNL ( v ∗ )    op  | {z } η ∗ , MNL t . (42) A dditional ly, we have    b I − 1 2 MNL t − 1 ( b v t − 1 ) c M MNL t ( b v t − 1 | S, p ) b I − 1 / 2 t − 1 MNL ( b v t − 1 )    op ≤ c 2 8    b I − 1 / 2 t − 1 MNL ( v ∗ ) c M MNL t ( v ∗ | S, p ) b I − 1 / 2 t − 1 MNL ( v ∗ )    op , (43) wher e c 8 =  3(exp(4 τ v ) − 1)( K exp( ¯ v − p l ) + 1) + 1  . (44) The pro of and the implications for Lemma 7 are very similar to those of Lemma 6 except one subtle diﬀerence. W e use b I − 1 2 MNL t − 1 ( · ) , c M MNL t ( ·| S, p ) (Deﬁnition 3) rather than I − 1 2 MNL t − 1 ( · ) , M MNL t ( ·| S, p ) (Deﬁnition 2) b ecause the latter pair inv olv es unknown parameter θ ∗ , which can b e b ounded b y the former pair (see Lemma 1). The detailed pro of is in Section EC.11, and the Inequality (42) forms the basis for the p er-customer-reven ue-induced conﬁdence b ound shown in Equation (25). 23 4.2.4 Bounding of the Exp ected Regret. In this section, we show the upper b ound of the exp ected cumulativ e regret R π ( T ). Let E b e the high probability even t that the inequalities in Lemmas 4 and 5 hold and E c b e its complement (i.e., the even t that at least one of those inequalities fails to hold). The regret comes from three sources: 1. The regret incurred in the ﬁrst T 0 p erio ds in Stage 1, which is upp er b ounded b y T 0 p h Λ exp( ¯ x ) = O (log T ) (by the b oundedness of prices and the arriv al rate). 2. The regret incurred in Stage 2 under E , which we will sho w to b e of order O (( d x + d z ) √ T log T ). 3. The regret incurred in Stage 2 under E c , which is upp er b ounded b y 8 T · T · p h Λ exp( ¯ x ) = O (1). Com bining the three sources together giv es the statemen t. See b elow for detailed pro of. Pr o of of The or em 1. W e ﬁrst sho w that under even t E , ¯ R t ( S, p ) > R t ( S, p ) (45) for all t = T 0 + 1 , . . . , T and S ⊆ [ N ] , p ∈ ( p l , p h ) N , whic h directly giv es that under ev en t E , ¯ R t ( S t , p t ) ≥ ¯ R t ( S ∗ , p ∗ ) ≥ R t ( S ∗ , p ∗ ) ≥ R t ( S t , p t ) (46) for all t = T 0 + 1 , . . . , T . Note that ¯ R t ( S, p ) − R t ( S, p ) = λ ( S , p ; b θ t − 1 ) r ( S, p , z t ; b v t − 1 ) − λ ( S, p ; θ ∗ ) r ( S, p , z t ; v ∗ ) + p h η Poi t + Λ e ¯ x η MNL t ≥ −| λ ( S, p ; b θ t − 1 ) − λ ( S, p ; θ ∗ ) | r ( S, p , z t ; b v t − 1 ) + p h η Poi t | {z } ζ 1 − λ ( S, p ; θ ∗ ) | r ( S, p , z t ; v ∗ ) − r ( S, p , z t ; b v t − 1 ) | + Λ e ¯ x η MNL t | {z } ζ 2 . (47) By Lemma 6 and the fact that p er-customer reven ue is upp er bounded b y p h , we hav e ζ 1 ≥ 0. Similarly , Lemma 7 and the b oundedness of the arriv al rate giv e ζ 2 ≥ 0. T aken together, we hav e Inequalit y (45). Therefore, under ev en t E , by Inequality (46) and similar argument in Inequality (47), we ha v e R t ( S ∗ , p ∗ ) − R t ( S, p ) ≤ ¯ R t ( S t , p t ) − R t ( S, p ) (48) ≤| λ ( S, p ; b θ t − 1 ) − λ ( S, p ; θ ∗ ) | r ( S, p , z t ; b v t − 1 ) + p h η Poi t + λ ( S, p ; θ ∗ ) | r ( S, p , z t ; v ∗ ) − r ( S, p , z t ; b v t − 1 ) | + Λ e ¯ x η MNL t (49) ≤ 2( p h η Poi t + Λ e ¯ x η MNL t ) (50) for t = T 0 + 1 , . . . , T . Therefore, the regret incurred by the second source is upp er b ounded as T X t = T 0 +1 E [ ( R t ( S ∗ , p ∗ ) − R t ( S, p )) 1 {E } ] ≤ 2 p h T X t = T 0 +1 E ( η Poi t ) + 2Λ e ¯ x T X t = T 0 +1 E ( η MNL t ) . (51) 24 Next, w e b ound the tw o summations in Inequalit y (51). (a) Bounding P T t = T 0 +1 E ( η Poi t ) . By Lemma 6 and Cauc h y Sc h w arz Inequalit y , we hav e T X t = T 0 +1 E [ η Poi t 1 {E } ] ≤ E " T X t = T 0 +1 e 2 τ θ ¯ x η ∗ , Poi t 1 {E } # ≤ √ T e 2 τ θ ¯ x v u u t E " T X t = T 0 +1 ( η ∗ , Poi t ) 2 # , (52) where η ∗ , Poi t is deﬁned in Equation (39). T o b ound the summation on the right-hand side, w e hav e the follo wing lemma (see Section EC.8 for pro of ). Lemma 8. Under Assumptions 1 to 6, the fol lowing holds T X t = T 0 +1 ( η ∗ , Poi t ) 2 = T X t = T 0 +1 min ( Λ 2 ( e ¯ x − e − ¯ x ) 2 , ω θ Λ e (2 τ θ +1) ¯ x          I Poi − 1 2 t − 1 ( θ ∗ ) M Poi t ( θ ∗ | S t ) I Poi − 1 2 t − 1 ( θ ∗ )          op ) ≤ c 6 log det I Poi T ( θ ∗ ) det I Poi T 0 ( θ ∗ ) ≤ d x c 6  d x log ¯ x 2 T d x + ( d x + 1) ¯ x − log ( T 0 σ 1 )  = O ( d 2 x log T ) , wher e c 6 = Λ 2 ( e ¯ x − e − ¯ x ) 2 log  1 + Λ 2 ( e ¯ x − e − ¯ x ) 2 ω θ Λ e (2 τ θ +1) ¯ x  . (53) Returning bac k to Inequalit y (52) with Lemma 8, w e ha v e T X t = T 0 +1 E [ η Poi t 1 {E } ] ≤ O ( d x p T log T ) . (54) (b) Bounding P T t = T 0 +1 E ( η MNL t ) . Similarly , by Lemma 7 and Cauch y Sch warz Inequalit y we ha v e T X t = T 0 +1 E [ η MNL t 1 {E } ] ≤ E " T X t = T 0 +1 c 8 η ∗ , MNL t 1 {E } # ≤ c 8 √ T v u u t E " T X t = T 0 +1 ( η ∗ , MNL t ) 2 # , (55) where η ∗ , MNL t is deﬁned in Equation (42). T o b ound the summation on the right-hand side, w e hav e the follo wing lemma (see Section EC.12 for pro of ). Lemma 9. Under Assumptions 1 to 6, the fol lowing holds with pr ob ability at le ast 1 − 1 T : T X t = T 0 +1 ( η ∗ , MNL t ) 2 = T X t = T 0 +1 min  ¯ p 2 h , ω v c 0    b I MNL − 1 / 2 t − 1 ( v ∗ ) c M MNL t ( v ∗ | S t ) b I MNL − 1 / 2 t − 1 ( v ∗ )    op  ≤ d z c 7  log 4 T d z − log( T 0 σ 0 )  = O ( d 2 z log T ) , wher e c 7 = ω v c 0 exp(2 ¯ x ) ¯ p 2 h log  1 + ¯ p 2 h ω v c 0 exp(2 ¯ x )  . (56) 25 Using Lemma 9 to further b ound the right-hand side of Inequalit y (55), we ha v e T X t = T 0 +1 E [ η MNL t 1 {E } ] ≤ c 8 √ T s d z c 7  log 4 T d z − log( T 0 σ 0 )  + 1 T · T p 2 h = O ( d z p T log T ) . (57) Com bining Inequalities 54 and 57 with Inequalit y 51, we hav e the regret from the second source upp er b ounded by T X t = T 0 +1 E [ ( R t ( S ∗ , p ∗ ) − R t ( S, p )) 1 {E } ] ≤ O  ( d x + d z ) p T log T  . (58) Next, w e consider the regret from the third source. Since P ( E c ) ≤ 8 /T , and R t ( S ∗ , p ∗ ) ≤ p h T X t = T 0 +1 E [ ( R t ( S ∗ , p ∗ ) − R t ( S, p )) 1 {E c } ] ≤ T · p h Λ exp( ¯ x ) · 8 T = 8 p h Λ exp( ¯ x ) . (59) Com bining all three sources giv es R π ( T ; v , θ ) = T 0 X t =1 E  ¯ R t ( S t , p t ) − R t ( S t , p t )  + T X t = T 0 +1 E  ¯ R t ( S t , p t ) − R t ( S t , p t )  1 {E }  (60) + T X t = T 0 +1 E  ¯ R t ( S t , p t ) − R t ( S t , p t )  1 {E c }  (61) ≤ O (1) + O (( d x + d z ) p T log T ) + O (1) = O (( d x + d z ) p T log T ) + O (1) . (62) 4.3 Lo wer b ound T o understand the fundamen tal limitations of any p olicy in suc h settings, we now turn to estab- lishing a regret low er b ound. This ensures that our upp er b ound is tight and demonstrates the optimalit y of our prop osed strategy in the worst-case scenario. Sp eciﬁcally , w e can sho w that an y p olicy satisfying the assumptions on mo del parameters will incur at least the following regret in a w orst-case instance. Theorem 2 (Non-asymptotic regret lo wer b ound) . L et A denote the set of p olicies that output fe asible assortment-pricing de cisions satisfying Assumptions 1 and 2 for the pr oblem in Se ction 2.1. Supp ose at le ast one of the fol lowing c onditions is satisﬁe d: (i) min { d z − 2 , N } ≥ K and Λ ≥ 1 ; (ii) Λ ≥ 1 and log N − d z K ≥ 8 log 2 − 11 4 log 3 ; (iii) min { d x − 2 , N } ≥ K and Λ ≥ 1 ; then for any p olicy π ∈ A , ther e exists a pr oblem instanc e satisfying Assumptions 1 to 6 such that the exp e cte d r e gr et of p olicy π under this instanc e is lower b ounde d as R π ( T ) ≥ c 9 √ Λ T , (63) wher e c 9 > 0 dep ends only on d z , d x , K, p l , p h , and ¯ x . 26 Remark 14. This low er b ound is a non-asymptotic minimax low er b ound primarily fo cusing on the rate with resp ect to the time horizon T . Comparing with the upp er b ound O ( √ T log T ) shown in Theorem 1, our algorithm is near minimax optimal (up to log factors). Remark 15. This lo w er b ound is the ﬁrst low er b ound for the join t con textual assortmen t- pricing problem and is not a direct extension of the low er b ound for the contextual assortment problem established in Chen et al. (2020). The b ound in Chen et al. (2020) is asymptotic and requires N ≥ K · 2 d z . Our low er b ound, how ever, is non-asymptotic and only requires N > 14 K (at least one of (i) and (ii) holds). This relaxed assumption signiﬁcan tly enlarges the applicabilit y of the lo w er b ound, esp ecially when the dimension of the pro duct feature is mo derately high. Remark 16. W e consider algorithms choosing K pro ducts for assortment in this low er b ound, in alignment with Assumption 1. How ever, our proof only requires the assortmen t’s capacity is smaller or equal to K so it also applies to the setting considered in Oh and Iyengar (2021), Chen et al. (2022b), Agra w al et al. (2019), Chen et al. (2020). Theorem 2 establishes a non-asymptotic lo wer bound of order √ Λ T under mild conditions. Next, w e pro vide an asymptotic v ersion and further consider the dep endence on the dimensions d z , d x . Theorem 3 (Asymptotic regret lo wer b ound) . L et A denote the set of p olicies that output fe asible assortment-pricing de cisions satisfying Assumptions 1 and 2 for the pr oblem in Se ction 2.1. F or any p olicy π ∈ A the fol lowing statements hold. (i) If min { d z + 1 4 , N } ≥ K , then ther e exists a pr oblem instanc e satisfying Assumptions 1 to 6 such that the exp e cte d r e gr et of p olicy π under this instanc e is lower b ounde d by lim inf T →∞ R π ( T ) √ Λ T ≥ c l, 1 p d z , wher e c l, 1 only dep ends on p l , p h , K . In p articular, when p l = Ω(log K ) and p h = Ω(log K ) , we have c l, 1 = Ω(log K ) . When p l = Ω(1) and p h = Ω(1) , we have c l, 1 = Ω( 1 K ) . (ii) If Λ ≥ 1 , log N − d z K ≥ 8 log 2 − 11 4 log 3 , then ther e exists a pr oblem instanc e satisfying Assump- tions 1 to 6 such that the exp e cte d r e gr et of p olicy π under this instanc e is lower b ounde d by lim inf T →∞ R π ( T ) √ Λ T ≥ c l, 2 min { d z , log ( ( N − d z ) /K ) } . wher e c l, 2 only dep ends on p l , p h , K . In p articular, when p l = Ω(log K ) and p h = Ω(log K ) , we have c l, 2 = Ω(log K ) . When p l = Ω(1) and p h = Ω(1) , we have c l, 2 = Ω( 1 K ) . (iii) If min { d x + 1 4 , N } ≥ K , Λ ≥ 1 , then ther e exists a pr oblem instanc e satisfying Assumptions 1 to 6 such that the exp e cte d r e gr et of p olicy π under this instanc e is lower b ounde d by lim inf T →∞ R π ( T ) √ Λ T ≥ c l, 3 p d x . wher e c l, 3 only dep ends on p l , p h , K, ¯ x . In p articular, when p l = Ω(log K ) and p h = Ω(log K ) , we have c l, 3 = Ω( √ K log K exp( ¯ x ) ¯ x ) . When p l = Ω(1) and p h = Ω(1) , we have c l, 3 = Ω( √ K exp( ¯ x ) ¯ x ) . 27 Remark 17. Theorem 3 es tablishes an asymptotic low er b ound that also highlights the dep en- dence on the dimensions. Note that when the dimensions are relatively large, the asymptotic lo w er b ound is of order ( d z + √ d x ) √ T . Compared with the upp er b ound in Theorem 1, we establish that our algorithm is optimal with resp ect to d z , and near optimal with resp ect to the time horizon T (up to √ log T ). In typical regimes this remaining gap is small: for example √ log T ≈ 4 when T = 10 7 , and √ d x ≤ 10 when d x ≤ 100. As a result, further tightening the logarithmic gap may ha v e limited practical impact, although closing it remains an in teresting tec hnical question. Remark 18. W e consider algorithms that choose K pro ducts in this low er b ound, in alignment with Assumption 1. Ho w ev er, our pro of only requires the capacity of assortmen ts to b e smaller or equal to K ; therefore it also applies to settings considered in the literature (Agra w al et al. 2019, Chen et al. 2020, Oh and Iy engar 2021, Chen et al. 2022b). Remark 19. Note that when N ≥ 18 K , either the conditions in item (i) or item (ii) hold. Com- pared to prior lo wer-bound results for contextual MNL bandits, our condition is muc h more relaxed and reasonable. F or instance, Chen et al. (2020) require N ≥ 2 d z K and the feature dimension d z to be divisible b y 4. Under such conditions, the conditions in item (ii) hold, and the low er b ound b ecomes c l, 2 d z . In particular, if we let p l = p h = p , then the joint assortment and pricing problem b ecomes the assortment problem, and our asymptotic low er b ound in item (ii) reduces to Ω( d z √ T Λ /K ), matching their lo w er b ound. While there are eﬀorts to sharp en the dep endence on K (e.g., Lee and Oh (2024)), they fo cus on restricted classes of algorithms that select the same pro duct for K times in each episo de. 5 Sim ulations Studies W e present sim ulation exp erimen ts to ev aluate PMNL in multiple settings, especially when customer arriv al rates change with the assortment-pricing precision to highlight the imp ortance of mo deling decision-dep enden t arriv als. In Section 5.1, w e compare the performance of PMNL for dynamic pricing with v arying assortmen ts with an algorithm prop osed by F erreira and Mow er (2023), which w e denote b y FM23 . Section 5.2 compares the p erformance of PMNL on the full join t dynamic assortmen t and pricing with a naiv e baseline. W e defer additional results to Appendix EC.1. 5.1 Sim ulation Exp eriment I: Dynamic Pricing with V arying Assortmen t T o demonstration the importance of modeling customer arriv al rates and their dependence on prices and assortments (via pro duct features), we compare PMNL with a b enc hmark that considers an unknown constant arriv al rate for eac h p erio d and adjusts prices at the b eginning of eac h p erio d. As discussed in Section 1, to our kno wledge, FM23 is the only existing metho d designed for p erio d-level dynamic pricing. 28 W e follow a simulation setup similar to F erreira and Mow er (2023). Since FM23 does not optimize assortmen ts, we b enchmark the p erformance of dynamic pricing under assortments that v ary b y p erio d but are giv en at the b eginning of each p erio d. Sp eciﬁcally , we consider a retailer selling N = K = 5 pro ducts ov er T = 1 , 000 p erio ds with feasible prices P = [ p l , p h ] N = [10 , 30] N . In each p erio d t , pro duct features z t are drawn indep endently from a uniform distribution with supp ort [1 , 2], i.e., each feature z j t,d i.i.d. ∼ Uniform (1 , 2) for d ∈ [ d z ] and j ∈ [ N ], where w e set d z = 3. Customer arriv als n t follo w the Poisson arriv al mo del (1), with base arriv al rate Λ = 20 and unit arriv al rate λ t := exp − X j ∈ S t α log p j t + β X j ∈ S t d z X d =1 log( az j t,d + b ) ! (64) where w e set θ ∗ = ( α, β ) = (0 . 2 , 0 . 2), a = 30, and b = − 15. As discussed in Remark 1, the ﬁrst term captures the price sensitivit y , with lo wer prices attract more customers. The second term captures the eﬀect of pro duct features—treating features as desirable attributes, higher feature lev els lead to higher arriv als. Each customer’s purchase decision C ( i ) t is dra wn indep endently according to the MNL mo del (5) with each elemen t of the preference parameter v ∗ d i.i.d. ∼ Uniform (0 , 1) for d ∈ [ d z ]. W e run 100 Mon te Carlo sim ulations and compare the cum ulative regret R π ( t ) of FM23 against our PMNL algorithm with T 0 = 10, i.e., the ﬁrst 10 p erio ds are for Stage 1 of PMNL and the learn- ing stage for FM23 . Figure 1a sho ws the cum ulative regrets of PMNL and FM23 . The regret of PMNL con v erges whereas that of FM23 gro ws linearly , sho wing the imp ortance of accoun ting for the depen- dency of the arriv al rate on the assortment-pricing and corrob orating our claims that ignoring this dep endency can lead to sub optimal decision-making. W e further examine the pricing decisions to b etter understand the b eha vior of the algorithms. Figure 2 sho ws the pricing decisions of the ﬁve pro ducts under FM23 and PMNL . Both algorithms explore a wide range of prices during the ﬁrst stage ( t ∈ [ T 0 ]). In the second stage ( t > T 0 ), FM23 mostly prices the pro ducts around the upper b oundary p h = 30, due to the fact that FM23 assumes the arriv al rate do es not change with the price and therefore maximizes per-customer reven ue b y pricing high. Under this setting, how ever, arriv als decrease with higher prices, so p ersisten tly setting prices to o high reduces cum ulative rev en ue. By con trast, PMNL prices low, attracting more customers while still main taining suﬃcien t rev en ue to maximize the cum ulativ e rev en ue. W e further compare the tw o algorithms under tw o additional scenarios. Keep all other settings unc hanged, the only diﬀerence is the sp eciﬁcation of the arriv al rate: (i) the arriv al rate λ t dep ends only on prices, i.e., β = 0, and (ii) the arriv al rate is constan t, i.e., λ t = 1. In the former case, Figure 1b sho ws that PMNL still outp erforms FM23 . With constan t arriv al rate, Figure 1c sho ws that PMNL performs comparably to FM23 , suggesting that PMNL is robust to the case when the arriv al rate is exogenous. As shown in F erreira and Mow er (2023), FM23 itself outp erforms the M3P algorithm 29 0 3000 6000 9000 0 250 500 750 1000 Time R π ( t ) PMNL FM23 (a) Assortmen t-pricing dependent. 0 2000 4000 6000 0 250 500 750 1000 Time R π ( t ) PMNL FM23 (b) Price dependent. 0 25 50 75 100 0 250 500 750 1000 Time R π ( t ) PMNL FM23 (c) Constan t arriv al rate. Figure 1 Compa rison of cumulative regret of PMNL and FM23 (F erreira and Mow er 2023) across three settings. In each panel, the solid line shows the sample average cumulative regret over 100 simulations for PMNL , and the dashed line corresponds to FM23 . Shaded regions indicate the 10th and 90th p ercentiles. price_1 price_2 price_3 price_4 price_5 0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000 10 15 20 25 30 Time Price PMNL FM23 Pricing Figure 2 Pricing decisions for the p ro ducts. The solid line sho ws the median price under PMNL across 100 simulations, and the dashed line corresponds to FM23 . Shaded regions indicate the 10th and 90th p ercentiles. prop osed by Ja v anmard and Nazerzadeh (2019), which alternates the learning and earning phases in an episo dic manner. T ogether, these results show that PMNL is comp etitive for dynamic pricing, and outp erforms v arious existing algorithms when arriv al rates dep end on assortmen t and pricing. 5.2 Sim ulation Exp eriment I I: Join t Dynamic Assortmen t-Pricing In this section, w e consider the joint dynamic assortmen t and pricing problem. Since there exists no comp etitive algorithm in the literature for our setting, w e compare PMNL with a naive UCB algorithm that assumes a ﬁxed arriv al rate and up dates the assortment and prices at the beginning of eac h p erio d using past observ ations. W e adopt a setup similar to Section 5.1 with the following c hanges. W e set N = 5 pro ducts and an assortment size K = 4. Eac h pro duct has d z = 5 features. F or the Poisson arriv al mo del, we set θ ∗ = ( α, β ) = (0 . 1 , 0 . 1) and Λ = 100. All other conﬁgurations remain unchanged. Figure 3 rep orts cumulativ e regrets of PMNL and UCB. Similarly , the regret of PMNL conv erges whereas that of UCB gro ws linearly . Once again, these results highligh t that it is crucial to accoun t for the decision-dep endent arriv als: PMNL eﬀectively learns both the Poisson arriv al and the MNL c hoice mo dels and can provide b etter join t assortment-pricing decisions that naive algorithms that assume the arriv al rate is ﬁxed. 30 0 5000 10000 0 250 500 750 1000 Time R π ( t ) PMNL UCB Figure 3 Compa rison of cumulative regret of PMNL and a naive UCB algo rithm. The solid line shows the sample average of the cumulative regret across 100 simulations for PMNL , while the dashed line corresponds to UCB. Shaded regions indicate the 10th and 90th p ercentiles. 6 Conclusion This pap er studies dynamic join t assortment and pricing problem when ﬁrms up date decisions at regular accounting or op erating interv als with the goal of maximizing the cumulativ e p er-p erio d rev en ue o ver a time horizon T . In this setting, the reward dep ends jointly on how many customers arriv e during the p erio d and what they purc hase conditional on arriv al. T o capture these t w o c hannels, we prop ose a Poisson–m ultinomial logit (P oisson–MNL) mo del that couples a P oisson arriv al mo del with a con textual MNL c hoice mo del. The key is to allo w the arriv al rate to dep end on the assortmen t–pricing decision through a rich set of basis functions on the oﬀered assortment and prices, while the choice mo del leverages (p otentially time-v arying) pro duct features to enable attribute-lev el learning and generalization across items. This framew ork roughly nests the classical MNL mo dels that assume the arriv al is ﬁxed as a sp ecial case. Building on this mo del, we develop PMNL , an eﬃcien t online p olicy leveraging the upper conﬁdence b ound (UCB) strategy that join tly learns arriv al and choice parameters and selects assortments and prices to maximize cumulativ e exp ected reward. W e establish near-minimax optimal regret guaran tees: an O ( √ T log T ) upp er b ound with a matching Ω( √ T ) lo wer b ound (up to log T ). Sim u- lations sho w that accoun ting for decision-dependent arriv als substan tially outp erforms b enchmarks that assume constant arriv al rates when arriv als dep end on the assortment and pricing. An imp or- tan t direction for future w ork is to close the remaining logarithmic gap b etw een the upp er and lo w er b ounds. Another direction is to extend the arriv al mo del to capture temp oral eﬀects, where arriv als in a giv en p erio d ma y dep end not only on the current assortment and prices but also on past decisions. 31 References Abbasi-Y adk ori Y, P´ al D, Szep esv´ ari C (2011) Improv ed algorithms for linear sto chastic bandits. A dvanc es in Neur al Information Pr o c essing Systems , volume 24. Ab dallah T, V ulcano G (2021) Demand estimation under the multinomial logit mo del from sales transaction data. Manufacturing & Servic e Op er ations Management 23(5):1196–1216. Agra w al S, Av adhanula V, Go yal V, Zeevi A (2019) MNL-bandit: A dynamic learning approac h to assortmen t selection. Op er ations R ese ar ch 67(5):1453–1485. Ahle TD (2022) Sharp and simple b ounds for the raw moments of the binomial and poisson distributions. Statistics and Pr ob ability L etters 182:109306. Ak¸ ca y Y, Natara jan HP , Xu SH (2010) Join t dynamic pricing of m ultiple perishable pro ducts under consumer c hoice. Management Scienc e 56(8):1345–1361. Aparicio D, Eckles D, Kumar M (2023) Algorithmic pricing and consumer sensitivit y to price v ariability . Avai lable at SSRN 4435831 . Araman VF, Caldentey R (2009) Dynamic pricing for nonp erishable pro ducts with demand learning. Op er- ations R ese ar ch 57(5):1169–1188. Ban GY, Keskin NB (2021) Personalized dynamic pricing with mac hine learning: High-dimensional features and heterogeneous elasticity . Management Scienc e 67(9):5549–5568. Bastani H, Simchi-Levi D, Zh u R (2022) Meta dynamic pricing: T ransfer learning across exp eriments. Man- agement Scienc e 68(3):1865–1881. Besb es O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near- optimal algorithms. Op er ations R ese ar ch 57(6):1407–1420. Bro der J, Rusmevic hien tong P (2012) Dynamic pricing under a general parametric c hoice mo del. Op er ations R ese ar ch 60(4):965–980. Bro wn LD (1986) F undamentals of statistical exp onen tial families: with applications in statistical decision theory (Ims). Bro wn Z, et al. (2023) Competition and consumer b ehavior in online marketplaces. Journal of Marketing R ese ar ch . Chen N, Gallego G (2021) Nonparametric pricing analytics with customer co v ariates. Op er ations R ese ar ch 69(3):974–984. Chen X, Owen Z, Pixton C, Simc hi-Levi D (2022a) A statistical learning approach to p ersonalization in rev en ue managemen t. Management Scienc e 68(3):1923–1937. Chen X, Shi C, W ang Y, Zhou Y (2021) Dynamic assortmen t planning under nested logit mo dels. Pr o duction and Op er ations Management 30(1):85–102. 32 Chen X, W ang Y (2017) A note on a tight low er b ound for mnl-bandit assortment selection mo dels. arXiv pr eprint arXiv:1709.06109 . Chen X, W ang Y, Zhou Y (2020) Dynamic assortmen t optimization with changing con textual information. The Journal of Machine L e arning R ese ar ch 21(1):8918–8961. Chen Y, W ang Y, F ang EX, W ang Z, Li R (2022b) Nearly dimension-indep endent sparse linear bandit o v er small action spaces via b est subset selection. Journal of the A meric an Statistic al Asso ciation 1–13. Cheung WC, Simchi-Levi D (2017) Thompson sampling for online p ersonalized assortment optimization problems with multinomial logit choice mo dels. Available at SSRN 3075658 . Choi KP (1994) On the medians of gamma distributions and an equation of ramanujan. Pr o c e e dings of the Am eric an Mathematic al So ciety 121(1):245–251, URL http://dx.doi.org/10.2307/2160389 . Cohen MC, Lob el I, Paes Leme R (2020) F eature-based dynamic pricing. Management Scienc e 66(11):4921– 4943. de la Pena VH (1999) A general class of exp onential inequalities for martingales and ratios. The Annals of Pr ob ability 27(1):537–564. den Bo er A V, Zwart B (2014) Sim ultaneously learning and optimizing using controlled v ariance pricing. Management Scienc e 60(3):770–783. F an J, Guo Y, Y u M (2024) Policy optimization using semiparametric mo dels for dynamic pricing. Journal of the Americ an Statistic al Asso ciation 552–564. F erreira KJ, Mow er E (2023) Demand learning and pricing for v arying assortments. Manufacturing & Servic e Op er ations Management 25(4):1227–1244. Gallego G, W ang R (2014) Multipro duct price optimization and comp etition under the nested logit mo del with product-diﬀerentiated price sensitivities. Op er ations R ese ar ch 62(2):450–461. Ja v anmard A, Nazerzadeh H (2019) Dynamic pricing in high-dimensions. The Journal of Machine L e arning R ese ar ch 20(1):315–363. Ja v anmard A, Nazerzadeh H, Shao S (2020) Multi-pro duct dynamic pricing in high-dimensions with heteroge- neous price sensitivity . 2020 IEEE International Symp osium on Information The ory (ISIT) , 2652–2657 (IEEE). Kahn BE (1995) Consumer v ariety-seeking among go o ds and services: An in tegrative review. Journal of r etailing and c onsumer servic es 2(3):139–148. Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand mo del: Asymptotically optimal semi-m yopic p olicies. Op er ations r ese ar ch 62(5):1142–1167. Kingman JF C (1992) Poisson pr o c esses , volume 3 (Clarendon Press). Klein b erg R, Leighton T (2003) The v alue of knowing a demand curve: Bounds on regret for online p osted- price auctions. 44th Annual IEEE Symp osium on F oundations of Computer Scienc e, 2003. Pr o c e e dings. , 594–605 (IEEE). 33 Lancaster K (1990) The economics of product v ariety: A survey . Marketing scienc e 9(3):189–206. Lattimore T, Szep esv´ ari C (2020) Bandit algorithms (Cambridge Universit y Press). Lee J, Oh Mh (2024) Nearly minimax optimal regret for multinomial logistic bandit. A dvanc es in Neur al Information Pr o c essing Systems 37:109003–109065. Lee SJ, Sun WW, Liu Y (2025) Low-rank online dynamic assortment with dual contextual information. Journal of the Americ an Statistic al Asso ciation (just-accepted):1–22. Ma W, Simchi-Levi D, Zhao J (2018) Dynamic pricing (and assortment) under a static calendar. arXiv pr eprint arXiv:1811.01077 . Miao S, Chao X (2021) Dynamic joint assortment and pricing optimization with demand learning. Manu- facturing & Servic e Op er ations Management 23(2):525–545. Miao S, Chao X (2022) Online p ersonalized assortment optimization with high-dimensional customer con- textual data. Manufacturing & Servic e Op er ations Management 24(5):2741–2760. Miao S, Chen X, Chao X, Liu J, Zhang Y (2022) Context-based dynamic pricing with online clustering. Pr o duction and Op er ations Management 31(9):3559–3575. Oh Mh, Iy engar G (2019) Thompson sampling for m ultinomial logit contextual bandits. A dvanc es in Neur al Information Pr o c essing Systems 32. Oh Mh, Iyengar G (2021) Multinomial logit contextual bandits: Prov able optimality and practicality . Pr o- c e e dings of the AAAI c onfer enc e on artiﬁcial intel ligenc e . P erivier N, Goy al V (2022) Dynamic pricing and assortmen t under a con textual mnl demand. A dvanc es in Neur al Information Pr o c essing Systems 35:3461–3474. P oisson SD (1837) R e cher ches sur la pr ob abilit´ e des jugements en mati` er e criminel le et en mati` er e civile: pr ´ ec´ ed´ ees des r` egles g´ en´ er ales du c alcul des pr ob abilit´ es (Bachelier). P ollard D (2015) Miniempirical. http://www.stat.yale.edu/ ~ pollard/Books/Mini/ , manuscript (accessed 02-23-2017). Qiang S, Bay ati M (2016) Dynamic pricing with demand cov ariates. arXiv pr eprint arXiv:1604.07463 . Robbins H (1952) Some asp ects of the sequential design of exp erimen ts. Bul letin of the Americ an Mathe- matic al So ciety 58(5):527–535. Rusmevic hien tong P , Shen ZJM, Shmoys DB (2010) Dynamic assortment optimization with a m ultinomial logit c hoice mo del and capacity constraint. Op er ations R ese ar ch 58(6):1666–1680. T ropp JA (2012) User-friendly tail b ounds for sums of random matrices. F oundations of Computational Mathematics 12(4):389–434. v an de Geer S (2000) Empiric al Pr o c esses in M-estimation (Cam bridge Univ ersity Press). V ulcano G, V an Ryzin G, Ratliﬀ R (2012) Estimating primary demand for substitutable pro ducts from sales transaction data. Op er ations R ese ar ch 60(2):313–334. 34 W ain wright MJ (2019) High-dimensional statistics: A non-asymptotic viewp oint , volume 48 (Cambridge Univ ersit y Press). W ang R (2021) Consumer c hoice and market expansion: Mo deling, optimization, and estimation. Op er ations R ese ar ch 69(4):1044–1056. Wink elmann R (2008) Ec onometric analysis of c ount data (Springer-V erlag). e-companion to Author: Poisson MNL ec1 Electronic compani ons The app endix collects supplementary materials that support the main text and pro ofs. W e ﬁrst rep ort additional sim ulation results in Section EC.1. W e then provide a reduction for the feature vector x ( S, p ) in the arriv al model in Section EC.2. Next, Section EC.3 clariﬁes the relationship b etw een Assumptions 6 and 7 b y sho wing ho w the stronger condition implies the weak er one and ho w to construct a sequence required in the latter. Finally , we introduce additional notations and technical preliminaries used throughout the pro ofs in Section EC.4 and provide the detailed proofs in the rest of the App endix, which is organized as follows: • Section EC.5 pro ves Lemma 2 and its related Lemmas EC.11 to EC.15. Sp eciﬁcally , Lemma 2 dep ends on the pro ofs of Lemmas EC.11 and EC.12, with the latter relying on Lemmas EC.13 to EC.15. • Section EC.6 prov es Lemma 4, building on Lemmas EC.11 and EC.12. • Section EC.7 prov es Lemma 6, whic h dep ends on Lemma 4. • Section EC.8 prov es Lemma 8. • Section EC.9 prov es Lemma 3 and its auxiliary Lemmas EC.16 to EC.21. Here, Lemma 3 is established via Lemma EC.16; Lemma EC.18 relies on Lemmas EC.17 and EC.21, and Lemmas EC.19 and EC.20 b oth depend on Lemma EC.17. • Section EC.10 prov es Lemma 5 and Lemma EC.22. Lemma 5 depends on Lemmas EC.11, EC.16 and EC.22. • Section EC.11 prov es Lemma 7 and Lemma EC.23 where the former dep ends on the latter and Lemmas 5 and EC.22. • Section EC.12 prov es Lemma 9. • Section EC.13 is dedicated to the pro of of the lo wer b ound Theorems 2 and 3 as w ell as theirs dependent on Lemmas EC.24 to EC.31. Theorem 1 Lemma 2 Lemma 4 Lemma 6 Lemma 3 Lemma 5 Lemma 7 Lemma 8 Lemma 9 Lemma EC.11 Lemma EC.16 Lemma EC.22 Lemma EC.23 Lemma EC.12 Lemma EC.18 Lemma EC.19 Lemma EC.20 Lemma EC.13 Lemma EC.14 Lemma EC.15 Lemma EC.21 Lemma EC.17 Theorem (Final Conclusion) Lemma (Key Steps) Useful conclusion F undamental Lemma T echnical Lemma Figure EC.1 Pro of Structure of Theo rem 1. EC.1 Additional Sim ulation Results Figure EC.2 shows that estimation error of the parameters for the sim ulation in Section 5.1. Sp eciﬁcally , Figure EC.2a sho ws the estimation error of the MNL parameters ∥ b v − v ∗ ∥ 2 : PMNL conv erges faster than FM23 ; and Figure EC.2b shows the the estimation error of the arriv al parameters ∥ b θ − θ ∗ ∥ 2 : PMNL learns the arriv al mo del eﬃcien tly and the estimation error con verges. EC.2 Reduction of x ( S t , p t ) This section shows that, without loss of generality , we can work with a reduced representation of the arriv al feature v ector x ( S t , p t ) and describ e how to construct such a reduction b y removing co ordinates that are redundan t o v er the feasible decision set. Consider the set of all attainable feature v ectors { x ( S, p ) | S ∈ S , p ∈ P } . W e can assume that the maximal linearly indep enden t group within the set { x ( S , p ) | S ∈ S , p ∈ P } comprises d x v ectors. If the n um b er of vectors is less than d x , then the matrix formed b y this maximal linearly indep endent group is rank-deﬁcien t. ec2 e-companion to Author: Poisson MNL 0.00 0.25 0.50 0.75 1.00 0 250 500 750 1000 Time PMNL FM23 | | v ^ − v | | 2 (a) Estimation Error of ˆ v : || ˆ v − v || 2 . 0.000 0.025 0.050 0.075 0.100 0 250 500 750 1000 Time | | α ^ − α | | 2 (b) Estimation Error of ˆ θ : || ˆ θ − θ || 2 . Figure EC.2 Estimation Error of Unknown P arameters b y PMNL and FM23 . The left panel sho ws the estimation erro r fo r the customer p reference for b oth algo rithms, while the right panel shows the estimation erro r for the a rrival pa rameters, which only appear in PMNL . This implies that some comp onents of x in the maximal linearly indep endent group can b e expressed in terms of other comp onents. How ever, the maximal linearly independent group can represen t all the vectors in { x ( S, p ) | S ∈ S , p ∈ P } , which means that a certain component of all the vectors in { x ( S, p ) | S ∈ S , p ∈ P } can be expressed by other comp onents. Consequen tly , w e can eliminate this comp onent. By rep eating this pro cess, we can deduce that the num b er of vectors in the maximal linearly indep endent groups is equal to the dimension of the v ariables. This condition ensures that span( { x ( S t , p t ) | S ∈ S , p ∈ P } ) has rank d x , whic h signiﬁcantly simpliﬁes our analysis. EC.3 Relationship Bet w een Assumption 6 and Prior W ork Lemma EC.10. Supp ose that Assumption 7 holds. Ther e exist c onstants σ 0 = 1 2 K ¯ σ 0 and σ 1 > 0 such that we c an c onstruct assortments { S s } t s =1 and pric e ve ctors { p s } t s =1 for which, for any t ≥ t 0 := max  log( d z T ) σ 0 (1 − log 2) , 2 d x  , the fol lowing hold with pr ob ability at le ast 1 − T − 1 : 1. σ min t X s =1 X j ∈ S s z j s z ⊤ j s ! ≥ σ 0 t. 2. Ther e exists a pric e se quenc e { p s } t s =1 such that σ min t X s =1 x ( S s , p s ) x ⊤ ( S s , p s ) ! ≥ σ 1 t. W e pro ve each part of Lemma EC.10 below. Pro of of Part 1 of Lemma EC.10. By Assumption 7, ∥ z j s ∥ 2 ≤ 1, hence σ max ( z j s z ⊤ j s ) ≤ 1. Applying the matrix Chernoﬀ b ound (T ropp 2012, Thm. 1.1), for any δ ∈ [0 , 1), P ( σ min t X s =1 X j ∈ S s z j s z ⊤ j s ! ≤ (1 − δ ) tK ¯ σ 0 ) ≤ d z  e − δ (1 − δ ) 1 − δ  tK ¯ σ 0 . (EC.1) Setting δ = 1 2 giv es P ( σ min t X s =1 X j ∈ S s z j s z ⊤ j s ! ≤ 1 2 tK ¯ σ 0 ) ≤ d z exp  − 1 2 tK ¯ σ 0 (1 − log 2)  . Th us the right-hand side is at most T − 1 pro vided t ≥ 2 log( d z T ) K ¯ σ 0 (1 − log 2) = log( d z T ) σ 0 (1 − log 2) . F or all t ≥ t 0 , this implies σ min t X s =1 X j ∈ S s z j s z ⊤ j s ! ≥ σ 0 t with probability at least 1 − T − 1 . Q.E .D . e-companion to Author: Poisson MNL ec3 Pro of of P art 2 of Lemma EC.10. Since { x ( S, p ) } spans R d z , we can choose d x linearly indep enden t v ectors { x ( S s , p s ) } d x s =1 . Hence there exists λ min > 0 such that σ min d x X s =1 x ( S s , p s ) x ⊤ ( S s , p s ) ! ≥ λ min . F or any t ≥ 2 d x , rep eat this length- d x blo c k (and truncate the last block if needed) to form { ( S s , p s ) } t s =1 . Since eac h term x ( S s , p s ) x ⊤ ( S s , p s ) is p ositive semideﬁnite, σ min t X s =1 x ( S s , p s ) x ⊤ ( S s , p s ) ! ≥ j t d x k λ min ≥  t d x − 1  λ min ≥ t λ min 2 d x . Let σ 1 := λ min / (2 d x ) > 0. Then σ min t X s =1 x ( S s , p s ) x ⊤ ( S s , p s ) ! ≥ σ 1 t, ∀ t ≥ 2 d x . Q.E .D . EC.4 Notations This section provides a detailed pro of of the results presented in the main b o dy of the pap er. T o facilitate the proof, we b egin by introducing sev eral deﬁnitions. F or each p erio d t , w e use Maximum Likelihoo d Estimation (MLE) to estimate the unknown parameters ( θ , v ). The log-lik eliho o d function for these parameters, up to p erio d t , can b e expressed as: L t ( θ , v ) = L Poi t ( θ ) + L MNL t ( v ) . (EC.2) Here, L Poi t ( θ ) and L MNL t ( v ) are deﬁned as in Section 3.1, where: L Poi t ( θ ) = t X s =1 {− Λ λ ( S s , p s ; θ ) + n s log λ ( S s , p s ; θ ) + ( n s log Λ − log n s !) } , L MNL t ( v ) = t X s =1 n s X i =1 X j ∈ S s ∪{ 0 } 1 { C ( i ) s = j } log q ( j , S s , p s , z s ; v ) . The term L t ( θ , v ) represents the logarithm of the joint lik eliho o d for the observed customer arriv als and purc hase c hoices up to time t . Its gradien ts with resp ect to the parameters ( θ , v ) are given by: ∇ θ L Poi t ( θ ) = − t X s =1  Λ λ ( S s , p s ; θ ) − n s  x ( S s , p s ) , ∇ v L MNL t ( v ) = − t X s =1 n s X i =1 X j ∈ S s  q ( j, S s , p s , z j s ; v ) − 1 { C ( i ) s = j }  z j s . W e denote the ground truth of θ and v as θ ∗ and v ∗ , resp ectively , and their corresp onding estimates at the end of p erio d t , which maximize the log-lik eliho o d, as ˆ θ t and ˆ v t . The calculation of the ﬁrst and second deriv ativ es of the log-likelihoo d ratio with resp ect to the unknown parameters is cumbersome and tedious. T o facilitate the pro ofs, we deﬁne several terms for ∀ θ ∈ R d x and ∀ v ∈ R d z for these unknown parameters. Definition EC.1. F or the P oisson arriv al pro cess, we deﬁne the follo wing terms: ¯ G Poi t ( θ ) := t X s =1 ¯ g Poi s ( θ ) , where ¯ g Poi s ( θ ) := log P  P oisson  Λ λ ( S s , p s ; θ )  = n s  − log P  P oisson  Λ λ ( S s , p s ; θ ∗ )  = n s  =Λ  λ ( S s , p s ; θ ∗ ) − λ ( S s , p s ; θ )  − n s  log λ ( S s , p s ; θ ∗ ) − log λ ( S s , p s ; θ )  , G Poi t ( θ ) := t X s =1 g Poi s ( θ ) , where ec4 e-companion to Author: Poisson MNL g Poi s ( θ ) := E  ¯ g Poi s ( θ )   H s  =Λ  λ ( S s , p s ; θ ∗ ) − λ ( S s , p s ; θ )  − Λ λ ( S s , p s ; θ ∗ )  log λ ( S s , p s ; θ ∗ ) − log λ ( S s , p s ; θ )  , ¯ I Poi t ( θ ) := t X s =1 ¯ M Poi s ( θ ) , where ¯ M Poi s ( θ ) := −∇ 2 θ log P  P oisson  Λ λ ( S s , p s ; θ )  = n s  = Λ λ ( S s , p s ; θ ) · x ( S s , p s ) x ( S s , p s ) ⊤ . T o verify the ab ov e deﬁnitions and equations, recall that the parameter gov erning customer arriv als is expressed as λ ( S t , p t ; θ ) = exp( θ ⊤ x ( S t , p t )), where x ( S t , p t ) represents a set of suﬃcient statistics encapsu- lating the impact of b oth the assortmen t S t and prices p t on customer arriv als. Here, ¯ g Poi s ( θ ) represen ts the diﬀerence in the logarithm of the Poisson probability b etw een θ and its ground truth θ ∗ , while ¯ G Poi t ( θ ) denotes the cumulativ e sum of these diﬀerences up to p erio d t . The term g Poi s ( θ ) is the conditional exp ectation of ¯ g Poi s ( θ ) giv en the history H s , and G Poi t ( θ ) is its cumu- lativ e counterpart. Notably , − G Poi t ( θ ) aligns with the Kullbac k-Leibler div ergence b etw een the conditional distributions parameterized b y θ and θ ∗ since the data generation pro cess is gov erned b y the true parameters, making G Poi t ( θ ) inheren tly non-positive. F urthermore, ¯ G Poi t ( θ ) serves as the empirical counterpart to G Poi t ( θ ), and their diﬀerence constitutes a martingale: E [ ¯ G Poi t ( θ ) − G Poi t ( θ ) | H t ] = ¯ G Poi t − 1 ( θ ) − G Poi t − 1 ( θ ) . Lastly , ¯ I Poi t ( θ ) is the Fisher Information matrix related to the maximum lik eliho o d estimation (MLE) of the sequence { n 1 , n 2 , . . . , n t } . This matrix is symmetrical and positively deﬁnite. Giv en that customers hav e arrived, their pro duct choices are en tirely characterized by the parameter v . F or an y v ∈ R N , w e deﬁne a set of analogous functions and terms: ¯ g MNL s ( v ), ¯ G MNL t ( v ), g MNL s ( v ), and G MNL t ( v ). These functions mirror the role of their counterparts in customer arriv als, oﬀering a comprehensiv e framew ork for analyzing and understanding the dynamics of consumer b ehavior in the context of pro duct c hoice. Definition EC.2. W e deﬁne log-likelihoo d ratio and the second order deriv ative of the log-likelihoo d ratio with resp ect to the parameter v , ¯ G MNL t ( v ) := t X s =1 ¯ g MNL s ( v ) , where ¯ g MNL s ( v ) := n s X i =1 X j ∈ S s ∪{ 0 } 1 { C ( i ) s = j } log q ( j, S s , p s , z t ; v ) q ( j, S s , p s , z t ; v ∗ ) , G MNL t ( v ) := t X s =1 g MNL s ( v ) , where g MNL s ( v ) := Λ λ ( S s , p s ; θ ∗ ) X j ∈ S s ∪{ 0 } q ( j, S s , p s , z t ; v ∗ ) log q ( j, S s , p s , z t ; v ) q ( j, S s , p s , z t ; v ∗ ) , ¯ I MNL t ( v ) := t X s =1 ¯ M MNL s ( v ) , where ¯ M MNL s ( v ) := − ∇ 2 v n s X i =1 X j ∈ S s 1 { C ( i ) s = j } log q ( j , S s , p s , z s ; v ) = n s X j ∈ S s q ( j, S s , p s , z s ; v ) z j s z ′ j s − n s X j,k ∈ S s q ( j, S s , p s , z s ; v ) q ( k , S s , p s , z s ; v ) z j s z ′ ks . Here, ¯ g MNL s ( v ) captures the diﬀerence in the log-lik eliho o d ratio b etw een v and v ∗ , while ¯ G MNL t ( v ) denotes its cumulativ e sum up to p erio d t . The term g MNL s ( v ) is the conditional expectation of ¯ g MNL s ( v ) given the history H s , and G MNL t ( v ) is its cumulativ e coun terpart. Similar to the Poisson case, − G MNL t ( v ) aligns with the Kullback-Leibler div ergence b etw een the conditional distributions parameterized by v and v ∗ , making G MNL t ( v ) inherently non-p ositiv e. Additionally , ¯ G MNL t ( v ) serves as the empirical counterpart to G MNL t ( v ), and their diﬀerence constitutes a martingale: E [ ¯ G MNL t ( v ) − G MNL t ( v ) | H t ] = ¯ G MNL t − 1 ( v ) − G MNL t − 1 ( v ) . Lastly , ¯ I MNL t ( v ) is the Fisher Information matrix related to the maximum lik eliho o d estimation (MLE) of the c hoice sequence { C ( i ) 1 , C ( i ) 2 , . . . , C ( i ) t } . This matrix is symmetrical and positively deﬁnite. e-companion to Author: Poisson MNL ec5 EC.5 Pro of of Lemma 2 This section pro vides a detailed pro of for Lemma 2, whic h fo cuses on narro wing down the initial estimation for θ . Before diving in to the pro of of the lemma, we ﬁrst introduce tw o auxiliary lemmas, Lemma EC.11 and Lemma EC.12, which will b e instrumental in the main pro of. The structure of this section is as follows: w e b egin by presenting the tw o lemmas, follow ed by the pro of of Lemma 2. Finally , we provide the pro ofs for the auxiliary lemmas, Lemma EC.11 and Lemma EC.12. Lemma EC.11. Supp ose y satisﬁes the ine quality y ≤ a + p b ( y + c ) , then it fol lows that y ≤ b + 2 a + c. Lemma EC.12. Supp ose ∥ θ − θ ∗ ∥ ≤ 2 τ , wher e τ > 0 . L et c 4 b e deﬁne d in Equation (29) . Then, the fol- lowing holds: 1. F or any ﬁxe d t : ¯ G Poi t ( θ ) − G Poi t ( θ ) ≤ t exp( ¯ x ) T 2 + 4 log( T ) Λ exp( ¯ x ) t + s 4 log( T ) Λ exp( ¯ x ) t ! + 4 τ ¯ xc 4  log T + d x log(6 τ ¯ x (Λ T + 1))  + s  | G Poi t ( θ ) | + 2 t T  8  log T + d x log(6 τ ¯ x (Λ T + 1))  ∀ ∥ θ − θ ∗ ∥ ≤ 2 τ , (EC.3) with pr ob ability 1 − 2 T − 1 . 2. F or T 0 < t ≤ T : ¯ G Poi t ( θ ) − G Poi t ( θ ) ≤ 2 exp( ¯ x ) + r 8 exp( ¯ x ) log T T Λ + 8 log T T Λ + 4 τ ¯ xc 4  2 log T + d x log(6 τ ¯ x (Λ T + 1))  + q ( | G Poi t ( θ ) | + 2) 8  2 log T + d x log(6 τ ¯ x (Λ T + 1))  , (EC.4) holds uniformly with pr ob ability 1 − 2 T − 1 , for al l ∥ θ − θ ∗ ∥ ≤ 2 τ . No w, let us detail the pro of of Lemma 2. Pro of. It is easy to v erify that: ∇ θ G Poi T 0 ( θ ) = Λ T 0 X s =1  λ ( S s , p s ; θ ∗ ) − λ ( S s , p s ; θ )  x s , ∇ 2 θ G Poi T 0 ( θ ) = − Λ T 0 X s =1 λ ( S s , p s ; θ ) x s x ⊤ s = − I Poi T 0 ( θ ) . Note that: G Poi T 0 ( θ ∗ ) = 0 , ∇ θ G Poi T 0 ( θ ∗ ) = 0 , and ∇ 2 θ G Poi T 0 ( θ ∗ ) = − I Poi T 0 ( θ ∗ ) . Using the T aylor expansion with Lagrange remainder, there exists some ¯ θ = α θ ∗ + (1 − α ) b θ for α ∈ (0 , 1) suc h that: G Poi T 0 ( b θ ) = − 1 2  b θ − θ ∗  ⊤ I Poi T 0  ¯ θ   b θ − θ ∗  . (EC.5) F rom Equation (11) in Assumption 6, w e hav e I Poi T 0 ( ¯ θ ) ⪰ T 0 Λ exp( − ¯ x ) σ 1 I d x × d x , where I d x × d x denotes the identit y matrix of dimension d x . Th us, w e ha ve: − G Poi T 0 ( b θ ) ≥ 1 2 T 0 Λ exp( − ¯ x ) σ 1 · ∥ b θ − θ ∗ ∥ 2 2 . (EC.6) Next, we provide an upper b ound for − G Poi T 0 ( b θ ), whic h will then b e used in conjunction with inequalit y (EC.6) to b ound ∥ b θ − θ ∗ ∥ 2 2 . Using the facts that G Poi T 0  b θ  ≤ 0 and ¯ G Poi T 0  b θ  ≥ 0, along with inequality (EC.3) from Lemma EC.12, we deriv e:    G Poi T 0  b θ     ≤ ¯ G Poi T 0  b θ  − G Poi T 0  b θ  ≤ T 0 exp( ¯ x ) T 2 + 4 log ( T ) Λ exp( ¯ x ) T 0 + s 4 log ( T ) Λ exp( ¯ x ) T 0 ! + 4 ¯ xc 4  log T + d x log(3 ¯ x (Λ T + 1))  + s     G Poi T 0 ( b θ )    + 2 T 0 T  4  log T + d x log(3 ¯ x (Λ T + 1))  , ec6 e-companion to Author: Poisson MNL whic h holds with probability 1 − 2 /T . In the second inequality , we use the result from Lemma EC.12 with τ = 1 2 , whic h guarantees that ∥ b θ − θ ∗ ∥ ≤ 1 = 2 τ , thereby ensuring the v alidity of the b ound under this setting. Com bining the ab o v e inequalit y and Lemma EC.11, we obtain an upper b ound for    G Poi T 0  b θ t     ≤ 2 T 0 exp( ¯ x ) T 2 + 4 log ( T ) Λ exp( ¯ x ) T 0 + s 4 log ( T ) Λ exp( ¯ x ) T 0 ! + 2 T 0 T + (8 ¯ xc 4 + 4)  log T + d x log(3 ¯ x (Λ T + 1))  , with a probability 1 − 2 /T . Giv en that G Poi T 0  b θ t  ≤ 0 and by using Equation (EC.6), we hav e, with a probabilit y 1 − 2 T − 1 , 1 2 T 0 Λ exp( − ¯ x ) σ 1 || b θ − θ ∗ || 2 2 ≤ − G Poi T 0  b θ t  ≤ 2 T 0 exp( ¯ x ) T 2 + 4 log ( T ) Λ exp( ¯ x ) T 0 + s 4 log ( T ) Λ exp( ¯ x ) T 0 ! + 2 T 0 T + (8 ¯ xc 4 + 4)  log T + d x log(3 ¯ x (Λ T + 1))  . Th us, w e conclude: ∥ b θ − θ ∗ ∥ 2 2 ≤ 4 exp(2 ¯ x ) T Λ σ 1 2 + 4 log ( T ) Λ exp( ¯ x ) T 0 + s 4 log ( T ) Λ exp( ¯ x ) T 0 ! + 4 exp( ¯ x ) T Λ σ 1 + 2(8 ¯ xc 4 + 4) exp( ¯ x ) T 0 Λ σ 1  log T + d x log(3 ¯ x (Λ T + 1))  , with probabilit y 1 − 2 T − 1 . Q.E.D. No w, w e turn to the pro of for Lemmas EC.11 and EC.12. EC.5.1 Pro of of Lemma EC.11 Lemma EC.11 follo ws directly using the inequality p b ( y + c ) ≤ b + y + c 2 . Substituting this in to the given condition, w e obtain y ≤ a + b + y + c 2 . Rearranging terms giv es y ≤ b + 2 a + c. This completes the pro of of Lemma EC.11. Q.E.D. EC.5.2 Pro of of Lemma EC.12 This subsection details the pro of of Lemma EC.12. Before diving into the details, w e ﬁrst in tro duce three auxiliary lemmas: Lemma EC.13, Lemma EC.14 and Lemma EC.15. In this subsection, w e present the pro of of Lemma EC.12 assuming the v alidity of Lemmas EC.13 to EC.15 The pro ofs of Lemmas EC.13 to EC.15 are deferred to the end of this subsection. F or ∥ θ − θ ∗ ∥ 2 ≤ 2 τ , let us ﬁrst deﬁne: V P oi s := V ar  ( n s − Λ λ ( S s , p s ; θ ∗ )) log λ ( S s , p s ; θ ∗ ) λ ( S s , p s ; θ )     H s  = Λ λ ( S s , p s ; θ ∗ ) log 2 λ ( S s , p s ; θ ∗ ) λ ( S s , p s ; θ ) , S V P oi t := t X s =1 V P oi s . (EC.7) Lemma EC.13. F or al l k > 2 , we have the fol lowing ine quality: E    ( n s − Λ λ ( S s , p s ; θ ∗ )) log λ ( S s , p s ; θ ∗ ) λ ( S s , p s ; θ )   k     H s  ≤ k ! 2 V P oi s ( τ ¯ xc 4 ) k − 2 . (EC.8) Her e, c 4 is deﬁne d Equation (29) . Lemma EC.14. S V P oi t ≤ 2 | G P oi t ( θ ) | . Lemma EC.15. If ∥ θ − θ ′ ∥ 2 ≤ ϵ , then:   G Poi t ( θ )   ≤   G Poi t ( θ ′ )   + Λ exp( ¯ x )  exp( ϵ ¯ x ) − 1 + ϵ ¯ x  t. (EC.9) e-companion to Author: Poisson MNL ec7 Using Lemma EC.13 (i.e., conditional Bernstein condition), it’s easy to chec k that the conditions of Theorem 1.2A in de la P ena (1999) hold. Therefore, we hav e for all x, y > 0: P  ¯ G Poi t ( θ ) − G Poi t ( θ ) ≥ x, S V P oi t ≤ y  ≤ exp  − x 2 2( y + τ ¯ xc 4 x )  , (EC.10) Using Lemma EC.14 and setting y = 2 | G Poi t ( θ ) | , the inequalit y Equation (EC.10) b ecomes: P  ¯ G Poi t ( θ ) − G Poi t ( θ ) ≥ x  ≤ exp  − x 2 2(2 | G Poi t ( θ ) | + τ ¯ xc 4 x )  . Setting the right-hand side to b e δ , we derive the follo wing inequalit y for each θ ∈ { θ : ∥ θ − θ ∗ ∥ 2 ≤ 2 τ } : P ¯ G Poi t ( θ ) − G Poi t ( θ ) ≥ r 8 | G Poi t ( θ ) | log 1 δ + 4 τ ¯ xc 4 log 1 δ ! ≤ δ. (EC.11) F or any ϵ > 0, let H ( ϵ ) b e a ﬁnite cov ering of { θ ∈ R d x : ∥ θ − θ ∗ ∥ 2 ≤ 2 τ } in ∥ · ∥ 2 up to precision ϵ . That is, for ∀ θ ′ ∈ { θ : ∥ θ − θ ∗ ∥ 2 ≤ 2 τ } , there exists θ ∈ H ( ϵ ) such that ∥ θ − θ ′ ∥ 2 ≤ ϵ . By standard cov ering n umber argumen ts in v an de Geer (2000), such a ﬁnite co v ering set H ( ϵ ) exists, and its size can b e upp er b ounded as: |H ( ϵ ) | ≤  6 τ ϵ  d x . F or each θ ∈ H ( ϵ ), the set satisfying the probability condition in (EC.11) has probabilit y 2 δ . Thus, we obtain: P ¯ G Poi t ( θ ) − G Poi t ( θ ) ≥ r 8 | G Poi t ( θ ) | log 1 δ + 4 τ ¯ xc 4 log 1 δ , ∀ θ ∈ H ( ϵ ) ! ≤ δ |H ( ϵ ) | . (EC.12) Pro of of P art 1 in Lemma EC.12: Set δ = 1 T  ϵ 6 τ  d x . Then, with probability 1 − T − 1 , w e hav e: ¯ G Poi t ( θ ) − G Poi t ( θ ) ≤ 4 τ ¯ xc 4  log T + d x log(6 τ /ϵ )  + q | G Poi t ( θ ) | 8  log T + d x log(6 τ /ϵ )  , ∀ θ ∈ H ( ϵ ) . (EC.13) Giv en that the inequality holds for a ﬁnite set of θ ∈ H ( ϵ ), w e now extend it to the general case where | ¯ G Poi t ( θ ′ ) − G Poi t ( θ ′ ) | holds for all θ ′ and is well-bounded from ab o ve with a probabilit y 1 − 2 T − 1 (see the ﬁnal part, inequalit y EC.3, of the proof ). F or θ ′ , w e can ﬁnd a θ ∈ H ( ϵ ) such that ∥ θ − θ ′ ∥ 2 ≤ ϵ . Using this, w e deriv e the following inequality: ¯ G Poi t ( θ ′ ) − G Poi t ( θ ′ ) ≤ ¯ G Poi t ( θ ) − G Poi t ( θ ) + t X s =1     n s log  λ ( S s , p s ; θ ′ ) λ ( S s , p s ; θ )      + Λ t X s =1     λ ( S s , p s ; θ ∗ ) log  λ ( S s , p s ; θ ′ ) λ ( S s , p s ; θ )      . (EC.14) The inequalit y EC.14 can b e deriv ed b y observing that: ¯ g Poi s ( θ ) − g Poi s ( θ ) = (Λ λ ( S s , p s ; θ ∗ ) − n s )  log λ ( S s , p s ; θ ∗ ) − log λ ( S s , p s ; θ )  , whic h holds for b oth θ and θ ′ . Consequen tly , we hav e:  ¯ G Poi t ( θ ′ ) − G Poi t ( θ ′ )  −  ¯ G Poi t ( θ ) − G Poi t ( θ )  = t X s =1 (Λ λ ( S s , p s ; θ ∗ ) − n s )  log λ ( S s , p s ; θ ) − log λ ( S s , p s ; θ ′ )  . (EC.15) By applying the triangle inequality directly to (EC.15), we can derive the inequality EC.14. No w, to b ound ¯ G Poi t ( θ ′ ) − G Poi t ( θ ′ ) for all θ ′ , we only need to b ound each term on the right-hand side of inequalit y EC.14. Sp eciﬁcally , we fo cus on b ounding the following terms: 1. ¯ G Poi t ( θ ) − G Poi t ( θ ) for θ ∈ H ( ϵ ), which has already b een con trolled by (EC.13). 2. The second term: t X s =1     n s log  λ ( S s , p s ; θ ′ ) λ ( S s , p s ; θ )      . 3. The third term: Λ t X s =1     λ ( S s , p s ; θ ∗ ) log  λ ( S s , p s ; θ ′ ) λ ( S s , p s ; θ )      . Bounding these terms ensures that ¯ G Poi t ( θ ′ ) − G Poi t ( θ ′ ) is w ell con trolled for all θ ′ , completing the necessary analysis. ec8 e-companion to Author: Poisson MNL • F or the ﬁrst part of the inequality (EC.14) , w e use inequality EC.13 and substitute | G Poi t ( θ ) | with | G Poi t ( θ ′ ) | . It is worth mentioning that we need to bound the ﬁrst part using G Poi t ( θ ′ ) rather than G Poi t ( θ ). T o ac hieve this, we use Lemma EC.15, whic h will b e prov ed in Section EC.5.6. • F or the second part of inequalit y (EC.14) , we hav e: t X s =1     n s log  λ ( S s , p s ; θ ′ ) λ ( S s , p s ; θ )      ≤ ¯ xϵ t X s =1 n s , where a random realization of P t s =1 n s needs to b e b ounded in probability . Using the result from Chapter 3.5 of Pollard (2015) and noting that λ ( S s , p s ; θ ) ≤ exp( ¯ x ) for all ∥ θ ∗ ∥ ≤ 1, w e ha v e: P t X s =1 n s ≥ Λ exp( ¯ x ) t (1 + δ ) ! ≤ exp  − δ 2 Λ exp( ¯ x ) t 2(1 + δ )  . (EC.16) Set δ = 4 log( T ) Λ exp(¯ x ) t + q 4 log( T ) Λ exp(¯ x ) t . Substituting this into the b ound, w e obtain: P t X s =1 n s ≥ Λ exp( ¯ x ) t 1 + 4 log ( T ) Λ exp( ¯ x ) t + s 4 log ( T ) Λ exp( ¯ x ) t !! ≤ 1 T . (EC.17) Th us, with probability 1 − T − 1 , w e hav e: t X s =1     n s log  λ ( S s , p s ; θ ′ ) λ ( S s , p s ; θ )      ≤ Λ ¯ x exp( ¯ x ) t 1 + 4 log ( T ) Λ exp( ¯ x ) t + s 4 log ( T ) Λ exp( ¯ x ) t ! ϵ. (EC.18) • F or the third part of inequalit y (EC.14) , it is straightforw ard to calculate: Λ t X s =1     λ ( S s , p s ; θ ∗ ) log  λ ( S s , p s ; θ ′ ) λ ( S s , p s ; θ )      ≤ Λ ¯ xϵ exp { ¯ x } t. (EC.19) Note that the upp er b ounds for the ﬁrst, second, and third parts (see (EC.13), (EC.18), and (EC.19)) all include ϵ . Carefully setting ϵ = log(1 + 1 / Λ T ) / ¯ x , we hav e Λ ¯ xϵt ≤ t/T and exp( ϵ ¯ x ) − 1 = 1 Λ T . F or ∥ θ ′ − θ ∗ ∥ 2 ≤ 2 τ , let θ b e the closest p oint in H ( ϵ ). By the deﬁnition of H ( ϵ ), w e hav e ∥ θ − θ ′ ∥ 2 ≤ ϵ . Com bining inequalities (EC.11), (EC.18), and (EC.19) with Lemma EC.15, we obtain the following with probabilit y 1 − 2 T − 1 : ¯ G Poi t ( θ ′ ) − G Poi t ( θ ′ ) ≤ ¯ G Poi t ( θ ) − G Poi t ( θ ) + t X s =1 n s log  λ ( S s , p s ; θ ′ ) λ ( S s , p s ; θ )  + Λ t X s =1     λ ( S s , p s ; θ ∗ ) log  λ ( S s , p s ; θ ′ ) λ ( S s , p s ; θ )      ≤ t exp( ¯ x ) T 2 + 4 log ( T ) Λ exp( ¯ x ) t + s 4 log ( T ) Λ exp( ¯ x ) t ! + 4 τ ¯ xc 4  log T + d x log(6 τ ¯ x (Λ T + 1))  + s  | G Poi t ( θ ′ ) | + 2 t T  8  log T + d x log(6 τ ¯ x (Λ T + 1))  , ∀ θ ′ ∈ { θ : ∥ θ − θ ∗ ∥ 2 ≤ 2 τ } . Th us, w e complete the pro of for the ﬁrst part of Lemma EC.12. Q.E.D. Pro of of the Second P art of the Lemma EC.12 Similar to the pro of of the ﬁrst part of Lemma EC.12, we set δ = 1 T 2  ϵ 6 τ  d x in Equation (EC.12). F or ∀ T 0 < t ≤ T and ∀ θ ∈ H ( ϵ ), we obtain: ¯ G Poi t ( θ ) − G Poi t ( θ ) ≤ 4 τ ¯ xc 4  2 log T + d x log(6 τ /ϵ )  + q | G Poi t ( θ ) | 8  2 log T + d x log(6 τ /ϵ )  , (EC.20) with probabilit y 1 − T − 1 . e-companion to Author: Poisson MNL ec9 F ollo wing a similar pro cedure to the ﬁrst part of the pro of for Lemma EC.12, we extend this result to the general case θ ′ ∈ { θ : ∥ θ − θ ∗ ∥ 2 ≤ 2 τ } . Sp eciﬁcally: ¯ G Poi t ( θ ′ ) − G Poi t ( θ ′ ) ≤ 4 τ ¯ xc 4  2 log T + d x log(6 τ /ϵ )  + q | G Poi t ( θ ′ ) | 8  2 log T + d x log(6 τ /ϵ )  , (EC.21) holds uniformly for ∀ θ ′ ∈ { θ : ∥ θ − θ ∗ ∥ 2 ≤ 2 τ } and ∀ T 0 < t ≤ T , with probabilit y 1 − T − 1 . T o ac hieve these uniform results for all t , we use the same triangular inequalit y (EC.14) and take an upper b ound for each term. • F or the ﬁrst part of inequality (EC.14) , we directly use inequality (EC.21) along with Lemma EC.15 to b ound this term. • F or the second part of inequalit y (EC.14) , note that for arbitrary ∥ θ ′ − θ ∥ 2 ≤ ϵ : t X s =1     n s log  λ ( S s , p s ; θ ′ ) λ ( S s , p s ; θ )      ≤ ¯ xϵ t X s =1 n s . (EC.22) Similar to Equation (EC.16), we obtain the follo wing concentration inequality: P t X s =1 n s ≥ Λ exp( ¯ x ) t 1 + 8 log ( T ) Λ exp( ¯ x ) t + s 8 log ( T ) Λ exp( ¯ x ) t !! ≤ 1 T 2 . (EC.23) This bound holds for each t suc h that t ∈ { T 0 + 1 , . . . , T } . Th us, w e ha ve the following inequality , P t X s =1 n s ≥ Λ exp( ¯ x ) t 1 + 8 log ( T ) Λ exp( ¯ x ) t + s 8 log ( T ) Λ exp( ¯ x ) t ! , ∀ T 0 < t ≤ T ! ≤ 1 T . (EC.24) • F or the third part of the inequalit y (EC.14) . Note that, hav e the inequality , Λ t X s =1     λ ( S s , p s ; θ ∗ ) log λ ( S s , p s ; θ ′ ) λ ( S s , p s ; θ )     ≤ Λ ¯ xϵ exp { ¯ x } t. (EC.25) Similarly , note that the upp er b ound of the ﬁrst, second, and third parts (see (EC.21), (EC.25), and (EC.24)) includes ϵ . W e carefully set ϵ = log(1 + 1 / Λ T ) / ¯ x , and combining inequalities EC.21, (EC.25), and (EC.24), w e hav e, with probability 1 − 2 T − 1 , that: ¯ G Poi t ( θ ′ ) − G Poi t ( θ ′ ) ≤ ¯ G Poi t ( θ ) − G Poi t ( θ ) + t X s =1     n s log  λ ( S s , p s ; θ ′ ) λ ( S s , p s ; θ )      + Λ t X s =1     λ ( S s , p s ; θ ∗ ) log  λ ( S s , p s ; θ ′ ) λ ( S s , p s ; θ )      ≤ exp( ¯ x ) t T 2 + 8 log ( T ) Λ exp( ¯ x ) t + s 8 log ( T ) Λ exp( ¯ x ) t ! + 4 τ ¯ xc 4  2 log T + d x log(3 τ ¯ x (Λ T + 1))  + s  | G Poi t ( θ ′ ) | + 2 t T  8  2 log T + d x log(3 τ ¯ x (Λ T + 1))  ≤ 2 exp( ¯ x ) + r 8 exp( ¯ x ) log T T Λ + 8 log T T Λ + 4 τ ¯ xc 4  2 log T + d x log(6 τ ¯ x (Λ T + 1))  + q ( | G Poi t ( θ ′ ) | + 2) 8  2 log T + d x log(6 τ ¯ x (Λ T + 1))  , ∀ T 0 < t ≤ T . Th us, w e conclude the pro of for part 2 of Lemma EC.12. Q.E.D. EC.5.3 Pro ofs for Supp orti ng Lemmas EC.13, EC.14 and EC.15 W e no w provide the pro ofs for the lemmas:Lemmas EC.13 to EC.15. ec10 e-companion to Author: Poisson MNL EC.5.4 Pro of of Lemma EC.13 In this subsection, we provide the pro of for Lemma EC.13 under the case k > 2, which is the primary fo cus. F rom Ahle (2022), the following b ound holds: E [ n k s | H s ] ≤  k log(1 + k / (Λ λ ( S s , p s ; θ ∗ )))  k . Since n s ≥ 0, this implies: E  | n s − Λ λ ( S s , p s ; θ ∗ ) | k | H s  ≤ E [ n k s | H s ] ≤  k log(1 + k / (Λ λ ( S s , p s ; θ ∗ )))  k . As λ ( S s , p s ; θ ∗ ) ≤ exp( ¯ x ), w e obtain: E  | n s − Λ λ ( S s , p s ; θ ∗ ) | k | H s  ≤  k log(1 + k / (Λ exp( ¯ x )))  k . (EC.26) Using the deﬁnition of the constan t c 4 (see (29)), we derive the following inequality for all k > 2:  k log(1 + k / (Λ exp( ¯ x )))  k ≤ √ 2 π k 2  k e  k Λ λ ( S s , p s ; θ ∗ )  c 4 2  k − 2 . (EC.27) Using Stirling’s approximation, we further reﬁne the b ound: √ 2 π k 2  k e  k Λ λ ( S s , p s ; θ ∗ )  c 4 2  k − 2 ≤ k ! 2 Λ λ ( S s , p s ; θ ∗ )  c 4 2  k − 2 . (EC.28) Com bining these results, we conclude: E  | n s − Λ λ ( S s , p s ; θ ∗ ) | k | H s  ≤ k ! 2 Λ λ ( S s , p s ; θ ∗ )  c 4 2  k − 2 . Since ∥ θ − θ ∗ ∥ 2 ≤ 2 τ , it follo ws that    log λ ( S s , p s ; θ ∗ ) λ ( S s , p s ; θ )    k − 2 ≤ (2 τ ¯ x ) k − 2 . Th us, we hav e: E      ( n s − Λ λ ( S s , p s ; θ ∗ )) log λ ( S s , p s ; θ ∗ ) λ ( S s , p s ; θ )     k     H s  ≤ k ! 2 Λ λ ( S s , p s ; θ ∗ ) log 2 λ ( S s , p s ; θ ∗ ) λ ( S s , p s ; θ )  c 4 2  k − 2 (2 τ ¯ x ) k − 2 . By substituting the deﬁnition of V Poi s from (EC.7), the inequality ab ov e matches (EC.8): E      ( n s − Λ λ ( S s , p s ; θ ∗ )) log λ ( S s , p s ; θ ∗ ) λ ( S s , p s ; θ )     k     H s  ≤ k ! 2 V Poi s ( τ ¯ xc 4 ) k − 2 . Q.E.D. EC.5.5 Pro of of Lemma EC.14 W e begin by considering the expression for V Poi s . Starting from its deﬁnition, w e ha v e: V Poi s = E  n 2 s  log λ ( S s , p s ; θ ) − log λ ( S s , p s ; θ ∗ )  2 | H s  − E  n s  log λ ( S s , p s ; θ ) − log λ ( S s , p s ; θ ∗ )  | H s  2 . Using the deﬁnition of λ ( S s , p s ; θ ) and substituting the terms, we get: V Poi s =(Λ 2 λ ( S s , p s ; θ ∗ ) 2 + Λ λ ( S s , p s ; θ ∗ )) log 2 y − Λ 2 λ ( S s , p s ; θ ∗ ) 2 log 2 y =Λ λ ( S s , p s ; θ ∗ ) log 2 y . No w, let’s consider g Poi s ( θ ), whic h can b e expressed as: g Poi s ( θ ) = Λ λ ( S s , p s ; θ ∗ ) − Λ λ ( S s , p s ; θ ) − Λ λ ( S s , p s ; θ ∗ )  log λ ( S s , p s ; θ ∗ ) − log λ ( S s , p s ; θ )  = − Λ λ ( S s , p s ; θ ∗ )( y − 1 − log y ) , where y = λ ( S s , p s ; θ ) λ ( S s , p s ; θ ∗ ) . Next, w e use the inequality log 2 y ≤ 2( y − 1 − log y ) , which directly leads to: V Poi s ≤ Λ λ ( S s , p s ; θ ∗ ) log 2 y ≤ 2  − g Poi s ( θ )  . (EC.29) Finally , summing ov er all p erio ds, we conclude S V Poi t ≤ 2  − G Poi t ( θ )  . Q.E.D. e-companion to Author: Poisson MNL ec11 EC.5.6 Pro of of Lemma EC.15 W e begin by considering the diﬀerence betw een g Poi s ( θ ) and g Poi s ( θ ′ ): | g Poi s ( θ ) − g Poi s ( θ ′ ) | = Λ     λ ( S s , p s ; θ ′ ) − λ ( S s , p s ; θ ) + λ ( S s , p s ; θ ∗ ) log λ ( S s , p s ; θ ) λ ( S s , p s ; θ ′ )     ≤ Λ | λ ( S s , p s ; θ ′ ) − λ ( S s , p s ; θ ) | + Λ     λ ( S s , p s ; θ ∗ ) log λ ( S s , p s ; θ ) λ ( S s , p s ; θ ′ )     ≤ Λ exp( ¯ x )(exp( ϵ ¯ x ) − 1) + Λ exp( ¯ x ) ϵ ¯ x = Λ exp( ¯ x ) (exp( ϵ ¯ x ) − 1 + ϵ ¯ x ) . Summing the inequalities, we hav e | G Poi t ( θ ) − G Poi t ( θ ′ ) | ≤ Λ exp( ¯ x ) (exp( ϵ ¯ x ) − 1 + ϵ ¯ x ) t. Q.E.D. EC.6 Pro of of Lemma 4 It is easy to calculate that ∇ θ G Poi t ( θ ) = t X s =1  λ ( S s , p s ; θ ∗ ) − λ ( S s , p s ; θ )  x s , ∇ 2 θ G Poi t ( θ ) = t X s =1 λ ( S s , p s ; θ ) x s x ′ s = − I Poi t ( θ ) . Note that G Poi t ( θ ∗ ) = 0, ∇ θ G Poi t ( θ ∗ ) = 0, and ∇ 2 θ G Poi t ( θ ∗ ) = − I Poi t ( θ ∗ ). Using T a ylor expansion at the p oint θ ∗ with Lagrangian remainder, there exists ¯ θ t = α θ ∗ + (1 − α ) b θ t for some α ∈ (0 , 1) such that G Poi t ( b θ t ) = − 1 2  b θ t − θ ∗  ⊤ I Poi t  ¯ θ t   b θ t − θ ∗  . (EC.30) F rom   ¯ θ t − θ ∗   2 ≤ 2 τ θ and ∥ x ′ ( ¯ θ t − θ ∗ ) ∥ 2 ≤ ∥ x ∥ 2 ∥ ¯ θ t − θ ∗ ∥ 2 ≤ 2 τ θ ¯ x , w e obtain − M Poi t ( ¯ θ t ) ⪯ − exp {− 2 τ θ ¯ x } M Poi t ( θ ∗ ) whic h indicates − I Poi t ( ¯ θ t ) ⪯ − exp {− 2 τ θ ¯ x } I Poi t ( θ ∗ ) using the deﬁnition of I Poi t ( · ). So w e ha v e G Poi t ( b θ t ) ≤ − 1 2 exp {− 2 τ θ ¯ x }  b θ t − θ ∗  ⊤ I Poi t ( θ ∗ )  b θ t − θ ∗  . (EC.31) Using inequalit y (EC.3) of Lemma EC.12, we hav e an uniform b ound on the diﬀerence b etw een ¯ G Poi t ( θ ) and G Poi t ( θ ) for all ∥ θ − θ ∗ ∥ ≤ 2 τ θ as follo ws: ¯ G Poi t ( b θ t ) − G Poi t ( b θ t ) ≤ 2 exp( ¯ x ) + r 8 exp( ¯ x ) log T T Λ + 8 log T T Λ + 4 τ θ ¯ xc 4  2 log T + d log (6 τ θ ¯ x (Λ T + 1))  + r     G Poi t ( b θ t )    + 2  8  2 log T + d log (6 τ θ ¯ x (Λ T + 1))  , ∀ T 0 < t ≤ T . (EC.32) This holds uniformly with probability 1 − 2 T − 1 for t ∈ { T 0 + 1 , . . . , T } . By Equation (EC.32) and the fact that G Poi t  b θ t  ≤ 0 ≤ ¯ G Poi t  b θ t  , w e hav e    G Poi t  b θ t     ≤ 2 exp( ¯ x ) + r 8 exp( ¯ x ) log T T Λ + 8 log T T Λ + 4 τ θ ¯ xc 4  2 log T + d log (6 τ θ ¯ x (Λ T + 1))  + r     G Poi t ( b θ t )    + 2  8  2 log T + d log (6 τ θ ¯ x (Λ T + 1))  , ∀ T 0 < t ≤ T . (EC.33) Using Lemma EC.11, we hav e    G Poi t  b θ t     ≤ 4 exp( ¯ x ) + 2 r 8 exp( ¯ x ) log T T Λ + 16 log T T Λ + 8 τ θ ¯ xc 4  2 log T + d log (6 τ θ ¯ x (Λ T + 1))  + 2 + 8  2 log T + d log (6 τ θ ¯ x (Λ T + 1))  , ∀ T 0 < t ≤ T . Notice that G Poi t  b θ t  ≤ 0 and combining Equation (EC.31), we hav e  b θ t − θ ∗  ⊤ I Poi t ( θ ∗ )  b θ t − θ ∗  ≤ 2 exp(2 τ θ ¯ x )    G Poi t  b θ t     ≤ 8 exp(2 τ θ ¯ x ) " exp( ¯ x ) + r 2 exp( ¯ x ) log T T Λ + 4 log T T Λ + 1 2 # + 16 exp(2 τ θ ¯ x )( τ θ ¯ xc 4 + 1)  2 log T + d log (6 τ θ ¯ x (Λ T + 1))  . (EC.34) ec12 e-companion to Author: Poisson MNL Similarly ,  b θ t − θ ∗  ⊤ I Poi t  b θ t   b θ t − θ ∗  ≤ 8 exp(2 τ θ ¯ x ) " exp( ¯ x ) + r 2 exp( ¯ x ) log T T Λ + 4 log T T Λ + 1 2 # + 16 exp(2 τ θ ¯ x )( τ θ ¯ xc 4 + 1)  2 log T + d log (6 τ θ ¯ x (Λ T + 1))  . EC.7 Pro of of Lemma 6 No w, let us turn to the pro of for the Lemma 6 which states the estimation error on the P oisson arriv al rate. Unless stated otherwise, all statemen ts are conditioned on the success even t in Lemma 4. On this even t, the follo wing inequalities hold uniformly for all t ∈ { T 0 , . . . , T − 1 } :  b θ t − θ ∗  ⊤ I Poi t ( θ ∗ )  b θ t − θ ∗  ≤ ω θ ,  b θ t − θ ∗  ⊤ I Poi t  b θ t   b θ t − θ ∗  ≤ ω θ , ∥ b θ t − θ ∗ ∥ ≤ 2 τ θ , (EC.35) W e start by noting that the gradien t of λ ( S , p ; θ ) with resp ect to θ is given by: ∇ θ λ ( S, p ; θ ) = λ ( S, p ; θ ) x ⊤ ( S, p ) . (EC.36) Next, by applying the mean v alue theorem, w e know that there exists some e θ t − 1 = θ ∗ + ξ  b θ t − 1 − θ ∗  for some ξ ∈ (0 , 1), such that:    λ ( S, p ; b θ t − 1 ) − λ ( S, p ; θ ∗ )    =    D ∇ θ λ ( S, p ; e θ ) , b θ t − 1 − θ ∗ E    = r  b θ t − 1 − θ ∗  ⊤ h ∇ θ λ ( S, p ; e θ ) ∇ θ λ ( S, p ; e θ ) ⊤ i  b θ t − 1 − θ ∗  = s λ ( S, p ; e θ ) Λ  b θ t − 1 − θ ∗  ⊤ M Poi t ( e θ | S, p )  b θ t − 1 − θ ∗  . (EC.37) Notice that M Poi t ( ˜ θ | S, p ) = Λ exp( x ⊤ ˜ θ ) x ( S, p ) x ⊤ ( S, p ) ⪯ exp { 2 τ θ ¯ x } M Poi t ( θ ∗ | S, p ) and λ ( S, p ; e θ ) ≤ exp( ¯ x ). Combining with Equation (EC.37) and Lemma 4, we obtain: Λ    λ ( S, p ; b θ t − 1 ) − λ ( S, p ; θ ∗ )    = r Λ λ ( S, p ; e θ )  b θ t − 1 − θ ∗  ⊤ I Poi 1 2 t − 1 ( θ ∗ ) I Poi − 1 2 t − 1 ( θ ∗ ) M Poi t ( e θ | S, p ) I Poi − 1 2 t − 1 ( θ ∗ ) I Poi 1 2 t − 1 ( θ ∗ )  b θ t − 1 − θ ∗  ≤ r ω θ Λ λ ( S, p ; e θ )    I Poi t − 1 ( θ ∗ ) − 1 / 2 M Poi t  e θ t − 1 | S, p  I Poi t − 1 ( θ ∗ ) − 1 / 2    op ≤ r ω θ Λ exp((2 τ θ + 1) ¯ x )    I Poi t − 1 ( θ ∗ ) − 1 / 2 M Poi t ( θ ∗ | S, p ) I Poi t − 1 ( θ ∗ ) − 1 / 2    op . Similarly , w e hav e Λ    λ ( S, p ; b θ t − 1 ) − λ ( S, p ; θ ∗ )    ≤ s ω θ Λ exp((2 τ θ + 1) ¯ x )     I Poi − 1 2 t − 1  b θ t − 1  M Poi t  b θ t − 1 | S, p  I Poi − 1 2 t − 1  b θ t − 1  ]     op . (EC.38) Notice that M Poi t ( b θ | S, p ) ⪯ exp { 2 τ θ ¯ x } M Poi t ( θ ∗ | S, p ) and I Poi t ( θ ∗ ) ⪯ exp { 2 τ θ ¯ x } I Poi t ( b θ ),          I Poi − 1 2 t − 1 ( b θ t − 1 ) M Poi t ( b θ t − 1 | S, p ) I Poi − 1 2 t − 1 ( b θ t − 1 )          op ≤ exp { 4 τ θ ¯ x }          I Poi − 1 2 t − 1 ( θ ∗ ) M Poi t ( θ ∗ | S, p ) I Poi − 1 2 t − 1 ( θ ∗ )          op . Th us, w e pro ved the Lemma 6. Q.E.D. EC.8 Pro of of Lemma 8 This subsection provides a detailed pro of of the Lemma 8. e-companion to Author: Poisson MNL ec13 • Pro of of the ﬁrst inequality of Lemma 8 . Denote b A t := I Poi − 1 / 2 t − 1 ( θ ∗ ) M Poi t ( θ ∗ | S t ) I Poi − 1 / 2 t − 1 ( θ ∗ ) as a d -dimensional positive semi-deﬁnite matrix with eigen v alues sorted as σ 1  b A t  ≥ . . . ≥ σ d  b A t  ≥ 0. By applying spectral prop erties, we hav e: T X t = T 0 +1 min  Λ 2 (exp( ¯ x ) − exp( − ¯ x )) 2 ω θ Λ exp((2 τ θ + 1) ¯ x ) ,    I Poi − 1 / 2 t − 1 ( θ ∗ ) M Poi t ( θ ∗ | S t ) I Poi − 1 / 2 t − 1 ( θ ∗ )    op  = T X t = T 0 +1 min  Λ 2 (exp( ¯ x ) − exp( − ¯ x )) 2 ω θ Λ exp((2 τ θ + 1) ¯ x ) , σ 1  b A t   (a) ≤ T X t = T 0 +1 c 6 ω θ Λ exp((2 τ θ + 1) ¯ x ) log  1 + min  Λ 2 (exp( ¯ x ) − exp( − ¯ x )) 2 ω θ Λ exp((2 τ θ + 1) ¯ x ) , σ 1  b A t   ≤ T X t = T 0 +1 c 6 ω θ Λ exp((2 τ θ + 1) ¯ x ) log  1 + σ 1  b A t  where c 6 = Λ 2 (exp( ¯ x ) − exp( − ¯ x )) 2 log(1+ Λ 2 (exp( ¯ x ) − exp( − ¯ x )) 2 ω θ Λ exp((2 τ θ +1) ¯ x ) ) , which is deﬁned in Equation (53) and indep enden t of time t. Here, the inequalit y ( a ) can b e obtained using an inequality that y ≤ log(1 + y ) c log(1 + c ) for 0 ≤ y ≤ c . Next, observ e that: I Poi t ( θ ∗ ) = I Poi t − 1 ( θ ∗ ) + M Poi t ( θ ∗ | S t ) = I Poi t − 1 ( θ ∗ ) 1 / 2 h I d × d + b A t i I Poi t − 1 ( θ ∗ ) 1 / 2 , and therefore I Poi t ( θ ∗ ) = log det I Poi t − 1 ( θ ∗ ) + P d j =1 log  1 + σ j  b A t  . By comparing the last tw o inequalities, w e obtain: T X t = T 0 +1 min  Λ 2 (exp( ¯ x ) − exp( − ¯ x )) 2 ω θ exp(2 τ θ ¯ x ) ,    I Poi − 1 / 2 t − 1 ( θ ∗ ) M Poi t ( θ ∗ | S t ) I Poi − 1 / 2 t − 1 ( θ ∗ )    op  ≤ c 6 ω θ Λ exp((2 τ θ + 1) ¯ x ) log det I Poi T ( θ ∗ ) det I Poi T 0 ( θ ∗ ) , whic h completes the pro of of the ﬁrst inequalit y . • Pro of of the second inequalit y of Lemma 8 Note that σ min  b I Poi T 0 ( θ ∗ )  ≥ Λ exp( − ¯ x ) T 0 σ 1 and tr( I Poi T ( θ ∗ )) ≤ P T s =1 Λ exp( ¯ x ) tr( x s x ⊤ s ) ≤ Λ exp( ¯ x ) ¯ x 2 T . Therefore, w e can b ound log (det I Poi T ( θ ∗ )) ≤ d x log tr( I Poi T ( θ ∗ )) d x ≤ d x log Λ exp( ¯ x ) ¯ x 2 T d x . As a result, we obtain: log det I Poi T ( θ ∗ ) det I Poi T 0 ( θ ∗ ) ≤ d x log Λ exp( ¯ x ) ¯ x 2 T d x + ¯ x − log (Λ T 0 σ 1 ) = d x log ¯ x 2 T d x + ( d x + 1) ¯ x − log ( T 0 σ 1 ) , (EC.39) whic h completes the pro of of the second inequalit y . Q.E.D. EC.9 Pro of of Lemma 3 This section provides pro of of Lemma 3. T o facilitate the pro of, we ﬁrst presen t Lemma EC.16 and then pro vide the pro o f for Lemma 3. W e leav e the pro of for the Lemma EC.16 to the end of this subsection. Lemma EC.16. 1. With pr ob ability as le ast 1 − 2 T − 1 , the fol lowing holds for al l v satisfying ∥ v − v ∗ ∥ 2 ≤ 2 τ : ¯ G MNL T 0 ( v ) − G MNL T 0 ( v ) ≤ exp( ¯ x ) T 0 T 2 + 4 log ( T ) Λ exp( ¯ x ) T 0 + s 4 log ( T ) Λ exp( ¯ x ) T 0 ! + 4 τ (( d z + 1) log T + d z log(6Λ τ )) + 4 τ p T 0 Λ exp( ¯ x )(( d z + 1) log T + d z log(6 τ Λ)) . (EC.40) ec14 e-companion to Author: Poisson MNL 2. With pr ob ability at le ast 1 − 2 T − 1 , the fol lowing holds for al l t ∈ { T 0 + 1 , . . . , T } and al l v satisfying ∥ v − v ∗ ∥ 2 ≤ 2 τ : ¯ G MNL t ( v ) − G MNL t ( v ) ≤ exp( ¯ x ) + r 8 exp( ¯ x ) log T T Λ + 8 log T T Λ + 4 τ c 4 (( d z + 2) log T + d z log(6 τ Λ)) + p ( | G MNL t ( v ) | + 2 exp( ¯ x ))4 c 5 (( d z + 2) log T + d z log(6 τ Λ)) . (EC.41) Using the T aylor expansion with Lagrange remainder and similar argument as in Lemma 2, there exists some ¯ v = α v ∗ + (1 − α ) b v T 0 for α ∈ (0 , 1) such that: G MNL T 0 ( b v T 0 ) = − 1 2 ( b v T 0 − v ∗ ) ⊤ I MNL T 0 ( ¯ v ) ( b v T 0 − v ∗ ) . Note that M MNL T 0 ( b v T 0 ) Λ λ ( S T 0 , p T 0 ; θ ∗ ) = X i ∈ S T 0 q ( i ; b v T 0 ) z i z ⊤ i − X i ∈ S T 0 X j ∈ S T 0 q ( i ; b v T 0 ) q ( j ; b v T 0 ) z i z ⊤ j = X i ∈ S T 0 q ( i ; b v T 0 ) z i z ⊤ i − 1 2 X i ∈ S T 0 X j ∈ S T 0 q ( i ; b v T 0 ) q ( j ; b v T 0 )  z i z ⊤ j + z j z ⊤ i  ⪰ X i ∈ S T 0 q ( i ; b v T 0 ) z i z ⊤ i − 1 2 X i ∈ S T 0 X j ∈ S T 0 q ( i ; b v T 0 ) q ( j ; b v T 0 )  z i z ⊤ i + z j z ⊤ j  = X i ∈ S T 0 q ( i ; b v T 0 ) z i z ⊤ i − X i ∈ S T 0 X j ∈ S T 0 q ( i ; b v T 0 ) q ( j ; b v T 0 ) z i z ⊤ i = X i ∈ S T 0 q ( i ; b v T 0 )   1 − X j ∈ S T 0 q ( j ; b v T 0 )   z i z ⊤ i = X i ∈ S T 0 q ( i ; b v T 0 ) q (0; b v T 0 ) z i z ⊤ i . By Assumption 6, we hav e X i ∈ S T 0 q ( i ; b v T 0 ) q (0; b v T 0 ) z i z ⊤ i ≥ exp( − ¯ v − p h ) ( K exp( ¯ v − p l ) + 1) 2 | {z } κ T 0 σ 0 , whic h further gives σ min  I MNL T 0 ( b v T 0 )  ≥ Λ exp( − ¯ x ) κT 0 σ 0 with probabilit y 1 − T − 1 . Therefore, − G MNL T 0 ( b v T 0 ) ≥ 1 2 Λ exp( − ¯ x ) κT 0 σ 0 ∥ b v T 0 − v ∗ ∥ 2 . (E C.42) Com bining Equation (EC.42), Equation (EC.40) of Lemma EC.16 with τ = 1 (b y ∥ b v T 0 − b v ∗ ∥ ≤ 2), and the straigh tforw ard fact G MNL T 0 ( b v T 0 ) ≤ 0 ≤ ¯ G MNL T 0 ( b v T 0 ), w e hav e with probability 1 − 3 T − 1 the follo wing holds: 1 2 Λ exp( − ¯ x ) κ ∥ b v T 0 − v ∗ ∥ 2 2 ≤ exp( ¯ x ) T σ 0 2 + 4 log ( T ) Λ exp( ¯ x ) T 0 + s 4 log ( T ) Λ exp( ¯ x ) T 0 ! + 4 T 0 σ 0 (( d z + 1) log T + d z log(6Λ)) + 4 T 0 σ 0 v u u t T 0 X s =1 Λ λ ( S s , p s ; θ ∗ )(( d z + 1) log T + d z log(6Λ)) ≤ exp( ¯ x ) T σ 0 2 + 4 log ( T ) Λ exp( ¯ x ) T 0 + s 4 log ( T ) Λ exp( ¯ x ) T 0 ! + 4 T 0 σ 0 (( d z + 1) log T + d z log(6Λ)) + 4 T 0 σ 0 p T 0 Λ exp( ¯ x )(( d z + 1) log T + d z log(6Λ)) . Q.E.D. e-companion to Author: Poisson MNL ec15 EC.9.1 Pro of of Lemma EC.16 This subsection details the pro of of Lemma EC.16. Before diving in to the details, we ﬁrst presen t four lemmas: Lemma EC.17, EC.18, EC.19, and EC.20. W e ﬁrst present the pro of of Lemma EC.16 given Lemmas EC.17 to EC.20 hold. W e leav e the proofs of Lemmas EC.17 to EC.20 to the end of this subsection. F or a given v , w e deﬁne the following v ariance terms: V M N L s :=V ar " n s X i =1 X j ∈ S s 1 { C ( i ) s = j } log q ( j ; v ) q ( j ; v ∗ ) − Λ λ ( S s , p s ; θ ∗ ) X j ∈ S s q ( j ; v ∗ ) log q ( j ; v ) q ( j ; v ∗ )      H s # , =V ar [ ¯ g T 0 − g T 0 ] , and S V M N L t := P t s =1 V M N L s , where q ( j ; v ) is shorthand for P ( C ( i ) s = j | S s , p s , v , z s ). W e need the following lemmas: Lemma EC.17. Supp ose ∥ v (1) − v (2) ∥ 2 ≤ 2 τ . Then for any j ∈ S t , the fol lowing ine quality holds: exp {− 4 τ } ≤ P ( C ( i ) s = j | S s , p s , v (1) , z s ) P ( C ( i ) s = j | S s , p s , v (2) , z s ) ≤ exp { 4 τ } . Lemma EC.18. Supp ose for any j ∈ S t exp {− 4 τ } ≤ P ( C ( i ) s = j | S s , p s , v , z s ) P ( C ( i ) s = j | S s , p s , v ∗ , z s ) ≤ exp { 4 τ } . We have the tail b ound for ¯ G MNL T 0 ( v ) − G MNL T 0 ( v ) as fol lows: P ( ¯ G MNL T 0 ( v ) − G MNL T 0 ( v ) ≥ y ) ≤ exp − y 2 2 τ [ y + 4 τ P T 0 s =1 Λ λ ( S s , p s , θ ∗ )] ! . (EC.43) Lemma EC.19. F or al l k > 2 , we have the fol lowing ine quality: E   g M N L s ( v ) − ¯ g M N L s ( v )   k ≤ k ! 2 V M N L s (2 τ c 4 ) k − 2 . (EC.44) Her e, v ∈ R d satisﬁes ∥ v − v ∗ ∥ 2 ≤ 2 τ , and c 4 is deﬁne d in Equation (29) . Lemma EC.20. If ∥ v − v ∗ ∥ ≤ 2 τ v , then S V M N L t ≤ c 5   G M N L t ( v )   , wher e c 5 is deﬁne d in Equation (36) as c 5 = 16 τ 2 v 4 τ v +exp( − 4 τ v ) − 1 . Pro of of the First P art of Lemma EC.16 Using Lemma EC.17, we know the condition in Lemma EC.18 holds, th us we hav e an inequality (EC.43). Solving y such that the right hand side of inequality (EC.43) is upper b ounded b y δ , we hav e the following holds for each v ∈ { v | ∥ v − v ∗ ∥ 2 ≤ 2 τ } : P   ¯ G MNL T 0 ( v ) − G MNL T 0 ( v ) ≥ 4 τ v u u t t X s = t Λ λ ( S s , p s ; θ ∗ ) log 1 δ + 4 τ log 1 δ   ≤ δ. (EC.45) Similar to the pro of of Lemma EC.12, deﬁne a ﬁnite cov ering H 2 ( ϵ ) of { v ∈ R d z : ∥ v − v ∗ ∥ 2 ≤ 2 τ } . That is, for ∀ v ∈ { v : ∥ v − v ∗ ∥ 2 ≤ 2 τ } , there exists v ′ ∈ H 2 ( ϵ ) such that ∥ v − v ′ ∥ 2 ≤ ϵ . By standard cov ering n umber argumen ts in v an de Geer (2000), such a ﬁnite cov ering set H 2 ( ϵ ) exists whose size can b e upp er b ounded b y |H 2 ( ϵ ) | ≤ ((6 τ /ϵ )) d z . Setting δ = 1 T  ϵ 6 τ  d z , w e hav e with probability 1 − T − 1 that: ¯ G MNL T 0 ( v ) − G MNL T 0 ( v ) ≤ 4 τ v u u t t X s =1 Λ λ ( S s , p s ; θ ∗ )(log T + d z log(6 τ /ϵ )) + 4 τ (log T + d z log(6 τ /ϵ )) , ∀ v ∈ H ( ϵ ) . (EC.46) Giv en the inequalit y holds for a ﬁnite set v ∈ H 2 ( ϵ ), no w w e can extend to the general case, that ¯ G MNL T 0 ( v ′ ) − G MNL T 0 ( v ′ ) holds for all v ′ and well b ounded from ab ov e with a probability 1 − T − 1 (see the ﬁnal part, the inequalit y (EC.40), of the pro of ). Note that, for v ′ , w e can ﬁnd a v ∈ H 2 ( ϵ ) suc h that || v ′ − v || ≤ ϵ , we ha ve the follo wing inequalit y ¯ G MNL T 0 ( v ′ ) − G MNL T 0 ( v ′ ) ≤ ¯ G MNL T 0 ( v ) − G MNL T 0 ( v ) +     ¯ G MNL T 0 ( v ) − ¯ G MNL T 0 ( v ′ )     +     G MNL T 0 ( v ) − G MNL T 0 ( v ′ )     . (EC.47) No w to b ound ¯ G MNL T 0 ( v ′ ) − G MNL T 0 ( v ′ ) for all v , we only need to b ound all terms on the right hand side of the inequalit y (EC.47). ec16 e-companion to Author: Poisson MNL • F or the ﬁrst part of the inequalit y (EC.47) , we directly use the inequalit y (EC.46) to b ound it. • F or the second part of the inequalit y (EC.47) , W e note that for arbitrary ∥ v ′ − v ∥ 2 ≤ ϵ :     ¯ G MNL T 0 ( v ) − ¯ G MNL T 0 ( v ′ )     ≤ T 0 X s =1 n s X i =1 X j ∈ S s 1 { C ( i ) s = j }     log q ( j, S s , p s , v ) q ( j, S s , p s , v ′ )     ≤ T 0 X s =1 n s X i =1 X j ∈ S s 1 { C ( i ) s = j } 2 ∥ v ′ − v ∥ 2 ≤ 2 ϵ T 0 X s =1 n s . W e use the same concentration b ound as in Equation (EC.16): P T 0 X s =1 n s ≥ 2Λ exp( ¯ x ) T 0 1 + 4 log ( T ) Λ exp( ¯ x ) T 0 + s 4 log ( T ) Λ exp( ¯ x ) T 0 !! ≤ 1 T . Therefore, w e hav e with probability 1 − T − 1 that     ¯ G MNL T 0 ( v ) − ¯ G MNL T 0 ( v ′ )     ≤ 2 ϵ Λ exp( ¯ x ) T 0 1 + 4 log ( T ) Λ exp( ¯ x ) T 0 + s 4 log ( T ) Λ exp( ¯ x ) T 0 ! . (EC.48) • F or the third part of the inequalit y (EC.47) , it is easy to calculate that     G MNL T 0 ( v ) − G MNL T 0 ( v ′ )     ≤ 2 ϵ Λ exp( ¯ x ) T 0 . (EC.49) Note that the upper b ound of the ﬁrst, second, and third part (see (EC.46),(EC.48), and (EC.49)) include ϵ , w e carefully set ϵ = 1 / ( T Λ), and combining inequalities (EC.46),(EC.48) and (EC.49), we hav e with probabilit y 1 − 2 T − 1 that ¯ G MNL T 0 ( v ′ ) − G MNL T 0 ( v ′ ) ≤ ¯ G MNL T 0 ( v ) − ¯ G MNL T 0 ( v ′ ) +   ¯ G MNL T 0 ( v ) − G MNL T 0 ( v )   +   G MNL T 0 ( v ) − G MNL T 0 ( v ′ )   ≤ exp( ¯ x ) T 0 T 4 + 8 log ( T ) Λ exp( ¯ x ) T 0 + s 16 log ( T ) Λ exp( ¯ x ) T 0 ! + 4 τ v u u t T 0 X s =1 Λ λ ( S s , p s ; θ ∗ )(( d z + 1) log T + d z log(6 τ Λ)) + 8 τ (( d z + 1) log T + d z log(6Λ τ )) . Q.E.D. Pro of of the Second P art of Lemma EC.16. Using Lemma EC.19 and noticing that the condition of Theorem 1.2A in de la P ena (1999) holds, we kno w that, for all x, y > 0, P  ¯ G MNL t ( v ) − G MNL t ( v ) ≥ x, S V MNL t ≤ y  ≤ exp  − x 2 2( y + 2 τ c 4 x )  . (EC.50) Using Lemma EC.20, set y = c 5 | G MNL t ( v ) | , and Equation (EC.50) b ecomes: P  ¯ G MNL t ( v ) − G MNL t ( v ) ≥ x  ≤ exp  − x 2 2( c 5 | G MNL t ( v ) | + 2 τ c 4 x )  . Setting the right-hand side to b e δ , we hav e the follo wing inequality for each v : P | ¯ G MNL t ( v ) − G MNL t ( v ) | ≥ r 4 c 5 | G MNL t ( v ) | log 1 δ + 8 τ c 4 log 1 δ ! ≤ δ. Similar to the pro of of Lemma EC.12, deﬁne a ﬁnite cov ering H 2 ( ϵ ) of { v ∈ R N : ∥ v − v ∗ ∥ 2 ≤ 2 τ } . That is, for ∀ v ∈ { v : ∥ v − v ∗ ∥ 2 ≤ 2 τ } , there exists v ′ ∈ H 2 ( ϵ ) such that ∥ v − v ′ ∥ 2 ≤ ϵ . By standard cov ering n umber argumen ts in v an de Geer (2000), such a ﬁnite cov ering set H 2 ( ϵ ) exists whose size can b e upp er b ounded e-companion to Author: Poisson MNL ec17 b y |H 2 ( ϵ ) | ≤ (6 τ /ϵ ) d z . F or each v ∈ H 2 ( ϵ ), the set satisfying the probabilit y condition in the inequality ab o ve has probabilit y δ . Th us, we hav e ∀ v ∈ H 2 ( ϵ ), P ¯ G MNL t ( v ) − G MNL t ( v ) ≥ r 4 c 5 | G MNL t ( v ) | log 1 δ + 8 τ c 4 log 1 δ ! ≤ δ |H 2 ( ϵ ) | . (EC.51) Setting δ = 1 T 2  ϵ 6 τ  d z in Equation (EC.51), with probability 1 − T − 1 , ∀ T 0 < t ≤ T , v ∈ H 2 ( ϵ ), w e hav e ¯ G MNL t ( v ) − G MNL t ( v ) ≤ 8 τ c 4 (2 log T + d z log(6 τ /ϵ )) + p | G MNL t ( v ) | 4 c 5 (2 log T + d z log(6 τ /ϵ )) . (EC.52) Giv en the inequality holds for a ﬁnite set v ∈ H 2 ( ϵ ), no w w e can extend to the general case, that ¯ G MNL t ( v ′ ) − G MNL t ( v ′ ) holds for all v ′ and well b ounded from ab ov e with a probability 1 − 2 T − 1 (see the ﬁnal part, the inequalit y (EC.41), of the pro of ). Note that, for v ′ , w e can ﬁnd a v ∈ H 2 ( ϵ ) suc h that || v ′ − v || ≤ ϵ , we ha ve the follo wing inequalit y ¯ G MNL t ( v ′ ) − G MNL t ( v ′ ) ≤ ¯ G MNL t ( v ) − G MNL t ( v ) +     ¯ G MNL t ( v ) − ¯ G MNL t ( v ′ )     +     G MNL t ( v ) − G MNL t ( v ′ )     (EC.53) No w to b ound ¯ G MNL t ( v ′ ) − G MNL t ( v ′ ) for all v , we only need to b ound all terms on the right hand side of the inequalit y (EC.53). • F or the ﬁrst part of the inequalit y (EC.53) , we directly use the inequalit y (EC.52) to b ound it. • F or the second part of the inequalit y (EC.53) , we hav e:     ¯ G MNL t ( v ) − ¯ G MNL t ( v ′ )     ≤ t X s =1 2 ϵn s . (EC.54) Com bining with inequality (EC.24), we hav e     ¯ G MNL t ( v ) − ¯ G MNL t ( v ′ )     ≤ 2 ϵ Λ exp( ¯ x ) t 1 + 8 log ( T ) Λ exp( ¯ x ) t + s 8 log ( T ) Λ exp( ¯ x ) t ! , ∀ T 0 < t ≤ T , (EC.55) holds with probability 1 − T − 1 . • F or the third part of the inequalit y (EC.53) , we hav e:     G MNL t ( v ) − G MNL t ( v ′ )     ≤ 2 ϵ Λ exp( ¯ x ) t. (EC.56) Similarly , Note that the upper bound of the ﬁrst, second, and third part (see (EC.52), (EC.55) and (EC.56) include ϵ , we carefully set ϵ = 1 T Λ , and combining inequalit y (EC.52), (EC.55) and (EC.56), With probability at least 1 − 2 T − 1 , the following holds for all t ∈ { T 0 + 1 , . . . , T } and all v satisfying ∥ v − v ∗ ∥ 2 ≤ 2 τ : ¯ G MNL t ( v ′ ) − G MNL t ( v ′ ) ≤ 4 exp( ¯ x ) t T + 1 T Λ p 32 t Λ exp( ¯ x ) log T + 16 log T T Λ + 8 τ c 4 (( d z + 2) log T + d z log(6 τ Λ)) + q  | G MNL t ( v ′ ) | + 2 exp( ¯ x )  4 c 5 (( d z + 2) log T + d z log(6 τ Λ)) . Th us, w e ha ve: ¯ G MNL t ( v ′ ) − G MNL t ( v ′ ) ≤ 4 exp( ¯ x ) + r 32 exp( ¯ x ) log T T Λ + 16 log T T Λ + 8 τ c 4 (( d z + 2) log T + d z log(6 τ Λ)) + q  | G MNL t ( v ′ ) | + 2 exp( ¯ x )  4 c 5 (( d z + 2) log T + d z log(6 τ Λ)) . Q.E.D. ec18 e-companion to Author: Poisson MNL EC.9.2 Pro of of Lemma EC.17 F or simplicit y , denote P ( C ( i ) s = j | S s , p s , v (1) , z s ) as q ( j ; v (1) ). W e b egin b y considering the deriv ative of q ( j ; v (2) + α ( v (1) − v (2) )) with resp ect to α . By applying the chain rule, we hav e: d q ( j ; v (2) + α ( v (1) − v (2) )) d α =  ∇ v (1) q ( j ; e v α ) , v (1) − v (2)  = q ( j ; e v α ) * z j t − X k ∈ S t q ( k ; e v α ) z kt , v (1) − v (2) + where e v α = v (2) + α ( v (1) − v (2) ). Since ∥ z kt ∥ 2 ≤ 1 for all k ∈ [ N ] and t ∈ [ T ], we can b ound the deriv ative as follows: d q ( j ; v (2) + α ( v (1) − v (2) )) d α ≤ q ( j ; e v α )2 ∥ v (1) − v (2) ∥ 2 ≤ q ( j ; e v α )4 τ , for ∀ v (1) suc h that || v (1) − v (2) || 2 ≤ 2 τ . No w, b y the Gr¨ on wall’s inequality , we obtain: q ( j ; v (2) + α ( v (1) − v (2) )) ≤ exp { 4 τ α } q ( j ; v (2) ) . Set α = 1, w e conclude exp {− 4 τ } ≤ q ( j ; v (1) ) q ( j ; v (2) ) ≤ exp { 4 τ } . Th us, the desired inequality holds. Q.E.D. EC.9.3 Pro of of Lemma EC.18 In this subsection, we present the pro of of Lemma EC.18. T o streamline notation, let us deﬁne Λ ∗ := T 0 X s =1 Λ λ  S s , p s , θ ∗  . Lemma EC.21. F or any u > 0 , the fol lowing e quality holds: E  exp h u  ¯ G MNL T 0 ( v ) − G MNL T 0 ( v ) i  = exp   Λ ∗   X j ∈ S s ∪{ 0 } q u ( j ; v ) q u − 1 ( j ; v ∗ ) − 1 − u X j ∈ S s ∪{ 0 } q ( j ; v ∗ ) log  q ( j ; v ) q ( j ; v ∗ )      . (EC.57) Using Lemma EC.21, we can b ound the probabilit y as follows: P  ¯ G MNL T 0 ( v ) − G MNL T 0 ( v ) ≥ y  ≤ E  exp  u ( ¯ G MNL T 0 ( v ) − G MNL T 0 ( v ) − y )   ≤ exp   Λ ∗   X j ∈ S s ∪{ 0 } q u ( j ; v ) q u − 1 ( j ; v ∗ ) − 1   − u Λ ∗ X j ∈ S s ∪{ 0 } q ( j ; v ∗ ) log  q ( j ; v ) q ( j ; v ∗ )  − uy   , = exp Λ ∗ q ( j ; v ∗ )   X j ∈ S s ∪{ 0 } q u ( j ; v ) q u ( j ; v ∗ ) − u log  q ( j ; v ) q ( j ; v ∗ )    | {z } h ( q ( j ; v ) q ( j ; v ∗ ) ) − uy − Λ ∗ ! . (EC.58) W e no w analyze the function h ( ξ ) = ξ u − u log ( ξ ). The function h ( ξ ) decreases when ξ ≤ 1 and increases when ξ ≥ 1. F rom Lemma EC.17, w e kno w that: exp {− 4 τ } ≤ q ( j ; v ) q ( j ; v ∗ ) ≤ exp { 4 τ } . Using this b ound and the prop erty of h ( ξ ), we deduce: q u ( j ; v ) q u ( j ; v ∗ ) − u log  q ( j ; v ) q ( j ; v ∗ )  ≤ max  exp(4 τ u ) − 4 τ u, exp( − 4 τ u ) + 4 τ u  ≤ exp(4 τ u ) − 4 τ u. This allo ws us to b ound the summation term: X j ∈ S s ∪{ 0 } q u ( j ; v ) q u − 1 ( j ; v ∗ ) − u X j ∈ S s ∪{ 0 } q ( j ; v ∗ ) log  q ( j ; v ) q ( j ; v ∗ )  ≤ X j ∈ S s ∪{ 0 } q ( j ; v ∗ ) [exp(4 τ u ) − 4 τ u ] . e-companion to Author: Poisson MNL ec19 Substituting this b ound into inequality (EC.58), we obtain: P  ¯ G MNL T 0 ( v ) − G MNL T 0 ( v ) ≥ y  ≤ exp (Λ ∗ (exp(4 τ u ) − 1 − 4 τ u ) − uy ) . F ollo wing the deriv ation for the tail b ound of a Poisson random v ariable, set u ∗ = 1 4 τ log  1 + y 4 τ Λ ∗  . Substituting u ∗ , w e hav e: P  ¯ G MNL T 0 ( v ) − G MNL T 0 ( v ) ≥ y  = exp  Λ ∗  y 4 τ Λ ∗ − log  1 + y 4 τ Λ ∗  − y 4 τ log  1 + y 4 τ Λ ∗  ≤ exp  − y 2 2 τ [ y + 4 τ Λ ∗ ]  . This completes the pro of. Q.E.D. EC.9.3.1 Pro of of Lemma EC.21 E h exp  y ¯ G MNL T 0 ( v )  i = E h exp  y T 0 X s =1  n s X i =1 X j ∈ S s ∪{ 0 } 1  C ( i ) s = j  log  q  j ; v  q  j ; v ∗  i (a) = T 0 Y s =1 E h exp  y  n s X i =1 X j ∈ S s ∪{ 0 } 1  C ( i ) s = j  log  q  j ; v  q  j ; v ∗  i (b) = T 0 Y s =1 ∞ X n =1 P  n s = n  E h exp  y  n s X i =1 X j ∈ S s ∪{ 0 } 1  C ( i ) s = j  log  q  j ; v  q  j ; v ∗      n s = n i (c) = T 0 Y s =1 ∞ X n =1 P  n s = n  n Y i =1 E h exp  y  X j ∈ S s ∪{ 0 } 1  C ( i ) s = j  log  q  j ; v  q  j ; v ∗  i (d) = T 0 Y s =1 ∞ X n =1 P  n s = n  n Y i =1 X j ∈ S s ∪{ 0 } q  j ; v ∗   q  j ; v  q  j ; v ∗   y (e) = T 0 Y s =1 ∞ X n =1 P  n s = n   X j ∈ S s ∪{ 0 } q  j ; v ∗   q  j ; v  q  j ; v ∗   y  n (f ) = T 0 Y s =1 exp  Λ λ  S s , p s , θ ∗   − 1 + X j ∈ S s ∪{ 0 } q  j ; v ∗   q  j ; v  q  j ; v ∗   y  = exp  T 0 X s =1 Λ λ  S s , p s , θ ∗   X j ∈ S s ∪{ 0 } q y ( j ; v ) q y − 1 ( j ; v ∗ ) − 1  . (EC.59) where Equalit y ( a ) follo ws from the indep endence of eac h p erio d; Equalit y ( b ) separates the probabilit y space in to ev ents that n s tak es diﬀeren t v alues; Equalities ( c ) , ( d ) , ( e ) simplify the conditional exp ectations; Equalit y ( f ) follo ws from substituting Poisson distribution into n s and then simplifying the equation. Multiplying b oth sides by exp[ − y G MNL T 0 ( v )], we can obtain Equation (EC.57). Q.E.D. EC.9.4 Pro of of Lemma EC.19 W e ﬁrst simplify V MNL s . The v ariance term can b e written as: V MNL s = Λ λ  S s , p s ; θ ∗  X j ∈ S s ∪{ 0 } q  j ; v ∗  log 2  q  j ; v  q  j ; v ∗   . This reduces the v ariance analysis to the sum of squared logarithmic terms. Next, w e b ound the exp ectation of the error term g MNL s ( v ) − ¯ g MNL s ( v ). W e ha ve: E h   g MNL s ( v ) − ¯ g MNL s ( v )   k i ≤ 2 k − 1 E h   ¯ g MNL s ( v )   k i . ec20 e-companion to Author: Poisson MNL Expanding ¯ g MNL s ( v ), we get: 2 k − 1 E h  ¯ g MNL s ( v )  k i = 2 k − 1 E h n s X i =1 X j ∈ S s ∪{ 0 } 1 { C ( i ) s = j } log  q ( j ; v ) q ( j ; v ∗ ) i k = 2 k − 1 ∞ X n =1 P  n s = n  E hh n s X i =1 X j ∈ S s ∪{ 0 } 1 { C ( i ) s = j } log  q ( j ; v ) q ( j ; v ∗ ) i k    n s = n i ( a ) ≤ 2 k − 1 ∞ X n =1 P  n s = n  E h n k − 1 n X i =1  X j ∈ S s ∪{ 0 } 1 { C ( i ) s = j } log  q ( j ; v ) q ( j ; v ∗ )   k i = 2 k − 1 E h n k s  X j ∈ S s ∪{ 0 } 1 { C (1) s = j } log  q ( j ; v ) q ( j ; v ∗ )   k i = 2 k − 1 X j ∈ S s ∪{ 0 } q  j ; v ∗   log  q ( j ; v ) q ( j ; v ∗ )   k E  n k s  , where Equation ( a ) uses H¨ older’s inequality for k > 2. Combining the last tw o inequalities, we obtain: E h   g MNL s ( v ) − ¯ g MNL s ( v )   k i ≤ 2 k − 1 X j ∈ S s ∪{ 0 } q  j ; v ∗   log  q ( j ; v ) q ( j ; v ∗ )   k E  n k s  . Finally , using Lemma EC.17, we obtain: E h   g MNL s ( v ) − ¯ g MNL s ( v )   k i ≤ 2 k − 1 4 k − 2 τ k − 2 X j ∈ S s ∪{ 0 } q  j ; v ∗  log 2  q ( j ; v ) q ( j ; v ∗ )  E  n k s  . Using the similar argument in Equation (EC.27), we ha ve E h   g MNL s ( v ) − ¯ g MNL s ( v )   k i ≤ k ! 2 V MNL s  4 τ c 4  k − 2 . Q.E.D. EC.9.5 Pro of of Lemma EC.20 W e simplify the expression for the probabilit y P ( C ( i ) s = j | S s , p s , v , z s ) as q ( j ; v ). Thus, the expression for V MNL s b ecomes: V MNL s = Λ λ ( S s , p s ; θ ∗ ) X j ∈ S s ∪{ 0 } q ( j ; v ∗ ) log 2 q ( j ; v ) q ( j ; v ∗ ) . (EC.60) Note that P j ∈ S s ∪{ 0 } q ( j ; v ∗ )  q ( j ; v ) q ( j ; v ∗ ) − 1  = 0 . W e hav e: − g MNL s ( v ) = Λ λ ( S s , p s ; θ ∗ ) X j ∈ S s ∪{ 0 } q ( j ; v ∗ )  − log  q ( j ; v ) q ( j ; v ∗ )  + q ( j ; v ) q ( j ; v ∗ ) − 1  . (EC.61) Th us, to prov e this lemma, we only need to show that the inequality holds for each comp onent: − log  q ( j ; v ) q ( j ; v ∗ )  + q ( j ; v ) q ( j ; v ∗ ) − 1 ≥ 4 τ v + exp( − 4 τ v ) − 1 16 τ 2 v log 2 q ( j ; v ) q ( j ; v ∗ ) , (EC.62) whic h can b e prov ed using Lemma EC.17: exp {− 4 τ v } ≤ q ( j ; v ) q ( j ; v ∗ ) ≤ exp { 4 τ v } . Q.E.D. Remark EC.1. Our pro of simpliﬁes the approach presen ted in Chen et al. (2020) and improv es up on their result. Sp eciﬁcally , we reﬁne the b ound on S V MNL t from a second-order b ound: S V MNL t ≤ 2  1 + max j  | q ( j ; v ) − q ( j ; v ∗ ) | q ( j ; v ∗ )  2   G MNL t ( v )   , to a logarithmic-order b ound: S V MNL t ≤ log 2 min j  q ( j ; v ) q ( j ; v ∗ )  − log min j  q ( j ; v ) q ( j ; v ∗ )  + min j  q ( j ; v ) q ( j ; v ∗ )  − 1   G MNL t ( v )   . This enhancement demonstrates a signiﬁcant impro vemen t in the asymptotic characterization of the sensi- tivit y of the v alue function. e-companion to Author: Poisson MNL ec21 EC.10 Pro of of Lemma 5 This section pro vides a pro of for the Lemma 5. Using a similar argument as in the pro of of Lemma 2, w e see that there exists ¯ v t = ξ v ∗ + (1 − ξ ) b v t for some ξ ∈ (0 , 1) suc h that G MNL t ( b v t ) = − 1 2 ( b v t − v ∗ ) ⊤ I MNL t ( ¯ v t ) ( b v t − v ∗ ) . (EC.63) If ¯ v t is close to v ∗ , then I MNL t ( ¯ v t ) is also close to I MNL t ( v ∗ ) due to contin uity . Before diving into the details for the pro of, we ﬁrst present a lemma, whic h will b e prov ed in the next subsection, to facilitate the pro of. Lemma EC.22. F or ∀ v (1) , v (2) such that ∥ v (2) − v (1) ∥ ≤ 2 τ v , c 8 I MNL t  v (2)  ⪰ I MNL t  v (1)  (EC.64) wher e c 8 is deﬁne d in Equation (44) . . A dditional ly, the same r esult holds when r eplacing I MNL t with M MNL t : c 8 M MNL t  v (2)  ⪰ M MNL t  v (1)  . (EC.65) Using Lemma EC.22, we can see − G MNL t ( v ) ≥ 1 2 c 8 ( b v t − v ∗ ) ⊤ I MNL t ( v ∗ ) ( b v t − v ∗ ) . (EC.66) Using Equation (EC.41) of Lemma EC.16 and the fact that G MNL t ( b v t ) ≤ 0 ≤ ¯ G MNL t ( b v t ), w e hav e: | G MNL t ( b v t ) | ≤ ¯ G MNL t ( b v t ) − G MNL t ( b v t ) ≤ exp( ¯ x ) + r 8 exp( ¯ x ) log T T Λ + 8 log T T Λ + 8 τ v c 4 (( d z + 2) log T + d z log(6 τ v Λ)) + q  | G MNL t ( b v t ) | + 2 exp( ¯ x )  4 c 5 (( d z + 2) log T + d z log(6 τ v Λ)) for all T 0 ≤ t ≤ T with probability 1 − 2 T − 1 . Using Lemma EC.11, we hav e | G MNL t ( b v t ) | ≤ 4 exp( ¯ x ) + 2 r 8 exp( ¯ x ) log T T Λ + 16 log T T Λ + (16 τ v c 4 + 4 c 5 )(( d z + 2) log T + d z log(6 τ v Λ)) . Notice that G MNL t ( b v t ) ≤ 0. Combining Equation (EC.66) and the last inequality , we ha ve with probability 1 − 2 T − 1 , ( b v t − v ∗ ) ⊤ I MNL t ( v ∗ ) ( b v t − v ∗ ) ≤ 2 c 8 | G MNL t ( b v t ) | ≤ 8 c 8 exp( ¯ x ) + 4 c 8 r 8 exp( ¯ x ) log T T Λ + 32 c 8 log T T Λ + 8(4 τ v c 4 + c 5 ) c 8 (( d z + 2) log T + d z log(6 τ v Λ)) . (EC.67) Similarly , w e hav e ( b v t − v ∗ ) ⊤ I MNL t ( b v t ) ( b v t − v ∗ ) ≤ 2 c 8 | G MNL t ( b v t ) | ≤ 8 c 8 exp( ¯ x ) + 4 c 8 r 8 exp( ¯ x ) log T T Λ + 32 c 8 log T T Λ + 8(4 τ v c 4 + c 5 ) c 8 (( d z + 2) log T + d z log(6 τ v Λ)) . (EC.68) Q.E.D. EC.10.1 Pro of of Lemma EC.22 T o pro ve this lemma, we only need to show  3(exp(4 τ v ) − 1)( K exp( ¯ v − p l ) + 1) + 1  M MNL s  v (2)  ⪰ M MNL s  v (1)  , for all s , which is equiv alent to M MNL s  v (1)  − M MNL s  v (2)  ⪯ 3(exp(4 τ v ) − 1)( K exp( ¯ v − p l ) + 1) M MNL s  v (2)  . Here, for simplicity , we will write q ( C ( i ) s = j | S s , p s , v , z s ) as q ( j ; v ), z is as z i , and q ( i ; v (2) ) − q ( i ; v (1) ) as δ q ( i ; v ). Note that ( z i − z j ) ( z i − z j ) ⊤ = z i z ⊤ i + z j z ⊤ j − z i z ⊤ j − z j z ⊤ i ⪰ 0 , ec22 e-companion to Author: Poisson MNL whic h implies z i z ⊤ i + z j z ⊤ j ⪰ z i z ⊤ j + z j z ⊤ i . Therefore, let’s express the matrix M MNL s ( v (2) ) in terms of the arriv al rate and probabilities: M MNL s  v (2)  Λ λ ( S s , p s ; θ ∗ ) = X i ∈ S t q ( i ; v (2) ) z i z ⊤ i − X i ∈ S t X j ∈ S t q ( i ; v (2) ) q ( j ; v (2) ) z i z ⊤ j = X i ∈ S t q ( i ; v (2) ) z i z ⊤ i − 1 2 X i ∈ S t X j ∈ S t q ( i ; v (2) ) q ( j ; v (2) )  z i z ⊤ j + z j z ⊤ i  ⪰ X i ∈ S t q ( i ; v (2) ) z i z ⊤ i − 1 2 X i ∈ S t X j ∈ S t q ( i ; v (2) ) q ( j ; v (2) )  z i z ⊤ i + z j z ⊤ j  = X i ∈ S t q ( i ; v (2) ) z i z ⊤ i − X i ∈ S t X j ∈ S t q ( i ; v (2) ) q ( j ; v (2) ) z i z ⊤ i = X i ∈ S t q ( i ; v (2) ) 1 − X j ∈ S t q ( j ; v (2) ) ! z i z ⊤ i = X i ∈ S t q ( i ; v (2) ) q (0; v (2) ) z i z ⊤ i . Next, w e consider the diﬀerence b et ween M MNL s ( v (1) ) and M MNL s ( v (2) ): M MNL s  v (1)  − M MNL s  v (2)  Λ λ ( S s , p s ; θ ∗ ) = − X i ∈ S t δ q ( i ; v ) z i z ⊤ i + X i ∈ S t X j ∈ S t  q ( i ; v (1) ) q ( j ; v (1) ) − q ( i ; v (2) ) q ( j ; v (2) )  z i z ⊤ j ⪯ − X i ∈ S t δ q ( i ; v ) z i z ⊤ i + X i ∈ S t X j ∈ S t   q ( i ; v (1) ) q ( j ; v (1) ) − q ( i ; v (2) ) q ( j ; v (2) )   z i z ⊤ i . Finally , b y comparing the co eﬃcients of z i z ⊤ i , w e only need to pro v e: X j ∈ S t   q ( i ; v (1) ) q ( j ; v (1) ) − q ( i ; v (2) ) q ( j ; v (2) )   − δ q ( i ; v ) ≤ 3(exp(4 τ v ) − 1)( K exp( ¯ v − p l ) + 1) q ( i ; v (2) ) q (0; v (2) ) , whic h can b e derived as follows: X j ∈ S t   q ( i ; v (1) ) q ( j ; v (1) ) − q ( i ; v (2) ) q ( j ; v (2) )   − δ q ( i ; v (1) ) ≤ X j ∈ S t  q ( i ; v (2) )   δ q ( j ; v )   + q ( j ; v (1) )   δ q ( i ; v )    − δ q ( i ; v ) ≤ q ( i ; v (2) ) " X j ∈ S t   δ q ( j ; v )   # +   δ q ( i ; v )   − δ q ( i ; v ) ≤ q ( i ; v (2) ) " X j ∈ S t   δ q ( j ; v )   # + 2   δ q ( i ; v )   ≤ q ( i ; v (2) ) " X j ∈ S t (exp(4 τ v ) − 1) q ( j ; v (2) ) # + 2(exp(4 τ v ) − 1) q ( i ; v (2) ) ≤ 3(exp(4 τ v ) − 1) q ( i ; v (2) ) ≤ 3(exp(4 τ v ) − 1) q ( i ; v (2) )( K exp( ¯ v − p l ) + 1) q (0; v (2) ) . Here, we see that the left-hand side of the inequalit y , inv olving the probabilities under v (1) and v (2) , is b ounded b y the right-hand side, conﬁrming the required condition. Q.E.D. EC.11 Pro of of Lemma 7 No w, we turn to pro ve the Lemma 7 to b ound the regret for eac h perio d and Lemma 9 to b ound the cum ulativ e regret across all p erio ds. Similar to the pro of of Lemma 6, unless stated otherwise, all statements e-companion to Author: Poisson MNL ec23 are conditioned on the success ev ent in Lemma 5. On this ev ent, the following inequalities hold uniformly for all t ∈ { T 0 , . . . , T − 1 } : ( b v t − v ∗ ) ⊤ I MNL t ( v ∗ ) ( b v t − v ∗ ) ≤ ω v , ( b v t − v ∗ ) ⊤ I MNL t ( b v t ) ( b v t − v ∗ ) ≤ ω v , ∥ b v t − v ∗ ∥ ≤ 2 τ v , (EC.69) w e ﬁrst present a Lemma EC.23 to facilitate the pro of. Lemma EC.23. Supp ose Equation (34) holds uniformly over al l t = T 0 , . . . , T − 1 . F or al l t > T 0 and S ⊆ [ N ] , | S | ≤ K , it holds that | r ( S, p , z t ; b v t − 1 ) − r ( S, p , z t ; v ∗ ) | ≤ r c 0 ω v    I MNL t − 1 ( b v t − 1 ) − 1 / 2 M MNL t ( b v t − 1 | S, p ) I MNL t − 1 ( b v t − 1 ) − 1 / 2    op , | r ( S, p , z t ; b v t − 1 ) − r ( S, p , z t ; v ∗ ) | ≤ r c 0 ω v    I MNL t − 1 ( v ∗ ) − 1 / 2 M MNL t ( v ∗ | S, p ) I MNL t − 1 ( v ∗ ) − 1 / 2    op . (EC.70) The proof of the lemma is left to the next subsection EC.23. Note M MNL t ( v ∗ | S, p ) ⪯ c M MNL t ( v | S, p ) and b I MNL t ( v ∗ ) ⪯ I MNL t ( v ∗ ), from Lemma EC.23, we see | r ( S, p , z t ; b v t − 1 ) − r ( S, p , z t ; v ∗ ) | ≤ r c 0 ω v    b I − 1 / 2 t − 1 MNL ( b v t − 1 ) c M MNL t ( b v t − 1 | S, p ) b I − 1 / 2 t − 1 MNL ( b v t − 1 )    op . Similarly , w e hav e | r ( S, p , z t ; b v t − 1 ) − r ( S, p , z t ; v ∗ ) | ≤ r c 0 ω v    b I − 1 / 2 t − 1 MNL ( v ∗ ) c M MNL t ( v ∗ | S, p ) b I − 1 / 2 t − 1 MNL ( v ∗ )    op . Using Lemma EC.22, w e hav e c M MNL t ( b v t − 1 | S, p ) ⪯ c 8 c M MNL t ( v ∗ | S, p ) and b I t − 1 MNL ( v ∗ ) ⪯ c 8 b I MNL t − 1 ( b v t − 1 ). Therefore,    b I − 1 2 MNL t − 1 ( b v t − 1 ) c M MNL t ( b v t − 1 | S, p ) b I − 1 / 2 t − 1 MNL ( b v t − 1 )    op ≤ c 2 8    b I − 1 / 2 t − 1 MNL ( v ∗ ) c M MNL t ( v ∗ | S, p ) b I − 1 / 2 t − 1 MNL ( v ∗ )    op Th us, w e pro ved the Lemma 7. Q.E.D. EC.11.1 Pro of of Lemma EC.23 Note that, ∇ v r ( S, p , z t ; v ) = X j ∈ S t q ( j ; v ) p j t z j t − X j ∈ S t q ( j ; v ) p j t X j ∈ S t q ( j ; v ) z j t = X j ∈ S t ∪{ 0 } q ( j ; v )( p j t − p l ) z j t − X k ∈ S t q ( k ; v ) z kt ! . (EC.71) F or simplicit y , we denote ¯ z j t = z j t − X k ∈ S t q ( k ; v ) z kt ! , a j = p j t − p l ≥ 0 . W e ha ve ∇ r ( S, p , z t ; v ) ∇ r ( S, p , z t ; v ) ⊤ =   X j ∈ S t ∪{ 0 } q ( j ; v ) a j ¯ z j t     X j ∈ S t ∪{ 0 } q ( j ; v ) a j ¯ z j t   ⊤ = X j ∈ S t ∪{ 0 } ,k ∈ S t ∪{ 0 } q ( j ; v ) q ( k ; v ) a j ¯ z j t a k ¯ z ⊤ kt ⪯ 1 2 X j ∈ S t ∪{ 0 } ,k ∈ S t ∪{ 0 } q ( j ; v ) q ( k ; v )  a 2 j ¯ z j t ¯ z ⊤ j t + a 2 k ¯ z kt ¯ z ⊤ kt  = X j ∈ S t ∪{ 0 } q ( j ; v )(1 − q (0; v )) a 2 j ¯ z j t ¯ z ⊤ j t . (EC.72) F rom the deﬁnition of M MNL t ( v | S, p ), w e hav e X j ∈ S t ∪{ 0 } q ( j ; v ) ¯ z j t ¯ z ⊤ j t = M MNL t ( v | S, p ) Λ λ ( S s , p s ; θ ∗ ) . (EC.73) ec24 e-companion to Author: Poisson MNL Com bining Equations (EC.72) and (EC.73), we ha ve ∇ r ( S, p , z t ; v ) ∇ r ( S, p , z t ; v ) ⊤ ⪯ X j ∈ S t q ( j ; v )(1 − q (0; v )) a 2 j ¯ z j t ¯ z ⊤ j t ⪯ X j ∈ S t q ( j ; v )( p h − p l ) 2 ¯ z j t ¯ z ⊤ j t ⪯ ( p h − p l ) 2 Λ λ ( S s , p s ; θ ∗ ) M MNL t ( v | S, p ) ⪯ s ( p h − p l ) 2 Λ exp( − ¯ v ) M MNL t ( v | S, p ) . (EC.74) By the inequalit y ab ov e and the mean v alue theorem, there exists e v t − 1 = v ∗ + ξ ( b v t − 1 − v ∗ ) for some ξ ∈ (0 , 1) suc h that | r ( S, p , z t ; b v t − 1 ) − r ( S, p , z t ; v ∗ ) | = |⟨∇ r ( S, p , z t ; e v t − 1 ) , b v t − 1 − v ∗ ⟩| = q ( b v t − 1 − v ∗ ) ⊤ [ ∇ r ( S, p , z t ; e v t − 1 ) ∇ r ( S, p , z t ; e v t − 1 ) ⊤ ] ( b v t − 1 − v ∗ ) ≤ s ( p h − p l ) 2 Λ exp( − ¯ v ) q ( b v t − 1 − v ∗ ) ⊤ M MNL t ( e v t − 1 | S, p ) ( b v t − 1 − v ∗ ) . (EC.75) Com bining inequalit y (EC.75), Lemma 5, and Lemma EC.22, we can see that | r ( S t , p , z t ; v t ) − r ( S, p , z t ; v ∗ ) | ≤ s ( p h − p l ) 2 Λ exp( − ¯ v ) × q ( b v t − 1 − v ∗ ) ⊤ I MNL 1 2 t − 1 ( b v t − 1 ) I MNL − 1 2 t − 1 ( b v t − 1 ) M MNL t ( e v t − 1 | S, p ) I MNL − 1 2 t − 1 ( b v t − 1 ) I MNL 1 2 t − 1 ( b v t − 1 ) ( b v t − 1 − v ∗ ) ≤ s ( p h − p l ) 2 Λ exp( − ¯ v ) r ω v    I MNL − 1 2 t − 1 ( b v t − 1 ) M MNL t ( e v t − 1 | S, p ) I MNL − 1 2 t − 1 ( b v t − 1 ) − 1 / 2    op ≤ s ( p h − p l ) 2 Λ exp( − ¯ v ) r c 8 ω v    I MNL − 1 2 t − 1 ( b v t − 1 ) M MNL t ( b v t − 1 | S, p ) I MNL − 1 2 t − 1 ( b v t − 1 )    op , and | r ( S t , p , z t ; v t ) − r ( S, p , z t ; v ∗ ) | ≤ s ( p h − p l ) 2 Λ exp( − ¯ v ) r c 8 ω v    I MNL − 1 2 t − 1 ( v ∗ ) M MNL t ( v ∗ | S, p ) I MNL − 1 2 t − 1 ( v ∗ )    op . EC.12 Pro of of Lemma 9 W e deﬁne f M MNL t ( v | S, p ) := Λ X j ∈ S q ( j, S, p , z t ; v ) z j t z ⊤ j t − X j,k ∈ S q ( j, S, p , z t ; v ) q ( k , S, p , z t ; v ) z j t z ⊤ kt ! , (EC.76) e I MNL t ( v ) := t X s =1 Λ X j ∈ S s q s ( j ; v ) z j s z ⊤ j s − X j,k ∈ S s q s ( j ; v ) q s ( k ; v ) z j s z ⊤ ks ! . (EC.77) Similar to the pro of of Lemma 8, we hav e: T X t = T 0 +1 min  ¯ p 2 h , ω v c 0    b I − 1 / 2 t − 1 MNL ( v ∗ ) c M MNL t ( v ∗ | S t ) b I − 1 / 2 t − 1 MNL ( v ∗ )    op  ≤ T X t = T 0 +1 min  ¯ p 2 h , exp(2 ¯ x ) ω v c 0    e I − 1 / 2 t − 1 MNL ( v ∗ ) f M MNL t ( v ∗ | S t ) e I − 1 / 2 t − 1 MNL ( v ∗ )    op  ≤ c 7 log det e I MNL T ( v ∗ ) det e I MNL T 0 ( v ∗ ) . Note that σ min  e I MNL T 0 ( v ∗ )  ≥ Λ T 0 σ 0 . This implies log  det e I MNL T 0 ( v ∗ )  ≥ d z log(Λ T 0 σ 0 ) . On the other hand, w e ha v e: tr( e I MNL T ( v ∗ )) = T X s =1 Λ X j ∈ S s q ( j ; v ) z ⊤ j s − X k ∈ S s q ( k ; v ) z ⊤ ks ! z j s − X k ∈ S s q ( k ; v ) z ks ! ≤ 4 T Λ . e-companion to Author: Poisson MNL ec25 Therefore, w e can b ound log  det e I MNL T ( v ∗ )  ≤ d z log tr( e I MNL T ( v ∗ ) ) d z ≤ d z log 4 T Λ d z . As a result, w e obtain log det e I MNL T ( v ∗ ) det e I MNL T 0 ( v ∗ ) ≤ d z log 4 T d z − d z log( T 0 σ 0 ) , which completes the proof of the inequality . EC.13 Pro of of Theorem 2 and Theorem 3 Note that the regret of p olicy π at time T dep ends on v , θ and the data generation mechanism of z i,t . Therefore, w e can denote the regret as R π ( T ; v , θ , z ) to emphasize the dependence on v , θ , z . Supp ose P v , θ , z is a distribution ov er ( v , θ , z ). Clearly , inf π sup v , θ , z R π ( T ; v , θ , z ) ≥ inf π E v , z ∼ P v , θ , z , [ R π ( T ; v , θ , z )] (EC.78) for any suc h distribution. When we take a supremum ov er such distributions we arriv e at a low er b ound for the minimax optimal rate of the regret. In the following, we consider the following tw o cases: • Case I. In Section EC.13.1, w e set θ = 0 and consider P v , θ , z = P v , z . • Case II. In Section EC.13.2, w e set v = 0 and consider P v , θ , z = P θ . F or both cases, we ﬁx the features ov er time: z i,t = z i for all i ∈ [ N ] , t ≥ 1. EC.13.1 Case I When θ = 0, the problem of interest reduces to the follo wing. Inst ance EC.13.1. The seller has N distinct pro ducts, and each pro duct i ∈ [ N ] is asso ciated with a ﬁxed feature vector z i ∈ R d . In each p erio d t ∈ [ T ], the seller selects an assortmen t S t ⊆ [ N ] with | S t | = K and sets prices of pro ducts in the assortment. F or simplicity , we use a price v ector p t = ( p 1 t , p 2 t , . . . , p N t ) ∈ [ p l , p h ] N to represen t the pricing decision (and set prices corresp onding to the out-of-assortment products to b e p h ). The n um b er of arriv ed customers for each p erio d follo ws P oisson distribution with a known p ositive arriv al rate Λ. Under the m ultinomial logit (MNL) mo del, each arriving customer selects pro duct i ∈ S t with probability exp  z ⊤ i v − p it  1 + P j ∈ S t exp  z ⊤ j v − p j t  , and makes no purchase with the remaining probability . Here, v ∈ R d is an unknown parameter that inﬂuences the purchase likelihoo ds. Note that the feature v ectors are ﬁxed, thus the optimal strategy is time-inv ariant. W e denote the optimal strategy as ( S ∗ , p ∗ ) and the strategy seller adopted as ( S t , p t ) = π ( H t ) where H t is deﬁned in Equation (8). Substituting Λ t = Λ into Equation (9), w e hav e R π ( T ; v , 0 , z ) = T X t =1 Λ n r ( S ∗ , p ∗ ) − E π  r ( S t , p t )  o . Next, we establish the connection b et ween the regret of our-true-time-horizon problem and the regret of the customer-horizon problem. Denote M t = P t s =1 n s as the n um b er of arriving customers in the ﬁrst t p erio ds. The n -th customer is uniquely indexed b y a pair ( t ( n ) , i ( n )), where t ( n ) indicates the perio d in which the customer arrives and i ( n ) indicates the order of the arriv al within perio d t ( n ). The assortmen t-pricing oﬀered to the n -th customer is th us ( S t ( n ) , p t ( n ) ). Then the regret can b e rewritten as R π ( T ; v , 0 , z ) = E " M T X n =1 n r ( S ∗ , p ∗ ) − E π  r ( S t ( n ) , p t ( n ) )  o # . (EC.79) It is straightforw ard that M T follo ws a Poisson distribution with parameter Λ T . T o relate our p erio d-based problem to the customer-level formulation in Agra w al et al. (2019), let P v , z b e a prior on ( v , z ). T aking expectation with resp ect to this prior yields E v , z ∼ P v , z R π ( T ; v , 0 , z ) = E v , z E " M T X n =1 n r ( S ∗ ( v , z ) , p ∗ ( v , z )) − E π  r ( S t ( n ) , p t ( n ) )  o #! , (EC.80) where ( S ∗ ( v , z ) , p ∗ ( v , z )) denotes the optimal assortment–price pair under parameters ( v , z ). T o b ound this exp ectation of regret, we consider the following dynamic assortment-pricing problem that has similar form but can adjust the assortment and pricing p er-customer. ec26 e-companion to Author: Poisson MNL Inst ance EC.13.2. 1 The seller has N distinct pro ducts, each pro duct asso ciated with a ﬁxed feature v ector z i ∈ R d o v er e T customer arriv als. Using similar notations in Instance EC.13.1, for eac h customer n ∈ [ e T ], the seller selects an assortment S n ⊆ [ N ] with | S n | = K and sets the price v ector p n = ( p 1 n , p 2 n , . . . , p N n ) ∈ [ p l , p h ] N . Under the m ultinomial logit (MNL) mo del, customer n purchases pro duct i ∈ S n with probability exp  z ⊤ i v − p in  1 + P j ∈ S n exp  z ⊤ j v − p j n  , and makes no purchase with the remaining probability . W e deﬁne the history b efore customer n as H ′ n :=  S 1 , p 1 , j 1 , S 2 , p 2 , j 2 , . . . , S n − 1 , p n − 1 , j n − 1  , (EC.81) with the conv ention H ′ 1 = ∅ , where j i denotes the purchase outcome of customer i . A p olicy at the customer lev el is a sequence π ′ = { π ′ n } e T n =1 of (p ossibly stochastic) functions π ′ n : H ′ n 7→ ( S n , p n ) . W e deﬁne the collection of legitimate p olicy with ˜ T customer arriv als as A ′ ( e T ). Next, w e deﬁne a subset of p olicies that restricts the changes of assortment-pricing to limited points ( m 1 , m 1 + m 2 , · · · , m 1 + · · · + m T ) as A ′′ ( e T , m 1 , m 2 , . . . , m T ) := ( ( π 1 , · · · , π ˜ T )    π j ( H ′ j ) = π i ( H ′ i ) , (EC.82) if exists an s such that s X t =1 m t ≤ i, j < s +1 X t =1 m t ) , (EC.83) where m 1 , m 2 , · · · , m T , ˜ T satisfy P T j =1 m j = ˜ T . Clearly , when w e know the num b er of customer arriv als in the problem described in EC.13.1 and condition on these num b ers, the original p olicy can b e considered as a policy in this restricted class for the customer- lev el problem, which is in a smaller set than the entire legitimate p olicy space. Then, b y taking the conditional exp ectation with resp ect to customer arriv al ( n 1 , · · · , n T ) ﬁrst, w e can simplify Equation (EC.80) as E v , z ∼ P v , z R π ( T ; v , 0 , z ) = E ( E " E v , z M T X n =1 n r ( S ∗ ( v , z ) , p ∗ ( v , z )) − E π  r ( S t ( n ) , p t ( n ) )  o !      M T , n 1 , n 2 , . . . , n T #) ≥ E ( E " inf π ′′ ∈A ′′ ( M T ,n 1 ,n 2 ,...,n T ) E v , z M T X n =1 n r ( S ∗ ( v , z ) , p ∗ ( v , z )) − E π ′′  r ( S n , p n )  o !      M T , n 1 , n 2 , . . . , n T #) ≥ E ( E " inf π ′ ∈A ′ ( M T ) E v , z M T X n =1 n r ( S ∗ ( v , z ) , p ∗ ( v , z )) − E π ′  r ( S n , p n )  o !      M T , n 1 , n 2 , . . . , n T #) = E ( E " inf π ′ ∈A ′ ( M T ) E v , z M T X n =1 n r ( S ∗ ( v , z ) , p ∗ ( v , z )) − E π ′  r ( S n , p n )  o !      M T #) (EC.84) F or simplicit y , for some π ′ ∈ A ′ ( m ), w e denote b R ∗ ( m, π ′ , v , z ) = m X n =1 n r ( S ∗ ( v , z ) , p ∗ ( v , z )) − E π ′  r ( S n , p n )  o . By monotonicity (each summand is nonnegativ e as S ∗ ( v , z ) , p ∗ ( v , z ) are optimal assortmen t-pricing), if M T ≥ ⌈ Λ T ⌉ , w e ha ve b R ∗ ( M T , π ′ , v , z ) ≥ b R ∗ ( ⌈ Λ T ⌉ , π ′ , v , z ) . 1 The problem studied in Agraw al et al. (2019) is a sp ecial case of this instance. e-companion to Author: Poisson MNL ec27 Therefore, inf π sup θ , v , z R π ( T ; v , θ , z ) ≥ inf π sup v , z R π ( T ; v , 0 , z ) ≥ E ( E " inf π ′ ∈A ′ ( M T ) E v , z M T X n =1 n r ( S ∗ ( v , z ) , p ∗ ( v , z )) − E π ′  r ( S n , p n )  o !     M T #) = E  E  inf π ′ ∈A ′ ( M T ) E v , z b R ∗ ( M T , π ′ , v , z )     M T  ≥ E  E  inf π ′ ∈A ′ ( M T ) E v , z b R ∗ ( ⌈ Λ T ⌉ , π ′ , v , z ) 1 { M T ≥⌈ Λ T ⌉}     M T  ≥ inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) E v , z b R ∗ ( ⌈ Λ T ⌉ , π ′ , v , z ) P [ M T ≥ ⌈ Λ T ⌉ ] (EC.85) EC.13.1.1 Adv ersarial construction I. Suppose min { d z − 2 , N } ≥ K , Λ ≥ 1. Inst ance EC.13.3. (W orst-case instance I) First, w e let d, ¯ K b e p ositiv e in tegers satisfying ¯ K = min nj d z − K + 1 3 k , K o , d = d z − K + ¯ K . Then clearly , we ha v e d ≤ d z , d − d z = K − ¯ K , d ≥ 4 ¯ K − 1 . Next, w e ﬁx ϵ ∈  0 , min { ¯ v / √ d z , 1 }  and deﬁne a set of d z -dimension v ectors, V , as follo ws. F or eac h subset W ⊆ [ d ] with | W | = ¯ K , deﬁne the corresp onding parameter vector v W ∈ R d z as v W ( i ) =      ϵ, i ∈ W , 0 , i ∈ [ d ] \ W , ϵ, i ∈ { d + 1 , . . . , d z } . i ∈ [ d z ] . Clearly , ∥ v W ∥ 2 ≤ ¯ v , satisfying the b oundedness required in Assumption 3. Collecting all such vectors gives the parameter set V : V = { v W : W ⊆ [ d ] , | W | = ¯ K } = { v W : W ∈ W ¯ K } , where, for simplicity of notation, W ¯ K denotes the class of all subsets of [ d ] whose size is ¯ K . Let the (time-inv ariant) pro duct features { z i } 1 ≤ i ≤ N b e the standard basis in R d z , i.e., the j -th elemen t of z i is deﬁned as z i ( j ) = ( 1 , j = i, 0 , j  = i. It is straightforw ard that ∥ z i ∥ 2 ≤ 1. So this construction of features satisﬁes the b oundedness requiremen t in Assumption 7. Clearly , for parameter v W and aforementioned features { z i } 1 ≤ i ≤ N , the optimal assortmen t S ∗ ( v W , z ) is W ∪ { d + 1 , d + 2 , . . . , d z } . R e gr et of e ach customer. Next, w e derive an explicit low er b ound on the p er-customer regret. W e use E W and P W to denote the exp ectation and probability measure under the mo del param- eterized b y v W and the p olicy π ′ , resp ectively . The following lemma establishes a low er b ound for r  S ∗ ( v W , z ) , p ∗ ( v W , z )  − r  S n , p n  b y comparing S ∗ ( v W ) with S n . Lemma EC.24. Supp ose ϵ ≤ 1 . Then, for any customer n , r  S ∗ ( v W , z ) , p ∗ ( v W , z )  − r  S n , p n  ≥ c 10 ϵ  ¯ K −   S n ∩ W    . wher e c 10 is deﬁne d in Equation (EC.88) . Remark EC.2. When p l = Ω(log K ) and p h = Ω(log K ), we ha ve c 10 = Ω( log K K ). When p l = Ω(1) and p h = Ω(1), we hav e c 10 = Ω( 1 K 2 ). ec28 e-companion to Author: Poisson MNL Next, we establish a lo wer bound on the cumulativ e regret. Deﬁne e N i := P ⌈ Λ T ⌉ n =1 1 { i ∈ S n } . By Lemma EC.24, it follo ws that b R ∗ ( ⌈ Λ T ⌉ , π ′ , v W , z ) = E W ⌈ Λ T ⌉ X n =1  r  S ∗ ( v W , z ) , p ∗ ( v W , z )  − r  S n , p n   ≥ E W ⌈ Λ T ⌉ X n =1 c 10 ϵ  ¯ K −   S n ∩ W    = c 10 ϵ  ¯ K ⌈ Λ T ⌉ − X i ∈ W E W [ e N i ]  , ∀ W ∈ W ¯ K . Denote W ( i ) ¯ K := { W ∈ W ¯ K : i ∈ W } and W ¯ K − 1 := { W ⊆ [ d ] : | W | = ¯ K − 1 } . W e take the prior P v , z to b e the uniform distribution o ver { ( v W , z ) : W ∈ W ¯ K } ; that is, sample W ∼ Unif ( W ¯ K ) and set ( v , z ) = ( v W , z ). Therefore, using the previous Inequality EC.85 on minimax rates E v , z ∼ P v , z R π ( T ; v , 0 , z ) / P [ M T ≥ ⌈ Λ T ⌉ ] ≥ inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) E v , z b R ∗ ( ⌈ Λ T ⌉ , π ′ , v , z ) ≥ inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) c 10 ϵ  ¯ K ⌈ Λ T ⌉ − 1 |W ¯ K | X W ∈W ¯ K X i ∈ W E W [ e N i ]  = inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) c 10 ϵ  ¯ K ⌈ Λ T ⌉ − 1 |W ¯ K | d X i =1 X W ∈W ( i ) ¯ K E W [ e N i ]  = inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) c 10 ϵ  ¯ K ⌈ Λ T ⌉ − 1 |W ¯ K | X W ∈W ¯ K − 1 X i / ∈ W E W ∪{ i } [ e N i ]  ≥ inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) c 10 ϵ  ¯ K ⌈ Λ T ⌉ − |W ¯ K − 1 | |W ¯ K | max W ∈W ¯ K − 1 X i / ∈ W E W ∪{ i } [ e N i ]  = inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) c 10 ϵ  ¯ K ⌈ Λ T ⌉ − |W ¯ K − 1 | |W ¯ K | max W ∈W ¯ K − 1 X i / ∈ W h E W ∪{ i } [ e N i ] − E W [ e N i ] + E W [ e N i ] i  . Since P i / ∈ W E W [ e N i ] ≤ P d i =1 E W [ e N i ] ≤ ¯ K ⌈ Λ T ⌉ and |W ¯ K − 1 | |W ¯ K | = ( d ¯ K − 1 ) ( d ¯ K ) = ¯ K ( d − ¯ K +1) ≤ 1 / 3 , we obtain E v , z ∼ P v , z R π ( T ; v , 0 , z ) P [ M T ≥ ⌈ Λ T ⌉ ] ≥ inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) c 10 ϵ  2 3 ¯ K ⌈ Λ T ⌉ − ¯ K d − ¯ K + 1 max W ∈W ¯ K − 1 X i / ∈ W   E W ∪{ i } [ e N i ] − E W [ e N i ]    . Pinsker’s ine quality. Finally , we derive an upp er b ound on   E W ∪{ i }  e N i  − E W  e N i    for any W ∈ W ¯ K − 1 :    E W  e N i  − E W ∪{ i }  e N i     ≤ ⌈ Λ T ⌉ X j =0 j    P W  e N i = j  − P W ∪{ i }  e N i = j     ≤⌈ Λ T ⌉ · ⌈ Λ T ⌉ X j =0    P W  e N i = j  − P W ∪{ i }  e N i = j     ≤ 2 ⌈ Λ T ⌉ ·   P W − P W ∪{ i }   TV ≤ ⌈ Λ T ⌉ · q 2KL( P W ∥ P W ∪{ i } ) . Here ∥ P W − P W ∪{ i } ∥ TV and KL( P W ∥ P W ∪{ i } ) denote the total v ariation distance and the Kullbac k–Leibler div ergence, respectively; the last inequality uses Pinsker’s inequalit y . The follo wing lemma provides an upp er b ound on the Kullback–Leibler divergence. Lemma EC.25. Supp ose ϵ ≤ 1 . F or any W ∈ W ¯ K − 1 and i ∈ [ d ] , KL( P W ∥ P W ∪{ i } ) ≤ c 11 · E W [ e N i ] · ϵ 2 , wher e c 11 > 0 is deﬁne d in Equation (EC.90) . Remark EC.3. When p l = Ω(log K ) and p h = Ω(log K ), we ha ve c 11 = Ω( 1 K ). When p l = Ω(1) and p h = Ω(1), we hav e c 11 = Ω( 1 K ). e-companion to Author: Poisson MNL ec29 Com bining Lemma EC.25 with the b ound ab ov e yields E v , z ∼ P v , z R π ( T ; v , 0 , z ) P [ M T ≥ ⌈ Λ T ⌉ ] ≥ inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) c 10 ϵ  2 ¯ K ⌈ Λ T ⌉ 3 − ¯ K ⌈ Λ T ⌉ d − ¯ K + 1 d X i =1 q 2 c 11 E W [ e N i ] ϵ 2  . Applying the Cauch y-Sch warz inequality , and since P d i =1 E W [ e N i ] ≤ ¯ K ⌈ Λ T ⌉ d X i =1 q 2 c 11 E W [ e N i ] ϵ 2 ≤ √ d · v u u t d X i =1 2 c 11 E W [ e N i ] ϵ 2 ≤ q 2 c 11 d ¯ K ⌈ Λ T ⌉ ϵ 2 . Th us, w e obtain E v , z ∼ P v , z R π ( T ; v , 0 , z ) ≥ sup W ∈W ¯ K  inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) b R ∗ ( ⌈ Λ T ⌉ , π ′ , v W , z )  P [ M T ≥ ⌈ Λ T ⌉ ] ≥ c 10 ϵ  2 ¯ K ⌈ Λ T ⌉ 3 − ¯ K ⌈ Λ T ⌉ d − ¯ K + 1 q 2 c 11 d ¯ K ⌈ Λ T ⌉ ϵ 2  P [ M T ≥ ⌈ Λ T ⌉ ] . (EC.86) T o further simplify Inequalit y (EC.86), we ﬁrst hav e the following low er b ound for the last probability term: Lemma EC.26. L et M T ∼ Poi( µ ) with µ = Λ T ≥ 1 . Then P  M T ≥ ⌈ µ ⌉  ≥ 1 2 − 1 √ 2 π > 0 . 1 . Next, w e will b ound the middle term by discussing tw o scenarios of Λ T and setting ϵ corresp ondingly . 1. If Λ T ≥ d z 2 18 c 11 ( d z − K + 1) max  1 , d z ¯ v 2  , set ϵ = s ( d − ¯ K + 1) 2 18 c 11 d ¯ K ⌈ Λ T ⌉ , then ϵ ∈ (0 , min { ¯ v / √ d z , 1 } ]. Substituting this choice of ϵ in to Inequality (EC.86) yields E v , z ∼ P v , z R π ( T ; v , 0 , z ) ≥ c 10 ( d − ¯ K + 1) √ ¯ K 90 √ 2 c 11 d p ⌈ Λ T ⌉ ≥ c 10 √ d ¯ K 120 √ 2 c 11 p ⌈ Λ T ⌉ ≥ c 10 120 √ 2 c 11 min  d z − K − 1 √ 3 , p d z K  p ⌈ Λ T ⌉ . 2. If Λ T < d z 2 18 c 11 ( d z − K + 1) max  1 , d z ¯ v 2  , w e set ϵ = min { ¯ v/ √ d z , 1 } . Substituting this c hoice of ϵ into Inequality (EC.86) yields E v , z ∼ P v , z R π ( T ; v , 0 , z ) ≥ c 10 ϵ ¯ K ⌈ Λ T ⌉ 30 = c 10 min { ¯ v / p d z , 1 } min n d z − K − 1 3 , K o ⌈ Λ T ⌉ 30 . Th us, w e claim that E v , z ∼ P v , z R π ( T ; v , 0 , z ) ≥ c 9 , 4 min ( 1 , s ¯ v 2 d z ) √ Λ T , for some p ositive constant c 9 , 4 only depends on K , p l , p h , ¯ v , d z . Besides, if K ≤ d z + 1 4 , w e hav e d = d z , ¯ K = K and lim inf T →∞ E v , z ∼ P v , z R π ( T ; v , 0 , z ) √ d z Λ T ≥ c 10 √ K 120 √ 2 c 11 := c l, 1 . (EC.87) Remark EC.4. When p l = Ω(log K ) and p h = Ω(log K ), we ha ve c l, 1 = Ω(log K ). When p l = Ω(1) and p h = Ω(1), we hav e c l, 1 = Ω( 1 K ). ec30 e-companion to Author: Poisson MNL EC.13.1.2 Pro of of Lemma EC.24 F or simplicit y , we deﬁne W 1 = S n ∩ ( W ∪ { d + 1 , d + 2 , . . . , N } ) , W 2 = S n \ ( W ∪ { d + 1 , d + 2 , . . . , N } ) . Note that r  S n , p n  = P j ∈ W 1 p j n exp( ϵ − p j n ) + P j ∈ W 2 p j n exp( − p j n ) 1 + P j ∈ W 1 exp( ϵ − p j n ) + P j ∈ W 2 exp( − p j n ) = X j ∈ W 1 p j n q n ( j ; v W ) = X j ∈ W 1 log  q n (0; v W ) q n ( j ; v W ) + ϵ  q n ( j ; v W ) + X j ∈ W 2 log  q n (0; v W ) q n ( j ; v W )  q n ( j ; v W ) Th us, r  S n , p n  is conca v e with resp ect to ( q n (0; v W ) , q n (1; v W ) , . . . , q n ( n ; v W )). Since all feasible ( q n (0; v W ) , q n (1; v W ) , . . . , q n ( n ; v W )) form a con vex set, the supremum of r  S n , p n  is attained when the v al- ues q n ( j ; v W ) are equal within W 1 and within W 2 . Consequently , the optimal prices are constant within each group: all products in W 1 share the same optimal price and all pro ducts in W 2 share the same optimal price. W e denote these prices by p 1 (for W 1 ) and p 2 (for W 2 ). W e ﬁrst show that the optimal prices satisfy p 1 = p 2 . Deﬁne D = 1 + | W 1 | exp( ϵ − p 1 ) + | W 2 | exp( − p 2 ) , N = | W 1 | p 1 exp( ϵ − p 1 ) + | W 2 | p 2 exp( − p 2 ) , so the p erio d- t reven ue is r ( p 1 , p 2 ) = N /D . If W 1 or W 2 is empty , then the problem reduces to a single price decision, and the conclusion trivially holds. When b oth W 1 and W 2 are not empty , a direct calculation yields ∂ r ∂ p 1 = | W 1 | exp( ϵ − p 1 ) D 2  (1 − p 1 ) D + N  , ∂ r ∂ p 2 = | W 2 | exp( − p 2 ) D 2  (1 − p 2 ) D + N  , and the identit y  (1 − p 1 ) D + N  −  (1 − p 2 ) D + N  = ( p 2 − p 1 ) D . Supp ose p 1  = p 2 . Consider the ascent direction δ p = ( δ p 1 , δ p 2 ) :=  D 2 ( p 2 − p 1 ) | W 1 | exp( ϵ − p 1 ) , D 2 ( p 1 − p 2 ) | W 2 | exp( − p 2 )  , b δ p := δ p ∥ δ p ∥ ∞ . The directional deriv ative at ( p 1 , p 2 ) along the direction b δ p is  ∇ r ( p 1 , p 2 ) , b δ p  = D ∥ δ p ∥ ∞ ( p 2 − p 1 ) 2 > 0 . Because p 1 , p 2 ∈ [ ¯ p l , ¯ p h ], for any η ∈ (0 , | p 2 − p 1 | ] the p erturb ed p oint ( p 1 , p 2 ) + η b δ p remains feasible. Therefore, b δ p is a feasible direction, and ﬁrst-order necessary optimalit y requires ev ery feasible directional deriv ativ e to b e nonp ositive at an optimal p oint. This con tradiction shows that p 1  = p 2 cannot o ccur. Hence ev ery optimal solution must satisfy p 1 = p 2 . Since p 1 = p 2 , w e hav e r  S ∗ ( v W , z ) , p ∗ ( v W , z )  − r  S n , p n  ≥ K p 1 exp( ϵ − p 1 ) 1 + K exp( ϵ − p 1 ) − | W 1 | p 1 exp( ϵ − p 1 ) + | W 2 | p 1 exp( − p 1 ) 1 + | W 1 | exp( ϵ − p 1 ) + | W 2 | exp( − p 1 ) = | W 2 | p 1  exp( ϵ − p 1 ) − exp( − p 1 )   1 + K exp( ϵ − p 1 )  1 + | W 1 | exp( ϵ − p 1 ) + | W 2 | exp( − p 1 )  . Note that for small ϵ > 0, exp( ϵ − p 1 ) − exp( − p 1 ) = exp( − p 1 )  exp( ϵ ) − 1  ≥ exp( − p 1 ) ϵ , we further obtain r  S ∗ ( v W , z ) , p ∗ ( v W , z )  − r  S n , p n  ≥ | W 2 | ϵ p 1 exp( − p 1 )  1 + K exp(1 − p 1 )  2 ≥ | W 2 | ϵ p l exp( − p h )  1 + K exp(1 − p l )  2 . F or simplicit y , we denote c 10 = p l exp( − p h )  1 + K exp(1 − p l )  2 , (EC.88) whic h giv es the ﬁnal statement. Q.E.D. e-companion to Author: Poisson MNL ec31 EC.13.1.3 Pro of of Lemma EC.25 KL ( P W ( · | S n , p n ) ∥ P W ∪{ i } ( · | S n , p n )) = X j ∈ S n ∪{ 0 } q ( j, S n , p n , z n ; v W ) log q ( j, S n , p n , z n ; v W ) q ( j, S n , p n , z n ; v W ∪{ i } ) ≤ X j q ( j, S n , p n , z n ; v W ) q ( j, S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) q ( j, S n , p n , z n ; v W ∪{ i } ) = X j | q ( j, S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) | 2 q ( j, S n , p n , z n ; v W ∪{ i } ) . where the inequality holds due to log(1 + y ) ≤ y for all y > − 1. Because q ( j, S n , p n , z n ; v W ∪{ i } ) ≥ exp( − p h ) / (1 + K exp( ϵ − p l )) for all j ∈ S n ∪ { 0 } . Thus, the inequalit y ab o ve is reduced to KL ( P W ( · | S n , p n ) ∥ P W ∪{ i } ( · | S n , p n )) ≤ exp( p h )( K exp( ϵ − p l ) + 1) · X j ∈ S n ∪{ 0 } | q ( j, S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) | 2 . (EC.89) W e next upp er b ound | q ( j , S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) | in several scenarios separately . Sc enario 1 : j = 0 . W e hav e | q ( j, S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) | =      1 1 + P k ∈ S n exp ( z ⊤ kn v W − p k ) − 1 1 + P k ∈ S n exp ( z ⊤ kn v W ∪{ i } − p k )      =      P k ∈ S n [exp ( z ⊤ kn v W ∪{ i } − p k ) − exp ( z ⊤ kn v W − p k )]  1 + P k ∈ S n exp ( z ⊤ kn v W − p k )   1 + P k ∈ S n exp ( z ⊤ kn v W ∪{ i } − p k )       ≤ 1 (1 + K exp( − p h )) 2 · exp( ϵ − p l ) X k ∈ S n   z ⊤ kn ( v W − v W ∪{ i } )   ≤ exp( ϵ − p l ) ϵ (1 + K exp( − p h )) 2 . Here, we use exp( − p h ) ≤ exp ( z ⊤ kn v W − p k ) ≤ exp( ϵ − p l ), exp( − p h ) ≤ exp ( z ⊤ kn v W ∪{ i } − p k ) ≤ exp( ϵ − p l ), and | exp( a ) − exp( b ) | ≤ exp(max { a, b } ) | a − b | . Sc enario 2 : j > 0 and i  = j . W e hav e | q ( j, S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) | =      exp  z ⊤ j n v W − p j  1 + P k ∈ S n exp ( z ⊤ kn v W − p k ) − exp  z ⊤ j n v W ∪{ i } − p j  1 + P k ∈ S n exp ( z ⊤ kn v W ∪{ i } − p k )      = exp  z ⊤ j n v W − p j       1 1 + P k ∈ S n exp ( z ⊤ kn v W − p k ) − 1 1 + P k ∈ S n exp ( z ⊤ kn v W ∪{ i } − p k )      ≤ exp( ϵ − p l ) exp( ϵ − p l ) ϵ (1 + K exp( − p h )) 2 = exp(2 ϵ − 2 p l ) ϵ (1 + K exp( − p h )) 2 . The last tw o (in)equalities holds b ecause exp  z ⊤ j n v W − p j  = exp  z ⊤ j n v W ∪{ i } − p j  ≤ exp( ϵ − p l ) for i  = j . Sc enario 3 : j = i . W e ha ve | q ( j, S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) | =      exp  z ⊤ j n v W − p j  1 + P k ∈ S n exp ( z ⊤ kn v W − p k ) − exp  z ⊤ j n v W ∪{ i } − p j  1 + P k ∈ S n exp ( z ⊤ kn v W ∪{ i } − p k )      ≤ exp  z ⊤ j n v W − p j       1 1 + P k ∈ S n exp ( z ⊤ kn v W − p k ) − 1 1 + P k ∈ S n exp ( z ⊤ kn v W ∪{ i } − p k )      +   exp  z ⊤ j n v W − p j  − exp  z ⊤ j n v W ∪{ i } − p j    1 1 + P k ∈ S n exp ( z ⊤ kn v W ∪{ i } − p k ) ≤ exp(2 ϵ − 2 p l ) ϵ (1 + K exp( − p h )) 2 + exp( ϵ − p l ) ϵ · 1 1 + K exp( − p h ) ≤ exp(2 ϵ − 2 p l ) ϵ (1 + K exp( − p h )) 2 + exp( ϵ − p l ) ϵ 1 + K exp( − p h ) . ec32 e-companion to Author: Poisson MNL Com bining all upper b ounds on | q ( j, S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) | and Equation (EC.89), we ha v e KL ( P W ( · | S n , p n ) ∥ P W ∪{ i } ( · | S n , p n )) ≤ exp( p h )(1 + K exp( ϵ − p l )) ·  exp(4 ϵ − 4 p l ) ϵ 2 (1 + K exp( − p h )) 4 ( K + 1) + exp(2 ϵ − 2 p l ) ϵ 2 (1 + K exp( − p h )) 4 + 2 exp(2 ϵ − 2 p l ) ϵ 2 (1 + K exp( − p h )) 2  ≤ exp( p h )(1 + K exp( ϵ − p l )) ·  exp(4 − 4 p l ) ϵ 2 (1 + K exp( − p h )) 4 ( K + 1) + exp(2 − 2 p l ) ϵ 2 (1 + K exp( − p h )) 4 + 2 exp(2 − 2 p l ) ϵ 2 (1 + K exp( − p h )) 2  = c 11 ϵ 2 , where c 11 = exp( p h )(1 + K exp(1 − p l ))  exp(4 − 4 p l ) (1 + K exp( − p h )) 4 ( K + 1) + exp(2 − 2 p l ) (1 + K exp( − p h )) 4 + 2 exp(2 − 2 p l ) (1 + K exp( − p h )) 2  . (EC.90) The ab ov e upp er b ound holds for arbitrary S n . If i / ∈ S n , we can calculate the KL div ergence exactly: KL  P W ( · | S n , p n )   P W ∪{ i } ( · | S n , p n )  = 0. Th us, by the c hain rule for KL divergence for sequen tial observ a- tions, KL  P W ∥ P W ∪{ i }  = E W " log Q ⌈ Λ T ⌉ n =1 P W  j n | H ′ n − 1  Q ⌈ Λ T ⌉ n =1 P W ∪{ i }  j n | H ′ n − 1  # = E W " ⌈ Λ T ⌉ X n =1 log P W  j n | H ′ n − 1  P W ∪{ i }  j n | H ′ n − 1  # = ⌈ Λ T ⌉ X n =1 E W " log P W  j n | H ′ n − 1  P W ∪{ i }  j n | H ′ n − 1  # = ⌈ Λ T ⌉ X n =1 E W  KL  P W ( j n | H ′ n − 1 )   P W ∪{ i } ( j n | H ′ n − 1 )  = E W " ⌈ Λ T ⌉ X n =1 KL  P W ( j n | S n , p n )   P W ∪{ i } ( j n | S n , p n )  # ≤ E W " T X n =1 c 11 ϵ 2 1 { i ∈ S n } # = c 11 ϵ 2 E W  e N i  . whic h pro v es the lemma. Q.E.D. EC.13.1.4 Pro of of Lemma EC.26 Let m := Med( M T ) b e the median, whic h is deﬁned to be the least integer such that P ( M T ≤ m ) ≥ 1 2 . Since µ ≥ 1, we hav e m ≥ 1 (otherwise m = 0 would force P ( M T = 0) ≥ 1 / 2, i.e., µ ≤ log 2 < 1, a contradiction arises). By the classical b ounds for Poisson medians (see Choi (1994)), we hav e − log 2 ≤ m − µ < 1 3 . W e ha ve ⌈ µ ⌉ − 1 ≤ ⌈ µ − ln 2 ⌉ ≤ m ≤ ⌊ µ + 1 3 ⌋ ≤ ⌈ µ ⌉ . Hence ⌈ µ ⌉ − 1 ≤ m ≤ ⌈ µ ⌉ , so m ∈ {⌈ µ ⌉ − 1 , ⌈ µ ⌉} . Consider tw o cases. Case 1: m = ⌈ µ ⌉ . By the deﬁnition of the median, P ( M T ≤ m − 1) ≤ 1 / 2. Therefore, P ( M T ≥ m ) ≥ 1 / 2, and th us P ( M T ≥ ⌈ µ ⌉ ) ≥ 1 / 2. Case 2: m = ⌈ µ ⌉ − 1. Then P ( M T ≥ ⌈ µ ⌉ ) = P ( M T ≥ m + 1) = P ( M T ≥ m ) − P ( M T = m ) ≥ 1 2 − P ( M T = m ) . T o b ound P ( M T = m ) uniformly in µ , note that for ﬁxed integer m ≥ 0, the function g ( µ ) = e xp( − µ ) µ m is maximized at µ = m ; hence P ( M T = m ) = exp( − µ ) µ m m ! ≤ exp( − m ) m m m ! ≤ 1 √ 2 π m , where the last step uses Stirling’s lo wer b ound m ! ≥ √ 2 π m ( m/e ) m . Note that m ≥ 1 , w e further get P ( M T = m ) ≤ 1 / √ 2 π . Therefore, P ( M T ≥ ⌈ µ ⌉ ) ≥ 1 2 − 1 √ 2 π > 0 . 1 . Com bining the tw o cases completes the pro of. Q.E.D. e-companion to Author: Poisson MNL ec33 EC.13.1.5 Adv ersarial construction I I. Supp ose min { d z − 2 , N } ≥ K , d z ≥ 4 , Λ ≥ 1. Inst ance EC.13.4. (W orst-case instance I I) First, w e let H ( p ) := − p log p − (1 − p ) log(1 − p ) , and deﬁne d := min ($ log  ( N − d z ) /K  − 1 4 log 3 H (1 / 4) % , d z ) , ¯ K := j d + 1 4 k . (EC.91) Supp ose log N − d z K ≥ 1 4 log 3 + 4 H (1 / 4) , (EC.92) whic h in particular guarantees d ≥ 4 (hence d > 3) and d ≤ d z under the choice (EC.91). The follo wing lemma v eriﬁes that the explicit choice of ( d, ¯ K ) in (EC.91) satisﬁes several properties required in subsequent analysis. Lemma EC.27. L et d and ¯ K b e chosen as in (EC.91). If c ondition (EC.92) holds, then K  d ¯ K  + d z − d ≤ N . Next, w e ﬁx ϵ ∈ (0 , min { ¯ v / √ ¯ K , 1 } ], and deﬁne a set of d z -dimension v ectors, V , as follo ws. F or eac h subset W ⊆ [ d ] with | W | = ¯ K , deﬁne the corresp onding parameter vector v W ∈ R d z as v W ( i ) = ( ϵ, i ∈ W , 0 , i / ∈ W, i ∈ [ d z ] . Since ∥ v W ∥ 2 ≤ ¯ v , this construction satisﬁes the b oundedness condition in Assumption 3. Collecting these vectors gives the parameter set V = { v W : W ⊆ [ d ] , | W | = ¯ K } = { v W : W ∈ W ¯ K } , where, for simplicity of notation, W ¯ K denotes the class of all subsets of [ d ] whose size is ¯ K . Definition EC.3. (F eature construction) The (time-inv arian t) pro duct features are constructed as fol- lo ws. F or each U ∈ W ¯ K , deﬁne z U ∈ R d z b y z U ( i ) = ( 1 / √ ¯ K , i ∈ U, 0 , i / ∈ U, i ∈ [ d z ] , and for each j ∈ { d + 1 , . . . , d z } , let z { j } b e the j -th standard basis vector scaled by ϵ , i.e., [ z { j } ] i = ( ϵ, i = j, 0 , i  = j. The catalog contains at least K pro ducts with feature vector z U for ev ery U ∈ W ¯ K , and one pro duct with feature vector z { j } for each j ∈ { d + 1 , . . . , d z } ; Lemma EC.27 shows that the total num b er of pro ducts do es not exceed N . Besides the aforementioned pro ducts, the rest of the pro ducts all hav e feature z { d z } . It is straightforw ard that ∥ z U ∥ 2 ≤ 1 and ∥ z { i } ∥ 2 ≤ 1. So this construction of features satisﬁes the b ound- edness requiremen t in Assumption 7. Clearly for any z such that ∥ z ∥ ≤ 1, z ⊤ v W ≤ ∥ v W ∥ = ϵ √ ¯ K , which is attained b y z = z W . Hence, selecting the K products with feature z W giv es the optimal assortment of size K . R e gr et of e ach customer. Next, w e derive an explicit low er b ound on the p er-customer regret. By the construction of pro duct features, we can supp ose assortment S n con tains K pro ducts with features z U 1 ,n , z U 2 ,n , · · · , z U K,n , where the set U i,n is either in W ¯ K or {{ d 1 } , · · · , { d z }} . W e single out the feature of the product in the assortment S n with the highest intrinsic v alue and denote its corresp onding set to b e ˜ U n = arg max 1 ≤ k ≤ K ⟨ z U k,n , v W ⟩ . W e use E W and P W to denote the exp ectation and probability measure under the mo del param- eterized b y v W and the p olicy π ′ , resp ectively . The following lemma establishes a low er b ound for r  S ∗ ( v W , z ) , p ∗ ( v W , z )  − r  S t , p t  b y comparing e U n with W . ec34 e-companion to Author: Poisson MNL Lemma EC.28. Supp ose ϵ ∈ (0 , ¯ v/ √ ¯ K ] . Then, r  S ∗ ( v W , z ) , p ∗ ( v W , z )  − r  S n , p n  ≥ c 12 ϵ √ ¯ K  ¯ K −   e U n ∩ W    , wher e c 12 := K ¯ p l exp( − ¯ p h )  K exp( ¯ v − ¯ p l )+1  2 . Remark EC.5. When p l = Ω(log K ) and p h = Ω(log K ), we ha ve c 12 = Ω(log K ). When p l = Ω(1) and p h = Ω(1), we hav e c 12 = Ω( 1 K ). Next, we establish a low er b ound on the cumulativ e regret. Deﬁne e N i := P ⌈ Λ T ⌉ n =1 1 { i ∈ e U n } . By Lemma EC.28, it follows that b R ∗ ( ⌈ Λ T ⌉ , π ′ , v W , z ) = E W ⌈ Λ T ⌉ X n =1  r  S ∗ ( v W , z ) , p ∗ ( v W , z )  − r  S n , p n   ≥ E W ⌈ Λ T ⌉ X n =1 c 12 ϵ √ ¯ K  ¯ K −   e U n ∩ W    = c 12 ϵ √ ¯ K  ¯ K ⌈ Λ T ⌉ − X i ∈ W E W [ e N i ]  , ∀ W ∈ W ¯ K . Denote W ( i ) ¯ K := { W ∈ W ¯ K : i ∈ W } and W ¯ K − 1 := { W ⊆ [ d ] : | W | = ¯ K − 1 } . W e giv e v a uniform distribution o v er W ¯ K , and ﬁx the feature construction to be Deﬁnition EC.3. W e use P v , z to denote this join t distribution. Using Inequalit y (EC.85) on minimax rate gives E v , z ∼ P v , z R π ( T ; v , 0 , z ) / P [ M T ≥ ⌈ Λ T ⌉ ] ≥ inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) E v , z b R ∗ ( ⌈ Λ T ⌉ , π ′ , v , z ) ≥ inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) c 12 ϵ √ ¯ K  ¯ K ⌈ Λ T ⌉ − 1 |W ¯ K | X W ∈W ¯ K X i ∈ W E W [ e N i ]  = inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) c 12 ϵ √ ¯ K  ¯ K ⌈ Λ T ⌉ − 1 |W ¯ K | d X i =1 X W ∈W ( i ) ¯ K E W [ e N i ]  = inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) c 12 ϵ √ ¯ K  ¯ K ⌈ Λ T ⌉ − 1 |W ¯ K | X W ∈W ¯ K − 1 X i / ∈ W E W ∪{ i } [ e N i ]  ≥ inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) c 12 ϵ √ ¯ K  ¯ K ⌈ Λ T ⌉ − |W ¯ K − 1 | |W ¯ K | max W ∈W ¯ K − 1 X i / ∈ W E W ∪{ i } [ e N i ]  = inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) c 12 ϵ √ ¯ K  ¯ K ⌈ Λ T ⌉ − |W ¯ K − 1 | |W ¯ K | max W ∈W ¯ K − 1 X i / ∈ W h E W ∪{ i } [ e N i ] − E W [ e N i ] + E W [ e N i ] i  . Since P i / ∈ W E W [ e N i ] ≤ P d i =1 E W [ e N i ] ≤ ¯ K ⌈ Λ T ⌉ and |W ¯ K − 1 | |W ¯ K | = ( d ¯ K − 1 ) ( d ¯ K ) = ¯ K ( d − ¯ K +1) ≤ 1 / 3 , we obtain E v , z ∼ P v , z R π ( T ; v , 0 , z ) P [ M T ≥ ⌈ Λ T ⌉ ] ≥ inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) c 12 ϵ √ ¯ K  2 3 ¯ K ⌈ Λ T ⌉ − ¯ K d − ¯ K + 1 max W ∈W ¯ K − 1 X i / ∈ W   E W ∪{ i } [ e N i ] − E W [ e N i ]    . Pinsker’s ine quality. Finally , we derive an upp er b ound on   E W ∪{ i }  e N i  − E W  e N i    for any W ∈ W ¯ K − 1 :    E W  e N i  − E W ∪{ i }  e N i     ≤ ⌈ Λ T ⌉ X j =0 j    P W  e N i = j  − P W ∪{ i }  e N i = j     ≤⌈ Λ T ⌉ · ⌈ Λ T ⌉ X j =0    P W  e N i = j  − P W ∪{ i }  e N i = j     ≤ 2 ⌈ Λ T ⌉ ·   P W − P W ∪{ i }   TV ≤ ⌈ Λ T ⌉ · q 2 KL( P W ∥ P W ∪{ i } ) . Here ∥ P W − P W ∪{ i } ∥ TV and KL( P W ∥ P W ∪{ i } ) denote the total v ariation distance and the Kullbac k–Leibler div ergence, respectively; the last inequality uses Pinsker’s inequalit y . F or every i ∈ [ d ] deﬁne random v ariables N i := P ⌈ Λ T ⌉ t =1 P z U ∈ S t 1 { i ∈ U } . The following lemma provides an upp er bound on the Kullback–Leibler divergence. e-companion to Author: Poisson MNL ec35 Lemma EC.29. Supp ose ϵ < 1 . F or any W ∈ W ¯ K − 1 and i ∈ [ d ] , KL( P W ∥ P W ∪{ i } ) ≤ c 13 · E W [ N i ] · ϵ 2 ¯ K , wher e c 13 is deﬁne d in Equation (EC.100) . Remark EC.6. When p l = Ω(log K ) and p h = Ω(log K ), we ha ve c 13 = Ω( 1 K ). When p l = Ω(1) and p h = Ω(1), we hav e c 13 = Ω( 1 K ). Using Lemma EC.29 with the b ound ab o ve yields E v , z ∼ P v , z R π ( T ; v , 0 , z ) / P [ M T ≥ ⌈ Λ T ⌉ ] ≥ inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) c 12 ϵ √ ¯ K  2 ¯ K ⌈ Λ T ⌉ 3 − ¯ K ⌈ Λ T ⌉ d − ¯ K + 1 d X i =1 r 2 c 13 E W [ N i ] ϵ 2 ¯ K  . Applying the Cauch y-Sch warz inequality , d X i =1 r 2 c 13 E W [ N i ] ϵ 2 ¯ K ≤ √ d · v u u t d X i =1 2 c 13 E W [ N i ] ϵ 2 ¯ K . Since P d i =1 E W [ N i ] ≤ K ¯ K ⌈ Λ T ⌉ , we further hav e d X i =1 r 2 c 13 E W [ N i ] ϵ 2 ¯ K ≤ p 2 c 13 d K ⌈ Λ T ⌉ ϵ 2 . Th us, w e obtain E v , z ∼ P v , z R π ( T ; v , 0 , z ) ≥ sup W ∈W ¯ K  inf π ′ ∈A ′ ( ⌈ Λ T ⌉ ) b R ∗ ( ⌈ Λ T ⌉ , π ′ , v W , z )  P [ M T ≥ ⌈ Λ T ⌉ ] ≥ c 12 ϵ √ ¯ K  2 ¯ K ⌈ Λ T ⌉ 3 − ¯ K ⌈ Λ T ⌉ d − ¯ K + 1 p 2 c 13 d K ⌈ Λ T ⌉ ϵ 2  P [ M T ≥ ⌈ Λ T ⌉ ] . (EC.93) Note that by Lemma EC.26 , P ( M T ≥ ⌈ Λ T ⌉ ) ≥ 0 . 1. Next, we will bound the middle term b y discussing t w o scenarios of Λ T and setting ϵ corresp ondingly . • If Λ T ≥ log  ( N − d z ) /K  − 1 4 log 3 H (1 / 4) + 5 32 c 13 K max          1 , log  ( N − d z ) /K  − 1 4 log 3 H (1 / 4) + 1 4 ¯ v 2          , b y setting ϵ = s ( d − ¯ K + 1) 2 18 c 13 d K ⌈ Λ T ⌉ , we hav e ϵ ∈ (0 , min { ¯ v / √ ¯ K , 1 } ). Substituting this choice of ϵ into Inequalit y (EC.93) yields E v , z ∼ P v , z R π ( T ; v , 0 , z ) ≥ c 12 ( d − ¯ K + 1) √ ¯ K 90 √ 2 c 13 dK p ⌈ Λ T ⌉ ≥ c 12 √ d ¯ K 120 √ 2 c 13 K p ⌈ Λ T ⌉ ≥ c 12 240 √ 2 c 13 K min  log(( N − d z ) /K ) − 1 4 log 3 H (1 / 4) , d z  p ⌈ Λ T ⌉ . • If Λ T < log  ( N − d z ) /K  − 1 4 log 3 H (1 / 4) + 5 32 c 13 K max          1 , log  ( N − d z ) /K  − 1 4 log 3 H (1 / 4) + 1 4 ¯ v 2          , w e set ϵ = min { ¯ v/ √ ¯ K , 1 } . Substituting this choice of ϵ into Inequality (EC.93) yields E v , z ∼ P v , z R π ( T ; v , 0 , z ) ≥ c 12 min { ¯ v / √ ¯ K , 1 } √ ¯ K ¯ K ⌈ Λ T ⌉ 30 ≥ c 12 min { ¯ v / √ d z , 1 } 30 v u u u u t $ j log(( N − d z ) /K ) − 1 4 log 3 H (1 / 4) k + 1 4 % ⌈ Λ T ⌉ . ec36 e-companion to Author: Poisson MNL Com bining t w o cases, we hav e E v , z ∼ P v , z R π ( T ; v , 0 , z ) ≥ c 9 , 5 q log  ( N − d z ) /K  min { ¯ v / p d z , 1 } √ Λ T , for some p ositive constant c 9 , 2 only depends on K , p l , p h , ¯ v , d z . Besides, if d z ≤  log  ( N − d z ) /K  − 1 4 log 3 H (1 / 4)  , w e hav e d = d z and lim inf T →∞ E v , z ∼ P v , z R π ( T ; v , 0 , z ) d z √ Λ T ≥ c 12 120 √ 2 c 13 K := c l, 2 . (EC.94) Remark EC.7. When p l = Ω(log K ) and p h = Ω(log K ), we ha ve c l, 2 = Ω(log K ). When p l = Ω(1) and p h = Ω(1), we hav e c l, 2 = Ω( 1 K ). EC.13.1.6 Pro of of Lemma EC.27 Set k := ¯ K = ⌊ ( d + 1) / 4 ⌋ and α := k /d . W e use tw o standard ingredien ts. (i) Entr opy upp er b ound for binomials. F or any integers 0 ≤ k ≤ d ,  d k  ≤ exp  d H ( k /d )  , (EC.95) whic h follo ws from P d j =0  d j  p j (1 − p ) d − j = 1 by taking p = k /d . (ii) Supp orting-line b ound at p = 1 4 . Since H is conca ve on [0 , 1] and H ′ ( p ) = log  1 − p p  , H ( x ) ≤ H  1 4  +  x − 1 4  log 3 ( x ∈ [0 , 1]) . (EC.96) Because k = ⌊ ( d + 1) / 4 ⌋ , we hav e α = k d ≤ 1 4 + 1 4 d ; hence by (EC.96), H ( α ) ≤ H  1 4  + log 3 4 d . (EC.97) Com bining (EC.95) and (EC.97) gives  d k  ≤ exp  d H ( 1 4 ) + 1 4 log 3  = e dH (1 / 4) 3 1 / 4 . (EC.98) By the explicit choice (EC.91), d ≤ log  ( N − d z ) /K  − 1 4 log 3 H (1 / 4) , i.e., e dH (1 / 4) 3 1 / 4 ≤ N − d z K . Applying this to (EC.98) yields K  d ¯ K  ≤ N − d z , hence K  d ¯ K  + d z − d ≤ N , as claimed. The identit y ¯ K = ⌊ ( d + 1) / 4 ⌋ holds by construction in (EC.91). Finally , note that the condition (EC.92) implies log  ( N − d z ) /K  − 1 4 log 3 ≥ 4 H (1 / 4) , so the right-hand side of (EC.91) is at least 4; thus d ≥ 4 (in particular d > 3), as required for the instance. Q.E.D. EC.13.1.7 Pro of of Lemma EC.28 Because all pro ducts in e S n ha v e the same feature vector z e U n , the optimal price for each product in e S n is identical. Let this price be p ∗ e U n and the corresp onding price vector b e p ∗ e U n . Then r  S ∗ ( v W , z ) , p ∗ ( v W , z )  − sup p r  e S n , p  ≥ r  S ∗ ( v W , z ) , p ∗ e U n  − r  e S n , p ∗ e U n  = K p ∗ e U n exp( v ⊤ W z W − p ∗ e U n ) K exp( v ⊤ W z W − p ∗ e U n ) + 1 − K p ∗ e U n exp( v ⊤ W z e U n − p ∗ e U n ) K exp( v ⊤ W z e U n − p ∗ e U n ) + 1 = K p ∗ e U n exp( − p ∗ e U n )  exp( v ⊤ W z W ) − exp( v ⊤ W z e U n )   K exp( v ⊤ W z W − p ∗ e U n ) + 1  K exp( v ⊤ W z e U n − p ∗ e U n ) + 1  ( a ) ≥ K ¯ p l exp( − ¯ p h )  K exp( ¯ v − ¯ p l ) + 1  2  exp( v ⊤ W z W ) − exp( v ⊤ W z e U n )  ≥ K ¯ p l exp( v ⊤ W z e U n − ¯ p h )  K exp( ¯ v − ¯ p l ) + 1  2  exp  ϵ √ ¯ K  ¯ K − | e U n ∩ W |   − 1  ≥ K ¯ p l exp( − ¯ p h )  K exp( ¯ v − ¯ p l ) + 1  2 · ϵ √ ¯ K  ¯ K − | e U n ∩ W |  , e-companion to Author: Poisson MNL ec37 where ( a ) uses the facts that ¯ p l ≤ p ∗ e U n ≤ ¯ p h and v ⊤ W z · ≤ ¯ v . W e also used that v ⊤ W z W − v ⊤ W z e U n = ϵ √ ¯ K  ¯ K − | e U n ∩ W |  ≥ 0 and exp( x ) − 1 ≥ x for all x ≥ 0. Setting c 12 := K ¯ p l exp( − ¯ p h )  K exp( ¯ v − ¯ p l ) + 1  2 , w e obtain the stated inequality . Q.E.D. EC.13.1.8 Pro of of Lemma EC.29 KL ( P W ( · | S n , p n ) ∥ P W ∪{ i } ( · | S n , p n )) = X j ∈ S n ∪{ 0 } q ( j, S n , p n , z n ; v W ) log q ( j, S n , p n , z n ; v W ) q ( j, S n , p n , z n ; v W ∪{ i } ) ≤ X j q ( j, S n , p n , z n ; v W ) q ( j, S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) q ( j, S n , p n , z n ; v W ∪{ i } ) = X j | q ( j, S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) | 2 q ( j, S n , p n , z n ; v W ∪{ i } ) . where the inequality holds because log(1 + y ) ≤ y for all y > − 1. Because q ( j, S n , p n , z n ; v W ∪{ i } ) ≥ e p h (1 + K e ϵ − p l ) for all j ∈ S n ∪ { 0 } . Thus, the inequalit y ab o ve is reduced to KL ( P W ( · | S n , p n ) ∥ P W ∪{ i } ( · | S n , p n )) ≤ e p h (1 + K e ϵ − p l ) X j ∈ S n ∪{ 0 } | q ( j, S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) | 2 . (EC.99) W e next upp er b ound | q ( j , S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) | separately . First consider j = 0. W e hav e | q ( j, S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) | =      1 1 + P k ∈ S n exp ( z ⊤ kn v W − p k ) − 1 1 + P k ∈ S n exp ( z ⊤ kn v W ∪{ i } − p k )      =      P k ∈ S n [exp ( z ⊤ kn v W ∪{ i } − p k ) − exp ( z ⊤ kn v W − p k )]  1 + P k ∈ S n exp ( z ⊤ kn v W − p k )   1 + P k ∈ S n exp ( z ⊤ kn v W ∪{ i } − p k )       ≤ 1 (1 + K exp( − p h )) 2 · exp( ϵ − p l ) X k ∈ S n   z ⊤ kn ( v W − v W ∪{ i } )   ≤ exp( ϵ − p l ) ϵ √ ¯ K (1 + K exp( − p h )) 2 X z U ∈ S n 1 { i ∈ U } . Here, we use exp( − p h ) ≤ exp ( z ⊤ kn v W − p k ) ≤ exp( ϵ − p l ), exp( − p h ) ≤ exp ( z ⊤ kn v W ∪{ i } − p k ) ≤ exp( ϵ − p l ), and | exp( a ) − exp( b ) | ≤ exp(max { a, b } ) | a − b | . F or j > 0 corresp onding to z j n = z U and i / ∈ U , w e hav e | q ( j, S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) | =      exp  z ⊤ j n v W − p j  1 + P k ∈ S n exp ( z ⊤ kn v W − p k ) − exp  z ⊤ j n v W ∪{ i } − p j  1 + P k ∈ S n exp ( z ⊤ kn v W ∪{ i } − p k )      = exp  z ⊤ j n v W − p j       1 1 + P k ∈ S n exp ( z ⊤ kn v W − p k ) − 1 1 + P k ∈ S n exp ( z ⊤ kn v W ∪{ i } − p k )      ≤ exp( ϵ − p l ) exp( ϵ − p l ) ϵ √ ¯ K (1 + K exp( − p h )) 2 X z U ∈ S n 1 { i ∈ U } = exp(2 ϵ − 2 p l ) ϵ √ ¯ K (1 + K exp( − p h )) 2 X z U ∈ S n 1 { i ∈ U } . Here the inequalit y holds b ecause exp  z ⊤ j n v W − p j  = exp  z ⊤ j n v W ∪{ i } − p j  ≤ exp( ϵ − p l ), since i / ∈ U . Moreo v er, the n umber of indices j satisfying this condition is K − X z U ∈ S n 1 { i ∈ U } . F or j > 0 corresp onding to z j n = z U and i ∈ U , we hav e ec38 e-companion to Author: Poisson MNL | q ( j, S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) | =      exp  z ⊤ j n v W − p j  1 + P k ∈ S n exp ( z ⊤ kn v W − p k ) − exp  z ⊤ j n v W ∪{ i } − p j  1 + P k ∈ S n exp ( z ⊤ kn v W ∪{ i } − p k )      ≤ exp  z ⊤ j n v W − p j       1 1 + P k ∈ S n exp ( z ⊤ kn v W − p k ) − 1 1 + P k ∈ S n exp ( z ⊤ kn v W ∪{ i } − p k )      +   exp  z ⊤ j n v W − p j  − exp  z ⊤ j n v W ∪{ i } − p j    1 1 + P k ∈ S n exp ( z ⊤ kn v W ∪{ i } − p k ) ≤ exp(2 ϵ − 2 p l ) ϵ √ ¯ K (1 + K exp( − p h )) 2 X z U ∈ S n 1 { i ∈ U } + exp( ϵ − p l ) ϵ √ ¯ K · 1 1 + K exp( − p h ) ≤ exp(2 ϵ − 2 p l ) ϵ √ ¯ K (1 + K exp( − p h )) 2 X z U ∈ S n 1 { i ∈ U } + exp( ϵ − p l ) ϵ √ ¯ K (1 + K exp( − p h )) . Besides, the num b er of indices j satisfying this condition is P z U ∈ S n 1 { i ∈ U } . Com bining all upper b ounds on | q ( j, S n , p n , z n ; v W ) − q ( j, S n , p n , z n ; v W ∪{ i } ) | and Equation (EC.99), we ha v e KL ( P W ( · | S n , p n ) ∥ P W ∪{ i } ( · | S n , p n )) ≤ exp( p h )(1 + K exp( ϵ − p l )) ϵ 2 / ¯ K  exp(2 ϵ − 2 p l ) (1 + K exp( − p h )) 4 X z U ∈ S n 1 { i ∈ U } ! 2 + exp(4 ϵ − 4 p l ) (1 + K exp( − p h )) 4 X z U ∈ S n 1 { i ∈ U } ! 2 ¯ K − X z U ∈ S n 1 { i ∈ U } ! + 8 exp(4 ϵ − 4 p l ) (1 + K exp( − p h )) 4 X z U ∈ S n 1 { i ∈ U } ! 2 X z U ∈ S n 1 { i ∈ U } + 2 exp(2 ϵ − 2 p l ) (1 + K exp( − p h )) 2 X z U ∈ S n 1 { i ∈ U }  ≤ c 13 ϵ 2 ¯ K X z U ∈ S n 1 { i ∈ U } , where w e use P z U ∈ S n 1 { i ∈ U } ≤ ¯ K , ϵ ≤ 1 and c 13 = e p h (1 + K e ϵ − p l )  2 e 4 − 4 p l ¯ K (1 + K e − p h ) 4 + e 2 − 2 p l ¯ K (1 + K e − p h ) 4 + 2 e 2 − 2 p l (1 + K e − p h )) 2  . (EC.100) Finally , summing o ver n = 1 , . . . , ⌈ Λ T ⌉ and applying the chain rule for KL div ergence (together with the to w er prop ert y) yields KL  P W ∥ P W ∪{ i }  = E W " log Q ⌈ Λ T ⌉ n =1 P W  j n | H ′ n − 1  Q ⌈ Λ T ⌉ n =1 P W ∪{ i }  j n | H ′ n − 1  # = E W " ⌈ Λ T ⌉ X n =1 log P W  j n | H ′ n − 1  P W ∪{ i }  j n | H ′ n − 1  # = ⌈ Λ T ⌉ X n =1 E W " log P W  j n | H ′ n − 1  P W ∪{ i }  j n | H ′ n − 1  # = ⌈ Λ T ⌉ X n =1 E W  KL  P W ( j n | H ′ n − 1 )   P W ∪{ i } ( j n | H ′ n − 1 )  = E W " ⌈ Λ T ⌉ X n =1 KL  P W ( j n | S n , p n )   P W ∪{ i } ( j n | S n , p n )  # ≤ E W " ⌈ Λ T ⌉ X n =1 c 13 ϵ 2 X z U ∈ S n 1 { i ∈ U } # = c 13 ϵ 2 E W [ N i ] . EC.13.2 Case I I EC.13.2.1 In termediate Problem Consider the case v = 0, the problem of interest reduces to the follo wing: inf π sup θ , v R π ( T ; v , θ , z ) ≥ inf π sup θ R π ( T ; 0 , θ , z ) . In this setting, the problem can be formulated as follows: e-companion to Author: Poisson MNL ec39 Inst ance EC.13.5. (Assortmen t-dep endent arriv als with MNL demand) The seller has N distinct pro d- ucts, and each pro duct i ∈ [ N ] is asso ciated with a ﬁxed feature vector z i ∈ R d x . F or each p erio d t ∈ [ T ], the seller c ho oses an assortment S t ⊆ [ N ] with | S t | = K and sets prices of products in the assortment. F or simplicity , w e use a price v ector p t = ( p 1 t , p 2 t , . . . , p N t ) ∈ [ p l , p h ] N to represent the pricing decision (and set prices corresp onding to the out-of-assortment pro ducts to b e p h ). Customer arriv als in p erio d t follows P oisson distribution with mean Λ t = Λ exp  θ ⊤ x ( S t )  , where Λ > 0 is kno wn, θ ∈ R d x is an unkno wn parameter, and x ( S t ) ∈ R d x is a known function of the assortmen t. Under the MNL model (price-only utilities), a customer arriving in p erio d t c ho oses i ∈ S t with probability exp( − p it ) 1+ P j ∈ S t exp( − p j t ) , and mak es no purchase with the remaining probability . Let S ∗ ( θ ) ∈ arg max | S | = K exp  θ ∗⊤ x ( S )  and deﬁne Λ ∗ := Λ exp  θ ∗⊤ x ( S ∗ )  . Note that the p er-customer rev en ue only dep ends on prices. Therefore, r ( S t , p t ) ≤ sup S, p r ( S, p ) = sup p r ( S t , p ) := r ∗ , which further giv es R π ( T ; 0 , θ , z ) = T X t =1 Λ ∗ r ∗ − T X t =1 E (Λ t r ( S t , p t )) ≥ T X t =1 (Λ ∗ − E Λ t ) r ∗ . (EC.101) By the deﬁnition of p er-customer reward, w e ha v e r ∗ = sup p ∈ P X i ∈ S t exp( − p it ) 1 + P j ∈ S t exp( − p j t ) ≥ K p l exp( − p h ) 1 + K exp( − p l ) . (EC.102) Plugging in the low er b ound of r ∗ bac k to Inequality (EC.101) gives R π ( T ; 0 , θ , z ) ≥ T X t =1 (Λ ∗ − E Λ t ) K p l exp( − p h ) 1 + K exp( − p l ) . (EC.103) The problem then reduces to a problem with constan t p er-customer-reven ue but assortment-dependent arriv als. Inst ance EC.13.6. (Assortmen t-dep endent arriv als with constant per-customer reven ue) W e consider N pro ducts ov er T p erio ds. In each p erio d t , the seller selects an assortment S t ⊆ [ N ] with | S t | = K . Customer arriv als in perio d t follows Poisson distribution with mean Λ t = Λ exp  θ ∗⊤ x ( S t )  , where Λ > 0 is known, θ ∗ ∈ R d x is unkno wn, and x ( S t ) ∈ R d x is a known function of the assortmen t. The expected reven ue for each customer is r := K p l exp( − p h ) 1 + K exp( − p l ) . A dversarial c onstruction. Adversarial construction Suppose min { d x − 2 , N } ≥ K , Λ ≥ 1. Let ¯ K , d x ∈ Z ++ satisfy ¯ K = min nj d x − K +1 3 k , K o , d = d x − K + ¯ K , and ﬁx a small parameter ϵ ∈  0 , 1 / √ K  . F or each W ⊆ [ d ] with | W | = ¯ K , deﬁne θ W ∈ R d x co ordinatewise b y [ θ W ] i =      ϵ, i ∈ W , 0 , i ∈ [ d ] \ W , ϵ, i ∈ { d + 1 , . . . , d x } . Clearly , ∥ θ W ∥ 2 ≤ 1, satisfying the b oundedness required in Assumption 4. Collect these in Θ = { θ W : W ∈ W ¯ K } = { θ W : W ⊆ [ d ] , | W | = ¯ K } , where W ¯ K is the family of all ¯ K -subsets of [ d ]. Finally , deﬁne the assortment-dependent vector x ( S t ) ∈ R d x b y [ x ( S t )] i =    ¯ x √ K , i ∈ S t ∩ [ d x ] , 0 , i / ∈ S t ∩ [ d x ] . Since ∥ x ∥ 2 ≤ ¯ x , the constructed assortment-dependent vector x satisfy the b oundedness condition in Assump- tion 5. W e use E W and P W to represen t the la w parameterized b y θ W and the p olicy π , resp ectively . The follo wing lemma establishes a low er b ound for Λ ∗ − Λ t b y comparing S t with W . ec40 e-companion to Author: Poisson MNL Lemma EC.30. We have Λ ∗ − Λ t ≥ Λ ¯ xϵ √ K  ¯ K −   S t ∩ W    . Next, we establish a low er bound on the cum ulative regret. Deﬁne e N i := P T t =1 1 { i ∈ S t } . By Lemma EC.30, it follo ws that, ∀ W ∈ W ¯ K , R π ( T ; 0 , θ , z ) ≥ E W T X t =1  Λ ∗ − Λ t  r ≥ E W T X t =1 Λ ¯ xϵ √ K  ¯ K −   S t ∩ W    r = Λ ¯ xϵ √ K  ¯ K T − X i ∈ W E W [ e N i ]  r . Denote W ( i ) ¯ K := { W ∈ W ¯ K : i ∈ W } and W ¯ K − 1 := { W ⊆ [ d x ] : | W | = ¯ K − 1 } . Let P θ b e the uniform prior on { θ W : W ∈ W ¯ K } ; that is, sample W ∼ Unif ( W ¯ K ) and set θ = θ W . Therefore, E θ ∼ P θ h R π ( T ; 0 , θ , z ) i ≥ Λ ¯ xϵ √ K  ¯ K T − 1 |W ¯ K | X W ∈W ¯ K X i ∈ W E W [ e N i ]  r = Λ ¯ xϵ √ K  ¯ K T − 1 |W ¯ K | d X i =1 X W ∈W ( i ) ¯ K E W [ e N i ]  r = Λ ¯ xϵ √ K  ¯ K T − 1 |W ¯ K | X W ∈W ¯ K − 1 X i / ∈ W E W ∪{ i } [ e N i ]  r ≥ Λ ¯ xϵ √ K  ¯ K T − |W ¯ K − 1 | |W ¯ K | max W ∈W ¯ K − 1 X i / ∈ W E W ∪{ i } [ e N i ]  r = Λ ¯ xϵ √ K  ¯ K T − |W ¯ K − 1 | |W ¯ K | max W ∈W ¯ K − 1 X i / ∈ W h E W ∪{ i } [ e N i ] − E W [ e N i ] + E W [ e N i ] i  r . Since P i / ∈ W E W [ e N i ] ≤ P d x i =1 E W [ e N i ] ≤ ¯ K T and |W ¯ K − 1 | |W ¯ K | = ( d x ¯ K − 1 ) ( d x ¯ K ) = ¯ K ( d x − ¯ K +1) ≤ 1 / 3 , we ﬁnally get E θ ∼ P θ h R π ( T ; 0 , θ , z ) i ≥ Λ ¯ xϵ √ K  2 3 ¯ K T − ¯ K d − ¯ K + 1 max W ∈W ¯ K − 1 X i / ∈ W   E W ∪{ i } [ e N i ] − E W [ e N i ]    r . Pinsker’s ine quality. Finally , we fo cus on upp er bounding   E W ∪{ i }  e N i  − E W  e N i    for an y W ∈ W ¯ K − 1 :    E W  e N i  − E W ∪{ i }  e N i     ≤ T X j =0 j    P W  e N i = j  − P W ∪{ i }  e N i = j     ≤ T · T X j =0    P W  e N i = j  − P W ∪{ i }  e N i = j     ≤ 2 T ·   P W − P W ∪{ i }   TV ≤ T · q 2KL( P W ∥ P W ∪{ i } ) . where ∥ P W − P W ∪{ i } ∥ TV is the total v ariation distance and KL( P W ∥ P W ∪{ i } ) is the Kullbac k-Leibler diver- gence. The follo wing lemma b ounds the KL div ergence. Lemma EC.31. F or any W ∈ W ¯ K − 1 and i ∈ [ d x ] , KL( P W ∥ P W ∪{ i } ) ≤ exp(2 ¯ x )Λ · ¯ x 2 K E W [ e N i ] · ϵ 2 . Using Lemma EC.31 and the b ound ab o ve, w e obtain E θ ∼ P θ h R π ( T ; 0 , θ , z ) i ≥ Λ ¯ xϵ √ K  2 ¯ K T 3 − ¯ K T d x − ¯ K + 1 d x X i =1 r 2 exp(2 ¯ x )Λ ¯ x 2 K E W [ e N i ] ϵ 2  r . Applying the Cauch y-Sch warz inequality and since P d x i =1 E W [ e N i ] ≤ K T , d x X i =1 r 2 e 2 Λ ¯ x 2 K E W [ e N i ] ϵ 2 ≤ p 2 exp(2 ¯ x )Λ ¯ x √ K v u u t d x d x X i =1 E W [ e N i ] ϵ ≤ p 2 exp(2 ¯ x ) d x Λ T ¯ x ϵ. Th us, w e obtain E θ ∼ P θ R π ( T ; 0 , θ , z ) ≥ Λ ¯ xϵ √ K  2 ¯ K T 3 − ¯ K T d x − ¯ K + 1 p 2 exp(2 ¯ x ) ¯ x 2 d x Λ T ϵ 2  r . e-companion to Author: Poisson MNL ec41 • If Λ T ≥ 2( d x − K +1) K 9 exp(2¯ x ) ¯ x 2 , setting ϵ = d x − ¯ K +1 3 exp(¯ x ) ¯ x q 2 d x Λ T , w e hav e ϵ ∈  0 , 1 / √ K  . Th us, we obtain E θ ∼ P θ R π ( T ; 0 , θ , z ) ≥ ( d x − ¯ K + 1) ¯ K 9 √ 2 exp( ¯ x ) ¯ x √ d x K √ Λ T r ≥ √ d x ¯ K 12 √ 2 exp( ¯ x ) ¯ x √ K √ Λ T r ≥ 1 12 √ 2 exp( ¯ x ) ¯ x √ Λ T r min { p d x K , r 4( d x − K − 1) 3 K ( d x − K − 1) 3 } . • If Λ T < d x − K +1 12 exp(2¯ x ) ¯ x 2 K , setting ϵ = 1 / √ K , we hav e E θ ∼ P θ R π ( T ; 0 , θ , z ) ≥ Λ ¯ xϵ √ K ¯ K T 3 ≥ ¯ x K min nj d x − K + 1 3 k , K o Λ T 3 . Com bining t w o cases, we hav e E θ ∼ P θ R π ( T ; 0 , θ , z ) ≥ c 9 , 6 √ Λ T min nj d x − K + 1 3 k , K o , for some p ositive constant c 9 , 3 only depends on K , p l , p h , ¯ x, d x . Besides, if K ≤ d z + 1 4 , w e hav e d = d z , ¯ K = K and lim inf T →∞ E θ ∼ P θ R π ( T ; 0 , θ , z ) √ d x Λ T ≥ K √ K p l exp( − p h ) 12 √ 2 exp( ¯ x ) ¯ x (1 + K exp( − p l )) := c l, 3 . (EC.104) Remark EC.8. When p l = Ω(log K ) and p h = Ω(log K ), we ha ve c l, 3 = Ω( √ K log K exp( ¯ x ) ¯ x ). When p l = Ω(1) and p h = Ω(1), we hav e c l, 3 = Ω( √ K exp( ¯ x ) ¯ x ). EC.13.2.2 Pro of of Lemma EC.30 W e ha ve Λ ∗ − Λ t = Λ(exp( θ ⊤ W x ( W )) − exp( θ ⊤ W x ( S t ))) ≥ Λ( θ ⊤ W x ( W ) − θ ⊤ W x ( S t )) = Λ ¯ xϵ √ K  ¯ K −   S t ∩ W    . EC.13.2.3 Pro of of Lemma EC.31 Note that for tw o P oisson distribution P ( λ 1 ) , P ( λ 2 ), we ha v e D KL ( P ( λ 1 ) ∥ P ( λ 2 )) = λ 1 log  λ 1 λ 2  − λ 1 + λ 2 ≤ λ 1 λ 1 − λ 2 λ 2 − λ 1 + λ 2 = ( λ 1 − λ 2 ) 2 λ 2 . Note that Λ exp  θ ⊤ W ∪{ i } x ( S t ))  − Λ exp  θ ⊤ W x ( S t )  ≤ exp( ¯ x )Λ ϵ ¯ x √ K 1 { i ∈ S t } , Λ exp  θ ⊤ W ∪{ i } x ( S t )  ≥ Λ . Substituting λ 1 , λ 2 , w e hav e KL  P W ( · | S t , p t )   P W ∪{ i } ( · | S t , p t )  ≤  exp( ¯ x )Λ ¯ x √ K ϵ 1 { i ∈ S t }  2 Λ = exp(2 ¯ x )Λ ¯ x 2 K ϵ 2 1 { i ∈ S t } . Finally , summing ov er t = 1 , . . . , T and applying the chain rule for KL div ergence (together with the to wer prop ert y) yields KL  P W ∥ P W ∪{ i }  = E W " log Q T t =1 P W  · | H t − 1  Q T t =1 P W ∪{ i }  · | H t − 1  # = E W " T X t =1 log P W  · | H t − 1  P W ∪{ i }  · | H t − 1  # = T X t =1 E W " log P W  · | H t − 1  P W ∪{ i }  · | H t − 1  # = T X t =1 E W  KL  P W ( · | H t − 1 )   P W ∪{ i } ( · | H t − 1 )  = E W " T X t =1 KL  P W ( · | S t , p t )   P W ∪{ i } ( · | S t , p t )  # ≤ E W " T X t =1 exp(2 ¯ x )Λ ¯ x 2 K ϵ 2 1 { i ∈ S t } # = exp(2 ¯ x )Λ ¯ x 2 K ϵ 2 E W [ N i ] . ec42 e-companion to Author: Poisson MNL EC.13.3 Conclusion (aggregation o v er the three instances) F rom the three instances established ab ov e, w e obtain the following three claims. 1. If min { d z − 2 , N } ≥ K and Λ ≥ 1, then there exists a corresp onding problem instance such that inf π sup v , θ , z R π ( T ; v , θ , z ) ≥ c 9 , 4 min ( 1 , s ¯ v 2 d z ) √ Λ T . 2. If Λ ≥ 1 and log N − d z K ≥ 8 log 2 − 11 4 log 3, then there exists a corresp onding problem instance such that inf π sup v , θ , z R π ( T ; v , θ , z ) ≥ c 9 , 5 s log  N − d z K  min  ¯ v √ d z , 1  √ Λ T . 3. If min { d x − 2 , N } ≥ K and Λ ≥ 1, then there exists a corresp onding problem instance such that inf π sup v , θ , z R π ( T ; v , θ , z ) ≥ c 9 , 6 min nj d x − K + 1 3 k , K o √ Λ T . Besides, deﬁne the following ev ents (conditions): E 4 := { min { d z − 2 , N } ≥ K , Λ ≥ 1 } , E 5 :=  Λ ≥ 1 , log N − d z K ≥ 8 log 2 − 11 4 log 3  , E 6 := { min { d x − 2 , N } ≥ K , Λ ≥ 1 } . Consequen tly , if at least one of E 4 , E 5 , E 6 holds, then there exists a problem instance such that inf π sup v , θ , z R π ( T ; v , θ , z ) ≥ c 9 √ Λ T , where w e set c 9 := max ( 1 {E 4 } c 9 , 4 min n 1 , p ¯ v 2 /d z o , 1 {E 5 } c 9 , 5 s log  N − d z K  min  ¯ v √ d z , 1  , 1 {E 6 } c 9 , 6 min nj d x − K + 1 3 k , K o ) . By construction, c 9 > 0 dep ends only on d z , d x , K , p l , p h , and ¯ x . Besides, if we denote E 1 =  K ≤ d z + 1 4  , E 2 = ( d z ≤ $ log  ( N − d z ) /K  − 1 4 log 3 H (1 / 4) %) , E 3 =  K ≤ d z + 1 4  . W e ha ve lim inf T →∞ R π ( T ; v , θ , z ) √ Λ T = max  1 {E 1 } 1 {E 4 } c 10 √ d z K 120 √ 2 c 11 , 1 {E 2 } 1 {E 5 } c 12 d z 120 √ 2 c 13 , 1 {E 3 } 1 {E 6 } K p l exp( − p h ) 12 √ 2 exp( ¯ x ) ¯ x (1 + K exp( − p l ))  = max { 1 {E 1 } 1 {E 4 } c l, 1 , 1 {E 2 } 1 {E 5 } c l, 2 , 1 {E 3 } 1 {E 6 } c l, 3 } .

Poisson-MNL Bandit: Nearly Optimal Dynamic Joint Assortment and Pricing with Decision-Dependent Customer Arrivals

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment