A Revealed Preference Framework for AI Alignment

A Rev ealed Preference F ramew ork for AI Alignmen t Elc hin Suleymano v ∗ Marc h 31, 2026 Abstract Human decision makers increasingly delegate choices to AI agen ts, raising a natural question: do es the AI implemen t the human principal’s preferences or pursue its own? T o study this question using rev ealed preference tec hniques, I in tro duce the Luc e Alignment Mo del , where the AI’s c hoices are a mixture of t wo Luce rules, one reﬂecting the human’s preferences and the other the AI’s. I sho w that the AI’s alignment (similarit y of h uman and AI preferences) can b e generically iden tiﬁed in t wo settings: the lab oratory setting, where both h uman and AI choices are observed, and the ﬁeld setting, where only AI c hoices are observ ed. JEL Classifica tion: D01, D11, D83 Keywords: AI alignment, sto chastic choice, rev ealed preference, Luce mo del ∗ Departmen t of Economics, Mitc h Daniels School of Business, Purdue Universit y , W est Lafa yette, Indiana, USA. Email: esuleyma@purdue.edu . 1 In tro duction Artiﬁcial intelligence (AI) agen ts are lik ely to play an increasing role in making choices on b ehalf of h uman users and in reshaping ev eryday decision-making ( Allouah et al. , 2025 ; Immorlica et al. , 2024 ). Historically , man y AI systems pla yed the role of a tec h- nology that assisted rather than replaced human choice. F or example, recommender systems ﬁlter, rank, and p ersonalize the set of alternatives presen ted to a user rather than making a selection on the user’s b ehalf ( Adomavicius and T uzhilin , 2005 ). Re- cen t adv ances in agentic AI, together with improv ed memory and p ersonalization abilities, hav e made fuller delegation of choice increasingly feasible, allowing human decision makers to rely on AI agen ts not only to screen among av ailable options but also to mak e the ﬁnal choice. This mak es it natural to model AI agents not just as a tec hnology a v ailable to human decision mak ers, but as economic agents in their o wn righ t ( Immorlica et al. , 2024 ; Chen et al. , 2024 ). This shift from assisted c hoice to delegated choice raises a natural economic ques- tion: when an AI agen t c ho oses on b ehalf of a human principal, whose preferences do es it implemen t? Do es it act fully in accordance with the principal’s preferences, or do es it instead pursue distinct ob jectives that div erge from them? This question lies at the core of the AI alignment literature, whic h broadly seeks to ensure that AI systems b eha ve in line with h uman users’ inten tions ( Leike et al. , 2018 ; Ji et al. , 2024 ). Muc h of the existing alignmen t literature is motiv ated b y catastrophic risks, harmful b eha viors, and the loss of con trol o v er increasingly capable AI systems ( Amo dei et al. , 2016 ; Hendryc ks et al. , 2023 ; Ji et al. , 2024 ). The main fo cus of this paper is what can b e considered a narrow er but economically central question: whether a p ersonalized AI agen t making choices on b ehalf of a h uman principal is in fact implemen ting the principal’s preferences. AI agen ts ma y p erform well in safet y ev aluations designed to detect catastrophic risks or harmful behaviors but still mak e misaligned c hoices in delegated choice en vironments. The AI alignmen t literature has largely approac hed the problem through three main channels. First, there are metho ds that attempt to align an AI system during its training phase. These include co op erative inv erse reinforcemen t learning ( Hadﬁeld- Menell et al. , 2016 ), reinforcement learning from human feedback ( Christiano et al. , 2017 ), scalable reward-modeling approac hes more generally ( Leike et al. , 2018 ), and constitutional AI ( Bai et al. , 2022 ). Second, there are ex-post ev aluation metho ds 1 that seek evidence of misalignment through b enc hmarks and b ehavioral tests ( P erez et al. , 2023 ; Ji et al. , 2024 ). Third, there are in terpretability-based approac hes that attempt to understand the mo del’s in ternal ob jectiv es and reasoning pro cesses by analyzing the inner structures of neural netw orks ( R¨ auk er et al. , 2023 ). This pap er adopts a complementary approach grounded in rev ealed preference analysis. Applied to h uman agents, the rev ealed preference approac h attempts to infer preferences from observ ed c hoices rather than from the processes inside the h uman brain. T reating AI agen ts as economic agen ts, we can extend the same approac h to c hoices made by an AI. In particular, the goal is to infer the extent of AI misalignment b y analyzing the sto c hastic choice data it generates while acting on b ehalf of its principal. The setup in this pap er is as follows. An analyst observ es the stochastic c hoice data ρ AI generated by an AI agent. That is, the AI faces v arying men us S rep eatedly and mak es choices from them on b ehalf of some h uman user. I consider t w o natural settings. In the lab oratory setting, the analyst also observes the human principal’s sto c hastic choices ρ H . F or example, ρ H ma y b e elicited directly from the h uman principal or generated syn thetically for the purp ose of guiding the AI. In the ﬁeld setting, only ρ AI is observed. In b oth settings, the aim of the analyst is to reco v er t wo key ob jects of in terest: the degree to which the AI’s intrinsic preferences match the h uman principal’s preferences ( alignment ) and the extent to whic h the AI defers to the h uman principal ( c omplianc e ). These tw o are distinct concepts. F or example, a misaligned AI that is highly compliant may still generate choice data that closely matc hes the h uman principal’s. Alternativ ely , a p erfectly aligned AI will replicate the human principal’s choices regardless of its compliance level. T o formalize this distinction, I in tro duce the Luc e Alignment Mo del (LAM) , where the AI’s c hoices from a men u S are mo deled as ρ AI ( x, S ) = α · u ( x ) P y ∈ S u ( y ) + (1 − α ) · v ( x ) P y ∈ S v ( y ) . (1) Here, u represen ts the h uman principal’s utility , v represents the AI’s intrinsic utilit y , and α ∈ [0 , 1] is the compliance parameter that captures the extent to which the AI defers to the human principal. The comparison of u and v in turn captures the AI’s alignmen t with the h uman principal. The ﬁrst term in equation ( 1 ) can b e in terpreted as the human principal’s sto chastic choice rule ρ H , and the second term, denoted ρ A , as the AI agen t’s autonomous stochastic choice rule. The model can then equiv alently 2 b e written as ρ AI ( x, S ) = α · ρ H ( x, S ) + (1 − α ) · ρ A ( x, S ) . In b oth settings, the goal of the analyst is to infer α , u , and v from observ ed sto chastic c hoice data. The main results of the pap er address the identiﬁcation of alignment and compli- ance in b oth settings. In the lab oratory setting, where b oth ρ AI and ρ H are observ ed, I show that all the parameters of interest can b e identiﬁed as long as the AI’s c hoice data violates the Indep endence of Irrelev an t Alternativ es (I IA) prop erty . The I IA prop ert y , which is the key implication of the Luce mo del, requires that the relative c hoice probabilities of any t w o alternatives are constant across men us ( Luce , 1959 ). When the AI is p erfectly compliant or p erfectly aligned, its choices satisfy I IA. By con trast, IIA violations rev eal the presence of an intrinsic AI utilit y that is distinct from the h uman principal’s utilit y . I in tro duce instabilit y measures that capture devi- ations from the I IA prop erty and pro vide a closed-form expression for the compliance parameter α using these measures. Using the recov ered compliance parameter, I then sho w that b oth the human principal’s utility u and the AI’s utilit y v can be iden- tiﬁed up to scale normalization. I also provide an axiomatic c haracterization of the mo del in this setting, identifying the behavioral conditions that the pair ( ρ AI , ρ H ) m ust satisfy in order to b e consistent with LAM. In the ﬁeld setting, where only ρ AI is observed, there is a fundamen tal obstacle to separately identifying the h uman’s and the AI’s utilities. Namely , b oth ( u, v , α ) and ( v , u, 1 − α ) generate the same stochastic c hoice data. Th us, iden tiﬁcation is only p ossible up to a lab el sw ap. Nev ertheless, I pro vide a constructiv e pro of showing that when there are at least four alternativ es, the underlying utilit y pair is generi- cally iden tiﬁed up to this swap. Stated alternatively , the distribution o v er utilities is generically identiﬁed but the lab els are not. F rom an analyst’s p ersp ectiv e, this is suﬃcien t to reco ver the degree of misalignmen t, even without knowing which utility b elongs to the human and which b elongs to the AI. In terms of compliance, the re- sult implies that the analyst cannot distinguish α from 1 − α . Hence, compliance is only iden tiﬁable up to reﬂection ab out 1 / 2 in the ﬁeld setting unless one is willing to assume α < 1 / 2 (lo w compliance) or α > 1 / 2 (high compliance). LAM draws on a long tradition in stochastic choice theory . When α ∈ { 0 , 1 } , the model reduces to the Luce rule, one of the foundational sto chastic choice mo dels. When α ∈ (0 , 1), the mo del b ecomes a mixed multinomial logit (MMNL, also kno wn 3 as the random co eﬃcien ts m ultinomial logit) mo del with binary support, or simply 2-MNL. More generally , the MMNL mo del was introduced by Boyd and Mellman ( 1980 ) and Cardell and Dun bar ( 1980 ). McF adden and T rain ( 2000 ) sho w that an y c hoice mo del deriv ed from random utility maximization can b e approximated by an MMNL model, while Saito ( 2018 ) provides axiomatic foundations (see also Lu and Saito , 2022 ; Chang et al. , 2023 ). F o x et al. ( 2012 ) sho w that the distribution o ver random co eﬃcien ts in MMNL is uniquely iden tiﬁed under suﬃciently rich v ariation in product c haracteristics. In con trast, in an abstract domain with men u v ariation, the identiﬁcation problem in the mixed logit mo del has mostly b een studied within the statistics and computer science literatures. Chieric hetti et al. ( 2018 ) study the iden tiﬁabilit y problem in 2- MNL assuming a uniform mixing w eigh t. T ang ( 2020 ) allows for an arbitrary mixing w eight and pro vides a generic iden tiﬁcation result. Zhang et al. ( 2022 ) extend this result b y sho wing that observing menus with three alternativ es is suﬃcien t for generic iden tiﬁcation. The result in the ﬁeld setting of this paper pro vides an alternativ e approac h to the identiﬁcation problem in 2-MNL: unlike Zhang et al. ( 2022 ), the iden tiﬁcation is constructive, and unlike Chierichetti et al. ( 2018 ) and T ang ( 2020 ), the constructiv e pro cedure does not require an y prior kno wledge of the mixing weigh t. Within decision theory literature, the closest preceden t to LAM is Cham b ers et al. ( 2023 ), who study a mo del of b eha vioral peer inﬂuence. In their mo del, there are t wo agen ts and eac h agen t’s sto c hastic c hoice can b e written as a mixture of the agent’s o wn and the other agen t’s Luce rule. Imp ortan tly , the mixing w eights in Chambers et al. ( 2023 ) dep end on the underlying utilities of the t w o agen ts and v ary across men us, whic h makes their setup more suitable to study inﬂuence rather than AI alignmen t and compliance. Cham b ers et al. ( 2023 ) also men tion a mo diﬁcation of their mo del with menu-independent weigh ts as a p otential alternative but note that unique identiﬁcation in this alternativ e version is not guaran teed. Lastly , Manzini and Mariotti ( 2018 ) study dual random utilit y maximization (dR UM), where the agen t maximizes one of t w o deterministic linear orders with a ﬁxed probabilit y . dR UM can b e viewed as a limiting case of LAM where b oth ρ H and ρ A are generated by deterministic utility maximization. The rest of the pap er pro ceeds as follows. Section 2 in tro duces the mo del. Sec- tion 3 presents identiﬁcation and characterization results in the lab oratory setting. Section 4 presen ts iden tiﬁcation results in the ﬁeld setting. Section 5 concludes. 4 2 The Mo del Let X b e a ﬁnite set of alternatives and denote b y X the collection of all non-empty subsets of X (men us). It is assumed that | X | = N ≥ 3. A sto chastic choic e function is a mapping ρ : X × X → [0 , 1] suc h that ρ ( x, S ) > 0 only if x ∈ S and P x ∈ S ρ ( x, S ) = 1 for all S ∈ X . ρ ( x, S ) denotes the probabilit y that x is chosen when the agen t is faced with the men u S repeatedly . The setup is as follo ws. The analyst observes the sto chastic c hoices of an AI agen t, denoted ρ AI , that acts on b ehalf of a human principal. The sto c hastic choices of the h uman principal, denoted ρ H , ma y or may not be observed. I consider t wo empirical settings. In the lab or atory setting , b oth ρ AI and ρ H are observ ed. This corresp onds to an exp erimen tal design where the human’s and the AI’s c hoices are elicited sequen tially , with the human’s choices serving as a guide for the AI. Alterna- tiv ely , the h uman’s c hoices might b e generated synthetically for the purp oses of the exp erimen t. In the ﬁeld setting , only the AI’s choices are observ ed. This corresponds to the scenario where the h uman principal fully delegates the decision-making pro cess to the AI. The main goal of this pap er is to provide a modeling framework that can b e used to analyze the alignmen t of AI agents with their h uman principals’ preferences. T o this end, let ρ A denote the h yp othetical c hoices of an AI agent that acts autonomously without any h uman principal. I assume that in this autonomous setting, the AI’s c hoices follo w the Luce rule with some underlying utility function v : X → R ++ : ρ A ( x, S ) = v ( x ) P y ∈ S v ( y ) . The function v can be in terpreted as the in trinsic utilit y of the AI agen t. Imp ortan tly , ρ A is not observ ed as the AI is alw ays assumed to act on b ehalf of some human principal. Let u : X → R ++ denote the utilit y function of the human principal. The princi- pal’s choices are also assumed to b e consistent with the Luce rule given b y ρ H ( x, S ) = u ( x ) P y ∈ S u ( y ) . The AI’s c omplianc e p ar ameter α ∈ [0 , 1] reﬂects the probability that it ignores its 5 o wn utility and follo ws the h uman principal. With probability 1 − α , it ignores the principal and acts autonomously . The AI’s observ ed sto chastic c hoices are therefore giv en b y ρ AI ( x, S ) = α · ρ H ( x, S ) + (1 − α ) · ρ A ( x, S ) . There are t wo key ob jects the analyst w ould lik e to identify: (i) alignment - to what extent the utilities u and v are aligned, (ii) c omplianc e - the v alue of α . I will discuss ho w each of these can b e identiﬁed b oth in the lab oratory and the ﬁeld settings. The following deﬁnition summarizes the mo del. Deﬁnition 1 (Luce Alignmen t Mo del) . A p air of sto chastic choic e functions ( ρ AI , ρ H ) is c onsistent with the Luce Alignmen t Mo del (LAM) if ther e exist utility functions u, v : X → R ++ and a c omplianc e p ar ameter α ∈ [0 , 1] such that for al l S ∈ X and x ∈ S , ρ AI ( x, S ) = α u ( x ) P y ∈ S u ( y ) + (1 − α ) v ( x ) P y ∈ S v ( y ) and ρ H ( x, S ) = u ( x ) P y ∈ S u ( y ) . (2) The tuple ( u, v , α ) is c al le d a LAM r epr esentation of ( ρ AI , ρ H ) . If only ρ AI is observe d, then it is said to b e c onsistent with LAM if it satisﬁes the expr ession in e quation ( 2 ) for some u, v , and α . LAM nests sev eral imp ortant special cases in terms of observ ed AI b eha vior: • Aligned AI ( v = λu for some λ > 0). The AI agen t’s preferences exactly matc h the h uman principal. In this case, the degree of compliance b ecomes unimp ortan t. The AI mak es exactly the same probabilistic choices as the hu- man. • Compliant AI ( α = 1). The AI agen t’s preferences p oten tially diﬀer from the human principal’s. How ev er, since compliance is p erfect, the AI makes the exact same probabilistic choices the human w ould mak e. • Misaligned AI ( v  = λu for all λ > 0). The AI agen t’s preferences diﬀer from the h uman principal’s and it may not b e fully complian t. This is the base case where unco v ering the exten t of misalignmen t and non-compliance b ecomes imp ortan t. 6 • Autonomous AI ( α = 0). The AI p oten tially has its o wn distinct preferences and op erates fully autonomously ignoring its h uman principal. • Adversarial AI ( v = λu − 1 for some λ > 0). The AI agen t’s preferences are exactly the opp osite of the human’s, with the top ordinally ranked alternativ e b y the human b eing ranked as the w orst b y the AI. While observ ed c hoices are exactly the same in the ﬁrst tw o cases where the AI is either p erfectly aligned or p erfectly compliant, they represen t the b oundary cases of scenarios with completely diﬀerent implications. In the ﬁrst case where we start with p erfect alignment, a small c hange in alignmen t will lik ely not hav e a big inﬂuence on the h uman principal’s welfare regardless of the compliance level. In the second case where w e start with p erfect compliance, if the degree of misalignment is v ery high, a small decrease in compliance lev els can hav e large welfare eﬀects. This mak es it imp ortant to unco ver alignmen t and compliance separately in the base third case with misaligned AI. The fourth case models a fully autonomous AI that ignores its h uman principal, while the last case represen ts a version of extreme misalignment that provides a further structure to the model and serves as a natural adv ersarial b enc hmark. 3 Lab oratory Data This section analyzes the lab oratory setting, where b oth the AI’s choices ρ AI and the h uman principal’s c hoices ρ H are observ able. First, I discuss the iden tiﬁcation of the AI’s alignment and compliance. I then pro vide an axiomatic c haracterization of the mo del using the pair ( ρ AI , ρ H ) as the observed primitiv e. 3.1 Iden tiﬁcation Consider a pair ( ρ AI , ρ H ) consistent with LAM. Under LAM, the h uman principal’s c hoices are consisten t with the Luce rule. 1 A sto chastic c hoice rule ρ consistent with 1 Since the laboratory setting can utilize synthetically generated choice data from a hypothetical h uman principal, assuming these choices follow the Luce rule is not a substantiv e restriction. In the ﬁeld setting, on the other hand, the human principal’s c hoices are unobserved and therefore not explicitly mo deled. The implicit mo deling assumption in this setting is that the AI p erceiv es its h uman principal as a Luce agent with utilit y u . 7 the Luce rule satisﬁes the Indep endenc e of Irr elevant Alternatives (IIA) property of Luce ( 1959 ): ρ ( x, S ) ρ ( y , S ) = ρ ( x, T ) ρ ( y , T ) for all x, y ∈ S ∩ T and S, T ∈ X . As the next prop osition shows, the AI’s c hoices will generally violate I IA unless u and v are p erfectly aligned ( v = λu for some λ > 0), or the AI is either autonomous ( α = 0) or p erfectly complian t ( α = 1). Prop osition 1 (IIA Violation) . L et ρ AI b e c onsistent with LAM. Then ρ AI satisﬁes IIA if and only if α ∈ { 0 , 1 } or v = λu for some λ > 0 . Pr o of. If α = 0 or α = 1, then ρ AI reduces to the Luce rule with the utilit y functions v or u , resp ectively , whic h satisﬁes I IA. Alternativ ely , if v = λu for some λ > 0, then ρ A = ρ H , and hence ρ AI = ρ H , which satisﬁes I IA. Con versely , supp ose α ∈ (0 , 1) and v  = λu for all λ > 0. Deﬁne r ( a ) = u ( a ) /v ( a ) for each a ∈ X . Since v  = λu , the function r is not constan t. Pick x, y ∈ X with r ( x )  = r ( y ) and an y z / ∈ { x, y } . W e need to sho w that I IA is violated for some tuple ( x, y , S, T ) with x, y ∈ S ∩ T . T o this end, ﬁrst, observ e that the conditional probabilit y of a relativ e to b in the choice set { a, b, c } can b e written as a mixture of the choice probabilities of the autonomous AI and the h uman principal in the c hoice set { a, b } , where the mixing co eﬃcien t dep ends on c . That is, for an y a, b, c ∈ X , ρ AI ( a, { a, b, c } ) ρ AI ( a, { a, b, c } ) + ρ AI ( b, { a, b, c } ) = β ( c ) · u ( a ) u ( a ) + u ( b ) + (1 − β ( c )) · v ( a ) v ( a ) + v ( b ) , where β ( c ) = α · u ( a )+ u ( b ) u ( a )+ u ( b )+ u ( c ) α · u ( a )+ u ( b ) u ( a )+ u ( b )+ u ( c ) + (1 − α ) · v ( a )+ v ( b ) v ( a )+ v ( b )+ v ( c ) . By deﬁnition, ρ AI ( a, { a, b } ) ρ AI ( a, { a, b } ) + ρ AI ( b, { a, b } ) = α · u ( a ) u ( a ) + u ( b ) + (1 − α ) · v ( a ) v ( a ) + v ( b ) . Comparing this to the conditional probabilit y in { a, b, c } , notice that b oth are mix- 8 tures of the exact same tw o terms. Hence, when c is remo ved from { a, b, c } , an I IA violation o ccurs if and only if b oth the mixing w eights and the mixture comp onents in these conditional probabilities are distinct: that is, β ( c )  = α and u ( a ) /u ( b )  = v ( a ) /v ( b ) (or, alternatively , r ( a )  = r ( b )). Notice that β ( c ) = α if and only if ( u ( a ) + u ( b )) / ( v ( a ) + v ( b )) = ( u ( a ) + u ( b ) + u ( c )) / ( v ( a ) + v ( b ) + v ( c )), whic h holds if and only if r ( c ) = u ( c ) /v ( c ) = ( u ( a ) + u ( b )) / ( v ( a ) + v ( b )). Therefore, when c is remo ved from { a, b, c } , an I IA violation o ccurs if and only if r ( a )  = r ( b ) and r ( c )  = ( u ( a ) + u ( b )) / ( v ( a ) + v ( b )). No w, going back to the c hoice set { x, y , z } , there are tw o cases to consider. Case 1: r ( z )  = u ( x )+ u ( y ) v ( x )+ v ( y ) . Since r ( x )  = r ( y ), b y the previous argumen t, I IA is violated when z is remo v ed from { x, y , z } . Case 2: r ( z ) = u ( x )+ u ( y ) v ( x )+ v ( y ) = v ( x ) v ( x )+ v ( y ) r ( x ) + v ( y ) v ( x )+ v ( y ) r ( y ). Since r ( x )  = r ( y ) and r ( z ) is a strict w eighted a verage of r ( x ) and r ( y ), it m ust lie strictly b etw een r ( x ) and r ( y ). This implies r ( x )  = r ( z ). Now suppose y is remo ved from the choice set { x, y , z } . Since r ( x )  = r ( z ), b y the previous argumen t, IIA holds if and only if r ( y ) = u ( x )+ u ( z ) v ( x )+ v ( z ) . But then r ( y ) is a strict mixture of r ( x ) and r ( z ). This is clearly not possible as r ( z ) itself is a strict mixture of r ( x ) and r ( y ). Hence, IIA m ust be violated when y is remo ved from { x, y , z } . T o conclude, either the remov al of z or the remo v al of y (or x ) from { x, y , z } leads to an I IA violation, as desired. The implication of the prop osition is that II A violations in the AI’s sto c hastic c hoices indicate w e are in the case of misaligned ( v  = λu for all λ > 0) and partially complian t ( α ∈ (0 , 1)) AI. W e can utilize this to reco ver the parameters of the model. The identiﬁcation strategy pro ceeds in three steps. Step 1: Reco ver u from ρ H . Since the human principal’s c hoices are consistent with the Luce rule, the iden tiﬁcation of u from ρ H is standard. Letting u ( x ) = 1 for some x ∈ X , the I IA prop ert y implies that for any y ∈ X , w e m ust hav e u ( y ) = ρ H ( y , S ) ρ H ( x, S ) , where S can b e any c hoice set containing x and y . This reco v ers u up to scale normalization. 9 Step 2: Recov er α from ρ AI and ρ H . If ρ AI = ρ H , then the AI may b e either fully complian t ( α = 1) or fully aligned ( v = λu for λ > 0). W e cannot distinguish b etw een these tw o cases. Alternativ ely , if ρ AI  = ρ H and ρ AI exhibits no I IA violations, then w e can use Proposition 1 to infer that α = 0. This is b ecause w e cannot ha ve α = 1 or v = λu with ρ AI  = ρ H , which lea ves α = 0 as the only p ossibility in the prop osition. No w supp ose ρ AI violates I IA. Then, there exist tw o choice sets S, T ∈ X and a pair of alternativ es x, y ∈ S ∩ T suc h that ρ AI ( x, S ) ρ AI ( y , S )  = ρ AI ( x, T ) ρ AI ( y , T ) . The identiﬁcation of α relies on these I IA violations. T o pro ceed with the iden tiﬁca- tion, we ﬁrst need a new deﬁnition. Deﬁnition 2 (Instability Measures) . L et ρ, ρ ′ b e two sto chastic choic e functions. F or any S, T ∈ X and x, y ∈ S ∩ T : 1. The own instability of ρ is deﬁne d by ∆ xy ( S, T | ρ ) = ρ ( x, S ) ρ ( y , T ) − ρ ( y , S ) ρ ( x, T ) . 2. The cr oss instability fr om ρ to ρ ′ is deﬁne d by Γ xy ( S, T | ρ, ρ ′ ) = ρ ( x, S ) ρ ′ ( y , T ) − ρ ( y , S ) ρ ′ ( x, T ) . 3. The c omp osite instability of ρ and ρ ′ is deﬁne d by Φ xy ( S, T | ρ, ρ ′ ) = Γ xy ( S, T | ρ, ρ ′ ) + Γ xy ( S, T | ρ ′ , ρ ) . In tuitively , o wn instabilit y can be view ed as a measure of instabilit y in the stochas- tic choice ρ for the tuple ( x, y , S, T ). It tells us ho w useful the observ ations from the c hoice set S are for imputing the relative c hoice probabilities for x, y in the c hoice set T . If ∆ xy ( S, T | ρ ) = 0, then there is no I IA violation for the alternativ es x, y in the choice sets S and T , and this imputation can b e done p erfectly . The larger | ∆ xy ( S, T | ρ ) | , the less useful the observ ations from S are for imputing choices in T . The measure of cross instability tells us how useful the information from the sto c hastic c hoice ρ in the choice set S is for imputing the relative choice probabilities 10 for ρ ′ in the choice set T . F or example, if b oth ρ and ρ ′ are consistent with the Luce rule with the same underlying utilit y function, then this measure b ecomes zero. Note that the cross instabilit y measure is generally not symmetric (i.e., Γ xy ( S, T | ρ, ρ ′ ) may not b e equal to Γ xy ( S, T | ρ ′ , ρ )). The measure of comp osite instability combines the t wo cross instabilit y measures, which makes it symmetric. W e will later see that o wn and comp osite instabilities pla y an imp ortan t role in iden tiﬁcation in the lab oratory setting, while cross instability pla ys an imp ortant role in the ﬁeld setting. Remark 1. F or a sto chastic choic e function ρ satisfying p ositivity ( ρ ( x, S ) > 0 for al l x ∈ S ⊆ X ), ∆ xy ( S, T | ρ ) = 0 for al l ( x, y , S, T ) with x, y ∈ S ∩ T if and only if ρ is c onsistent with the Luc e rule. F or two Luc e sto chastic choic e functions ρ and ρ ′ with utility functions u and v , r esp e ctively, the cr oss instabilities c an b e written as Γ xy ( S, T | ρ, ρ ′ ) = u ( x ) v ( y ) − u ( y ) v ( x ) u ( S ) v ( T ) and Γ xy ( S, T | ρ ′ , ρ ) = u ( y ) v ( x ) − u ( x ) v ( y ) u ( T ) v ( S ) , wher e u ( A ) = P a ∈ A u ( a ) and v ( A ) = P a ∈ A v ( a ) for any A ∈ X . Summing the two cr oss instabilities yields the c omp osite instability: Φ xy ( S, T | ρ, ρ ′ ) = [ u ( x ) v ( y ) − u ( y ) v ( x )] · [ u ( T ) v ( S ) − u ( S ) v ( T )] u ( S ) u ( T ) v ( S ) v ( T ) . F ol lowing ar guments similar to the ones in the pr o of of Pr op osition 1 , we c an show that Φ xy ( S, T | ρ, ρ ′ ) = 0 for al l ( x, y , S, T ) with x, y ∈ S ∩ T if and only if v = λu for some λ > 0 . Henc e, zer o own instability for b oth agents establishes that e ach is c onsistent with the Luc e rule, and zer o c omp osite instability further establishes that they shar e the same underlying pr efer enc es. If ρ AI violates I IA for some tuple ( x, y , S, T ) with x, y ∈ S ∩ T , then w e must ha v e ∆ xy ( S, T | ρ AI )  = 0. The next prop osition shows that the compliance parameter α can b e recov ered b y ev aluating the ratio ∆ xy ( S, T | ρ AI ) / Φ xy ( S, T | ρ AI , ρ H ) for this tuple. In tuitively , the AI’s compliance level is rev ealed by comparing the instability in the AI’s sto c hastic c hoices with the comp osite instability across both agen ts. If this ratio is high, then the instability in the AI’s c hoices matches the comp osite instabilit y to a large exten t, revealing a high compliance lev el. Alternativ ely , if this ratio is low, then the comp osite instabilit y is m uch higher than the instability in the AI’s choices, whic h indicates a lo w compliance level. 11 Prop osition 2 (Identiﬁcation of α ) . Supp ose ( ρ AI , ρ H ) is c onsistent with LAM. If ρ AI violates IIA for some tuple ( x, y , S, T ) with x, y ∈ S ∩ T , then the c omplianc e p ar ameter α c an b e uniquely r e c over e d as α = ∆ xy ( S, T | ρ AI ) Φ xy ( S, T | ρ AI , ρ H ) . Pr o of. Under LAM, ρ AI ( x, S ) = α · ρ H ( x, S ) + (1 − α ) · ρ A ( x, S ). Substituting this in to ∆ xy ( S, T | ρ AI ), ∆ xy ( S, T | ρ AI ) = ρ AI ( x, S ) ρ AI ( y , T ) − ρ AI ( x, T ) ρ AI ( y , S ) = [ αρ H ( x, S ) + (1 − α ) ρ A ( x, S )][ αρ H ( y , T ) + (1 − α ) ρ A ( y , T )] − [ αρ H ( x, T ) + (1 − α ) ρ A ( x, T )][ αρ H ( y , S ) + (1 − α ) ρ A ( y , S )] . Expanding and collecting terms b y p o w ers of α , we get ∆ xy ( S, T | ρ AI ) = α 2 ∆ xy ( S, T | ρ H ) + (1 − α ) 2 ∆ xy ( S, T | ρ A ) + α (1 − α )Φ xy ( S, T | ρ H , ρ A ) = α (1 − α )Φ xy ( S, T | ρ H , ρ A ) , where the ﬁrst equality follo ws from the deﬁnitions of ∆ xy ( S, T | ρ H ), ∆ xy ( S, T | ρ A ), and Φ xy ( S, T | ρ H , ρ A ), and the second equalit y uses the fact that b oth ρ H and ρ A are consisten t with the Luce rule, and hence ∆ xy ( S, T | ρ H ) = ∆ xy ( S, T | ρ A ) = 0. No w substituting ρ AI ( x, S ) = α · ρ H ( x, S ) + (1 − α ) · ρ A ( x, S ) in the deﬁnition of Φ xy ( S, T | ρ AI , ρ H ), Φ xy ( S, T | ρ AI , ρ H ) = ρ AI ( x, S ) ρ H ( y , T ) + ρ H ( x, S ) ρ AI ( y , T ) − ρ AI ( x, T ) ρ H ( y , S ) − ρ H ( x, T ) ρ AI ( y , S ) = [ αρ H ( x, S ) + (1 − α ) ρ A ( x, S )] ρ H ( y , T ) + ρ H ( x, S )[ αρ H ( y , T ) + (1 − α ) ρ A ( y , T )] − [ αρ H ( x, T ) + (1 − α ) ρ A ( x, T )] ρ H ( y , S ) − ρ H ( x, T )[ αρ H ( y , S ) + (1 − α ) ρ A ( y , S )] = 2 α ∆ xy ( S, T | ρ H ) + (1 − α )Φ xy ( S, T | ρ H , ρ A ) = (1 − α )Φ xy ( S, T | ρ H , ρ A ) , 12 where the third equalit y follows from the deﬁnitions of the instability measures and the last equalit y follo ws from the fact that ρ H follo ws the Luce rule. Therefore, ∆ xy ( S, T | ρ AI ) Φ xy ( S, T | ρ AI , ρ H ) = α (1 − α )Φ xy ( S, T | ρ H , ρ A ) (1 − α )Φ xy ( S, T | ρ H , ρ A ) = α. The cancellation is v alid since an IIA violation implies ∆ xy ( S, T | ρ AI )  = 0 and the ab o v e deriv ations show that this implies Φ xy ( S, T | ρ AI , ρ H )  = 0. There are t w o immediate but non-obvious implications of the pro of. The ﬁrst is that, under LAM, the instabilit y measures ∆ xy ( S, T | ρ AI ) and Φ xy ( S, T | ρ AI , ρ H ) are alw ays prop ortional. Hence, the compliance formula in Prop osition 2 is well-deﬁned whenev er there is an I IA violation in ρ AI , i.e., ∆ xy ( S, T | ρ AI )  = 0 automatically guaran tees a non-zero denominator. Second, the expression deriv ed for comp osite instabilit y , Φ xy ( S, T | ρ AI , ρ H ) = (1 − α )Φ xy ( S, T | ρ H , ρ A ), shows that the comp osite instabilit y of ρ AI and ρ H is alw a ys zero if and only if the AI is either fully aligned or fully compliant. Since ρ AI = ρ H in b oth cases, it follows that the expression for the compliance parameter is v alid as long as ρ AI  = ρ H . Corollary 1. If ( ρ AI , ρ H ) is c onsistent with LAM, then for al l tuples ( x, y , S, T ) with x, y ∈ S ∩ T , ∆ xy ( S, T | ρ AI ) = α · Φ xy ( S, T | ρ AI , ρ H ) . Mor e over, the c omplianc e p ar ameter α is uniquely identiﬁe d by this r elationship as long as ρ AI  = ρ H . The last step in iden tiﬁcation is to reco ver the AI’s utilit y function v . Step 3: Reco v er v from ρ AI and ρ H . As b efore, if ρ AI = ρ H , then w e can- not distinguish full compliance ( α = 1) from full alignmen t ( v = λu for λ > 0). Alternativ ely , if ρ AI  = ρ H , then Step 2 allo ws us to uniquely iden tify the compli- ance parameter α . Let α denote the recov ered compliance lev el. Using the fact that ρ AI ( x, S ) = α · ρ H ( x, S ) + (1 − α ) · ρ A ( x, S ) , w e can construct ρ A as ρ A ( x, S ) = ρ AI ( x, S ) − αρ H ( x, S ) 1 − α Under LAM, ρ A is generated b y the Luce rule with the utilit y function v . W e can use this to construct v from ρ A as in Step 1. 13 Theorem 1 summarizes the iden tiﬁcation results in the lab oratory setting. Theorem 1 (Lab oratory Identiﬁcation) . L et ( ρ AI , ρ H ) b e c onsistent with LAM. 1. If ρ AI  = ρ H , then α is uniquely identiﬁe d and u and v ar e uniquely identiﬁe d up to sc ale normalization. 2. If ρ AI = ρ H , then α and v ar e not sep ar ately identiﬁe d and only u is uniquely identiﬁe d up to sc ale normalization. Pr o of. The pro of follo ws from the three iden tiﬁcation steps and the results established in this section. The following example illustrates the identiﬁcation result. Example 1. Consider X = { x, y , z } and supp ose we observe ρ AI and ρ H given as fol lows. Agen t Option { x, y , z } { x, y } { x, z } { y , z } ρ AI x 1 / 3 7 / 15 1 / 2 – y 1 / 3 8 / 15 – 8 / 15 z 1 / 3 – 1 / 2 7 / 15 ρ H x 1 / 2 3 / 5 3 / 4 – y 1 / 3 2 / 5 – 2 / 3 z 1 / 6 – 1 / 4 1 / 3 T able 1: Observed c hoice probabilities in Example 1 Normalizing u ( x ) = 1 , we c an infer fr om ρ H that u ( y ) = 2 / 3 and u ( z ) = 1 / 3 . T o r e c over α , we c onstruct the two instability me asur es. L et S = { x, y , z } and T = { x, y } . We ﬁrst c ompute the instability of ρ AI for the tuple ( x, y , S, T ) : ∆ xy ( S, T | ρ AI ) = ρ AI ( x, S ) ρ AI ( y , T ) − ρ AI ( x, T ) ρ AI ( y , S ) = 1 3 · 8 15 − 7 15 · 1 3 = 1 45 . 14 Next, we c ompute the c omp osite instability b etwe en ρ AI and ρ H : Φ xy ( S, T | ρ AI , ρ H ) = ρ AI ( x, S ) ρ H ( y , T ) + ρ H ( x, S ) ρ AI ( y , T ) − ρ AI ( x, T ) ρ H ( y , S ) − ρ H ( x, T ) ρ AI ( y , S ) = 1 3 · 2 5 + 1 2 · 8 15 − 7 15 · 1 3 − 3 5 · 1 3 = 2 15 + 4 15 − 7 45 − 3 15 = 2 45 . By Pr op osition 2 , the c omplianc e p ar ameter is uniquely identiﬁe d as α = 1 / 45 2 / 45 = 1 / 2 . We c ould similarly r e c over α using the tuple ( y , z , { x, y , z } , { y , z } ) inste ad. Note, however, that we c annot use the tuple ( x, z , { x, y , z } , { x, z } ) as ρ AI satisﬁes IIA for this tuple. This highlights that while ρ AI  = ρ H implies α is uniquely identiﬁe d, not al l tuples c an b e use d for identiﬁc ation. Using α = 1 / 2 , we c an c onstruct ρ A as fol lows: Agen t Option { x, y , z } { x, y } { x, z } { y , z } ρ A x 1 / 6 1 / 3 1 / 4 – y 1 / 3 2 / 3 – 2 / 5 z 1 / 2 – 3 / 4 3 / 5 T able 2: Recov ered autonomous AI sto chastic c hoice ρ A Normalizing v ( x ) = 1 , we c an infer fr om ρ A that v ( y ) = 2 and v ( z ) = 3 . Henc e, we have u = (1 , 2 / 3 , 1 / 3) , v = (1 , 2 , 3) , α = 1 / 2 . Note that u and v induc e c ompletely opp osite or dinal r ankings, r eve aling a high de gr e e of misalignment. 3.2 Axiomatic Characterization In this section, I pro vide an axiomatic c haracterization for the Luce Alignmen t Mo del taking ( ρ AI , ρ H ) as the primitiv e. The ﬁrst t w o axioms are standard. Axiom 1 requires that ρ ( x, S ) is strictly p ositiv e for any x ∈ S ⊆ X and ρ ∈ { ρ AI , ρ H } . 15 Axiom 2 requires that ρ H satisﬁes I IA, ensuring that the h uman principal’s b eha vior is consistent with the Luce rule. Axiom 1 (Positivit y) . F or any ρ ∈ { ρ AI , ρ H } and x ∈ S ⊆ X , we have ρ ( x, S ) > 0 . Axiom 2 (H-I IA) . ρ H satisﬁes IIA. Axiom 3 is the key axiom in ensuring that the AI compliance parameter can b e iden tiﬁed. It requires that the o wn instabilit y of ρ AI and the composite instability of ρ AI and ρ H are prop ortional: an y change in comp osite instabilit y from one tuple to another must b e prop ortionally reﬂected by a change in the AI’s own instabilit y . Axiom 3 (Prop ortionality) . F or any two tuples ( x, y , S , T ) and ( z , t, S ′ , T ′ ) with x, y ∈ S ∩ T and z , t ∈ S ′ ∩ T ′ , ∆ xy ( S, T | ρ AI ) · Φ z t ( S ′ , T ′ | ρ AI , ρ H ) = ∆ z t ( S ′ , T ′ | ρ AI ) · Φ xy ( S, T | ρ AI , ρ H ) . Axiom 4 requires that the AI’s own instability alwa ys shares the same sign as the comp osite instabilit y , and the o wn instability is b ounded b y the comp osite instabilit y . T o get an intuition for this axiom, consider the case ∆ xy ( S, T | ρ AI ) > 0. This implies that if we use the AI’s relative c hoice probabilities for x and y in S to impute its relativ e choice probabilities in T , this will lead to an ov erestimation of x v ersus y . The ﬁrst part of the axiom then requires that the comp osite instability m ust also b e strictly p ositive: using the AI’s and the human’s relativ e c hoice probabilities in S to cross-impute the relativ e c hoice probabilities in T will also lead to an o verestimation of x in aggregate. In addition, the axiom requires that the aggregate cross-imputation error m ust b e larger than the own imputation error. T ogether with Axiom 3 , this prop ert y ensures that the compliance parameter is uniquely identiﬁed and b ounded b et w een zero and one. Axiom 4 (Bounded Instabilit y) . F or any tuple ( x, y , S, T ) with x, y ∈ S ∩ T , ∆ xy ( S, T | ρ AI ) · Φ xy ( S, T | ρ AI , ρ H ) ≥ 0 and | ∆ xy ( S, T | ρ AI ) | ≤ | Φ xy ( S, T | ρ AI , ρ H ) | , wher e b oth ine qualities hold strictly if ∆ xy ( S, T | ρ AI )  = 0 . The last axiom b ounds the divergence b etw een the AI’s and the human’s sto chastic c hoices. Fixing own and comp osite instability measures and the h uman’s sto chastic c hoices, it provides a low er b ound on the AI’s sto c hastic c hoices. 16 Axiom 5 (Bounded Divergence) . F or any tuple ( x, y , S, T ) with x, y ∈ S ∩ T , menu U , and alternative z ∈ U , ρ AI ( z , U ) · | Φ xy ( S, T | ρ AI , ρ H ) | ≥ ρ H ( z , U ) · | ∆ xy ( S, T | ρ AI ) | . Mor e over, if ∆ xy ( S, T | ρ AI )  = 0 , the ine quality is strict. T o in terpret this axiom, supp ose the instabilit y measures are strictly positive. The axiom then requires that ρ AI ( z , U ) ρ H ( z , U ) > ∆ xy ( S, T | ρ AI ) Φ xy ( S, T | ρ AI , ρ H ) . The righ t-hand side of the ab o ve inequalit y giv es us the relativ e imputation error: it tells us the prop ortion of aggregate cross-imputation error that can b e explained b y the AI’s o wn imputation error. The axiom then requires that the higher the relative imputation error is, the more the AI’s c hoice probabilities are constrained to trac k the human’s c hoice probabilities. Theorem 2 establishes that Axioms 1 – 5 are necessary and suﬃcien t for a LAM represen tation. An in teresting feature of this characterization is that while there is an explicit axiom imp osing the I IA prop ert y on the h uman’s choices ρ H , there is no equiv alen t axiom for the autonomous AI’s sto chastic c hoices ρ A . Instead, the pro of of the theorem shows that the I IA prop erty for ρ A is jointly implied by the axioms. Theorem 2 (Lab oratory Characterization) . The p air ( ρ AI , ρ H ) satisﬁes Axioms 1 – 5 if and only if it is c onsistent with LAM. Pr o of. Necessit y . Axiom 1 follo ws from the assumption that u and v are strictly p ositiv e in LAM. Axiom 2 follows from the fact that ρ H is consisten t with the Luce rule. Axioms 3 and 4 follow from Corollary 1 , which sho ws that ∆ xy ( S, T | ρ AI ) = α · Φ xy ( S, T | ρ AI , ρ H ) . F or Axiom 5 , if ∆ xy ( S, T | ρ AI ) = 0, then the inequality follo ws trivially . Alternatively , if ∆ xy ( S, T | ρ AI )  = 0, then α m ust b e strictly less than 1, as α = 1 w ould imply ρ AI = ρ H and hence ∆ xy ( S, T | ρ AI ) = 0. Therefore, ρ AI ( z , U ) − αρ H ( z , U ) = (1 − α ) ρ A ( z , U ) > 0 ⇒ ρ AI ( z , U ) > α ρ H ( z , U ) . 17 Substituting the result α = ∆ xy ( S, T | ρ AI ) / Φ xy ( S, T | ρ AI , ρ H ) from the previous section in to the ab ov e inequalit y yields the axiom. Suﬃciency . Axioms 1 and 2 imply that ρ H is consisten t with the Luce rule with some utility function u that is strictly p ositive. There are tw o cases to consider. First, supp ose ∆ xy ( S, T | ρ AI ) = 0 for all ( x, y , S, T ) with x, y ∈ S ∩ T . Then, b y Remark 1 , ρ AI is consisten t with the Luce rule with some strictly p ositiv e utility function v . In this case, ( u, v , α = 0) is a LAM representation of ( ρ AI , ρ H ). If, in addition, Φ xy ( S, T | ρ AI , ρ H ) = 0 for all ( x, y , S , T ) with x, y ∈ S ∩ T , then w e m ust ha ve v = λu for some λ > 0 and α can b e arbitrary . Next, supp ose ∆ xy ( S, T | ρ AI )  = 0 for some ( x, y , S, T ) with x, y ∈ S ∩ T . By Axiom 4 , ∆ xy ( S, T | ρ AI )  = 0 ⇒ Φ xy ( S, T | ρ AI , ρ H )  = 0 Hence, by Axiom 3 , the ratio ∆ xy ( S, T | ρ AI ) Φ xy ( S, T | ρ AI , ρ H ) is constan t for all such tuples ( x, y , S, T ). Let α denote the ab ov e ratio. Axiom 4 guaran tees that α ∈ (0 , 1). W e next deﬁne ρ A b y ρ A ( x, S ) = ρ AI ( x, S ) − αρ H ( x, S ) 1 − α . Since the requiremen ts ρ ( x, S ) = 0 for x / ∈ S and P x ∈ S ρ A ( x, S ) = 1 hold, ρ A is a v alid sto c hastic c hoice function. In addition, by Axiom 5 , the n umerator is strictly p ositiv e for an y x ∈ S so that ρ A ( x, S ) > 0. Rearranging the ab ov e equation yields ρ AI ( x, S ) = α ρ H ( x, S ) + (1 − α ) ρ A ( x, S ) . T o conclude the proof of the theorem, w e only need to show that ρ A is consistent with the Luce rule. By Remark 1 , it is suﬃcien t to show that ∆ xy ( S, T | ρ A ) = 0 for all tuples ( x, y , S, T ) with x, y ∈ S ∩ T . Let suc h a tuple b e giv en. As sho wn in the pro of of Prop osition 2 , substituting 18 ρ AI = α · ρ H + (1 − α ) · ρ A bac k into the o wn and comp osite instability measures yields ∆ xy ( S, T | ρ AI ) = α 2 ∆ xy ( S, T | ρ H ) + (1 − α ) 2 ∆ xy ( S, T | ρ A ) + α (1 − α )Φ xy ( S, T | ρ H , ρ A ) and Φ xy ( S, T | ρ AI , ρ H ) = 2 α ∆ xy ( S, T | ρ H ) + (1 − α )Φ xy ( S, T | ρ H , ρ A ) . By Axioms 1 – 2 and Remark 1 , ∆ xy ( S, T | ρ H ) = 0. Hence, com bining the last tw o expressions, we ha ve ∆ xy ( S, T | ρ AI ) = (1 − α ) 2 ∆ xy ( S, T | ρ A ) + α (1 − α )Φ xy ( S, T | ρ H , ρ A ) = (1 − α ) 2 ∆ xy ( S, T | ρ A ) + α Φ xy ( S, T | ρ AI , ρ H ) . Consider a tuple ( z , t, S ′ , T ′ ) with z , t ∈ S ′ ∩ T ′ that satisﬁes α = ∆ z t ( S ′ , T ′ | ρ AI ) Φ z t ( S ′ , T ′ | ρ AI , ρ H ) . Substituting this in to the last term of the ab ov e expression and cross-multiplying, w e get ∆ xy ( S, T | ρ AI )Φ z t ( S ′ , T ′ | ρ AI , ρ H ) = (1 − α ) 2 ∆ xy ( S, T | ρ A )Φ z t ( S ′ , T ′ | ρ AI , ρ H ) + ∆ z t ( S ′ , T ′ | ρ AI )Φ xy ( S, T | ρ AI , ρ H ) . Since α ∈ (0 , 1), w e m ust hav e Φ z t ( S ′ , T ′ | ρ AI , ρ H )  = 0. In addition, b y Axiom 3 , ∆ xy ( S, T | ρ AI )Φ z t ( S ′ , T ′ | ρ AI , ρ H ) = ∆ z t ( S ′ , T ′ | ρ AI )Φ xy ( S, T | ρ AI , ρ H ) . Therefore, the abov e expression can hold only if ∆ xy ( S, T | ρ A ) = 0. Since the tuple ( x, y , S, T ) w as arbitrary , ρ A is consistent with the Luce rule, as desired. This con- cludes the pro of of the theorem as we ha ve shown that ρ AI is a mixture of t wo Luce rules, where one of the mixing parts is ρ H . 19 4 Field Data In this section, I study the Luce Alignmen t Mo del when only the AI’s c hoices ρ AI are observ able. This setting is important for t wo reasons. First, while laboratory data may b e readily av ailable, the volume of ﬁeld data is expected to b e m uch larger, whic h can enable richer inference ab out AI behavior. Second, AI behavior in the t w o settings may diﬀer systematically: a suﬃcien tly sophisticated AI ma y appear compli- an t in a monitored lab oratory setting while rev erting to its autonomous preferences in the ﬁeld, a phenomenon kno wn as deceptive alignment ( Green blatt et al. , 2024 ). Comparing reco v ered compliance parameters across the tw o settings can provide a measure of deceptiv e alignmen t. 4.1 Iden tiﬁcation The iden tiﬁcation problem in the ﬁeld setting faces an inherent c hallenge: if ( u, v , α ) is a LAM representation of ρ AI , then so is ( v , u, 1 − α ). That is, ev en if the tw o utilit y functions underlying LAM can be recov ered, the data alone cannot rev eal whic h b elongs to the human principal and whic h to the AI agen t. The utilities can therefore b e iden tiﬁed only up to a lab el swap, and the compliance parameter only up to reﬂection about 1 / 2. Note, ho wev er, that the distribution ov er utilities ma y still b e uniquely identiﬁed. F urthermore, if ρ AI satisﬁes I IA, then the observed choice b eha vior can be con- sisten t with any alignment and compliance levels: w e can hav e either (i) v = λu for some λ > 0 with arbitrary α ∈ [0 , 1], or (ii) α ∈ { 0 , 1 } with v  = λu for an y λ > 0. Hence, the iden tiﬁcation problem is in teresting only if ρ AI violates I IA. The key for the iden tiﬁcation in this section will b e the cross instability measure Γ xy ( S, T | ρ AI , ρ ) for ρ ∈ { ρ H , ρ A } , deﬁned in Deﬁnition 2 . The next proposition pro vides a formula for the cross instability measures in terms of ( u, v , α ). Prop osition 3. Supp ose ρ AI is c onsistent with LAM with p ar ameters ( u, v , α ) , and let ρ H and ρ A b e the c orr esp onding Luc e rules. Then, Γ xy ( S, T | ρ AI , ρ H ) = (1 − α ) Γ xy ( S, T | ρ A , ρ H ) = (1 − α ) · u ( y ) v ( x ) − u ( x ) v ( y ) u ( T ) v ( S ) 20 and Γ xy ( S, T | ρ AI , ρ A ) = α Γ xy ( S, T | ρ H , ρ A ) = α · u ( x ) v ( y ) − u ( y ) v ( x ) u ( S ) v ( T ) . Pr o of. Substituting ρ AI ( x, S ) = α ρ H ( x, S ) + (1 − α ) ρ A ( x, S ) in to the deﬁnition of cross instability , Γ xy ( S, T | ρ AI , ρ H ) = ρ AI ( x, S ) ρ H ( y , T ) − ρ AI ( y , S ) ρ H ( x, T ) = α  ρ H ( x, S ) ρ H ( y , T ) − ρ H ( y , S ) ρ H ( x, T )  + (1 − α )  ρ A ( x, S ) ρ H ( y , T ) − ρ A ( y , S ) ρ H ( x, T )  = α ∆ xy ( S, T | ρ H ) + (1 − α ) Γ xy ( S, T | ρ A , ρ H ) . Since ρ H is consisten t with the Luce rule, ∆ xy ( S, T | ρ H ) = 0. Com bining this with the result in Remark 1 , w e get the ﬁrst iden tity . The second iden tit y follows analogously b y expanding Γ xy ( S, T | ρ AI , ρ A ) and using ∆ xy ( S, T | ρ A ) = 0. The next proposition pro vides a key equation that will b e used in the iden tiﬁcation of u and v from ρ AI . Prop osition 4. Supp ose ρ AI is c onsistent with LAM with p ar ameters ( u, v , α ) , and let ρ H and ρ A b e the c orr esp onding Luc e rules. L et ρ ∈ { ρ H , ρ A } , S = { x, y , z , t } , and T = { x, y } , wher e x, y , z , t ar e four distinct alternatives, and assume the asso ciate d cr oss instabilities ar e non-zer o. Then, 1 Γ xy ( S, T | ρ AI , ρ ) + 1 Γ xy ( T , T | ρ AI , ρ ) = 1 Γ xy ( S \ t, T | ρ AI , ρ ) + 1 Γ xy ( S \ z , T | ρ AI , ρ ) . Pr o of. Consider the case ρ = ρ H . Letting S = { x, y , z , t } and T = { x, y } , we know from Prop osition 3 that Γ xy ( S ′ , T | ρ AI , ρ H ) = (1 − α ) · u ( y ) v ( x ) − u ( x ) v ( y ) u ( T ) v ( S ′ ) for any men u S ′ ⊇ T . Hence, 1 Γ xy ( S ′ , T | ρ AI , ρ H ) = u ( T ) v ( S ′ ) (1 − α )[ u ( y ) v ( x ) − u ( x ) v ( y )] = v ( S ′ ) (1 − α )[ u ( y ) v ( x ) − u ( x ) v ( y )] /u ( T ) . 21 Notice that the denominator is indep endent of S ′ . Hence, the result holds as long as v ( S ) + v ( T ) = v ( S \ t ) + v ( S \ z ). This holds trivially since b oth sides of the equation ev aluate to 2 v ( x ) + 2 v ( y ) + v ( z ) + v ( t ). The case ρ = ρ A follo ws analogously with u replacing v and vice versa. W e will use this result to identify b oth u and v . The iden tiﬁcation strategy pro ceeds in three steps. Step 1: Reco ver u ( y ) and v ( y ) from ρ AI for eac h y ∈ X . The iden tiﬁcation of utilit y functions in the ﬁeld setting in volv es tw o separate steps. First, I sho w how the candidate utility v alues for each alternativ e can b e identiﬁed. In Step 3, I combine the prior t wo steps to iden tify the ov erall utility functions. I start with the identiﬁcation of u ( y ) for each y ∈ X , and the same pro cess also w orks for v ( y ). Assume X contains at least four alternativ es, and let u ( x ) = 1 for some x ∈ X . Since utility functions are identiﬁed only up to scale normalization, this is without loss. F or any y  = x , notice that ρ H ( x, { x, y } ) = 1 1 + u ( y ) and ρ H ( y , { x, y } ) = u ( y ) 1 + u ( y ) . Pic k tw o other alternativ es z and t distinct from x and y , and let S = { x, y , z , t } , T = { x, y } , and T ⊆ S ′ ⊆ S . W e ha ve Γ xy ( S ′ , T | ρ AI , ρ H ) = ρ AI ( x, S ′ ) ρ H ( y , T ) − ρ AI ( y , S ′ ) ρ H ( x, T ) = ρ AI ( x, S ′ ) u ( y ) 1 + u ( y ) − ρ AI ( y , S ′ ) 1 1 + u ( y ) = ρ AI ( x, S ′ ) u ( y ) − ρ AI ( y , S ′ ) 1 + u ( y ) . Since ρ AI is observ ed, this is an equation in terms of one unknown u ( y ). There are t wo cases to consider. Case 1: ρ AI ( x, S ′ ) u ( y )  = ρ AI ( y , S ′ ) for all S ′ with T ⊆ S ′ ⊆ S . This ensures that Γ xy ( S ′ , T | ρ AI , ρ H )  = 0. Utilizing Prop osition 4 with ρ = ρ H and canceling the common (1 + u ( y )) terms, w e get 1 ρ AI ( x, S ) u ( y ) − ρ AI ( y , S ) + 1 ρ AI ( x, T ) u ( y ) − ρ AI ( y , T ) = 1 ρ AI ( x, S \ t ) u ( y ) − ρ AI ( y , S \ t ) + 1 ρ AI ( x, S \ z ) u ( y ) − ρ AI ( y , S \ z ) . 22 Cross-m ultiplying, w e get a cubic polynomial in terms of the unkno wn u ( y ). Normal- izing v ( x ) = 1 and re-deriving Prop osition 4 with ρ = ρ A instead of ρ H , w e deduce that v ( y ) m ust also satisfy the same polynomial, provided that ρ AI ( x, S ′ ) v ( y )  = ρ AI ( y , S ′ ) for all S ′ with T ⊆ S ′ ⊆ S . Case 2: ρ AI ( x, S ′ ) u ( y ) = ρ AI ( y , S ′ ) for some S ′ with T ⊆ S ′ ⊆ S . Alternatively , ρ AI ( x, S ′ ) ρ AI ( y , S ′ ) = u ( x ) u ( y ) = ρ H ( x, S ′ ) ρ H ( y , S ′ ) , where the ﬁrst equality is due to u ( x ) = 1. Since we are assuming ρ AI violates I IA, w e cannot ha v e α = 1 b y Prop osition 1 . Therefore, the ab ov e equalit y is p ossible only if ρ H ( x, S ′ ) ρ H ( y , S ′ ) = ρ A ( x, S ′ ) ρ A ( y , S ′ ) ⇒ u ( x ) u ( y ) = v ( x ) v ( y ) . But then the ratio ρ AI ( y , · ) /ρ AI ( x, · ) must b e constan t and equal to u ( y ) for all men us. Normalizing v ( x ) = 1, we also get u ( y ) = v ( y ). Note that in this case the p olynomial formed b y cross-multiplying the equation in Case 1 will either yield the utilities u ( y ) and v ( y ) as a unique root or the p olynomial will be identically zero. If the p olynomial is identically zero, then w e can generically infer that w e are in Case 2, whic h trivially reco vers u ( y ) and v ( y ) as ρ AI ( y , { x, y } ) /ρ AI ( x, { x, y } ). 2 Prop osition 5 (Iden tiﬁcation of u ( y ) and v ( y )) . Supp ose ρ AI is c onsistent with LAM with ( u, v , α ) such that u ( x ) = v ( x ) = 1 , and supp ose ρ AI violates IIA. F or any y  = x , let P ( κ y ) b e the cubic p olynomial obtaine d by cr oss-multiplying the e quation 1 ρ AI ( x, S ) κ y − ρ AI ( y , S ) + 1 ρ AI ( x, T ) κ y − ρ AI ( y , T ) = 1 ρ AI ( x, S \ t ) κ y − ρ AI ( y , S \ t ) + 1 ρ AI ( x, S \ z ) κ y − ρ AI ( y , S \ z ) , (3) wher e S = { x, y , z , t } , T = { x, y } , and z , t ar e two alternatives distinct fr om x, y . If P ( κ y ) is not identic al ly zer o, then u ( y ) and v ( y ) ar e b oth r o ots of P ( κ y ) and admissible solutions to e quation ( 3 ) . Otherwise, u ( y ) = v ( y ) = ρ AI ( y , { x, y } ) /ρ AI ( x, { x, y } ) holds generic al ly. Pr o of. The pro of follows from the argumen ts preceding the prop osition. 2 A result is said to hold generic al ly if it fails only on a measure-zero subset of the underlying parameter space. 23 There are t w o important points to consider regarding this result. First, while the mo del has t w o unknown utility v alues u ( y ) and v ( y ) for eac h alternative y , the deriv ed cubic p olynomial P ( κ y ) generically has three distinct ro ots. Hence, solving the p olynomial may yield a spurious ro ot that is not a true utility v alue. Ho w ever, since equation ( 3 ) must hold for an y reference pair z and t distinct from x and y and the spurious root will t ypically v ary depending on the c hosen reference pair, if the analyst has access to a ﬁfth alternativ e, rederiving the polynomial using a diﬀerent reference pair will generically isolate the true utility v alues. Th us, as long as | X | ≥ 5, b oth u ( y ) and v ( y ) are generically identiﬁed up to scale normalization and lab el swaps. In addition, as detailed in Step 2, this iden tiﬁcation pro cedure can b e impro v ed to require only | X | ≥ 4. Second, note that successfully iden tifying the true candidate pair { u ( y ) , v ( y ) } for eac h alternativ e y ∈ X do es not fully pin do wn the utilit y functions u and v . T o illustrate, consider three alternativ es x, y , z and normalize u ( x ) = v ( x ) = 1. Supp ose we hav e recov ered candidate utility pairs { κ 1 y , κ 2 y } and { κ 1 z , κ 2 z } . Since u ( y ) and u ( z ) can b e either of these utilit y v alues, this lea ves us with four candidate utilit y functions u : (1 , κ 1 y , κ 1 z ), (1 , κ 1 y , κ 2 z ), (1 , κ 2 y , κ 1 z ), or (1 , κ 2 y , κ 2 z ). Generalizing this insight, for | X | = N , iden tifying candidate utilit y pairs for eac h alternativ e still lea ves us with 2 N − 1 candidate utility functions. T o resolve this problem, we ﬁrst need to reco v er the compliance parameter α , as illustrated in the next step. Step 2: Recov er α from ρ AI . F ollo wing Step 1, suppose w e hav e a candidate utilit y pair { u ( y ) , v ( y ) } for eac h alternative y ∈ X \ x and assume u ( x ) = v ( x ) = 1. Construct the asso ciated Luce choice probabilities ρ u ( x, { x, y } ) and ρ v ( x, { x, y } ) for eac h y  = x . If { u ( y ) , v ( y ) } is the true utility pair up to a lab el sw ap, then the observed AI sto chastic c hoice function ρ AI m ust satisfy one of the follo wing equations: ρ AI ( x, { x, y } ) = α · ρ u ( x, { x, y } ) + (1 − α ) · ρ v ( x, { x, y } ) , ρ AI ( x, { x, y } ) = (1 − α ) · ρ u ( x, { x, y } ) + α · ρ v ( x, { x, y } ) , where α is the compliance parameter. Hence, for each alternative y  = x and each candidate utilit y pair, we get tw o p ossible candidates for the compliance parameter. F or the true utility pair, this iden tiﬁes the compliance parameter up to reﬂection ab out 1 / 2. Step 1 generically reco vers the true utility pair for an alternativ e when | X | ≥ 5. 24 Alternativ ely , supp ose w e hav e three candidate utility pairs for an alternativ e after solving the cubic p olynomial. Note that an y candidate utility pair for an alternativ e that is not the true utilit y pair will imply a compliance parameter that will not b e generically v alidated by the candidate utilit y pairs for other alternatives. Hence, b y ensuring the consistency of the implied compliance parameter across diﬀeren t alternativ es, w e can iden tify the true utility pair. Adopting this approach, w e only need | X | ≥ 4, whic h impro v es up on the pro cedure in Step 1. Lastly , note that while the compliance parameter is iden tiﬁed up to reﬂection ab out 1 / 2, the distribution o ver utilities is generically uniquely iden tiﬁed. Step 3: Reco ver u and v from ρ AI . F ollo wing Steps 1 and 2, for each alternativ e y ∈ X \ x , w e can generically identify the true utilit y pair { u ( y ) , v ( y ) } up to a lab el sw ap. How ev er, as discussed in Step 1, iden tifying the true pair for each alternativ e do es not by itself determine which v alue b elongs to u and whic h b elongs to v across alternativ es. T o resolv e the remaining am biguity , ﬁx an alternativ e y ∈ X \ x and supp ose w e assign u ( y ) = κ 1 y . By Step 2, this assignmen t implies a candidate compliance parameter α u . No w consider any other alternativ e z ∈ X \ { x, y } with the utilit y pair { κ 1 z , κ 2 z } . Since the true utility function m ust generate the same compliance parameter across all alternativ es, w e can pin down the true assignmen t for u ( z ) by requiring consistency with α u . Generically , unless α u ∈ { 0 , 1 / 2 , 1 } , exactly one of the t wo candidate v alues for u ( z ) will b e consisten t with α u . W e can then rep eat this pro cedure for all alternativ es to recov er the utility functions u and v up to a lab el sw ap. Com bining all the results in this section, we hav e the following identiﬁcation result in the ﬁeld setting. Theorem 3 (Field Iden tiﬁcation) . Supp ose ρ AI is c onsistent with LAM and | X | ≥ 4 . 1. If ρ AI violates IIA, then ( u, v , α ) ar e generic al ly identiﬁe d up to a lab el swap and sc ale normalization. 2. If ρ AI satisﬁes IIA, then either v = λu for λ > 0 or α ∈ { 0 , 1 } . Pr o of. The pro of follo ws from the three iden tiﬁcation steps and the results established in this section. The next example illustrates the ﬁeld iden tiﬁcation result. 25 Example 2. Supp ose X = { x, y , z , t } and the AI sto chastic choic e data ρ AI is gen- er ate d by the p ar ameters u = (1 , 2 , 4 , 5) , v = (1 , 4 / 5 , 2 / 5 , 1 / 5) , α = 3 / 4 , as given in the fol lowing table. ρ AI ( · , · ) { x, y , z , t } { x, y , z } { x, y , t } { x, z , t } { y , z , t } { x, y } { x, z } { x, t } { y , z } { y , t } { z , t } x 1 / 6 17 / 77 7 / 32 37 / 160 7 / 18 23 / 70 1 / 3 y 5 / 24 47 / 154 23 / 80 43 / 154 11 / 18 5 / 12 29 / 70 z 7 / 24 73 / 154 29 / 80 53 / 154 47 / 70 7 / 12 1 / 2 t 1 / 3 79 / 160 13 / 32 29 / 77 2 / 3 41 / 70 1 / 2 T able 3: AI sto c hastic c hoice data in Example 2 T o pr o c e e d with identiﬁc ation, we ﬁrst normalize u ( x ) = v ( x ) = 1 . Consider the alternative y . Equation ( 3 ) c orr esp onding to y with S = { x, y , z , t } and T = { x, y } is given by 1 1 6 κ y − 5 24 + 1 7 18 κ y − 11 18 = 1 17 77 κ y − 47 154 + 1 7 32 κ y − 23 80 , which simpliﬁes to 24 4 κ y − 5 + 18 7 κ y − 11 = 154 34 κ y − 47 + 160 35 κ y − 46 . Cr oss-multiplying yields a cubic p olynomial in κ y with r o ots κ 1 y = 2 , κ 2 y = 4 / 5 , and κ 3 y = 263 / 196 . Thus, ther e ar e thr e e p ossible utility p airs: { 2 , 4 / 5 } , { 2 , 263 / 196 } , { 4 / 5 , 263 / 196 } . F or e ach utility p air, we c an use the e quation ρ AI ( x, { x, y } ) = α · ρ u ( x, { x, y } ) + (1 − α ) · ρ v ( x, { x, y } ) to r e c over the implie d c omplianc e p ar ameter up to r eﬂe ction ab out 1 / 2 . This yields the fol lowing table: 26 Utility Pair { u ( y ) , v ( y ) } Implie d α F e asibility { 2 , 4 / 5 } { 3 / 4 , 1 / 4 } ✓ { 2 , 263 / 196 } { 35 / 86 , 51 / 86 } ✓ { 4 / 5 , 263 / 196 } {− 35 / 118 , 153 / 118 } × A t this stage, ther e ar e two fe asible c andidate utility p airs for y , inducing two fe asible values of α up to r eﬂe ction ab out 1 / 2 . We c an eliminate one of them by c onsidering the alternative z or t . R ep e ating the pr o c e dur e for the alternative z with T = { x, z } gives 24 4 κ z − 7 + 70 23 κ z − 47 = 154 34 κ z − 73 + 160 37 κ z − 58 , with r o ots κ 1 z = 4 , κ 2 z = 2 / 5 , and κ 3 z = 481 / 244 . The implie d values of α ar e: Utility Pair { u ( z ) , v ( z ) } Implie d α F e asibility { 4 , 2 / 5 } { 3 / 4 , 1 / 4 } ✓ { 4 , 481 / 244 } { 9 / 154 , 145 / 154 } ✓ { 2 / 5 , 481 / 244 } {− 3 / 142 , 145 / 142 } × F or the alternative t with T = { x, t } , the c orr esp onding e quation is 6 κ t − 2 + 3 κ t − 2 = 160 35 κ t − 79 + 160 37 κ t − 65 . Cr oss-multiplying and solving the r esulting cubic p olynomial gives the c andidate r o ots κ 1 t = 5 , κ 2 t = 1 / 5 , and κ 3 t = 2 . However, κ t = 2 is not a valid solution to the original e quation, sinc e it makes the left-hand side undeﬁne d. Henc e, the admissible r o ots ar e κ 1 t = 5 and κ 2 t = 1 / 5 , which yields: Utility Pair { u ( t ) , v ( t ) } Implie d α F e asibility { 5 , 1 / 5 } { 3 / 4 , 1 / 4 } ✓ The only fe asible value of α c onsistent acr oss al l thr e e alternatives is { 3 / 4 , 1 / 4 } . This uniquely r e c overs u = (1 , 2 , 4 , 5) and v = (1 , 4 / 5 , 2 / 5 , 1 / 5) up to the lab el swap and sc ale normalization. 27 5 Conclusion This paper considers a delegated choice environmen t where an AI agen t is instructed to act on b ehalf of a human principal. A cen tral concern in this en vironment is the p oten tial misalignment b et w een the AI’s and the human principal’s preferences. T o study this problem using rev ealed preference tec hniques, I in tro duce the Luce Align- men t Model, where the AI agen t balances deference to the principal’s preferences against pursuit of its o wn. The mo del mak es it possible to separately iden tify t wo conceptually distinct dimensions of AI b eha vior: alignmen t, whic h captures the simi- larit y betw een the h uman’s and the AI’s preferences, and compliance, whic h captures the extent to which the AI defers to the h uman principal. I study the iden tiﬁcation problem in t w o settings. In the laboratory setting, where b oth the AI’s and the h uman principal’s sto c hastic c hoices are observed, I sho w that violations of the Indep endence of Irrelev an t Alternativ es in the AI’s c hoice data al- lo w the analyst to recov er b oth utility functions and obtain a closed-form expression for the compliance parameter. I also provide an axiomatic c haracterization of the mo del in this setting. In the ﬁeld setting, where only the AI’s choices are observ ed, a fundamen tal symmetry prev en ts an analyst from determining which recov ered utilit y b elongs to the h uman and whic h to the AI. Nev ertheless, I sho w that when there are at least four alternativ es, the underlying distribution o ver utilities is generically iden- tiﬁed up to this label swap, whic h is suﬃcien t to recov er the degree of misalignmen t. 28 References G. Adoma vicius and A. T uzhilin. T o ward the next generation of recommender sys- tems: A surv ey of the state-of-the-art and possible extensions. IEEE T r ansactions on Know le dge and Data Engine ering , 17(6):734–749, 2005. A. Allouah, O. Besb es, J. Figueroa, Y. Kanoria, and A. Kumar. What is your AI agen t buying? Ev aluation, implications, and emerging questions for agen tic e-commerce. Columbia Business Scho ol R ese ar ch Pap er No. 381574 , 2025. D. Amo dei, C. Olah, J. Steinhardt, P . Christiano, J. Sch ulman, and D. Man ´ e. Con- crete problems in AI safet y . arXiv pr eprint arXiv:1606.06565 , 2016. Y. Bai, S. K ada v ath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, et al. Constitutional AI: Harmlessness from AI feed- bac k. arXiv pr eprint arXiv:2212.08073 , 2022. J. H. Bo yd and R. E. Mellman. The eﬀect of fuel economy standards on the U.S. automotiv e market: An hedonic demand analysis. T r ansp ortation R ese ar ch Part A: Gener al , 14(5–6):367–378, 1980. N. S. Cardell and F. C. Dunbar. Measuring the so cietal impacts of automobile do wn- sizing. T r ansp ortation R ese ar ch Part A: Gener al , 14(5–6):423–434, 1980. C. P . Cham b ers, T. Cuhadaroglu, and Y. Masatlioglu. Behavioral inﬂuence. Journal of the Eur op e an Ec onomic Asso ciation , 21(1):135–166, 2023. H. Chang, Y. Narita, and K. Saito. Approximating c hoice data by discrete c hoice mo dels. arXiv pr eprint arXiv:2205.01882 , 2023. E. O. Chen, A. Ghersengorin, and S. Petersen. Imp erfect recall and AI delegation. T ec hnical Rep ort 30-2024, Global Priorities Institute, Universit y of Oxford, 2024. F. Chierichetti, R. Kumar, and A. T omkins. Learning a mixture of tw o multinomial logits. In Pr o c e e dings of the 35th International Confer enc e on Machine L e arning (ICML) , pages 961–969, 2018. P . F. Christiano, J. Leik e, T. Brown, M. Marb er, B. Shlegeris, and D. Amodei. Deep reinforcemen t learning from h uman preferences. A dvanc es in Neur al Information Pr o c essing Systems , 30, 2017. 29 J. T. F ox, K. i. Kim, S. P . Ry an, and P . Ba jari. The random co eﬃcients logit model is identiﬁed. Journal of Ec onometrics , 166(2):204–212, 2012. R. Greenblatt, C. Denison, B. W righ t, F. Roger, M. MacDiarmid, S. Marks, J. T reut- lein, T. Belonax, J. Chen, D. Duvenaud, A. Khan, J. Michael, S. Mindermann, E. P erez, L. P etrini, J. Uesato, J. Kaplan, B. Shlegeris, S. R. Bo wman, and E. Hub- inger. Alignment faking in large language mo dels. arXiv pr eprint arXiv:2412.14093 , 2024. D. Hadﬁeld-Menell, S. J. Russell, P . Abbeel, and A. Dragan. Coop erativ e in verse reinforcemen t learning. In A dvanc es in Neur al Information Pr o c essing Systems , v olume 29, 2016. D. Hendryc ks, M. Mazeik a, and T. W o o dside. An o v erview of catastrophic AI risks. arXiv pr eprint arXiv:2306.12001 , 2023. N. Immorlica, B. Lucier, and A. Slivkins. Generative AI as economic agents. A CM SIGe c om Exchanges , 22(1):93–109, 2024. J. Ji, T. Qiu, B. Chen, B. Zhang, H. Lou, K. W ang, Y. Duan, Z. He, J. Zhou, Z. Zhang, et al. AI alignmen t: A comprehensiv e surv ey . arXiv pr eprint arXiv:2310.19852 , 2024. J. Leik e, D. Krueger, T. Everitt, M. Martic, V. Maini, and S. Legg. Scalable agen t alignmen t via rew ard mo deling: A research direction. arXiv pr eprint arXiv:1811.07871 , 2018. J. Lu and K. Saito. Mixed logit and pure c haracteristics mo dels. Working p ap er , 2022. R. D. Luce. Individual Choic e Behavior: A The or etic al Analysis . New Y ork: Wiley , 1959. P . Manzini and M. Mariotti. Dual random utilit y maximisation. Journal of Ec onomic The ory , 177:162–182, 2018. D. McF adden and K. T rain. Mixed MNL models for discrete response. Journal of Applie d Ec onometrics , 15(5):447–470, 2000. 30 E. P erez, S. Ringer, K. Lukosiute, K. Nguyen, E. Chen, S. Heiner, C. Pettit, C. Olsson, S. Kundu, S. Kadav ath, et al. Discov ering language mo del b eha viors with mo del- written ev aluations. In Findings of the Asso ciation for Computational Linguistics: A CL 2023 , pages 13387–13434, 2023. T. R¨ auker, A. Ho, S. Casp er, and D. Hadﬁeld-Menell. T o ward transparen t AI: A surv ey on in terpreting the inner structures of deep neural net works. In 2023 IEEE Confer enc e on Se cur e and T rustworthy Machine L e arning (SaTML) , pages 464–483, 2023. doi: 10.1109/SA TML54575.2023.00039. K. Saito. Axiomatizations of the mixed logit model. T ec hnical rep ort, California Institute of T ec hnology , So cial Science W orking Paper 1433, 2018. W. T ang. Learning an arbitrary mixture of tw o multinomial logits. arXiv pr eprint arXiv:2007.00204 , 2020. X. Zhang, X. Zhang, P .-L. Loh, and Y. Liang. On the identiﬁabilit y of mixtures of ranking mo dels. arXiv pr eprint arXiv:2201.13132 , 2022. 31

A Revealed Preference Framework for AI Alignment

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment