Learning to Order Things

Journal of Artiial In telligene Resear h 10 (1999) 243-270 Submitted 10/98; published 5/99 Learning to Order Things William W. Cohen w ohenresear h.a tt.om Rob ert E. S hapire shapireresear h.a tt.om Y oram Singer singerresear h.a tt.om A T&T L abs, Shannon L ab or atory, 180 Park A venue Florham Park, NJ 07932, USA Abstrat There are man y appliations in whi h it is desirable to order rather than lassify in- stanes. Here w e onsider the problem of learning ho w to order instanes giv en feedba k in the form of preferene judgmen ts, i.e. , statemen ts to the eet that one instane should b e rank ed ahead of another. W e outline a t w o-stage approa h in whi h one rst learns b y on v en tional means a binary pr efer en e funtion indiating whether it is advisable to rank one instane b efore another. Here w e onsider an on-line algorithm for learning preferene funtions that is based on F reund and S hapire's \Hedge " algorithm. In the seond stage, new instanes are ordered so as to maximize agreemen t with the learned preferene fun- tion. W e sho w that the problem of nding the ordering that agrees b est with a learned preferene funtion is NP-omplete. Nev ertheless, w e desrib e simple greedy algorithms that are guaran teed to nd a go o d appro ximation. Finally , w e sho w ho w metasear h an b e form ulated as an ordering problem, and presen t exp erimen tal results on learning a om- bination of \sear h exp erts," ea h of whi h is a domain-sp ei query expansion strategy for a w eb sear h engine. 1. In tro dution W ork in indutiv e learning has mostly onen trated on learning to lassify . Ho w ev er, there are man y appliations in whi h it is desirable to order rather than lassify instanes. An example migh t b e a p ersonalized email lter that prioritizes unread mail. Here w e will onsider the problem of learning ho w to onstrut su h orderings giv en feedba k in the form of pr efer en e judgments , i.e. , statemen ts that one instane should b e rank ed ahead of another. Su h orderings ould b e onstruted based on a learned probabilisti lassier or regres- sion mo del and in fat often are. F or instane, it is ommon pratie in information retriev al to rank do umen ts aording to their probabilit y of relev ane to a query , as estimated b y a learned lassier for the onept \relev an t do umen t." An adv an tage of learning orderings diretly is that preferene judgmen ts an b e m u h easier to obtain than the lab els required for lassiation learning. F or instane, in the email appliation men tioned ab o v e, one approa h migh t b e to rank messages aording to their estimated probabilit y of mem b ership in the lass of \urgen t" messages, or b y some n umerial estimate of urgeny obtained b y regression. Supp ose, ho w ev er, that a user is presen ted with an ordered list of email messages, and elets to read the third message rst. Giv en this eletion, it is not neessarily the ase that message three is urgen t, nor is there suÆien t information to estimate an y n umerial urgeny measures.   1999 AI Aess F oundation and Morgan Kaufmann Publishers. All righ ts reserv ed. Cohen, Shapire, & Singer Ho w ev er, it seems quite reasonable to infer that message three should ha v e b een rank ed ahead of the others. Th us, in this setting, obtaining preferene information ma y b e easier and more natural than obtaining the lab els needed for a lassiation or regression approa h. Another appliation domain that requires ordering instanes is  ol lab or ative ltering ; see, for instane, the pap ers on tained in Resni k and V arian (1997). In a t ypial ollab orativ e ltering task, a user seeks reommendations, sa y , on mo vies that she is lik ely to enjo y . Su h reommendations are usually expressed as ordered lists of reommended mo vies, pro dued b y om bining mo vie ratings supplied b y other users. Notie that ea h user's mo vie ratings an b e view ed as a set of preferene judgemen ts. In fat, in terpreting ratings as preferenes is adv an tageous in sev eral w a ys: for instane, it is not neessary to assume that a rating of \7" means the same thing to ev ery user. In the remainder of this pap er, w e will in v estigate the follo wing t w o-stage approa h to learning ho w to order. In stage one, w e learn a pr efer en e funtion , a t w o-argumen t funtion PREF( u; v ) whi h returns a n umerial measure of ho w ertain it is that u should b e rank ed b efore v . In stage t w o, w e use the learned preferene funtion to order a set of new instanes X ; to aomplish this, w e ev aluate the learned funtion PREF( u; v ) on all pairs of instanes u; v 2 X , and  ho ose an ordering of X that agrees, as m u h as p ossible, with these pairwise preferene judgmen ts. F or stage one, w e desrib e a sp ei algorithm for learning a preferene funtion from a set of \ranking-exp erts". The algorithm is an on-line w eigh t allo ation algorithm, m u h lik e the w eigh ted ma jorit y algorithm (Littlestone & W arm uth, 1994) and Winno w (Littlestone, 1988), and, more diretly , F reund and S hapire's (1997) \Hedge" algorithm. F or stage t w o, w e sho w that nding a total order that agrees b est with su h a preferene funtion is NP- omplete. Nev ertheless, w e sho w that there are eÆien t greedy algorithms that alw a ys nd a go o d appro ximation to the b est ordering. W e then presen t some exp erimen tal results in whi h these algorithm are used to om bine the results of sev eral \sear h exp erts," ea h of whi h is a domain-sp ei query expansion strategy for a w eb sear h engine. Sine our w ork tou hes sev eral dieren t elds w e defer the disussion of related w ork to Se. 6. 2. Preliminaries Let X b e a set of instanes. F or simpliit y , in this pap er, w e alw a ys assume that X is nite. A pr efer en e funtion PREF is a binary funtion PREF : X  X ! [0 ; 1℄. A v alue of PREF( u; v ) whi h is lose to 1 (resp etiv ely 0) is in terpreted as a strong reommendation that u should b e rank ed ab o v e (resp etiv ely , b elo w) v . A v alue lose to 1 = 2 is in terpreted as an absten tion from making a reommendation. As noted earlier, the h yp othesis of our learning system will b e a preferene funtion, and new instanes will b e rank ed so as to agree as m u h as p ossible with the preferenes predited b y this h yp othesis. In standard lassiation learning, a h yp othesis is onstruted b y om bining primitiv e features. Similarly , in this pap er, a preferene funtion will b e a om bination of primitiv e preferene funtions. In partiular, w e will t ypially assume the a v ailabilit y of a set of N primitiv e preferene funtions R 1 ; : : : ; R N . These an then b e om bined in the usual w a ys, for instane with a b o olean or linear om bination of their v alues. W e will b e esp eially in terested in the latter om bination metho d. 244 Learning to Order Things a b c d a b c d a b c d 7/8 7/8 1 1/4 1 3/4 f(a)=1 f(b)=2 f(c)=0 f(d)= ⊥ g(a)=0 g(b)=2 g(c)=1 g(d)=2 1/8 1/4f() + 3/4g() 1/8 Figure 1: Left and middle: Tw o ordering funtions and their graph represen tation. Righ t: The graph represen tation of the preferene funtion reated b y a w eigh ted ( 1 4 and 3 4 ) om bination of the t w o funtions. Edges with w eigh t of 1 2 or 0 are omitted. It is on v enien t to assume that the R i 's are w ell-formed in ertain w a ys. T o this end, w e in tro due a sp eial kind of preferene funtion alled a rank ordering whi h is dened b y an ordering funtion. Let S b e a totally ordered set. W e assume without loss of generalit y that S  R . An or dering funtion into S is an y funtion f : X ! S , where w e in terpret an inequalit y f ( u ) > f ( v ) to mean that u is rank ed ab o v e v b y f . It is sometimes on v enien t to allo w an ordering funtion to \abstain" and not giv e a preferene for a pair u , v . W e therefore allo w S to inlude a sp eial sym b ol ? not in R , and w e in terpret f ( u ) = ? to mean that u is \unrank ed." W e dene the sym b ol ? to b e inomparable to all the elemen ts in S (that is, ? 6 < s and s 6 < ? for all s 2 S ). An ordering funtion f indues the preferene funtion R f , dened as R f ( u; v ) = 8 > < > : 1 if f ( u ) > f ( v ) 0 if f ( u ) < f ( v ) 1 2 otherwise. W e all R f a r ank or dering for X into S . If R f ( u; v ) = 1, then w e sa y that u is preferred to v , or u is rank ed higher than v . Note that R f ( u; v ) = 1 2 if either u or v (or b oth) is unrank ed. W e will sometimes desrib e and manipulate preferene funtions as direted w eigh ted graphs. The no des of a graph orresp ond to the instanes in X . Ea h pair ( u; v ) is on- neted b y a direted edge with w eigh t PREF( u; v ). Sine an ordering funtion f indues a preferene funtion R f , w e an also desrib e ordering funtions as graphs. In Fig. 1 w e giv e an example of t w o ordering funtions and their orresp onding graphs. F or brevit y , w e do not dra w edges ( u; v ) su h that PREF( u; v ) = 1 2 or PREF( u; v ) = 0. T o giv e a onrete example of rank orderings, imagine learning to order do umen ts based on the w ords that they on tain. T o mo del this, let X b e the set of all do umen ts in a rep ository , and for N w ords w 1 ; : : : ; w N , let f i ( u ) b e the n um b er of o urrenes of w ord w i in do umen t u . Then R f i will prefer u to v whenev er w i o urs more often in u than v . As a seond example, onsider a metasear h appliation in whi h the goal is to om bine the 245 Cohen, Shapire, & Singer rankings of sev eral w eb sear h engines on some xed query . F or N sear h engines e 1 ; : : : ; e N , one migh t dene f i so that R f i prefers w eb page u to w eb page v whenev er u is rank ed ahead of v in the list L i pro dued b y the orresp onding sear h engine. T o do this, one ould let f i ( u ) =  k for the w eb page u app earing in the k -th p osition in the list L i , and let f i ( u ) =  M (where M > j L i j ) for an y w eb page u not app earing in L i . F eedba k from the user will b e represen ted in a similar but more general w a y . W e will assume that feedba k is a set elemen t pairs ( u; v ), ea h represen ting an assertion of the form \ u should b e preferred to v ." This denition of feedba k is less restrited than ordering funtions. In partiular, w e will not assume that the feedba k is onsisten t|yles, su h as a > b > a , will b e allo w ed. 3. Learning a Com bination of Ordering F untions In this setion, w e onsider the problem of learning a go o d linear om bination of a set of ordering funtions. Sp eially , w e assume aess to a set of r anking exp erts , ea h of whi h generates an ordering funtion when pro vided with a set of instanes. F or instane, in a metasear h problem, ea h ranking exp ert migh t b e a funtion that submits the user's query to a dieren t sear h engine; the domain of instanes migh t b e the set of all w eb pages returned b y an y of the ranking exp erts; and the ordering funtion asso iated with ea h ranking exp ert migh t b e represen ted as in the example ab o v e ( i.e. , letting f i ( u ) =  k for the k -the w eb page u returned b y i -th sear h engine, and letting f i ( u ) =  M for an y w eb page u not retriev ed b y the i -th sear h engine). The user's feedba k will b e a set of pairwise preferenes b et w een w eb pages. This feedba k ma y b e obtained diretly , for example, b y asking the user to expliitly rank the URL's returned b y the sear h engine; or the feedba k ma y b e obtained indiretly , for example, b y measuring the time sp en t viewing ea h of the returned pages. W e note that for the metasear h problem, an approa h that w orks diretly with the n umerial sores asso iated with the dieren t sear h engines migh t not b e feasible; these n umerial sores migh t not b e omparable aross dieren t sear h engines, or migh t not b e pro vided b y all sear h engines. Another problem is that most w eb pages will not b e indexed b y all sear h engines. This an b e easily mo deled in our setting: rather than letting f i ( u ) =  M for a w eb page u that is not rank ed b y sear h engine i , one ould let f i ( u ) = ? . This orresp onds to the assumption that the sear h engine's preferene for u relativ e to rank ed w eb pages is unkno wn. W e no w desrib e a w eigh t allo ation algorithm that uses the preferene funtions R i to learn a preferene funtion of the form PREF( u; v ) = P N i =1 w i R i ( u; v ). W e adopt the on-line learning framew ork rst studied b y Littlestone (1988) in whi h the w eigh t w i assigned to ea h ranking exp ert i is up dated inremen tally . F ormally , learning is assumed to tak e plae in a sequene of rounds. On ea h round t , w e assume the learning algorithm is pro vided with a set X t of instanes to b e rank ed, for whi h ea h ranking exp ert i 2 f 1 ; : : : ; N g pro vides an ordering funtion f t i . (In metasear h, for instane, f t i is the ordering funtion asso iated with the list L t i of w eb pages returned b y the i -th ranking exp ert for the t -th query , and X t is the set of all w eb pages that app ear in an y of the lists L t 1 ; : : : ; L t N .) Ea h ordering funtion f t i indues a preferene funtion R f t i , whi h w e denote for brevit y b y R t i . The learner ma y ompute R t i ( u; v ) for an y and all preferene 246 Learning to Order Things funtions R t i and pairs u; v 2 X t b efore pro duing a om bined preferene funtion PREF t , whi h is then used to pro due an ordering ^  t of X t . (Metho ds for pro duing an ordering from a preferene funtion will b e disussed b elo w.) After pro duing the ordering ^  t , the learner reeiv es feedba k from the en vironmen t. W e assume that the feedba k is an arbitrary set of assertions of the form \ u should b e preferred to v ." That is, the feedba k on the t -th round is a set F t of pairs ( u; v ). The algorithm w e prop ose for this problem is based on the \w eigh ted ma jorit y algo- rithm" of Littlestone and W arm uth (1994) and, more diretly , on F reund and S hapire's (1997) \Hedge" algorithm. W e dene the loss of a preferene funtion R with resp et to the user's feedba k F as Loss( R ; F ) = P ( u;v ) 2 F (1  R ( u; v )) j F j = 1  1 j F j X ( u;v ) 2 F R ( u; v ) : (1) This loss has a natural probabilisti in terpretation. If R is view ed as a randomized predition algorithm that predits that u will preede v with probabilit y R ( u; v ), then Loss( R ; F ) is the probabilit y of R disagreeing with the feedba k on a pair ( u; v )  hosen uniformly at random from F . It is w orth noting that the assumption on the form of the feedba k an b e further relaxed b y allo wing the user to indiate the degree to whi h she prefers u o v er v . In this ase, the loss should b e normalized b y the w eigh ted sum of feedba k pairs. Sine this generalization is rather straigh tforw ard, w e assume for brevit y that the feedba k is an un w eigh ted set of assertions o v er elemen t pairs. W e no w an use the Hedge algorithm almost v erbatim, as sho wn in Figure 2. The algorithm main tains a p ositiv e w eigh t v etor whose v alue at time t is denoted b y w t = ( w t 1 ; : : : ; w t N ). If there is no prior kno wledge ab out the ranking exp erts, w e set all initial w eigh ts to b e equal so that w 1 i = 1 = N . On ea h round t , the w eigh t v etor w t is used to om bine the preferene funtions of the dieren t exp erts to obtain the preferene funtion PREF t ( u; v ) = P N i =1 w t i R t i ( u; v ). This preferene funtion is next on v erted in to an ordering ^  t on the urren t set of elemen ts X t . F or the purp oses of this setion, the metho d of pro duing an ordering is immaterial; in partiular, an y of the metho ds desrib ed in Se. 4 ould b e used here. Based on this ordering, the user pro vides feedba k F t , and the loss for ea h preferene funtion Loss( R t i ; F t ) is ev aluated as in Eq. (1). Finally , the w eigh t v etor w t is up dated using the m ultipliativ e rule w t +1 i = w t i   Loss ( R t i ;F t ) Z t where  2 [0 ; 1℄ is a parameter, and Z t is a normalization onstan t,  hosen so that the w eigh ts sum to one after the up date. Th us, in ea h round, the w eigh ts of the ranking exp erts are adjusted so that exp erts pro duing preferene funtions with relativ ely large agreemen t with the feedba k are inreased. W e no w giv e the theoretial rationale b ehind this algorithm. F reund and S hapire (1997) pro v e general results ab out Hedge whi h an b e applied diretly to this loss funtion. Their results imply almost immediately a b ound on the um ulativ e loss of the preferene funtion PREF t in terms of the loss of the b est ranking exp ert, sp eially: 247 Cohen, Shapire, & Singer Allo ate W eigh ts for Ranking Exp erts P arameters:  2 [0 ; 1℄, initial w eigh t v etor w 1 2 [0 ; 1℄ N with P N i =1 w 1 i = 1 N ranking exp erts, n um b er of rounds T Do for t = 1 ; 2 ; : : : ; T 1. Reeiv e a set of elemen ts X t and ordering funtions f t 1 ; : : : ; f t N . Let R t i denote the preferene funtion indued b y f t i . 2. Compute a total order ^  t whi h appro ximates PREF t ( u; v ) = N X i =1 w t i R t i ( u; v ) (Se. 4 desrib es sev eral w a ys of appro ximating a preferene funtion with a total order.) 3. Order X t using ^  t . 4. Reeiv e feedba k F t from the user. 5. Ev aluate losses Loss( R t i ; F t ) as dened in Eq. (1). 6. Set the new w eigh t v etor w t +1 i = w t i   Loss ( R t i ;F t ) Z t where Z t is a normalization onstan t,  hosen so that P N i =1 w t +1 i = 1. Figure 2: The on-line w eigh t allo ation algorithm. Theorem 1 F or the algorithm of Fig. 2, T X t =1 Loss(PREF t ; F t )  a  min i T X t =1 Loss( R t i ; F t ) +   ln N wher e a  = ln (1 = ) = (1   ) and   = 1 = (1   ) . Note that P t Loss(PREF t ; F t ) is the um ulativ e loss of the om bined preferene fun- tions PREF t , and P t Loss( R t i ; F t ) is the um ulativ e loss of the i th ranking exp ert. Th us, Theorem 1 states that the um ulativ e loss of the om bined preferene funtions will not b e m u h w orse than that of the b est ranking exp ert. Pro of: W e ha v e that Loss(PREF t ; F t ) = 1  1 F t X ( u;v ) 2 F t X i w t i R t i ( u; v ) = X i w t i 0  1  1 F t X ( u;v ) 2 F t R t i ( u; v ) 1 A 248 Learning to Order Things = X i w t i Loss( R t i ( u; v ) ; F t ) : Therefore, b y F reund and S hapire's (1997) Theorem 2, T X t =1 Loss(PREF t ; F t ) = T X t =1 X i w t i Loss( R t i ( u; v ) ; F t )  a  min i T X t =1 Loss( R t i ; F t ) +   ln N : 2 Of ourse, w e are not in terested in the loss of PREF t (sine it is not an ordering), but rather in the p erformane of the atual ordering ^  t omputed b y the learning algorithm. F ortunately , the losses of these an b e related using a kind of triangle inequalit y . Let DISA GREE( ; PREF ) = X u;v :  ( u ) > ( v ) (1  PREF( u; v )) : (2) Theorem 2 F or any PREF , F and total or der dene d by an or dering funtion  , Loss( R  ; F )  DISA GREE( ; PREF ) j F j + Loss(PREF ; F ) : (3) Pro of: F or x; y 2 [0 ; 1℄, let us dene d ( x; y ) = x (1  y ) + y (1  x ). W e no w sho w that d satises the triangle inequalit y . Let x , y and z b e in [0 ; 1℄, and let X , Y and Z b e indep enden t Bernoulli ( f 0 ; 1 g -v alued) random v ariables with probabilit y of outome 1 equal to x , y and z , resp etiv ely . Then d ( x; z ) = Pr [ X 6 = Z ℄ = Pr [( X 6 = Y ^ Y = Z ) _ ( X = Y ^ Y 6 = Z )℄  Pr [ X 6 = Y _ Y 6 = Z ℄  Pr [ X 6 = Y ℄ + Pr [ Y 6 = Z ℄ = d ( x; y ) + d ( y ; z ) : F or [0 ; 1℄-v alued funtions f ; g dened on X  X , w e next dene D ( f ; g ) = X u;v : u 6 = v d ( f ( u; v ) ; g ( u; v )) : Clearly , D also satises the triangle inequalit y . Let  F b e the  harateristi funtion of F so that  F : X  X ! f 0 ; 1 g and  F ( u; v ) = 1 if and only if ( u; v ) 2 F . Then from the denition of Loss and DISA GREE, w e ha v e j F j Loss( R  ; F ) = D ( R  ;  F )  D ( R  ; PREF) + D (PREF ;  F ) = DISA GREE( ; PREF ) + j F j Loss(PREF ; F ) : 2 Notie that the learning algorithm Hedge minimizes the seond term on the righ t hand side of Eq. (3). Belo w, w e onsider the problem of nding an ordering  whi h minimizes the rst term, namely , DISA GREE . 249 Cohen, Shapire, & Singer 4. Ordering Instanes with a Preferene F untion 4.1 Measuring the Qualit y of an Ordering W e no w onsider the omplexit y of nding a total order that agrees b est with a learned preferene funtion. T o analyze this, w e m ust rst quan tify the notion of agreemen t b et w een a preferene funtion PREF and an ordering. One natural notion is the follo wing: Let X b e a set, PREF b e a preferene funtion, and let  b e a total ordering of X , expressed again as an ordering funtion ( i.e. ,  ( u ) >  ( v ) if and only if u is ab o v e v in the order). F or the analysis of this setion, it is on v enien t to use the measure A GREE( ; PREF ), whi h is dened to b e the sum of PREF( u; v ) o v er all pairs u; v su h that u is rank ed ab o v e v b y  : A GREE ( ; PREF ) = X u;v :  ( u ) > ( v ) PREF( u; v ) : (4) Clearly , A GREE is a linear transformation of the measure DISA GREE in tro dued in Eq. (2), and hene maximizing A GREE is equiv alen t to minimizing DISA GREE . This denition is also losely related to similarit y metris used in deision theory and information pro ess- ing (Kemen y & Snell, 1962; Fish burn, 1970; Rob erts, 1979; F ren h, 1989; Y ao, 1995) (see the disussion in Se. 6). 4.2 Finding an Optimal Ordering is Hard Ideally one w ould lik e to nd a  that maximizes A GREE ( ; PREF ). The general opti- mization problem is of little in terest in our setting, sine there are man y onstrain ts on the preferene funtion that are imp osed b y the learning algorithm. Using the learning algo- rithm of Se. 3, for instane, PREF will alw a ys b e a linear om bination of simpler funtions. Ho w ev er, the theorem b elo w sho ws that this optimization problem is NP-omplete ev en if PREF is restrited to b e a linear om bination of w ell-b eha v ed preferene funtions. In par- tiular, the problem is NP-omplete ev en if all the primitiv e preferene funtions used in the linear om bination are rank orderings whi h map in to a set S with only three elemen ts, one of whi h ma y or ma y not b e ? . (Clearly , if S onsists of more than three elemen ts then the problem is still hard.) Theorem 3 The fol lowing de ision pr oblem is NP- omplete for any set S with j S j  3 : Input: A r ational numb er  ; a set X ; a  ol le tion of N or dering funtions f i : X ! S ; and a pr efer en e funtion PREF dene d as PREF( u; v ) = N X i =1 w i R f i ( u; v ) (5) wher e w = ( w 1 ; : : : ; w N ) is a r ational weight ve tor in [0 ; 1℄ N with P N i =1 w i = 1 . Question: Do es ther e exist a total or der  suh that A GREE ( ; PREF )   ? Pro of: The problem is learly in NP sine a nondeterministi algorithm an guess a total order and  he k the w eigh ted n um b er of agreemen ts in p olynomial time. T o pro v e that the problem is NP-hard w e redue from CYCLIC-ORDERING (Galil & Megido, 1977; Gary & Johnson, 1979), dened as follo ws: \Giv en a set A and a olletion 250 Learning to Order Things C of ordered triples ( a; b;  ) of distint elemen ts from A , is there a one-to-one funtion f : A ! f 1 ; 2 ; : : : ; j A jg su h that for ea h ( a; b;  ) 2 C w e ha v e either f ( a ) > f ( b ) > f (  ) or f ( b ) > f (  ) > f ( a ) or f (  ) > f ( a ) > f ( b )?" Without loss of generalit y , S is either f 0 ; 1 ; ?g or f 0 ; 1 ; 2 g . W e rst sho w that the problem of nding an optimal total order is hard when S = f 0 ; 1 ; ?g . Giv en an instane of CYCLIC-ORDERING, w e let X = A . F or ea h triplet t = ( a; b;  ) w e will in tro due three ordering funtions f t; 1 , f t; 2 , and f t; 3 , and dene them so that f t; 1 ( a ) > f t; 1 ( b ), f t; 2 ( b ) > f t; 2 (  ), and f t; 3 (  ) > f t; 3 ( a ). T o do this, w e let f t; 1 ( a ) = f t; 2 ( b ) = f t; 3 (  ) = 1, f t; 1 ( b ) = f t; 2 (  ) = f t; 3 ( a ) = 0, and f t;i (  ) = ? in all other ases. W e let the w eigh t v etor b e uniform, so that w t;i = 1 3 j C j . Let  = 5 3 + j A j ( j A j  1) = 2  3 2 : Dene R t ( u; v ) = P 3 i =1 w t;i R f t;i ( u; v ), whi h is the on tribution of these three funtions to PREF( u; v ). Notie that for an y triplet t = ( a; b;  ) 2 C , R t ( a; b ) = 2 3 j C j whereas R t ( b; a ) = 1 3 j C j , and similarly for b;  and ; a . In addition, for an y pair u; v 2 A su h that at least one of them do es not app ear in t , w e get that R t ( u; v ) = 1 2 j C j . Sine a total order  an satisfy at most t w o of the three onditions  ( a ) >  ( b ),  ( b ) >  (  ), and  (  ) >  ( a ), the largest p ossible w eigh ted n um b er of agreemen ts asso iated with this triple is exatly = j C j . If the n um b er of w eigh ted agreemen ts is at least  , it m ust b e exatly  , b y the argumen t ab o v e; and if there are exatly  w eigh ted agreemen ts, then the total order m ust satisfy exatly 2 out of the p ossible 3 relations for ea h three elemen ts that form a triplet from C . Th us, the onstruted rank ordering instane will b e p ositiv e if and only if the original CYCLIC-ORDERING instane is p ositiv e. The ase for S = f 0 ; 1 ; 2 g uses a similar onstrution; ho w ev er, for ea h triplet t = ( a; b;  ), w e dene six ordering funtions, f j t; 1 , f j t; 2 , and f j t; 3 , where j 2 f 0 ; 1 g . The basi idea here is to replae ea h f t;i with t w o funtions, f 0 t;i and f 1 t;i , that agree on the single ordering onstrain t asso iated with f t;i , but disagree on all other orderings. F or instane, w e will dene these funtions so that f j t; 1 ( a ) > f j t; 1 ( b ) for j = 0 and j = 1, but for all other pairs u; v , f 1 t; 1 ( u ) > f 1 t; 1 ( v ) i f 0 t; 1 ( v ) > f 0 t; 1 ( u ). Av eraging the t w o orderings f 0 t; 1 and f 1 t; 1 will th us yield the same preferene expressed b y the original funtion f t; 1 ( i.e. , a preferene for a > b only). In more detail, w e let f j t; 1 ( a ) = f j t; 2 ( b ) = f j t; 3 (  ) = 2  j , f j t; 1 ( b ) = f j t; 2 (  ) = f j t; 3 ( a ) = 1  j , and f j t;i (  ) = 2 j in all other ases. W e again let the w eigh t v etor b e uniform, so that w j t;i = 1 6 j C j . Similar to the rst ase, w e dene R t ( u; v ) = P i;j w t;i R f j t;i ( u; v ). It an b e v eried that R t is iden tial to the R t onstruted in the rst ase. Therefore, b y the same argumen t, the onstruted rank ordering instane will b e p ositiv e if and only if the original CYCLIC-ORDERING instane is p ositiv e. 2 Although this problem is hard when j S j  3, the next theorem sho ws that it b eomes tratable for linear om binations of rank orderings in to a set S of size t w o. Of ourse, when j S j = 2, the rank orderings are really only binary lassiers. The fat that this sp eial ase is tratable undersores the fat that manipulating orderings (ev en relativ ely simple 251 Cohen, Shapire, & Singer ones) an b e omputationally more diÆult than p erforming the orresp onding op erations on binary lassiers. Theorem 4 The fol lowing optimization pr oblem is solvable in line ar time: Input: A set X ; a set S with j S j = 2 ; a  ol le tion of N or dering funtions f i : X ! S ; and a pr efer en e funtion PREF dene d by Eq. (5). Output: A total or der dene d by an or dering funtion  whih maximizes A GREE ( ; PREF ) . Pro of: Assume without loss of generalit y that the t w o-elemen t set S is f 0 ; 1 g , and dene  ( u ) = P i w i f i ( u ). W e no w sho w that an y total order 1 onsisten t with  maximizes A GREE ( ; PREF). Fix a pair u; v 2 X and let q b 1 b 2 = X i s.t. f i ( u )= b 1 ;f i ( v )= b 2 w i : W e an no w rewrite  and PREF as  ( u ) = q 10 + q 11 PREF( u; v ) = q 10 + 1 2 q 11 + 1 2 q 00  ( v ) = q 01 + q 11 PREF( v ; u ) = q 01 + 1 2 q 11 + 1 2 q 00 : Note that b oth  ( u )   ( v ) and PREF( u; v )  PREF( v ; u ) are equal to q 10  q 01 . Hene, if  ( u ) >  ( v ) then PREF( u; v ) > PREF( v ; u ). Therefore, for ea h pair u; v 2 X , the order dened b y  agrees on al l pairs with the pairwise preferene dened b y PREF. In other w ords, w e ha v e sho wn that A GREE ( ; PREF ) = X f u;v g max f PREF ( u; v ) ; PREF ( v ; u ) g (6) where the sum is o v er all unordered pairs. Clearly , the righ t hand side of Eq. (6) maximizes the righ t hand side of Eq. (4) sine at most one of ( u; v ) or ( v ; u ) an b e inluded in the latter sum. 2 4.3 Finding an Appro ximately Optimal Ordering Theorem 3 implies that w e are unlik ely to nd an eÆien t algorithm that nds the optimal total order for a w eigh ted om bination of rank orderings. F ortunately , there do exist eÆ- ien t algorithms for nding an appr oximately optimal total order. In fat, nding a go o d total order is losely related to the problem of nding the minim um feedba k ar set, for whi h there exist go o d appro ximation algorithms; see, for instane, (Shmo ys, 1997) and the referenes therein. Ho w ev er, the algorithms that a hiev e the go o d appro ximation re- sults for the minim um feedba k ar set problem are based on (or further appro ximate) a linear-programming relaxation (Seymour, 1995; Ev en, Naor, Rao, & S hieb er, 1996; Berger & Shor, 1997; Ev en, Naor, S hieb er, & Sudan, 1998) whi h is rather omplex to implemen t and quite slo w in pratie. 1. Notie that in ase of a tie, so that  ( u ) =  ( v ) for distint u; v ,  denes only a partial order. The theorem holds for an y total order whi h is onsisten t with this partial order, i.e. , for an y  0 so that  ( u ) >  ( v ) )  0 ( u ) >  0 ( v ). 252 Learning to Order Things Algorithm Greedy-Order Inputs: an instane set X ; a preferene funtion PREF Output: an appro ximately optimal ordering funtion ^  let V = X for ea h v 2 V do  ( v ) = P u 2 V PREF( v ; u )  P u 2 V PREF( u; v ) while V is non-empt y do let t = arg max u 2 V  ( u ) let ^  ( t ) = j V j V = V  f t g for ea h v 2 V do  ( v ) =  ( v ) + PREF( t; v )  PREF( v ; t ) endwhile Figure 3: The greedy ordering algorithm. W e desrib e instead a simple greedy algorithm whi h is v ery simple to implemen t. Fig- ure 3 summarizes the greedy algorithm. As w e will shortly demonstrate, this algorithm pro dues a go o d appro ximation to the b est total order. The algorithm is easiest to desrib e b y thinking of PREF as a direted w eigh ted graph, where initially , the set of v erties V is equal to the set of instanes X , and ea h edge u ! v has w eigh t PREF( u; v ). W e assign to ea h v ertex v 2 V a p otential v alue  ( v ), whi h is the w eigh ted sum of the outgoing edges minus the w eigh ted sum of the ingoing edges. That is,  ( v ) = X u 2 V PREF( v ; u )  X u 2 V PREF( u; v ) : The greedy algorithm then pi ks some no de t that has maxim um p oten tial 2 , and assigns it a rank b y setting ^  ( t ) = j V j , eetiv ely ordering it ahead of all the remaining no des. This no de, together with all iniden t edges, is then deleted from the graph, and the p oten tial v alues  of the remaining v erties are up dated appropriately . This pro ess is rep eated un til the graph is empt y . Notie that no des remo v ed in subsequen t iterations will ha v e progressiv ely smaller and smaller ranks. As an example, onsider the preferene funtion dened b y the leftmost graph of Fig. 4. (This graph is iden tial to the w eigh ted om bination of the t w o ordering funtions from Fig. 1.) The initial p oten tials the algorithm assigns are:  ( b ) = 2,  ( d ) = 3 = 2,  (  ) =  5 = 4, and  ( a ) =  9 = 4. Hene, b has maximal p oten tial. It is giv en a rank of 4, and then no de b and all iniden t edges are remo v ed from the graph. The result is the middle graph of Fig. 4. After deleting b , the p oten tials of the remaining no des are up dated:  ( d ) = 3 = 2,  (  ) =  1 = 4, and  ( a ) =  5 = 4. Th us, d will b e assigned rank j V j = 3 and remo v ed from the graph, resulting in the righ tmost graph of Fig. 4. After up dating p oten tials again,  (  ) = 1 = 2 and  ( a ) =  1 = 2. No w  will b e assigned rank j V j = 2 and remo v ed, resulting in a graph on taining the single no de a , whi h will 2. Ties an b e brok en arbitrarily in ase of t w o or more no des with the same p oten tial. 253 Cohen, Shapire, & Singer a b c d 7/8 7/8 1 1/4 1 3/4 1/8 1/8 a c d 7/8 7/8 1/4 3/4 1/8 1/8 a c 1/4 3/4 Figure 4: Beha vior of the greedy ordering algorithm. The leftmost graph is the original input. F rom this graph, no de b will b e assigned maximal rank and deleted, leading to the middle graph; from this graph, no de d will deleted, leading to the righ tmost graph. In the righ tmost graph, no de  will b e rank ed ahead of no de a , leading the total ordering b > d >  > a . nally b e assigned the rank j V j = 1. The ordering pro dued b y the greedy algorithm is th us b > d >  > a . The next theorem sho ws that this greedy algorithm omes within a fator of t w o of optimal. Theorem 5 L et OPT (PREF) b e the weighte d agr e ement ahieve d by an optimal total or der for the pr efer en e funtion PREF , and let APPR O X (PREF) b e the weighte d agr e ement ahieve d by the gr e e dy algorithm. Then APPR O X(PREF )  1 2 OPT (PREF) : Pro of: Consider the edges that are iniden t on the no de v j whi h is seleted on the j -th rep etition of the while lo op of Figure 3. The ordering pro dued b y the algorithm will agree with all of the outgoing edges of v j and disagree with all of the ingoing edges. Let a j b e the sum of the w eigh ts of the outgoing edges of v j , and d j b e the sum of the w eigh ts of the ingoing edges. Clearly APPR O X(PREF )  P j V j j =1 a j . Ho w ev er, at ev ery rep etition, the total w eigh t of all inoming edges m ust equal the total w eigh t of all outgoing edges. This means that P v 2 V  ( v ) = 0, and hene for the no de v ? that has maximal p oten tial,  ( v ? )  0. Th us on ev ery rep etition j , it m ust b e that a j  d j , so w e ha v e that OPT (PREF )  j V j X j =1 ( a j + d j )  j V j X j =1 ( a j + a j )  2  APPR O X (PREF) : The rst inequalit y holds b eause OPT (PREF ) an at b est inlude ev ery edge in the graph, and sine ev ery edge is remo v ed exatly one, ea h edge m ust on tribute to some a j or some d j . 2 254 Learning to Order Things 2k+3 k+2 k+3 k+1 1 2 k k+1 1 2 k k+2 2k+3 Figure 5: An example of a graph (left) for whi h the no de-based greedy algorithm a hiev es an appro ximation fator of 1 2 b y onstruting the partial order on the righ t. In passing, w e note that there are other natural greedy algorithms that do not a hiev e go o d appro ximations. Consider, for example, an algorithm that starts from a graph on- sisting of all the no des but with no edges, and iterativ ely adds the highest w eigh ted edge in the graph, while a v oiding yles. It an b e sho wn that this algorithm an pro due a v ery p o or partial order, giv en an adv ersarially  hosen graph; there are ases where the optimal total order a hiev es a m ultipliativ e fator of O ( j V j ) more w eigh ted agreemen ts than this \edge-based" greedy algorithm. 4.4 Impro v emen ts to the Greedy Algorithm The appro ximation fator of t w o giv en in Theorem 5 is tigh t. That is, there exist problems for whi h the greedy algorithm appro ximation is w orse than the optimal solution b y a fator arbitrarily lose to t w o. Consider the graph sho wn on the left-hand side of Fig. 5. An optimal total order ranks the instanes aording to their p osition in the gure, left to righ t, breaking ties randomly , and a hiev es OPT(PREF) = 2 k + 2 w eigh ted agreemen ts. Ho w ev er, the greedy algorithm pi ks the no de lab eled k + 1 rst and orders all the remaining no des randomly , a hieving as few as APPR O X (PREF) = k + 2 agreemen ts. F or large k , the ratio APPR O X (PREF) = OPT(PREF) approa hes 1 2 . F or graph of Figure 5, there is another simple algorithm whi h pro dues an optimal ordering: sine the graph is already a partial order, pi king an y total order onsisten t with this partial order giv es an optimal result. T o op e with problems su h as the one of Figure 5, w e devised an impro v emen t to the greedy algorithm whi h om bines a greedy metho d with top ologial sorting. The aim of the impro v emen t is to nd b etter appro ximations for graphs whi h are omp osed of man y strongly onneted omp onen ts. As b efore, the mo died algorithm is easiest to desrib e b y thinking of PREF as a w eigh ted direted graph. Reall that for ea h pair of no des u and v , there exist t w o edges: one from u to v with w eigh t PREF( u; v ) and one from v to u with w eigh t PREF( v ; u ). In the mo died greedy algorithm w e will pre-pro ess the graph. F or ea h pair of no des, w e 255 Cohen, Shapire, & Singer Algorithm SCC-Greedy-Order Inputs: an instane set X ; a preferene funtion PREF Output: an appro ximately optimal ordering funtion ^  Dene PREF 0 ( u; v ) = max f PREF ( u; v )  PREF( v ; u ) ; 0 g : Find strongly onneted omp onen ts U 1 ; : : : ; U k of the graph G = ( V ; E ) where V = X and E = f ( u; v ) j PREF 0 ( u; v ) > 0 g : Order the strongly onneted omp onen ts in an y w a y onsisten t with the partial order < s : U < s U 0 i 9 u 2 U; u 0 2 U 0 : ( u; u 0 ) 2 E Use algorithm Greedy-Order or full en umeration to order the instanes within ea h om- p onen t U i aording to PREF 0 . Figure 6: The impro v ed greedy ordering algorithm. remo v e the edge with the smaller w eigh t and set the w eigh t of the other edge to b e j PREF( v ; u )  PREF( u; v ) j : F or the sp eial ase where PREF( v ; u ) = PREF( u; v ) = 1 2 , w e remo v e b oth edges. In the redued graph, there is at most one direted edge b et w een ea h pair of no des. Note that the greedy algorithm w ould b eha v e iden tially on the transformed graph sine it is based on the w eigh ted dier en es b et w een the inoming and outgoing edges. W e next nd the strongly onneted omp onen ts 3 of the redued graph, ignoring (for no w) the w eigh ts. One an no w split the edges of the redued graph in to t w o lasses: inter-  omp onent edges onnet no des u and v , where u and v are in dieren t strongly onneted omp onen ts; and intr a- omp onent edges onnet no des u and v from the same strongly onneted omp onen t. It is straigh tforw ard to v erify that an y optimal order agrees with all the in ter-omp onen t edges. Put another w a y , if there is an edge from no de u to no de v of t w o dieren t onneted omp onen ts in the redued graph, then  ( u ) >  ( v ) for an y optimal total order  . The rst step of the impro v ed algorithm is th us to totally order the strongly onneted omp onen ts in some w a y onsisten t with the partial order dened b y the in ter-omp onen t edges. More preisely , w e pi k a total ordering for the omp onen ts onsisten t with the partial order < s , dened as follo ws: for omp onen ts U and U 0 , U < s U 0 i there is an edge from some no de u 2 U to some no de u 0 2 U 0 in the redued graph. W e next order the no des within ea h strongly onneted omp onen t, th us pro viding a total order of all no des. Here the greedy algorithm an b e used. As an alternativ e, in ases where a omp onen t on tains only a few elemen ts (sa y at most v e), one an nd the optimal order b et w een the elemen ts of the omp onen t b y a brute-fore approa h, i.e. , b y full en umeration of all p erm utations. 3. Tw o no des u and v are in the same strongly onneted omp onen t i there are direted paths from u to v and from v to u . 256 Learning to Order Things a b c d 0.55 0.45 0.95 0.05 0.65 0.35 0.4 0.6 0.5 0.5 0.55 0.45 ) a b c d 0.1 0.9 0.3 0.2 0.1 ) a b c d 0.1 0.9 0.3 0.2 0.1 0.1 0.3 0.2 b c d a 0.9 Figure 7: An illustration of the appro ximation algorithm for nding a total order from a w eigh ted om bination of ordering funtions. The original graph (top left) is redued b y remo ving at least one edge for ea h edge-pair ( u; v ) and ( v ; u ) (middle). The strongly onneted omp onen ts are then found (righ t). Finally , an ordering is found within ea h strongly onneted omp onen t whi h yield the order b >  > d > a (b ottom). The impro v ed algorithm is summarized in Figure 6 and illustrated in Figure 7. There are four elemen ts in Figure 7 whi h onstitute t w o strongly onneted omp onen ts in the redued graph ( f b g and f a; ; d g ). Therefore, b is assigned the top rank and rank ed ab o v e a ,  and d . If the brute-fore algorithm w ere used to order the omp onen ts, then w e w ould  he k all 3! p erm utations b et w een a ,  and d and output the total order b >  > d > a , whi h is the optimal order in this to y example. In the w orst ase, the redued graph on tains only a single strongly onneted om- p onen t. In this ase, the impro v ed algorithm generates the same ordering as the greedy algorithm. Ho w ev er, in the exp erimen ts on metasear h problems desrib ed in Se. 5, man y of the strongly onneted omp onen ts are small; the a v erage size of a strongly onneted omp onen t is less than v e. In ases su h as these, the impro v ed algorithm will often impro v e on the simple greedy algorithm. 4.5 Exp erimen ts with the Ordering Algorithms Ideally , ea h algorithm w ould b e ev aluated b y determining ho w losely it appro ximates the optimal ordering on large, realisti problems. Unfortunately , nding the optimal ordering for large graphs is impratial. W e th us p erformed t w o sets of exp erimen ts with the ordering algorithms desrib ed ab o v e. In the rst set of exp erimen ts, w e ev aluated the algorithms on small graphs|sp eially , graphs for whi h the optimal ordering ould b e feasibly found with brute-fore en umeration. In these exp erimen ts, w e measure the \go o dness" of the resulting orderings relativ e to the optimal ordering. In the seond set of exp erimen ts, w e ev aluated the algorithms on large graphs for whi h the optimal orderings are unkno wn. In these exp erimen ts, w e ompute a \go o dness" measure whi h dep ends on the total w eigh t of all edges, rather than the optimal ordering. 257 Cohen, Shapire, & Singer In addition to the simple greedy algorithm and its impro v emen t, w e also onsidered the follo wing simple randomized algorithm: pi k a p erm utation at random, and then output the b etter of that p erm utation and its rev erse. It an b e easily sho wn that this algorithm a hiev es the same appro ximation b ound on exp eted p erformane as the greedy algorithm. (Briey , one of the t w o p erm utations m ust agree with at least half of the w eigh ted edges in the graph.) The random algorithm an b e impro v ed b y rep eating the pro ess, i.e. , examining man y random p erm utations and their rev erses, and  ho osing the p erm utation that a hiev es the largest n um b er of w eigh ted agreemen ts. In a rst set of exp erimen ts, w e ompared the p erformane of the greedy appro ximation algorithm, the impro v ed algorithm whi h rst nds strongly onneted omp onen ts, and the randomized algorithm on graphs of nine or few er elemen ts. F or ea h n um b er of elemen ts, w e generated 10 ; 000 random graphs b y  ho osing PREF( u; v ) uniformly at random, and setting PREF( v ; u ) to 1  PREF( u; v ). F or the randomized algorithm, w e ev aluated 10 n random p erm utations (and their rev erses) where n is the n um b er of instanes (no des). T o ha v e a fair omparison b et w een the dieren t algorithms on the smaller graphs, w e alw a ys used the greedy algorithm (rather than a brute-fore algorithm) to order the elemen ts of ea h strongly onneted omp onen t of a graph. T o ev aluate the algorithms, w e examined the redued graph and alulated the a v erage ratio of the w eigh ts of the edges  hosen b y the appro ximation algorithm to the w eigh ts of the edges that w ere  hosen b y the optimal order. More preisely , let  b e the optimal order and ^  b e an order  hosen b y an appro ximation algorithm. Then for ea h random graph, w e alulated X u; v : ^  ( u ) > ^  ( v ) max f PREF( u; v )  PREF( v ; u ) ; 0 g X u; v :  ( u ) >  ( v ) max f PREF( u; v )  PREF( v ; u ) ; 0 g : If this measure is 0.9, for instane, then the total w eigh t of the edges in the total order pi k ed b y the appro ximation algorithm is 90% of the orresp onding gure for the optimal algorithm. W e a v eraged the ab o v e ratios o v er all random graphs of the same size. The results are sho wn on the left hand side of Figure 8. On the righ t hand side of the gure, w e sho w the a v erage running time for ea h of the algorithms as a funtion of the n um b er of elemen ts. When the n um b er of rank ed elemen ts is more than v e, the greedy algorithms outp erform the randomized algorithm, while their running time is m u h smaller. Th us, if a full en umeration had b een used to nd the optimal order of small strongly onneted omp onen ts, the appro ximation w ould ha v e b een onsisten tly b etter than the randomized algorithm. W e note that the greedy algorithm also generally p erforms b etter on a v erage than the lo w er b ound giv en in Theorem 5. In fat, om bining the greedy algorithm with pre- partitioning of the graph in to strongly onneted omp onen ts often yields the optimal order. In the seond set of exp erimen ts, w e measured p erformane and running time for larger random graphs. Sine for large graphs w e annot nd the optimal solution b y brute-fore en umeration, w e use as a \go o dness" measure the ratio of the w eigh ts of the edges that w ere left in the redued graph after applying an appro ximation algorithm to the total w eigh t of 258 Learning to Order Things 3 4 5 6 7 8 9 0.88 0.9 0.92 0.94 0.96 0.98 1 Number of elements Fraction of optimal solution Greedy SCC + Greedy Randomized 3 4 5 6 7 8 9 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Number of elements Running time (seconds) Greedy SCC + Greedy Randomized Figure 8: Comparison of go o dness (left) and the running time (righ t) of the appro ximations a hiev ed b y the greedy algorithms and the randomized algorithm as a funtion of the n um b er of rank ed elemen ts for random preferene funtions with 3 through 9 elemen ts. 5 10 15 20 25 30 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 Number of elements Fraction of total weight Greedy SCC + Greedy Randomized 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Number of elements Running time (seconds) Greedy SCC + Greedy Randomized Figure 9: Comparison of go o dness (left) and the running time (righ t) of the appro ximations a hiev ed b y the greedy algorithms and the randomized algorithm as a funtion of the n um b er of rank ed elemen ts for random preferene funtions with 3 through 30 elemen ts. Note that the graphs for Greedy and SCC+Greedy oinide for most of the p oin ts. 259 Cohen, Shapire, & Singer edges in the graph. That is, for ea h random graph w e alulated X u; v : ^  ( u ) > ^  ( v ) max f PREF( u; v )  PREF( v ; u ) ; 0 g X u; v max f PREF( u; v )  PREF( v ; u ) ; 0 g : W e ran the three algorithms with the same parameters as ab o v e ( i.e. , 10 ; 000 random graphs). The results are giv en in Figure 9. The adv an tage of the greedy algorithms o v er the randomized algorithm is ev en more apparen t on these larger problems. Note also that for large graphs the p erformane of the t w o greedy algorithms is indistinguishable. This is mainly due to the fat that large r andom graphs are strongly onneted with high proba- bilit y . T o summarize the exp erimen ts, when there are six or more elemen ts the greedy algorithm learly outp erforms the randomized algorithm ev en if man y randomly  hosen p erm utations are examined. F urthermore, the impro v ed algorithm whi h rst nds the strongly onneted omp onen ts outp erforms the randomized algorithm for all graph sizes. In pratie the impro v ed greedy algorithm a hiev es v ery go o d appro ximations|within ab out 5 p eren t of optimal, for the ases in whi h optimal graphs an b e feasibly found. 5. Exp erimen tal Results for Metasear h So far, w e ha v e desrib ed a metho d for learning a preferene funtion, and a means of on v erting a preferene funtion in to an ordering of new instanes. W e will no w presen t some exp erimen tal results in learning to order. In partiular, w e will desrib e results on learning to om bine the orderings of sev eral w eb \sear h exp erts" using the algorithm of Figure 2 to learn a preferene funtion, and the simple greedy algorithm to order instanes using the learned preferene funtion. The goals of these exp erimen ts are to illustrate the t yp e of problems that an b e solv ed with our metho d; to empirially ev aluate the learning metho d; to ev aluate the ordering algorithm on large, non-random graphs, su h as migh t arise in a realisti appliation; and to onrm the theoretial results of the preeding setions. W e th us restrit ourselv es to omparing the learned orderings to individual sear h exp erts, as is suggested b y Theorem 1, rather than attempt to ompare this appliation of learning- to-order with previous exp erimen tal te hniques for metasear h, e.g., (Lo  h baum & Streeter, 1989; Kan tor, 1994; Bo y an, F reitag, & Joa hims, 1994; Bartell, Cottrell, & Belew, 1994). W e note that this metasear h problem exhibits sev eral prop erties that suggest a general approa h su h as ours. F or instane, approa hes that learn to om bine similarit y sores are not appliable, sine the similarit y sores of w eb sear h engines are often una v ailable. In the exp erimen ts presen ted here, the learning algorithm w as pro vided with ordered lists for ea h sear h engine without an y asso iated sores. T o further demonstrate the merits of our approa h, w e also desrib e exp erimen ts with partial feedba k|that is, with preferene judgmen ts that are less informativ e than the relev ane judgmen ts more t ypially used in impro ving sear h engines. 260 Learning to Order Things ML Sear h Exp erts UNIV Sear h Exp erts NAME NAME \NAME" \NAME" title:\NAME" \NAME" PLA CE NAME +LASTNAME title:\home page" title:NAME NAME +LASTNAME title:homepage title:\NAME" NAME +LASTNAME ma hine learning title:\NAME" PLA CE NAME +LASTNAME \ma hine learning" NAME title:\home page" NAME +LASTNAME ase based reasoning NAME title:\homepage" NAME +LASTNAME \ase based reasoning" NAME w elome NAME +LASTNAME PLA CE NAME url:index.h tml NAME +LASTNAME \PLA CE" NAME url:home.h tml NAME +LASTNAME url:index.h tml \NAME" title:\home page" NAME +LASTNAME url:home.h tml \NAME" title:\homepage" NAME +LASTNAME url: ~ *LASTNAME* \NAME" w elome NAME +LASTNAME url: ~ LASTNAME \NAME" url:index.h tml NAME +LASTNAME url:LASTNAME \NAME" url:home.h tml \NAME" PLA CE title:\home page" \NAME" PLA CE title:\homepage" \NAME" PLA CE w elome \NAME" PLA CE url:index.h tml \NAME" PLA CE url:home.h tml T able 1: Sear h (and ranking) exp erts used in the metasear h exp erimen ts. In the asso- iated queries, NAME is replaed with the p erson's (or univ ersit y's) full name, LASTNAME with the p erson's last name, and PLA CE is replaed with the p er- son's aÆliation (or univ ersit y's lo ation). Sequenes of w ords enlosed in quotes m ust app ear as a phrase, and terms prexed b y title: and url: m ust app ear in that part of the w eb page. W ords prexed b y a \+" m ust app ear in the w eb page; other w ords ma y or ma y not app ear. 5.1 T est Problems and Eno ding W e  hose to sim ulate the problem of learning a domain-sp ei sear h engine| i.e. , an engine that sear hes for pages of a partiular, narro w t yp e. Aho y! (Shak es, Langheinri h, & Etzioni, 1997) is one instane of su h a domain-sp ei sear h engine. As test ases, w e pi k ed t w o problems: retrieving the home pages of ma hine learning resear hers (ML), and retrieving the home pages of univ ersities (UNIV). T o obtain sample queries, w e obtained a listing of ma hine learning resear hers, iden tied b y name and aÆliated institution, together with their home pages, 4 and a similar list for univ ersities, iden tied b y name and (sometimes) geographial lo ation. 5 Ea h en try on a list w as view ed as a query , with the asso iated URL the sole relev an t w eb page. 4. F rom h ttp://www.ai.nrl.na vy .mil/  aha/resear h/ma hine-learning.h tml, a list main tained b y Da vid Aha. 5. F rom Y aho o! 261 Cohen, Shapire, & Singer W e then onstruted a series of sp eial-purp ose \sear h exp erts" for ea h domain. These w ere implemen ted as query expansion metho ds whi h on v erted a name/aÆliation pair (or a name/lo ation pair) to a lik ely-seeming Alta vista query . F or example, one exp ert for the UNIV domain sear hed for the univ ersit y name app earing as a phrase, together with the phrase \home page" in the title; another exp ert for the ML domain sear hed for all the w ords in the p erson's name plus the w ords \ma hine" and \learning," and further enfores a strit requiremen t that the p erson's last name app ear. Ov erall, w e dened 16 sear h exp erts for the ML domain and 22 for the UNIV domain; these are summarized in T able 1. Ea h sear h exp ert returned the top 30 rank ed w eb pages. In the ML domain, there w ere 210 sear hes for whi h at least one sear h exp ert returned the named home page; for the UNIV domain, there w ere 290 su h sear hes. The task of the learning system is to nd an appropriate w a y of om bining the output of these sear h exp erts. T o giv e a more preise desription of the sear h exp erts, for ea h query t , w e rst onstruted the set X t onsisting of all w eb pages returned b y all of the expanded queries dened b y the sear h exp erts. Next, ea h sear h exp ert i w as represen ted as a preferene funtion R t i . W e  hose these preferene funtions to b e rank orderings dened with resp et to an ordering funtion f t i in the natural w a y: w e assigned a rank of f t i = 30 to the rst listed page, f t i = 29 to the seond-listed page, and so on, nally assigning a rank of f t i = 0 to ev ery page not retriev ed in the top 30 b y the expanded query asso iated with exp ert i . T o eno de feedba k, w e onsidered t w o s hemes. In the rst, w e sim ulated omplete relev ane feedba k|that is, for ea h query , w e onstruted feedba k in whi h the sole relev an t page w as preferred to all other pages. In the seond, w e sim ulated the sort of feedba k that ould b e olleted from \li k data"| i.e. , from observing a user's in terations with a metasear h system. F or ea h query , after presen ting a rank ed list of pages, w e noted the rank of the one relev an t w eb page. W e then onstruted a feedba k ranking in whi h the relev an t page is preferred to all preeding pages. This w ould orresp ond to observing whi h link the user atually follo w ed, and making the assumption that this link w as preferred to previous links. It should b e emphasized that b oth of these forms of feedba k are sim ulated, and on tain less noise than w ould b e exp eted from real user data. In realit y some fration of the relev ane feedba k w ould b e missing or erroneous, and some fration of li k data w ould not satisfy the assumption stated ab o v e. 5.2 Ev aluation and Results T o ev aluate the exp eted p erformane of a fully-trained system on no v el queries in this domain, w e emplo y ed lea v e-one-out testing. F or ea h query t , w e trained the learning system on all the other queries, and then reorded the rank of the learned system on query t . F or omplete relev ane feedba k, this rank is in v arian t of the ordering of the training examples, but for the \li k data" feedba k, it is not; the feedba k olleted at ea h stage dep ends on the b eha vior of the partially learned system, whi h in turn dep ends on the previous training examples. Th us for li k data training, w e trained on 100 randomly  hosen p erm utations of the training data and reorded the median rank for t . 262 Learning to Order Things 5.2.1 Perf ormane Rela tive to Individual Exper ts The theoretial results pro vide a guaran tee of p erformane relativ e to the p erformane of the b est individual sear h (ranking) exp ert. It is therefore natural to onsider omparing the p erformane of the learned system to the b est of the individual exp erts. Ho w ev er, for ea h sear h exp ert, only the top 30 rank ed w eb pages for a query are kno wn; if the single relev an t page for a query is not among these top 30, then it is imp ossible to ompute an y natural measures of p erformane for this query . This ompliates an y omparison of the learned system to the individual sear h exp erts. Ho w ev er, in spite of the inomplete information ab out the p erformane of the sear h exp erts, it is usually p ossible to tell if the learned system ranks a w eb page higher than a partiular exp ert. 6 Motiv ated b y this, w e p erformed a sign test: w e ompared the rank of the learning systems to the rank giv en b y ea h sear h exp ert,  he king to see whether this rank w as lo w er, and disarding queries for whi h this omparison w as imp ossible. W e then used a normal appro ximation to the binomial distribution to test the follo wing t w o n ull h yp otheses (where the probabilit y is tak en o v er the distribution from whi h the queries are dra wn): H1. With probabilit y at least 0.5, the sear h exp ert p erforms b etter than the learning system ( i.e. , giv es a lo w er rank to the relev an t page than the learning system do es.) H2. With probabilit y at least 0.5, the sear h exp ert p erforms no w orse than the learning system ( i.e. , giv es an equal or lo w er rank to the relev an t page.) In training, w e explored learning rates in the range [0 : 001 ; 0 : 999℄. F or omplete feedba k in the ML domain, h yp othesis H1 an b e rejeted with high ondene ( p > 0 : 999) for ev ery sear h exp ert and ev ery learning rate 0 : 01    0 : 99. The same holds in the UNIV domain for all learning rates 0 : 02    0 : 99. The results for li k data training are nearly as strong, exept that 2 of the 22 sear h exp erts in the UNIV domain sho w a greater sensitivit y to the learning rate: for these engines, H1 an only b e rejeted with high ondene for 0 : 3    0 : 6. T o summarize, with high ondene, in b oth domains, the learned ranking system is no w orse than an y individual sear h exp ert for mo derate v alues of  . Hyp othesis H2 is more stringen t sine it an b e rejeted only if w e are sure that the learned system is stritly b etter than the exp ert. With omplete feedba k in the ML domain and 0 : 3    0 : 8, h yp othesis H2 an b e rejeted with ondene p > 0 : 999 for 14 of the 16 sear h exp erts. F or the remaining t w o exp erts the learned system do es p erform b etter more often, but the dierene is not signian t. In the UNIV domain, the results are similar. F or 0 : 2    0 : 99, h yp othesis H2 an b e rejeted with ondene p > 0 : 999 for 21 of the 22 sear h exp erts, and the learned engine tends to p erform b etter than the single remaining exp ert. Again, the results for li k data training are only sligh tly w eak er. In the ML domain, h yp othesis H2 an b e rejeted for all but three exp erts for all but the most extreme learning rates; in the UNIV domain, h yp othesis H2 an b e rejeted for all but t w o exp erts for 0 : 4    0 : 6. F or the remaining exp erts and learning rates the dierenes are not statistially 6. The only time this annot b e determined is when neither the learned system nor the exp ert ranks the relev an t w eb pages in the top 30, a ase of little pratial in terest. 263 Cohen, Shapire, & Singer signian t; ho w ev er, it is not alw a ys the ase that the learned engine tends to p erform b etter. T o summarize the exp erimen ts, for mo derate v alues of  the learned system is, with high ondene, stritly b etter than most of the sear h exp erts in b oth domains, and nev er signian tly w orse than an y exp ert. When trained with full relev ane judgmen ts, the learned system p erforms b etter on a v erage than an y individual exp ert. 5.2.2 Other Perf ormane Measures W e measured the n um b er of queries for whi h the orret w eb page w as in the top k rank ed pages, for v arious v alues of k . These results are sho wn in Figure 10. Here the lines sho w the p erformane of the learned systems (with  = 0 : 5, a generally fa v orable learning rate) and the p oin ts orresp ond to the individual exp erts. In most ases, the learned system losely tra ks the p erformane of the b est exp ert at ev ery v alue of k . This is esp eially in teresting sine no single exp ert is b est at all v alues of k . The nal graph in this gure in v estigates the sensitivit y of this measure to the learning rate  . As a represen tativ e illustration, w e v aried  in the ML domain and plotted the top- k p erformane of the system learned from omplete feedba k for three v alues of k . Note that p erformane is roughly omparable o v er a wide range of v alues for  . Another plausible measure of p erformane is the a v erage rank of the (single) relev an t w eb page. W e omputed an appro ximation to a v erage rank b y artiially assigning a rank of 31 to ev ery page that w as either unrank ed, or rank ed ab o v e rank 30. (The latter ase is to b e fair to the learned system, whi h is the only one for whi h a rank greater than 30 is p ossible.) A summary of these results for  = 0 : 5 is giv en in T able 2, together with some additional data on top- k p erformane. In the table, w e giv e the top- k p erformane for three v alues of k , and a v erage rank for sev eral ranking systems: the t w o learned systems; the naiv e query , i.e. , the p erson or univ ersit y's name; and the single sear h exp ert that p erformed b est with resp et to ea h p erformane measure. Note that not all of these exp erts are distint sine sev eral exp erts sored the b est on more than one measure. The table illustrates the robustness of the learned systems, whi h are nearly alw a ys omp etitiv e with the b est exp ert for ev ery p erformane measure listed. The only exeption to this is that the system trained on li k data trails the b est exp ert in top- k p erformane for small v alues of k . It is also w orth noting that in b oth domains, the naiv e query (simply the p erson or univ ersit y's name) is not v ery eetiv e: ev en with the w eak er li k data feedba k, the learned system a hiev es a 36% derease in a v erage rank o v er the naiv e query in the ML domain, and a 46% derease in the UNIV domain. T o summarize the exp erimen ts, on these domains the learned system not only p erforms m u h b etter than naiv e sear h strategies, but also onsisten tly p erforms at least as w ell as, and p erhaps sligh tly b etter than, an y single domain-sp ei sear h exp ert. This observ ation holds regardless of the p erformane metri onsidered; for nearly ev ery metri w e omputed, the learned system alw a ys equals, and usually exeeds, the p erformane of the sear h exp ert that is b est for that metri. Finally , the p erformane of the learned system is almost as go o d with the w eak er \li k data" training as with omplete relev ane feedba k. 264 Learning to Order Things 0 50 100 150 200 250 0 5 10 15 20 25 30 35 # queries in top k k ML: queries answered in top k Learned System - Full feedback Learned System - Click data Individual Rankers 0 50 100 150 200 250 300 0 5 10 15 20 25 30 35 # queries in top k k UNIV: queries answered in top k Learned System - Full feedback Learned System - Click data Individual Rankers 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0 0.2 0.4 0.6 0.8 1 % Relevant Learning Rate k=1 k=4 k=8 Figure 10: T op and middle: P erformane of the learned system v ersus individual exp erts for t w o dieren t domains. Bottom: the p eren tage of time the relev an t w eb page w as in the top- k list for k = 1,4, and 8. 265 Cohen, Shapire, & Singer ML Domain Univ ersit y Domain T op 1 T op 10 T op 30 Avg Rank T op 1 T op 10 T op 30 Avg Rank Learned (F ull F eed.) 114 185 198 4.9 111 225 253 7.8 Learned (Cli k Data) 93 185 198 4.9 87 229 259 7.8 Naiv e 89 165 176 7.7 79 157 191 14.4 Best (T op 1) 119 170 184 6.7 112 221 247 8.2 Best (T op 10) 114 182 190 5.3 111 223 249 8.0 Best (T op 30) 97 181 194 5.6 111 223 249 8.0 Best (Avg Rank) 114 182 190 5.3 111 223 249 8.0 T able 2: Comparison of learned systems and individual sear h queries. 6. Related W ork Problems that in v olv e ordering and ranking ha v e b een in v estigated in v arious elds su h as deision theory , the so ial sienes, information retriev al and mathematial eonomis (Bla k, 1958; Kemen y & Snell, 1962; Co op er, 1968; Fish burn, 1970; Rob erts, 1979; Salton & MGill, 1983; F ren h, 1989; Y ao, 1995). Among the w ealth of literature on the sub jet, the losest to ours app ears to b e the w ork of Kemen y and Snell (1962) whi h w as extended b y Y ao (1995) and used b y Balabano v   and Shoham (1997) in their F AB ollab orativ e ltering system. These w orks use a similar notion of ordering funtions and feedba k; ho w ev er, they assume that b oth the ordering funtions and the feedba k are omplete and transitiv e. Hene, it is not p ossible to lea v e elemen ts unrank ed, or to ha v e inonsisten t feedba k whi h violates the transitivit y requiremen ts. It is therefore diÆult to om bine and fuse inonsisten t and inomplete orderings in the Kemen y and Snell mo del. There are also sev eral related in tratabilit y results. Most of them are onerned with the diÆult y in rea hing onsensus in v oting systems based on preferene ordering. Sp eially , Bartholdi, T o v ey and T ri k (1989) study the problem of nding a winner in an eletion when the preferenes of all v oters are irreexiv e, an tisymmetri, transitiv e, and omplete. Th us, their setting is more restritiv e than ours. They study t w o similar s hemes to deide on a winner of an eletion. The rst w as in v en ted b y Do dgson (1876) (b etter kno wn b y his p en name, Lewis Carroll) and the seond is due to Kemen y (1959). F or b oth mo dels, they sho w that the problem of nding a winner in an eletion is NP-hard. Among these t w o mo dels, the one suggested b y Kemen y is the losest to ours. Ho w ev er, as men tioned ab o v e, this mo del is more restritiv e as it do es not allo w v oters to abstain (preferenes are required to b e omplete) or to b e inonsisten t (all preferenes are transitiv e). As illustrated b y the exp erimen ts, the problem of learning to rank is losely related to the problem of om bining the results of dieren t sear h engines. Man y metho ds for this ha v e b een prop osed b y the information retriev al omm unit y , and man y of these are adap- tiv e, using relev ane judgmen ts to mak e an appropriate  hoie of parameters. Ho w ev er, generally , rankings are om bined b y om bining the sores that w ere used to rank do u- men ts (Lo  h baum & Streeter, 1989; Kan tor, 1994). It is also frequen tly assumed that other prop erties of the ob jets (do umen ts) to b e rank ed are a v ailable, su h as w ord frequenies. In on trast, in our exp erimen ts, instanes are atomi en tities with no asso iated prop erties exept for their p osition in v arious rank-orderings. Similarly , w e mak e minimal assump- 266 Learning to Order Things tions ab out the rank-orderings|in partiular, w e do not assume sores are a v ailable. Our metho ds are th us appliable to a broader lass of ranking problems. General optimization metho ds ha v e also b een adopted to adjust parameters of an IR system so as to impro v e agreemen t with a set of user-giv en preferene judgmen ts. F or in- stane, Bo y an, F reitag, and Joa hims (1994) use sim ulated annealing to impro v e agreemen t with \li k data," and Bartell, Cottrell and Belew (1994) use onjugate gradien t desen t to  ho ose parameters for a linear om bination of soring funtions, ea h asso iated with a dieren t sear h exp ert. T ypially , su h approa hes oer few guaran tees of eÆieny , optimalit y , or generalization p erformane. Another related task is  ol le tion fusion . Here, sev eral sear hes are exeuted on disjoin t subsets of a large olletion, and the results are om bined. Sev eral approa hes to this prob- lem that do not rely on om bining ranking sores ha v e b een desrib ed (T o w ell, V o orhees, Gupta, & Johnson-Laird, 1995; V o orhees, Gupta, & Johnson-Laird, 1994). Ho w ev er, al- though the problem is sup erially similar to the one presen ted here, the assumption that the dieren t sear h engines index disjoint sets of do umen ts atually mak es the problem quite dieren t. In partiular, sine it is imp ossible for t w o engines to giv e dieren t relativ e orderings to the same pair of do umen ts, om bining the rankings an b e done relativ ely easily . Etzioni et al. (1996) formally onsidered another asp et of metasear h|the task of optimally om bining information soures with asso iated osts and time dela ys. Our formal results are disjoin t from theirs, as they assume that ev ery query has a single reognizable orret answ er, rendering ordering issues unimp ortan t. There are man y other appliations in ma hine learning, reinforemen t learning, neural net w orks, and ollab orativ e ltering that emplo y ranking and preferenes, e.g., (Utgo & Saxena, 1987; Utgo & Clouse, 1991; Caruana, Baluja, & Mit hell, 1996; Resni k & V arian, 1997), While our w ork is not diretly relev an t, it migh t b e p ossible to use the framew ork suggested in this pap er in similar settings. This is one of our future resear h goals. Finally , w e w ould lik e to note that the framew ork and algorithms presen ted in this pap er an b e extended in sev eral w a ys. Our urren t resear h fo uses on eÆien t bat h algorithms for om bining preferene funtions, and on using restrited ranking exp erts for whi h the problem of nding an optimal total ordering an b e solv ed in p oly omial time (F reund, Iy er, S hapire, & Singer, 1998). 7. Conlusions In man y appliations, it is desirable to order rather than lassify instanes. W e in v estigated a t w o-stage approa h to learning to order in whi h one rst learns a preferene funtion b y on v en tional means, and then orders a new set of instanes b y nding the total ordering that b est appro ximates the preferene funtion. The preferene funtion that is learned is a binary funtion PREF( u; v ), whi h returns a measure of ondene reeting ho w lik ely it is that u is preferred to v . This is learned from a set of \exp erts" whi h suggest sp ei orderings, and from user feedba k in the form of assertions of the form \ u should b e preferred to v ". W e ha v e presen ted t w o sets of results on this problem. First, w e presen ted an online learning algorithm for learning a w eigh ted om bination of ranking exp erts whi h is based 267 Cohen, Shapire, & Singer on an adaptation of F reund and S hapire's Hedge algorithm. Seond, w e explored the omplexit y of the problem of nding a total ordering that agrees b est with a preferene funtion. W e sho w ed that this problem is NP-omplete ev en in a highly restritiv e ase, namely , preferene prediates that are linear om binations of a ertain lass of w ell-b eha v ed \exp erts" alled rank orderings. Ho w ev er, w e also sho w ed that for an y preferene prediate, there is a greedy algorithm that alw a ys obtains a total ordering that is within a fator of t w o of optimal. W e also presen ted an algorithm that rst divides the set of instanes in to strongly onneted omp onen ts and then uses the greedy algorithm (or full en umeration, for small omp onen ts) to nd an appro ximately go o d order within large strongly onneted omp onen ts. W e found that this appro ximation algorithm w orks v ery w ell in pratie and often nds the b est order. W e also presen ted exp erimen tal results in whi h these algorithms w ere used to om bine the results of a n um b er of \sear h exp erts," ea h of whi h orresp onds to a domain-sp ei strategy for sear hing the w eb. W e sho w ed that in t w o domains, the learned system losely tra ks and often exeeds the p erformane of the b est of these sear h exp erts. These results hold for either traditional relev ane feedba k mo dels of learning, or from w eak er feedba k in the form of sim ulated \li k data." The p erformane of the learned systems also learly exeeds the p erformane of more naiv e approa hes to sear hing. A kno wledgmen ts W e w ould lik e to thank Noga Alon, Edith Cohen, Dana Ron, and Ri k V ohra for n umerous helpful disussions. An extended abstrat of this pap er app eared in A dvan es in Neur al Information Pr o  essing Systems 10 , MIT Press, 1998. Referenes Balabano v  , M., & Shoham, Y. (1997). F AB: Con ten t-based, ollab orativ e reommenda- tion. Communi ations of the A CM , 40 (3), 66{72. Bartell, B., Cottrell, G., & Belew, R. (1994). Automati om bination of m ultiple rank ed retriev al systems. In Sevente enth A nnual International A CM SIGIR Confer en e on R ese ar h and Development in Information R etrieval . Bartholdi, J., T o v ey , C., & T ri k, M. (1989). V oting s hemes for whi h it an b e diÆult to tell who w on the eletions. So ial Choi e and Welfar e , 6 , 157{165. Berger, B., & Shor, P . (1997). Tigh t b ounds for the ayli subgraph problem. Journal of A lgorithms , 25 , 1{18. Bla k, D. (1958). The ory of Committe es and Ele tions . Cam bridge Univ ersit y Press. Bo y an, J., F reitag, D., & Joa hims, T. (1994). A ma hine learning ar hiteture for opti- mizing w eb sear h engines. T e h. rep. WS-96-05, Amerian Asso iation of Artiial In telligene. 268 Learning to Order Things Caruana, R., Baluja, S., & Mit hell, T. (1996). Using the future to `Sort Out' the presen t: Rankprop and m ultitask learning for medial risk ev aluation. In A dvan es in Neur al Information Pr o  essing Systems (NIPS) 8 . Co op er, W. (1968). Exp eted sear h length: A single measure of retriev al eetiv eness based on the w eak ordering ation of retriev al systems. A meri an Do umentation , 19 , 30{41. Do dgson, C. (1876). A metho d for taking votes on mor e than two issues . Clarendon Press, Oxford. Reprin ted with disussion in (Bla k, 1958). Etzioni, O., Hanks, S., Jiang, T., Karp, R. M., Madani, O., & W aarts, O. (1996). EÆien t information gathering on the in ternet. In Pr o  e e dings of the 37th A nnual Symp o- sium on F oundations of Computer Sien e (F OCS-96) Burlington, V ermon t. IEEE Computer So iet y Press. Ev en, G., Naor, J., Rao, S., & S hieb er, B. (1996). Divide-and-onquer appro ximation algo- rithms via spreading metris. In 36th A nnual Symp osium on F oundations of Computer Sien e (F OCS-96) , pp. 62{71 Burlington, V ermon t. IEEE Computer So iet y Press. Ev en, G., Naor, J., S hieb er, B., & Sudan, M. (1998). Appro ximating minim um feedba k sets and m ultiuts in direted graphs. A lgorithmi a , 20 (2), 151{174. Fish burn, F. (1970). Utility The ory for De ision Making . Wiley , New Y ork. F ren h, S. (1989). De ision The ory: A n Intr o dution to the Mathematis of R ationality . Ellis Horw o o d Series in Mathematis and Its Appliations. F reund, Y., Iy er, R., S hapire, R., & Singer, Y. (1998). An eÆien t b o osting algorithm for om bining preferenes. In Mahine L e arning: Pr o  e e dings of the Fifte enth Interna- tional Confer en e . F reund, Y., & S hapire, R. (1997). A deision-theoreti generalization of on-line learning and an appliation to b o osting. Journal of Computer and System Sien es , 55 (1), 119{139. Galil, Z., & Megido, N. (1977). Cyli ordering is NP-omplete. The or eti al Computer Sien e , 5 , 179{182. Gary , M., & Johnson, D. (1979). Computers and Intr atability: A Guide to the The ory of NP- ompleteness . W. H. F reeman and Compan y , New Y ork. Kan tor, P . (1994). Deision lev el data fusion for routing of do umen ts in the TREC3 on text: a b est ase analysis of w orst ase results. In Pr o  e e dings of the thir d text r etrieval  onfer en e (TREC-3) . Kemen y , J. (1959). Mathematis without n um b ers. Dae dalus , 88 , 571{591. Kemen y , J., & Snell, J. (1962). Mathemati al Mo dels in the So ial Sien es . Blaisdell, New Y ork. 269 Cohen, Shapire, & Singer Littlestone, N. (1988). Learning qui kly when irrelev an t attributes ab ound: A new linear- threshold algorithm. Mahine L e arning , 2 (4). Littlestone, N., & W arm uth, M. (1994). The w eigh ted ma jorit y algorithm. Information and Computation , 108 (2), 212{261. Lo  h baum, K., & Streeter, L. (1989). Comparing and om bining the eetiv eness of la- ten t seman ti indexing and the ordinary v etor spae mo del for information retriev al. Information pr o  essing and management , 25 (6), 665{676. Resni k, P ., & V arian, H. (1997). In tro dution to sp eial setion on Reommender Systems. Communi ation of the A CM , 40 (3). Rob erts, F. (1979). Me asur ement the ory with appli ations to de ision making, utility, and so ial sien es . Addison W esley , Reading, MA. Salton, G., & MGill, M. (1983). Intr o dution to Mo dern Information R etrieval . MGra w- Hill. Seymour, P . (1995). P a king direted iruits frationally . Combinatori a , 15 , 281{288. Shak es, J., Langheinri h, M., & Etzioni, O. (1997). Dynami referene sifting: a ase study in the homepage domain. In Pr o  e e dings of WWW6 . Shmo ys, D. (1997). Cut problems and their appliation to divide-and-onquer. In Ho  h baum, D. (Ed.), Appr oximation algorithms for NP-Har d Pr oblems . PWS Pub- lishing Compan y , New Y ork. T o w ell, G., V o orhees, E., Gupta, N., & Johnson-Laird, B. (1995). Learning olletion fusion strategies for information retriev al. In Mahine L e arning: Pr o  e e dings of the Twelfth International Confer en e Lak e T aho, California. Morgan Kaufmann. Utgo, P ., & Clouse, J. (1991). Tw o kinds of training information for ev aluation funtion learning. In Pr o  e e dings of the Ninth National Confer en e on A rtiial Intel ligen e (AAAI-91) , pp. 596{600 Cam bridge, MA. AAAI Press/MIT PRess. Utgo, P ., & Saxena, S. (1987). Learning a preferene prediate. In Pr o  e e dings of the F ourth International Workshop on Mahine L e arning , pp. 115{121 San F raniso, CA. Morgan Kaufmann. V o orhees, E., Gupta, N., & Johnson-Laird, B. (1994). The olletion fusion problem. In Sev- ente enth A nnual International A CM SIGIR Confer en e on R ese ar h and Development in Information R etrieval . Y ao, Y. (1995). Measuring retriev al eetiv eness based on user preferene of do umen ts. Journal of the A meri an So iety for Information Sien e , 46 (2), 133{145. 270

Learning to Order Things

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment