Learning to Order Things
There are many applications in which it is desirable to order rather than classify instances. Here we consider the problem of learning how to order instances given feedback in the form of preference judgments, i.e., statements to the effect that one …
Authors: ** - **Robert D. Kleinberg** (Cornell University) - **Jon A. McCarthy** (University of Washington) - **J. M. R. Kleinberg** (※ 실제 논문에 따라 저자 명이 다를 수 있음) **
Journal of Artiial In telligene Resear h 10 (1999) 243-270 Submitted 10/98; published 5/99 Learning to Order Things William W. Cohen w ohenresear h.a tt.om Rob ert E. S hapire shapireresear h.a tt.om Y oram Singer singerresear h.a tt.om A T&T L abs, Shannon L ab or atory, 180 Park A venue Florham Park, NJ 07932, USA Abstrat There are man y appliations in whi h it is desirable to order rather than lassify in- stanes. Here w e onsider the problem of learning ho w to order instanes giv en feedba k in the form of preferene judgmen ts, i.e. , statemen ts to the eet that one instane should b e rank ed ahead of another. W e outline a t w o-stage approa h in whi h one rst learns b y on v en tional means a binary pr efer en e funtion indiating whether it is advisable to rank one instane b efore another. Here w e onsider an on-line algorithm for learning preferene funtions that is based on F reund and S hapire's \Hedge " algorithm. In the seond stage, new instanes are ordered so as to maximize agreemen t with the learned preferene fun- tion. W e sho w that the problem of nding the ordering that agrees b est with a learned preferene funtion is NP-omplete. Nev ertheless, w e desrib e simple greedy algorithms that are guaran teed to nd a go o d appro ximation. Finally , w e sho w ho w metasear h an b e form ulated as an ordering problem, and presen t exp erimen tal results on learning a om- bination of \sear h exp erts," ea h of whi h is a domain-sp ei query expansion strategy for a w eb sear h engine. 1. In tro dution W ork in indutiv e learning has mostly onen trated on learning to lassify . Ho w ev er, there are man y appliations in whi h it is desirable to order rather than lassify instanes. An example migh t b e a p ersonalized email lter that prioritizes unread mail. Here w e will onsider the problem of learning ho w to onstrut su h orderings giv en feedba k in the form of pr efer en e judgments , i.e. , statemen ts that one instane should b e rank ed ahead of another. Su h orderings ould b e onstruted based on a learned probabilisti lassier or regres- sion mo del and in fat often are. F or instane, it is ommon pratie in information retriev al to rank do umen ts aording to their probabilit y of relev ane to a query , as estimated b y a learned lassier for the onept \relev an t do umen t." An adv an tage of learning orderings diretly is that preferene judgmen ts an b e m u h easier to obtain than the lab els required for lassiation learning. F or instane, in the email appliation men tioned ab o v e, one approa h migh t b e to rank messages aording to their estimated probabilit y of mem b ership in the lass of \urgen t" messages, or b y some n umerial estimate of urgeny obtained b y regression. Supp ose, ho w ev er, that a user is presen ted with an ordered list of email messages, and elets to read the third message rst. Giv en this eletion, it is not neessarily the ase that message three is urgen t, nor is there suÆien t information to estimate an y n umerial urgeny measures. 1999 AI Aess F oundation and Morgan Kaufmann Publishers. All righ ts reserv ed. Cohen, Shapire, & Singer Ho w ev er, it seems quite reasonable to infer that message three should ha v e b een rank ed ahead of the others. Th us, in this setting, obtaining preferene information ma y b e easier and more natural than obtaining the lab els needed for a lassiation or regression approa h. Another appliation domain that requires ordering instanes is ol lab or ative ltering ; see, for instane, the pap ers on tained in Resni k and V arian (1997). In a t ypial ollab orativ e ltering task, a user seeks reommendations, sa y , on mo vies that she is lik ely to enjo y . Su h reommendations are usually expressed as ordered lists of reommended mo vies, pro dued b y om bining mo vie ratings supplied b y other users. Notie that ea h user's mo vie ratings an b e view ed as a set of preferene judgemen ts. In fat, in terpreting ratings as preferenes is adv an tageous in sev eral w a ys: for instane, it is not neessary to assume that a rating of \7" means the same thing to ev ery user. In the remainder of this pap er, w e will in v estigate the follo wing t w o-stage approa h to learning ho w to order. In stage one, w e learn a pr efer en e funtion , a t w o-argumen t funtion PREF( u; v ) whi h returns a n umerial measure of ho w ertain it is that u should b e rank ed b efore v . In stage t w o, w e use the learned preferene funtion to order a set of new instanes X ; to aomplish this, w e ev aluate the learned funtion PREF( u; v ) on all pairs of instanes u; v 2 X , and ho ose an ordering of X that agrees, as m u h as p ossible, with these pairwise preferene judgmen ts. F or stage one, w e desrib e a sp ei algorithm for learning a preferene funtion from a set of \ranking-exp erts". The algorithm is an on-line w eigh t allo ation algorithm, m u h lik e the w eigh ted ma jorit y algorithm (Littlestone & W arm uth, 1994) and Winno w (Littlestone, 1988), and, more diretly , F reund and S hapire's (1997) \Hedge" algorithm. F or stage t w o, w e sho w that nding a total order that agrees b est with su h a preferene funtion is NP- omplete. Nev ertheless, w e sho w that there are eÆien t greedy algorithms that alw a ys nd a go o d appro ximation to the b est ordering. W e then presen t some exp erimen tal results in whi h these algorithm are used to om bine the results of sev eral \sear h exp erts," ea h of whi h is a domain-sp ei query expansion strategy for a w eb sear h engine. Sine our w ork tou hes sev eral dieren t elds w e defer the disussion of related w ork to Se. 6. 2. Preliminaries Let X b e a set of instanes. F or simpliit y , in this pap er, w e alw a ys assume that X is nite. A pr efer en e funtion PREF is a binary funtion PREF : X X ! [0 ; 1℄. A v alue of PREF( u; v ) whi h is lose to 1 (resp etiv ely 0) is in terpreted as a strong reommendation that u should b e rank ed ab o v e (resp etiv ely , b elo w) v . A v alue lose to 1 = 2 is in terpreted as an absten tion from making a reommendation. As noted earlier, the h yp othesis of our learning system will b e a preferene funtion, and new instanes will b e rank ed so as to agree as m u h as p ossible with the preferenes predited b y this h yp othesis. In standard lassiation learning, a h yp othesis is onstruted b y om bining primitiv e features. Similarly , in this pap er, a preferene funtion will b e a om bination of primitiv e preferene funtions. In partiular, w e will t ypially assume the a v ailabilit y of a set of N primitiv e preferene funtions R 1 ; : : : ; R N . These an then b e om bined in the usual w a ys, for instane with a b o olean or linear om bination of their v alues. W e will b e esp eially in terested in the latter om bination metho d. 244 Learning to Order Things a b c d a b c d a b c d 7/8 7/8 1 1/4 1 3/4 f(a)=1 f(b)=2 f(c)=0 f(d)= ⊥ g(a)=0 g(b)=2 g(c)=1 g(d)=2 1/8 1/4f() + 3/4g() 1/8 Figure 1: Left and middle: Tw o ordering funtions and their graph represen tation. Righ t: The graph represen tation of the preferene funtion reated b y a w eigh ted ( 1 4 and 3 4 ) om bination of the t w o funtions. Edges with w eigh t of 1 2 or 0 are omitted. It is on v enien t to assume that the R i 's are w ell-formed in ertain w a ys. T o this end, w e in tro due a sp eial kind of preferene funtion alled a rank ordering whi h is dened b y an ordering funtion. Let S b e a totally ordered set. W e assume without loss of generalit y that S R . An or dering funtion into S is an y funtion f : X ! S , where w e in terpret an inequalit y f ( u ) > f ( v ) to mean that u is rank ed ab o v e v b y f . It is sometimes on v enien t to allo w an ordering funtion to \abstain" and not giv e a preferene for a pair u , v . W e therefore allo w S to inlude a sp eial sym b ol ? not in R , and w e in terpret f ( u ) = ? to mean that u is \unrank ed." W e dene the sym b ol ? to b e inomparable to all the elemen ts in S (that is, ? 6 < s and s 6 < ? for all s 2 S ). An ordering funtion f indues the preferene funtion R f , dened as R f ( u; v ) = 8 > < > : 1 if f ( u ) > f ( v ) 0 if f ( u ) < f ( v ) 1 2 otherwise. W e all R f a r ank or dering for X into S . If R f ( u; v ) = 1, then w e sa y that u is preferred to v , or u is rank ed higher than v . Note that R f ( u; v ) = 1 2 if either u or v (or b oth) is unrank ed. W e will sometimes desrib e and manipulate preferene funtions as direted w eigh ted graphs. The no des of a graph orresp ond to the instanes in X . Ea h pair ( u; v ) is on- neted b y a direted edge with w eigh t PREF( u; v ). Sine an ordering funtion f indues a preferene funtion R f , w e an also desrib e ordering funtions as graphs. In Fig. 1 w e giv e an example of t w o ordering funtions and their orresp onding graphs. F or brevit y , w e do not dra w edges ( u; v ) su h that PREF( u; v ) = 1 2 or PREF( u; v ) = 0. T o giv e a onrete example of rank orderings, imagine learning to order do umen ts based on the w ords that they on tain. T o mo del this, let X b e the set of all do umen ts in a rep ository , and for N w ords w 1 ; : : : ; w N , let f i ( u ) b e the n um b er of o urrenes of w ord w i in do umen t u . Then R f i will prefer u to v whenev er w i o urs more often in u than v . As a seond example, onsider a metasear h appliation in whi h the goal is to om bine the 245 Cohen, Shapire, & Singer rankings of sev eral w eb sear h engines on some xed query . F or N sear h engines e 1 ; : : : ; e N , one migh t dene f i so that R f i prefers w eb page u to w eb page v whenev er u is rank ed ahead of v in the list L i pro dued b y the orresp onding sear h engine. T o do this, one ould let f i ( u ) = k for the w eb page u app earing in the k -th p osition in the list L i , and let f i ( u ) = M (where M > j L i j ) for an y w eb page u not app earing in L i . F eedba k from the user will b e represen ted in a similar but more general w a y . W e will assume that feedba k is a set elemen t pairs ( u; v ), ea h represen ting an assertion of the form \ u should b e preferred to v ." This denition of feedba k is less restrited than ordering funtions. In partiular, w e will not assume that the feedba k is onsisten t|yles, su h as a > b > a , will b e allo w ed. 3. Learning a Com bination of Ordering F untions In this setion, w e onsider the problem of learning a go o d linear om bination of a set of ordering funtions. Sp eially , w e assume aess to a set of r anking exp erts , ea h of whi h generates an ordering funtion when pro vided with a set of instanes. F or instane, in a metasear h problem, ea h ranking exp ert migh t b e a funtion that submits the user's query to a dieren t sear h engine; the domain of instanes migh t b e the set of all w eb pages returned b y an y of the ranking exp erts; and the ordering funtion asso iated with ea h ranking exp ert migh t b e represen ted as in the example ab o v e ( i.e. , letting f i ( u ) = k for the k -the w eb page u returned b y i -th sear h engine, and letting f i ( u ) = M for an y w eb page u not retriev ed b y the i -th sear h engine). The user's feedba k will b e a set of pairwise preferenes b et w een w eb pages. This feedba k ma y b e obtained diretly , for example, b y asking the user to expliitly rank the URL's returned b y the sear h engine; or the feedba k ma y b e obtained indiretly , for example, b y measuring the time sp en t viewing ea h of the returned pages. W e note that for the metasear h problem, an approa h that w orks diretly with the n umerial sores asso iated with the dieren t sear h engines migh t not b e feasible; these n umerial sores migh t not b e omparable aross dieren t sear h engines, or migh t not b e pro vided b y all sear h engines. Another problem is that most w eb pages will not b e indexed b y all sear h engines. This an b e easily mo deled in our setting: rather than letting f i ( u ) = M for a w eb page u that is not rank ed b y sear h engine i , one ould let f i ( u ) = ? . This orresp onds to the assumption that the sear h engine's preferene for u relativ e to rank ed w eb pages is unkno wn. W e no w desrib e a w eigh t allo ation algorithm that uses the preferene funtions R i to learn a preferene funtion of the form PREF( u; v ) = P N i =1 w i R i ( u; v ). W e adopt the on-line learning framew ork rst studied b y Littlestone (1988) in whi h the w eigh t w i assigned to ea h ranking exp ert i is up dated inremen tally . F ormally , learning is assumed to tak e plae in a sequene of rounds. On ea h round t , w e assume the learning algorithm is pro vided with a set X t of instanes to b e rank ed, for whi h ea h ranking exp ert i 2 f 1 ; : : : ; N g pro vides an ordering funtion f t i . (In metasear h, for instane, f t i is the ordering funtion asso iated with the list L t i of w eb pages returned b y the i -th ranking exp ert for the t -th query , and X t is the set of all w eb pages that app ear in an y of the lists L t 1 ; : : : ; L t N .) Ea h ordering funtion f t i indues a preferene funtion R f t i , whi h w e denote for brevit y b y R t i . The learner ma y ompute R t i ( u; v ) for an y and all preferene 246 Learning to Order Things funtions R t i and pairs u; v 2 X t b efore pro duing a om bined preferene funtion PREF t , whi h is then used to pro due an ordering ^ t of X t . (Metho ds for pro duing an ordering from a preferene funtion will b e disussed b elo w.) After pro duing the ordering ^ t , the learner reeiv es feedba k from the en vironmen t. W e assume that the feedba k is an arbitrary set of assertions of the form \ u should b e preferred to v ." That is, the feedba k on the t -th round is a set F t of pairs ( u; v ). The algorithm w e prop ose for this problem is based on the \w eigh ted ma jorit y algo- rithm" of Littlestone and W arm uth (1994) and, more diretly , on F reund and S hapire's (1997) \Hedge" algorithm. W e dene the loss of a preferene funtion R with resp et to the user's feedba k F as Loss( R ; F ) = P ( u;v ) 2 F (1 R ( u; v )) j F j = 1 1 j F j X ( u;v ) 2 F R ( u; v ) : (1) This loss has a natural probabilisti in terpretation. If R is view ed as a randomized predition algorithm that predits that u will preede v with probabilit y R ( u; v ), then Loss( R ; F ) is the probabilit y of R disagreeing with the feedba k on a pair ( u; v ) hosen uniformly at random from F . It is w orth noting that the assumption on the form of the feedba k an b e further relaxed b y allo wing the user to indiate the degree to whi h she prefers u o v er v . In this ase, the loss should b e normalized b y the w eigh ted sum of feedba k pairs. Sine this generalization is rather straigh tforw ard, w e assume for brevit y that the feedba k is an un w eigh ted set of assertions o v er elemen t pairs. W e no w an use the Hedge algorithm almost v erbatim, as sho wn in Figure 2. The algorithm main tains a p ositiv e w eigh t v etor whose v alue at time t is denoted b y w t = ( w t 1 ; : : : ; w t N ). If there is no prior kno wledge ab out the ranking exp erts, w e set all initial w eigh ts to b e equal so that w 1 i = 1 = N . On ea h round t , the w eigh t v etor w t is used to om bine the preferene funtions of the dieren t exp erts to obtain the preferene funtion PREF t ( u; v ) = P N i =1 w t i R t i ( u; v ). This preferene funtion is next on v erted in to an ordering ^ t on the urren t set of elemen ts X t . F or the purp oses of this setion, the metho d of pro duing an ordering is immaterial; in partiular, an y of the metho ds desrib ed in Se. 4 ould b e used here. Based on this ordering, the user pro vides feedba k F t , and the loss for ea h preferene funtion Loss( R t i ; F t ) is ev aluated as in Eq. (1). Finally , the w eigh t v etor w t is up dated using the m ultipliativ e rule w t +1 i = w t i Loss ( R t i ;F t ) Z t where 2 [0 ; 1℄ is a parameter, and Z t is a normalization onstan t, hosen so that the w eigh ts sum to one after the up date. Th us, in ea h round, the w eigh ts of the ranking exp erts are adjusted so that exp erts pro duing preferene funtions with relativ ely large agreemen t with the feedba k are inreased. W e no w giv e the theoretial rationale b ehind this algorithm. F reund and S hapire (1997) pro v e general results ab out Hedge whi h an b e applied diretly to this loss funtion. Their results imply almost immediately a b ound on the um ulativ e loss of the preferene funtion PREF t in terms of the loss of the b est ranking exp ert, sp eially: 247 Cohen, Shapire, & Singer Allo ate W eigh ts for Ranking Exp erts P arameters: 2 [0 ; 1℄, initial w eigh t v etor w 1 2 [0 ; 1℄ N with P N i =1 w 1 i = 1 N ranking exp erts, n um b er of rounds T Do for t = 1 ; 2 ; : : : ; T 1. Reeiv e a set of elemen ts X t and ordering funtions f t 1 ; : : : ; f t N . Let R t i denote the preferene funtion indued b y f t i . 2. Compute a total order ^ t whi h appro ximates PREF t ( u; v ) = N X i =1 w t i R t i ( u; v ) (Se. 4 desrib es sev eral w a ys of appro ximating a preferene funtion with a total order.) 3. Order X t using ^ t . 4. Reeiv e feedba k F t from the user. 5. Ev aluate losses Loss( R t i ; F t ) as dened in Eq. (1). 6. Set the new w eigh t v etor w t +1 i = w t i Loss ( R t i ;F t ) Z t where Z t is a normalization onstan t, hosen so that P N i =1 w t +1 i = 1. Figure 2: The on-line w eigh t allo ation algorithm. Theorem 1 F or the algorithm of Fig. 2, T X t =1 Loss(PREF t ; F t ) a min i T X t =1 Loss( R t i ; F t ) + ln N wher e a = ln (1 = ) = (1 ) and = 1 = (1 ) . Note that P t Loss(PREF t ; F t ) is the um ulativ e loss of the om bined preferene fun- tions PREF t , and P t Loss( R t i ; F t ) is the um ulativ e loss of the i th ranking exp ert. Th us, Theorem 1 states that the um ulativ e loss of the om bined preferene funtions will not b e m u h w orse than that of the b est ranking exp ert. Pro of: W e ha v e that Loss(PREF t ; F t ) = 1 1 F t X ( u;v ) 2 F t X i w t i R t i ( u; v ) = X i w t i 0 1 1 F t X ( u;v ) 2 F t R t i ( u; v ) 1 A 248 Learning to Order Things = X i w t i Loss( R t i ( u; v ) ; F t ) : Therefore, b y F reund and S hapire's (1997) Theorem 2, T X t =1 Loss(PREF t ; F t ) = T X t =1 X i w t i Loss( R t i ( u; v ) ; F t ) a min i T X t =1 Loss( R t i ; F t ) + ln N : 2 Of ourse, w e are not in terested in the loss of PREF t (sine it is not an ordering), but rather in the p erformane of the atual ordering ^ t omputed b y the learning algorithm. F ortunately , the losses of these an b e related using a kind of triangle inequalit y . Let DISA GREE( ; PREF ) = X u;v : ( u ) > ( v ) (1 PREF( u; v )) : (2) Theorem 2 F or any PREF , F and total or der dene d by an or dering funtion , Loss( R ; F ) DISA GREE( ; PREF ) j F j + Loss(PREF ; F ) : (3) Pro of: F or x; y 2 [0 ; 1℄, let us dene d ( x; y ) = x (1 y ) + y (1 x ). W e no w sho w that d satises the triangle inequalit y . Let x , y and z b e in [0 ; 1℄, and let X , Y and Z b e indep enden t Bernoulli ( f 0 ; 1 g -v alued) random v ariables with probabilit y of outome 1 equal to x , y and z , resp etiv ely . Then d ( x; z ) = Pr [ X 6 = Z ℄ = Pr [( X 6 = Y ^ Y = Z ) _ ( X = Y ^ Y 6 = Z )℄ Pr [ X 6 = Y _ Y 6 = Z ℄ Pr [ X 6 = Y ℄ + Pr [ Y 6 = Z ℄ = d ( x; y ) + d ( y ; z ) : F or [0 ; 1℄-v alued funtions f ; g dened on X X , w e next dene D ( f ; g ) = X u;v : u 6 = v d ( f ( u; v ) ; g ( u; v )) : Clearly , D also satises the triangle inequalit y . Let F b e the harateristi funtion of F so that F : X X ! f 0 ; 1 g and F ( u; v ) = 1 if and only if ( u; v ) 2 F . Then from the denition of Loss and DISA GREE, w e ha v e j F j Loss( R ; F ) = D ( R ; F ) D ( R ; PREF) + D (PREF ; F ) = DISA GREE( ; PREF ) + j F j Loss(PREF ; F ) : 2 Notie that the learning algorithm Hedge minimizes the seond term on the righ t hand side of Eq. (3). Belo w, w e onsider the problem of nding an ordering whi h minimizes the rst term, namely , DISA GREE . 249 Cohen, Shapire, & Singer 4. Ordering Instanes with a Preferene F untion 4.1 Measuring the Qualit y of an Ordering W e no w onsider the omplexit y of nding a total order that agrees b est with a learned preferene funtion. T o analyze this, w e m ust rst quan tify the notion of agreemen t b et w een a preferene funtion PREF and an ordering. One natural notion is the follo wing: Let X b e a set, PREF b e a preferene funtion, and let b e a total ordering of X , expressed again as an ordering funtion ( i.e. , ( u ) > ( v ) if and only if u is ab o v e v in the order). F or the analysis of this setion, it is on v enien t to use the measure A GREE( ; PREF ), whi h is dened to b e the sum of PREF( u; v ) o v er all pairs u; v su h that u is rank ed ab o v e v b y : A GREE ( ; PREF ) = X u;v : ( u ) > ( v ) PREF( u; v ) : (4) Clearly , A GREE is a linear transformation of the measure DISA GREE in tro dued in Eq. (2), and hene maximizing A GREE is equiv alen t to minimizing DISA GREE . This denition is also losely related to similarit y metris used in deision theory and information pro ess- ing (Kemen y & Snell, 1962; Fish burn, 1970; Rob erts, 1979; F ren h, 1989; Y ao, 1995) (see the disussion in Se. 6). 4.2 Finding an Optimal Ordering is Hard Ideally one w ould lik e to nd a that maximizes A GREE ( ; PREF ). The general opti- mization problem is of little in terest in our setting, sine there are man y onstrain ts on the preferene funtion that are imp osed b y the learning algorithm. Using the learning algo- rithm of Se. 3, for instane, PREF will alw a ys b e a linear om bination of simpler funtions. Ho w ev er, the theorem b elo w sho ws that this optimization problem is NP-omplete ev en if PREF is restrited to b e a linear om bination of w ell-b eha v ed preferene funtions. In par- tiular, the problem is NP-omplete ev en if all the primitiv e preferene funtions used in the linear om bination are rank orderings whi h map in to a set S with only three elemen ts, one of whi h ma y or ma y not b e ? . (Clearly , if S onsists of more than three elemen ts then the problem is still hard.) Theorem 3 The fol lowing de ision pr oblem is NP- omplete for any set S with j S j 3 : Input: A r ational numb er ; a set X ; a ol le tion of N or dering funtions f i : X ! S ; and a pr efer en e funtion PREF dene d as PREF( u; v ) = N X i =1 w i R f i ( u; v ) (5) wher e w = ( w 1 ; : : : ; w N ) is a r ational weight ve tor in [0 ; 1℄ N with P N i =1 w i = 1 . Question: Do es ther e exist a total or der suh that A GREE ( ; PREF ) ? Pro of: The problem is learly in NP sine a nondeterministi algorithm an guess a total order and he k the w eigh ted n um b er of agreemen ts in p olynomial time. T o pro v e that the problem is NP-hard w e redue from CYCLIC-ORDERING (Galil & Megido, 1977; Gary & Johnson, 1979), dened as follo ws: \Giv en a set A and a olletion 250 Learning to Order Things C of ordered triples ( a; b; ) of distint elemen ts from A , is there a one-to-one funtion f : A ! f 1 ; 2 ; : : : ; j A jg su h that for ea h ( a; b; ) 2 C w e ha v e either f ( a ) > f ( b ) > f ( ) or f ( b ) > f ( ) > f ( a ) or f ( ) > f ( a ) > f ( b )?" Without loss of generalit y , S is either f 0 ; 1 ; ?g or f 0 ; 1 ; 2 g . W e rst sho w that the problem of nding an optimal total order is hard when S = f 0 ; 1 ; ?g . Giv en an instane of CYCLIC-ORDERING, w e let X = A . F or ea h triplet t = ( a; b; ) w e will in tro due three ordering funtions f t; 1 , f t; 2 , and f t; 3 , and dene them so that f t; 1 ( a ) > f t; 1 ( b ), f t; 2 ( b ) > f t; 2 ( ), and f t; 3 ( ) > f t; 3 ( a ). T o do this, w e let f t; 1 ( a ) = f t; 2 ( b ) = f t; 3 ( ) = 1, f t; 1 ( b ) = f t; 2 ( ) = f t; 3 ( a ) = 0, and f t;i ( ) = ? in all other ases. W e let the w eigh t v etor b e uniform, so that w t;i = 1 3 j C j . Let = 5 3 + j A j ( j A j 1) = 2 3 2 : Dene R t ( u; v ) = P 3 i =1 w t;i R f t;i ( u; v ), whi h is the on tribution of these three funtions to PREF( u; v ). Notie that for an y triplet t = ( a; b; ) 2 C , R t ( a; b ) = 2 3 j C j whereas R t ( b; a ) = 1 3 j C j , and similarly for b; and ; a . In addition, for an y pair u; v 2 A su h that at least one of them do es not app ear in t , w e get that R t ( u; v ) = 1 2 j C j . Sine a total order an satisfy at most t w o of the three onditions ( a ) > ( b ), ( b ) > ( ), and ( ) > ( a ), the largest p ossible w eigh ted n um b er of agreemen ts asso iated with this triple is exatly = j C j . If the n um b er of w eigh ted agreemen ts is at least , it m ust b e exatly , b y the argumen t ab o v e; and if there are exatly w eigh ted agreemen ts, then the total order m ust satisfy exatly 2 out of the p ossible 3 relations for ea h three elemen ts that form a triplet from C . Th us, the onstruted rank ordering instane will b e p ositiv e if and only if the original CYCLIC-ORDERING instane is p ositiv e. The ase for S = f 0 ; 1 ; 2 g uses a similar onstrution; ho w ev er, for ea h triplet t = ( a; b; ), w e dene six ordering funtions, f j t; 1 , f j t; 2 , and f j t; 3 , where j 2 f 0 ; 1 g . The basi idea here is to replae ea h f t;i with t w o funtions, f 0 t;i and f 1 t;i , that agree on the single ordering onstrain t asso iated with f t;i , but disagree on all other orderings. F or instane, w e will dene these funtions so that f j t; 1 ( a ) > f j t; 1 ( b ) for j = 0 and j = 1, but for all other pairs u; v , f 1 t; 1 ( u ) > f 1 t; 1 ( v ) i f 0 t; 1 ( v ) > f 0 t; 1 ( u ). Av eraging the t w o orderings f 0 t; 1 and f 1 t; 1 will th us yield the same preferene expressed b y the original funtion f t; 1 ( i.e. , a preferene for a > b only). In more detail, w e let f j t; 1 ( a ) = f j t; 2 ( b ) = f j t; 3 ( ) = 2 j , f j t; 1 ( b ) = f j t; 2 ( ) = f j t; 3 ( a ) = 1 j , and f j t;i ( ) = 2 j in all other ases. W e again let the w eigh t v etor b e uniform, so that w j t;i = 1 6 j C j . Similar to the rst ase, w e dene R t ( u; v ) = P i;j w t;i R f j t;i ( u; v ). It an b e v eried that R t is iden tial to the R t onstruted in the rst ase. Therefore, b y the same argumen t, the onstruted rank ordering instane will b e p ositiv e if and only if the original CYCLIC-ORDERING instane is p ositiv e. 2 Although this problem is hard when j S j 3, the next theorem sho ws that it b eomes tratable for linear om binations of rank orderings in to a set S of size t w o. Of ourse, when j S j = 2, the rank orderings are really only binary lassiers. The fat that this sp eial ase is tratable undersores the fat that manipulating orderings (ev en relativ ely simple 251 Cohen, Shapire, & Singer ones) an b e omputationally more diÆult than p erforming the orresp onding op erations on binary lassiers. Theorem 4 The fol lowing optimization pr oblem is solvable in line ar time: Input: A set X ; a set S with j S j = 2 ; a ol le tion of N or dering funtions f i : X ! S ; and a pr efer en e funtion PREF dene d by Eq. (5). Output: A total or der dene d by an or dering funtion whih maximizes A GREE ( ; PREF ) . Pro of: Assume without loss of generalit y that the t w o-elemen t set S is f 0 ; 1 g , and dene ( u ) = P i w i f i ( u ). W e no w sho w that an y total order 1 onsisten t with maximizes A GREE ( ; PREF). Fix a pair u; v 2 X and let q b 1 b 2 = X i s.t. f i ( u )= b 1 ;f i ( v )= b 2 w i : W e an no w rewrite and PREF as ( u ) = q 10 + q 11 PREF( u; v ) = q 10 + 1 2 q 11 + 1 2 q 00 ( v ) = q 01 + q 11 PREF( v ; u ) = q 01 + 1 2 q 11 + 1 2 q 00 : Note that b oth ( u ) ( v ) and PREF( u; v ) PREF( v ; u ) are equal to q 10 q 01 . Hene, if ( u ) > ( v ) then PREF( u; v ) > PREF( v ; u ). Therefore, for ea h pair u; v 2 X , the order dened b y agrees on al l pairs with the pairwise preferene dened b y PREF. In other w ords, w e ha v e sho wn that A GREE ( ; PREF ) = X f u;v g max f PREF ( u; v ) ; PREF ( v ; u ) g (6) where the sum is o v er all unordered pairs. Clearly , the righ t hand side of Eq. (6) maximizes the righ t hand side of Eq. (4) sine at most one of ( u; v ) or ( v ; u ) an b e inluded in the latter sum. 2 4.3 Finding an Appro ximately Optimal Ordering Theorem 3 implies that w e are unlik ely to nd an eÆien t algorithm that nds the optimal total order for a w eigh ted om bination of rank orderings. F ortunately , there do exist eÆ- ien t algorithms for nding an appr oximately optimal total order. In fat, nding a go o d total order is losely related to the problem of nding the minim um feedba k ar set, for whi h there exist go o d appro ximation algorithms; see, for instane, (Shmo ys, 1997) and the referenes therein. Ho w ev er, the algorithms that a hiev e the go o d appro ximation re- sults for the minim um feedba k ar set problem are based on (or further appro ximate) a linear-programming relaxation (Seymour, 1995; Ev en, Naor, Rao, & S hieb er, 1996; Berger & Shor, 1997; Ev en, Naor, S hieb er, & Sudan, 1998) whi h is rather omplex to implemen t and quite slo w in pratie. 1. Notie that in ase of a tie, so that ( u ) = ( v ) for distint u; v , denes only a partial order. The theorem holds for an y total order whi h is onsisten t with this partial order, i.e. , for an y 0 so that ( u ) > ( v ) ) 0 ( u ) > 0 ( v ). 252 Learning to Order Things Algorithm Greedy-Order Inputs: an instane set X ; a preferene funtion PREF Output: an appro ximately optimal ordering funtion ^ let V = X for ea h v 2 V do ( v ) = P u 2 V PREF( v ; u ) P u 2 V PREF( u; v ) while V is non-empt y do let t = arg max u 2 V ( u ) let ^ ( t ) = j V j V = V f t g for ea h v 2 V do ( v ) = ( v ) + PREF( t; v ) PREF( v ; t ) endwhile Figure 3: The greedy ordering algorithm. W e desrib e instead a simple greedy algorithm whi h is v ery simple to implemen t. Fig- ure 3 summarizes the greedy algorithm. As w e will shortly demonstrate, this algorithm pro dues a go o d appro ximation to the b est total order. The algorithm is easiest to desrib e b y thinking of PREF as a direted w eigh ted graph, where initially , the set of v erties V is equal to the set of instanes X , and ea h edge u ! v has w eigh t PREF( u; v ). W e assign to ea h v ertex v 2 V a p otential v alue ( v ), whi h is the w eigh ted sum of the outgoing edges minus the w eigh ted sum of the ingoing edges. That is, ( v ) = X u 2 V PREF( v ; u ) X u 2 V PREF( u; v ) : The greedy algorithm then pi ks some no de t that has maxim um p oten tial 2 , and assigns it a rank b y setting ^ ( t ) = j V j , eetiv ely ordering it ahead of all the remaining no des. This no de, together with all iniden t edges, is then deleted from the graph, and the p oten tial v alues of the remaining v erties are up dated appropriately . This pro ess is rep eated un til the graph is empt y . Notie that no des remo v ed in subsequen t iterations will ha v e progressiv ely smaller and smaller ranks. As an example, onsider the preferene funtion dened b y the leftmost graph of Fig. 4. (This graph is iden tial to the w eigh ted om bination of the t w o ordering funtions from Fig. 1.) The initial p oten tials the algorithm assigns are: ( b ) = 2, ( d ) = 3 = 2, ( ) = 5 = 4, and ( a ) = 9 = 4. Hene, b has maximal p oten tial. It is giv en a rank of 4, and then no de b and all iniden t edges are remo v ed from the graph. The result is the middle graph of Fig. 4. After deleting b , the p oten tials of the remaining no des are up dated: ( d ) = 3 = 2, ( ) = 1 = 4, and ( a ) = 5 = 4. Th us, d will b e assigned rank j V j = 3 and remo v ed from the graph, resulting in the righ tmost graph of Fig. 4. After up dating p oten tials again, ( ) = 1 = 2 and ( a ) = 1 = 2. No w will b e assigned rank j V j = 2 and remo v ed, resulting in a graph on taining the single no de a , whi h will 2. Ties an b e brok en arbitrarily in ase of t w o or more no des with the same p oten tial. 253 Cohen, Shapire, & Singer a b c d 7/8 7/8 1 1/4 1 3/4 1/8 1/8 a c d 7/8 7/8 1/4 3/4 1/8 1/8 a c 1/4 3/4 Figure 4: Beha vior of the greedy ordering algorithm. The leftmost graph is the original input. F rom this graph, no de b will b e assigned maximal rank and deleted, leading to the middle graph; from this graph, no de d will deleted, leading to the righ tmost graph. In the righ tmost graph, no de will b e rank ed ahead of no de a , leading the total ordering b > d > > a . nally b e assigned the rank j V j = 1. The ordering pro dued b y the greedy algorithm is th us b > d > > a . The next theorem sho ws that this greedy algorithm omes within a fator of t w o of optimal. Theorem 5 L et OPT (PREF) b e the weighte d agr e ement ahieve d by an optimal total or der for the pr efer en e funtion PREF , and let APPR O X (PREF) b e the weighte d agr e ement ahieve d by the gr e e dy algorithm. Then APPR O X(PREF ) 1 2 OPT (PREF) : Pro of: Consider the edges that are iniden t on the no de v j whi h is seleted on the j -th rep etition of the while lo op of Figure 3. The ordering pro dued b y the algorithm will agree with all of the outgoing edges of v j and disagree with all of the ingoing edges. Let a j b e the sum of the w eigh ts of the outgoing edges of v j , and d j b e the sum of the w eigh ts of the ingoing edges. Clearly APPR O X(PREF ) P j V j j =1 a j . Ho w ev er, at ev ery rep etition, the total w eigh t of all inoming edges m ust equal the total w eigh t of all outgoing edges. This means that P v 2 V ( v ) = 0, and hene for the no de v ? that has maximal p oten tial, ( v ? ) 0. Th us on ev ery rep etition j , it m ust b e that a j d j , so w e ha v e that OPT (PREF ) j V j X j =1 ( a j + d j ) j V j X j =1 ( a j + a j ) 2 APPR O X (PREF) : The rst inequalit y holds b eause OPT (PREF ) an at b est inlude ev ery edge in the graph, and sine ev ery edge is remo v ed exatly one, ea h edge m ust on tribute to some a j or some d j . 2 254 Learning to Order Things 2k+3 k+2 k+3 k+1 1 2 k k+1 1 2 k k+2 2k+3 Figure 5: An example of a graph (left) for whi h the no de-based greedy algorithm a hiev es an appro ximation fator of 1 2 b y onstruting the partial order on the righ t. In passing, w e note that there are other natural greedy algorithms that do not a hiev e go o d appro ximations. Consider, for example, an algorithm that starts from a graph on- sisting of all the no des but with no edges, and iterativ ely adds the highest w eigh ted edge in the graph, while a v oiding yles. It an b e sho wn that this algorithm an pro due a v ery p o or partial order, giv en an adv ersarially hosen graph; there are ases where the optimal total order a hiev es a m ultipliativ e fator of O ( j V j ) more w eigh ted agreemen ts than this \edge-based" greedy algorithm. 4.4 Impro v emen ts to the Greedy Algorithm The appro ximation fator of t w o giv en in Theorem 5 is tigh t. That is, there exist problems for whi h the greedy algorithm appro ximation is w orse than the optimal solution b y a fator arbitrarily lose to t w o. Consider the graph sho wn on the left-hand side of Fig. 5. An optimal total order ranks the instanes aording to their p osition in the gure, left to righ t, breaking ties randomly , and a hiev es OPT(PREF) = 2 k + 2 w eigh ted agreemen ts. Ho w ev er, the greedy algorithm pi ks the no de lab eled k + 1 rst and orders all the remaining no des randomly , a hieving as few as APPR O X (PREF) = k + 2 agreemen ts. F or large k , the ratio APPR O X (PREF) = OPT(PREF) approa hes 1 2 . F or graph of Figure 5, there is another simple algorithm whi h pro dues an optimal ordering: sine the graph is already a partial order, pi king an y total order onsisten t with this partial order giv es an optimal result. T o op e with problems su h as the one of Figure 5, w e devised an impro v emen t to the greedy algorithm whi h om bines a greedy metho d with top ologial sorting. The aim of the impro v emen t is to nd b etter appro ximations for graphs whi h are omp osed of man y strongly onneted omp onen ts. As b efore, the mo died algorithm is easiest to desrib e b y thinking of PREF as a w eigh ted direted graph. Reall that for ea h pair of no des u and v , there exist t w o edges: one from u to v with w eigh t PREF( u; v ) and one from v to u with w eigh t PREF( v ; u ). In the mo died greedy algorithm w e will pre-pro ess the graph. F or ea h pair of no des, w e 255 Cohen, Shapire, & Singer Algorithm SCC-Greedy-Order Inputs: an instane set X ; a preferene funtion PREF Output: an appro ximately optimal ordering funtion ^ Dene PREF 0 ( u; v ) = max f PREF ( u; v ) PREF( v ; u ) ; 0 g : Find strongly onneted omp onen ts U 1 ; : : : ; U k of the graph G = ( V ; E ) where V = X and E = f ( u; v ) j PREF 0 ( u; v ) > 0 g : Order the strongly onneted omp onen ts in an y w a y onsisten t with the partial order < s : U < s U 0 i 9 u 2 U; u 0 2 U 0 : ( u; u 0 ) 2 E Use algorithm Greedy-Order or full en umeration to order the instanes within ea h om- p onen t U i aording to PREF 0 . Figure 6: The impro v ed greedy ordering algorithm. remo v e the edge with the smaller w eigh t and set the w eigh t of the other edge to b e j PREF( v ; u ) PREF( u; v ) j : F or the sp eial ase where PREF( v ; u ) = PREF( u; v ) = 1 2 , w e remo v e b oth edges. In the redued graph, there is at most one direted edge b et w een ea h pair of no des. Note that the greedy algorithm w ould b eha v e iden tially on the transformed graph sine it is based on the w eigh ted dier en es b et w een the inoming and outgoing edges. W e next nd the strongly onneted omp onen ts 3 of the redued graph, ignoring (for no w) the w eigh ts. One an no w split the edges of the redued graph in to t w o lasses: inter- omp onent edges onnet no des u and v , where u and v are in dieren t strongly onneted omp onen ts; and intr a- omp onent edges onnet no des u and v from the same strongly onneted omp onen t. It is straigh tforw ard to v erify that an y optimal order agrees with all the in ter-omp onen t edges. Put another w a y , if there is an edge from no de u to no de v of t w o dieren t onneted omp onen ts in the redued graph, then ( u ) > ( v ) for an y optimal total order . The rst step of the impro v ed algorithm is th us to totally order the strongly onneted omp onen ts in some w a y onsisten t with the partial order dened b y the in ter-omp onen t edges. More preisely , w e pi k a total ordering for the omp onen ts onsisten t with the partial order < s , dened as follo ws: for omp onen ts U and U 0 , U < s U 0 i there is an edge from some no de u 2 U to some no de u 0 2 U 0 in the redued graph. W e next order the no des within ea h strongly onneted omp onen t, th us pro viding a total order of all no des. Here the greedy algorithm an b e used. As an alternativ e, in ases where a omp onen t on tains only a few elemen ts (sa y at most v e), one an nd the optimal order b et w een the elemen ts of the omp onen t b y a brute-fore approa h, i.e. , b y full en umeration of all p erm utations. 3. Tw o no des u and v are in the same strongly onneted omp onen t i there are direted paths from u to v and from v to u . 256 Learning to Order Things a b c d 0.55 0.45 0.95 0.05 0.65 0.35 0.4 0.6 0.5 0.5 0.55 0.45 ) a b c d 0.1 0.9 0.3 0.2 0.1 ) a b c d 0.1 0.9 0.3 0.2 0.1 0.1 0.3 0.2 b c d a 0.9 Figure 7: An illustration of the appro ximation algorithm for nding a total order from a w eigh ted om bination of ordering funtions. The original graph (top left) is redued b y remo ving at least one edge for ea h edge-pair ( u; v ) and ( v ; u ) (middle). The strongly onneted omp onen ts are then found (righ t). Finally , an ordering is found within ea h strongly onneted omp onen t whi h yield the order b > > d > a (b ottom). The impro v ed algorithm is summarized in Figure 6 and illustrated in Figure 7. There are four elemen ts in Figure 7 whi h onstitute t w o strongly onneted omp onen ts in the redued graph ( f b g and f a; ; d g ). Therefore, b is assigned the top rank and rank ed ab o v e a , and d . If the brute-fore algorithm w ere used to order the omp onen ts, then w e w ould he k all 3! p erm utations b et w een a , and d and output the total order b > > d > a , whi h is the optimal order in this to y example. In the w orst ase, the redued graph on tains only a single strongly onneted om- p onen t. In this ase, the impro v ed algorithm generates the same ordering as the greedy algorithm. Ho w ev er, in the exp erimen ts on metasear h problems desrib ed in Se. 5, man y of the strongly onneted omp onen ts are small; the a v erage size of a strongly onneted omp onen t is less than v e. In ases su h as these, the impro v ed algorithm will often impro v e on the simple greedy algorithm. 4.5 Exp erimen ts with the Ordering Algorithms Ideally , ea h algorithm w ould b e ev aluated b y determining ho w losely it appro ximates the optimal ordering on large, realisti problems. Unfortunately , nding the optimal ordering for large graphs is impratial. W e th us p erformed t w o sets of exp erimen ts with the ordering algorithms desrib ed ab o v e. In the rst set of exp erimen ts, w e ev aluated the algorithms on small graphs|sp eially , graphs for whi h the optimal ordering ould b e feasibly found with brute-fore en umeration. In these exp erimen ts, w e measure the \go o dness" of the resulting orderings relativ e to the optimal ordering. In the seond set of exp erimen ts, w e ev aluated the algorithms on large graphs for whi h the optimal orderings are unkno wn. In these exp erimen ts, w e ompute a \go o dness" measure whi h dep ends on the total w eigh t of all edges, rather than the optimal ordering. 257 Cohen, Shapire, & Singer In addition to the simple greedy algorithm and its impro v emen t, w e also onsidered the follo wing simple randomized algorithm: pi k a p erm utation at random, and then output the b etter of that p erm utation and its rev erse. It an b e easily sho wn that this algorithm a hiev es the same appro ximation b ound on exp eted p erformane as the greedy algorithm. (Briey , one of the t w o p erm utations m ust agree with at least half of the w eigh ted edges in the graph.) The random algorithm an b e impro v ed b y rep eating the pro ess, i.e. , examining man y random p erm utations and their rev erses, and ho osing the p erm utation that a hiev es the largest n um b er of w eigh ted agreemen ts. In a rst set of exp erimen ts, w e ompared the p erformane of the greedy appro ximation algorithm, the impro v ed algorithm whi h rst nds strongly onneted omp onen ts, and the randomized algorithm on graphs of nine or few er elemen ts. F or ea h n um b er of elemen ts, w e generated 10 ; 000 random graphs b y ho osing PREF( u; v ) uniformly at random, and setting PREF( v ; u ) to 1 PREF( u; v ). F or the randomized algorithm, w e ev aluated 10 n random p erm utations (and their rev erses) where n is the n um b er of instanes (no des). T o ha v e a fair omparison b et w een the dieren t algorithms on the smaller graphs, w e alw a ys used the greedy algorithm (rather than a brute-fore algorithm) to order the elemen ts of ea h strongly onneted omp onen t of a graph. T o ev aluate the algorithms, w e examined the redued graph and alulated the a v erage ratio of the w eigh ts of the edges hosen b y the appro ximation algorithm to the w eigh ts of the edges that w ere hosen b y the optimal order. More preisely , let b e the optimal order and ^ b e an order hosen b y an appro ximation algorithm. Then for ea h random graph, w e alulated X u; v : ^ ( u ) > ^ ( v ) max f PREF( u; v ) PREF( v ; u ) ; 0 g X u; v : ( u ) > ( v ) max f PREF( u; v ) PREF( v ; u ) ; 0 g : If this measure is 0.9, for instane, then the total w eigh t of the edges in the total order pi k ed b y the appro ximation algorithm is 90% of the orresp onding gure for the optimal algorithm. W e a v eraged the ab o v e ratios o v er all random graphs of the same size. The results are sho wn on the left hand side of Figure 8. On the righ t hand side of the gure, w e sho w the a v erage running time for ea h of the algorithms as a funtion of the n um b er of elemen ts. When the n um b er of rank ed elemen ts is more than v e, the greedy algorithms outp erform the randomized algorithm, while their running time is m u h smaller. Th us, if a full en umeration had b een used to nd the optimal order of small strongly onneted omp onen ts, the appro ximation w ould ha v e b een onsisten tly b etter than the randomized algorithm. W e note that the greedy algorithm also generally p erforms b etter on a v erage than the lo w er b ound giv en in Theorem 5. In fat, om bining the greedy algorithm with pre- partitioning of the graph in to strongly onneted omp onen ts often yields the optimal order. In the seond set of exp erimen ts, w e measured p erformane and running time for larger random graphs. Sine for large graphs w e annot nd the optimal solution b y brute-fore en umeration, w e use as a \go o dness" measure the ratio of the w eigh ts of the edges that w ere left in the redued graph after applying an appro ximation algorithm to the total w eigh t of 258 Learning to Order Things 3 4 5 6 7 8 9 0.88 0.9 0.92 0.94 0.96 0.98 1 Number of elements Fraction of optimal solution Greedy SCC + Greedy Randomized 3 4 5 6 7 8 9 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Number of elements Running time (seconds) Greedy SCC + Greedy Randomized Figure 8: Comparison of go o dness (left) and the running time (righ t) of the appro ximations a hiev ed b y the greedy algorithms and the randomized algorithm as a funtion of the n um b er of rank ed elemen ts for random preferene funtions with 3 through 9 elemen ts. 5 10 15 20 25 30 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 Number of elements Fraction of total weight Greedy SCC + Greedy Randomized 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Number of elements Running time (seconds) Greedy SCC + Greedy Randomized Figure 9: Comparison of go o dness (left) and the running time (righ t) of the appro ximations a hiev ed b y the greedy algorithms and the randomized algorithm as a funtion of the n um b er of rank ed elemen ts for random preferene funtions with 3 through 30 elemen ts. Note that the graphs for Greedy and SCC+Greedy oinide for most of the p oin ts. 259 Cohen, Shapire, & Singer edges in the graph. That is, for ea h random graph w e alulated X u; v : ^ ( u ) > ^ ( v ) max f PREF( u; v ) PREF( v ; u ) ; 0 g X u; v max f PREF( u; v ) PREF( v ; u ) ; 0 g : W e ran the three algorithms with the same parameters as ab o v e ( i.e. , 10 ; 000 random graphs). The results are giv en in Figure 9. The adv an tage of the greedy algorithms o v er the randomized algorithm is ev en more apparen t on these larger problems. Note also that for large graphs the p erformane of the t w o greedy algorithms is indistinguishable. This is mainly due to the fat that large r andom graphs are strongly onneted with high proba- bilit y . T o summarize the exp erimen ts, when there are six or more elemen ts the greedy algorithm learly outp erforms the randomized algorithm ev en if man y randomly hosen p erm utations are examined. F urthermore, the impro v ed algorithm whi h rst nds the strongly onneted omp onen ts outp erforms the randomized algorithm for all graph sizes. In pratie the impro v ed greedy algorithm a hiev es v ery go o d appro ximations|within ab out 5 p eren t of optimal, for the ases in whi h optimal graphs an b e feasibly found. 5. Exp erimen tal Results for Metasear h So far, w e ha v e desrib ed a metho d for learning a preferene funtion, and a means of on v erting a preferene funtion in to an ordering of new instanes. W e will no w presen t some exp erimen tal results in learning to order. In partiular, w e will desrib e results on learning to om bine the orderings of sev eral w eb \sear h exp erts" using the algorithm of Figure 2 to learn a preferene funtion, and the simple greedy algorithm to order instanes using the learned preferene funtion. The goals of these exp erimen ts are to illustrate the t yp e of problems that an b e solv ed with our metho d; to empirially ev aluate the learning metho d; to ev aluate the ordering algorithm on large, non-random graphs, su h as migh t arise in a realisti appliation; and to onrm the theoretial results of the preeding setions. W e th us restrit ourselv es to omparing the learned orderings to individual sear h exp erts, as is suggested b y Theorem 1, rather than attempt to ompare this appliation of learning- to-order with previous exp erimen tal te hniques for metasear h, e.g., (Lo h baum & Streeter, 1989; Kan tor, 1994; Bo y an, F reitag, & Joa hims, 1994; Bartell, Cottrell, & Belew, 1994). W e note that this metasear h problem exhibits sev eral prop erties that suggest a general approa h su h as ours. F or instane, approa hes that learn to om bine similarit y sores are not appliable, sine the similarit y sores of w eb sear h engines are often una v ailable. In the exp erimen ts presen ted here, the learning algorithm w as pro vided with ordered lists for ea h sear h engine without an y asso iated sores. T o further demonstrate the merits of our approa h, w e also desrib e exp erimen ts with partial feedba k|that is, with preferene judgmen ts that are less informativ e than the relev ane judgmen ts more t ypially used in impro ving sear h engines. 260 Learning to Order Things ML Sear h Exp erts UNIV Sear h Exp erts NAME NAME \NAME" \NAME" title:\NAME" \NAME" PLA CE NAME +LASTNAME title:\home page" title:NAME NAME +LASTNAME title:homepage title:\NAME" NAME +LASTNAME ma hine learning title:\NAME" PLA CE NAME +LASTNAME \ma hine learning" NAME title:\home page" NAME +LASTNAME ase based reasoning NAME title:\homepage" NAME +LASTNAME \ase based reasoning" NAME w elome NAME +LASTNAME PLA CE NAME url:index.h tml NAME +LASTNAME \PLA CE" NAME url:home.h tml NAME +LASTNAME url:index.h tml \NAME" title:\home page" NAME +LASTNAME url:home.h tml \NAME" title:\homepage" NAME +LASTNAME url: ~ *LASTNAME* \NAME" w elome NAME +LASTNAME url: ~ LASTNAME \NAME" url:index.h tml NAME +LASTNAME url:LASTNAME \NAME" url:home.h tml \NAME" PLA CE title:\home page" \NAME" PLA CE title:\homepage" \NAME" PLA CE w elome \NAME" PLA CE url:index.h tml \NAME" PLA CE url:home.h tml T able 1: Sear h (and ranking) exp erts used in the metasear h exp erimen ts. In the asso- iated queries, NAME is replaed with the p erson's (or univ ersit y's) full name, LASTNAME with the p erson's last name, and PLA CE is replaed with the p er- son's aÆliation (or univ ersit y's lo ation). Sequenes of w ords enlosed in quotes m ust app ear as a phrase, and terms prexed b y title: and url: m ust app ear in that part of the w eb page. W ords prexed b y a \+" m ust app ear in the w eb page; other w ords ma y or ma y not app ear. 5.1 T est Problems and Eno ding W e hose to sim ulate the problem of learning a domain-sp ei sear h engine| i.e. , an engine that sear hes for pages of a partiular, narro w t yp e. Aho y! (Shak es, Langheinri h, & Etzioni, 1997) is one instane of su h a domain-sp ei sear h engine. As test ases, w e pi k ed t w o problems: retrieving the home pages of ma hine learning resear hers (ML), and retrieving the home pages of univ ersities (UNIV). T o obtain sample queries, w e obtained a listing of ma hine learning resear hers, iden tied b y name and aÆliated institution, together with their home pages, 4 and a similar list for univ ersities, iden tied b y name and (sometimes) geographial lo ation. 5 Ea h en try on a list w as view ed as a query , with the asso iated URL the sole relev an t w eb page. 4. F rom h ttp://www.ai.nrl.na vy .mil/ aha/resear h/ma hine-learning.h tml, a list main tained b y Da vid Aha. 5. F rom Y aho o! 261 Cohen, Shapire, & Singer W e then onstruted a series of sp eial-purp ose \sear h exp erts" for ea h domain. These w ere implemen ted as query expansion metho ds whi h on v erted a name/aÆliation pair (or a name/lo ation pair) to a lik ely-seeming Alta vista query . F or example, one exp ert for the UNIV domain sear hed for the univ ersit y name app earing as a phrase, together with the phrase \home page" in the title; another exp ert for the ML domain sear hed for all the w ords in the p erson's name plus the w ords \ma hine" and \learning," and further enfores a strit requiremen t that the p erson's last name app ear. Ov erall, w e dened 16 sear h exp erts for the ML domain and 22 for the UNIV domain; these are summarized in T able 1. Ea h sear h exp ert returned the top 30 rank ed w eb pages. In the ML domain, there w ere 210 sear hes for whi h at least one sear h exp ert returned the named home page; for the UNIV domain, there w ere 290 su h sear hes. The task of the learning system is to nd an appropriate w a y of om bining the output of these sear h exp erts. T o giv e a more preise desription of the sear h exp erts, for ea h query t , w e rst onstruted the set X t onsisting of all w eb pages returned b y all of the expanded queries dened b y the sear h exp erts. Next, ea h sear h exp ert i w as represen ted as a preferene funtion R t i . W e hose these preferene funtions to b e rank orderings dened with resp et to an ordering funtion f t i in the natural w a y: w e assigned a rank of f t i = 30 to the rst listed page, f t i = 29 to the seond-listed page, and so on, nally assigning a rank of f t i = 0 to ev ery page not retriev ed in the top 30 b y the expanded query asso iated with exp ert i . T o eno de feedba k, w e onsidered t w o s hemes. In the rst, w e sim ulated omplete relev ane feedba k|that is, for ea h query , w e onstruted feedba k in whi h the sole relev an t page w as preferred to all other pages. In the seond, w e sim ulated the sort of feedba k that ould b e olleted from \li k data"| i.e. , from observing a user's in terations with a metasear h system. F or ea h query , after presen ting a rank ed list of pages, w e noted the rank of the one relev an t w eb page. W e then onstruted a feedba k ranking in whi h the relev an t page is preferred to all preeding pages. This w ould orresp ond to observing whi h link the user atually follo w ed, and making the assumption that this link w as preferred to previous links. It should b e emphasized that b oth of these forms of feedba k are sim ulated, and on tain less noise than w ould b e exp eted from real user data. In realit y some fration of the relev ane feedba k w ould b e missing or erroneous, and some fration of li k data w ould not satisfy the assumption stated ab o v e. 5.2 Ev aluation and Results T o ev aluate the exp eted p erformane of a fully-trained system on no v el queries in this domain, w e emplo y ed lea v e-one-out testing. F or ea h query t , w e trained the learning system on all the other queries, and then reorded the rank of the learned system on query t . F or omplete relev ane feedba k, this rank is in v arian t of the ordering of the training examples, but for the \li k data" feedba k, it is not; the feedba k olleted at ea h stage dep ends on the b eha vior of the partially learned system, whi h in turn dep ends on the previous training examples. Th us for li k data training, w e trained on 100 randomly hosen p erm utations of the training data and reorded the median rank for t . 262 Learning to Order Things 5.2.1 Perf ormane Rela tive to Individual Exper ts The theoretial results pro vide a guaran tee of p erformane relativ e to the p erformane of the b est individual sear h (ranking) exp ert. It is therefore natural to onsider omparing the p erformane of the learned system to the b est of the individual exp erts. Ho w ev er, for ea h sear h exp ert, only the top 30 rank ed w eb pages for a query are kno wn; if the single relev an t page for a query is not among these top 30, then it is imp ossible to ompute an y natural measures of p erformane for this query . This ompliates an y omparison of the learned system to the individual sear h exp erts. Ho w ev er, in spite of the inomplete information ab out the p erformane of the sear h exp erts, it is usually p ossible to tell if the learned system ranks a w eb page higher than a partiular exp ert. 6 Motiv ated b y this, w e p erformed a sign test: w e ompared the rank of the learning systems to the rank giv en b y ea h sear h exp ert, he king to see whether this rank w as lo w er, and disarding queries for whi h this omparison w as imp ossible. W e then used a normal appro ximation to the binomial distribution to test the follo wing t w o n ull h yp otheses (where the probabilit y is tak en o v er the distribution from whi h the queries are dra wn): H1. With probabilit y at least 0.5, the sear h exp ert p erforms b etter than the learning system ( i.e. , giv es a lo w er rank to the relev an t page than the learning system do es.) H2. With probabilit y at least 0.5, the sear h exp ert p erforms no w orse than the learning system ( i.e. , giv es an equal or lo w er rank to the relev an t page.) In training, w e explored learning rates in the range [0 : 001 ; 0 : 999℄. F or omplete feedba k in the ML domain, h yp othesis H1 an b e rejeted with high ondene ( p > 0 : 999) for ev ery sear h exp ert and ev ery learning rate 0 : 01 0 : 99. The same holds in the UNIV domain for all learning rates 0 : 02 0 : 99. The results for li k data training are nearly as strong, exept that 2 of the 22 sear h exp erts in the UNIV domain sho w a greater sensitivit y to the learning rate: for these engines, H1 an only b e rejeted with high ondene for 0 : 3 0 : 6. T o summarize, with high ondene, in b oth domains, the learned ranking system is no w orse than an y individual sear h exp ert for mo derate v alues of . Hyp othesis H2 is more stringen t sine it an b e rejeted only if w e are sure that the learned system is stritly b etter than the exp ert. With omplete feedba k in the ML domain and 0 : 3 0 : 8, h yp othesis H2 an b e rejeted with ondene p > 0 : 999 for 14 of the 16 sear h exp erts. F or the remaining t w o exp erts the learned system do es p erform b etter more often, but the dierene is not signian t. In the UNIV domain, the results are similar. F or 0 : 2 0 : 99, h yp othesis H2 an b e rejeted with ondene p > 0 : 999 for 21 of the 22 sear h exp erts, and the learned engine tends to p erform b etter than the single remaining exp ert. Again, the results for li k data training are only sligh tly w eak er. In the ML domain, h yp othesis H2 an b e rejeted for all but three exp erts for all but the most extreme learning rates; in the UNIV domain, h yp othesis H2 an b e rejeted for all but t w o exp erts for 0 : 4 0 : 6. F or the remaining exp erts and learning rates the dierenes are not statistially 6. The only time this annot b e determined is when neither the learned system nor the exp ert ranks the relev an t w eb pages in the top 30, a ase of little pratial in terest. 263 Cohen, Shapire, & Singer signian t; ho w ev er, it is not alw a ys the ase that the learned engine tends to p erform b etter. T o summarize the exp erimen ts, for mo derate v alues of the learned system is, with high ondene, stritly b etter than most of the sear h exp erts in b oth domains, and nev er signian tly w orse than an y exp ert. When trained with full relev ane judgmen ts, the learned system p erforms b etter on a v erage than an y individual exp ert. 5.2.2 Other Perf ormane Measures W e measured the n um b er of queries for whi h the orret w eb page w as in the top k rank ed pages, for v arious v alues of k . These results are sho wn in Figure 10. Here the lines sho w the p erformane of the learned systems (with = 0 : 5, a generally fa v orable learning rate) and the p oin ts orresp ond to the individual exp erts. In most ases, the learned system losely tra ks the p erformane of the b est exp ert at ev ery v alue of k . This is esp eially in teresting sine no single exp ert is b est at all v alues of k . The nal graph in this gure in v estigates the sensitivit y of this measure to the learning rate . As a represen tativ e illustration, w e v aried in the ML domain and plotted the top- k p erformane of the system learned from omplete feedba k for three v alues of k . Note that p erformane is roughly omparable o v er a wide range of v alues for . Another plausible measure of p erformane is the a v erage rank of the (single) relev an t w eb page. W e omputed an appro ximation to a v erage rank b y artiially assigning a rank of 31 to ev ery page that w as either unrank ed, or rank ed ab o v e rank 30. (The latter ase is to b e fair to the learned system, whi h is the only one for whi h a rank greater than 30 is p ossible.) A summary of these results for = 0 : 5 is giv en in T able 2, together with some additional data on top- k p erformane. In the table, w e giv e the top- k p erformane for three v alues of k , and a v erage rank for sev eral ranking systems: the t w o learned systems; the naiv e query , i.e. , the p erson or univ ersit y's name; and the single sear h exp ert that p erformed b est with resp et to ea h p erformane measure. Note that not all of these exp erts are distint sine sev eral exp erts sored the b est on more than one measure. The table illustrates the robustness of the learned systems, whi h are nearly alw a ys omp etitiv e with the b est exp ert for ev ery p erformane measure listed. The only exeption to this is that the system trained on li k data trails the b est exp ert in top- k p erformane for small v alues of k . It is also w orth noting that in b oth domains, the naiv e query (simply the p erson or univ ersit y's name) is not v ery eetiv e: ev en with the w eak er li k data feedba k, the learned system a hiev es a 36% derease in a v erage rank o v er the naiv e query in the ML domain, and a 46% derease in the UNIV domain. T o summarize the exp erimen ts, on these domains the learned system not only p erforms m u h b etter than naiv e sear h strategies, but also onsisten tly p erforms at least as w ell as, and p erhaps sligh tly b etter than, an y single domain-sp ei sear h exp ert. This observ ation holds regardless of the p erformane metri onsidered; for nearly ev ery metri w e omputed, the learned system alw a ys equals, and usually exeeds, the p erformane of the sear h exp ert that is b est for that metri. Finally , the p erformane of the learned system is almost as go o d with the w eak er \li k data" training as with omplete relev ane feedba k. 264 Learning to Order Things 0 50 100 150 200 250 0 5 10 15 20 25 30 35 # queries in top k k ML: queries answered in top k Learned System - Full feedback Learned System - Click data Individual Rankers 0 50 100 150 200 250 300 0 5 10 15 20 25 30 35 # queries in top k k UNIV: queries answered in top k Learned System - Full feedback Learned System - Click data Individual Rankers 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0 0.2 0.4 0.6 0.8 1 % Relevant Learning Rate k=1 k=4 k=8 Figure 10: T op and middle: P erformane of the learned system v ersus individual exp erts for t w o dieren t domains. Bottom: the p eren tage of time the relev an t w eb page w as in the top- k list for k = 1,4, and 8. 265 Cohen, Shapire, & Singer ML Domain Univ ersit y Domain T op 1 T op 10 T op 30 Avg Rank T op 1 T op 10 T op 30 Avg Rank Learned (F ull F eed.) 114 185 198 4.9 111 225 253 7.8 Learned (Cli k Data) 93 185 198 4.9 87 229 259 7.8 Naiv e 89 165 176 7.7 79 157 191 14.4 Best (T op 1) 119 170 184 6.7 112 221 247 8.2 Best (T op 10) 114 182 190 5.3 111 223 249 8.0 Best (T op 30) 97 181 194 5.6 111 223 249 8.0 Best (Avg Rank) 114 182 190 5.3 111 223 249 8.0 T able 2: Comparison of learned systems and individual sear h queries. 6. Related W ork Problems that in v olv e ordering and ranking ha v e b een in v estigated in v arious elds su h as deision theory , the so ial sienes, information retriev al and mathematial eonomis (Bla k, 1958; Kemen y & Snell, 1962; Co op er, 1968; Fish burn, 1970; Rob erts, 1979; Salton & MGill, 1983; F ren h, 1989; Y ao, 1995). Among the w ealth of literature on the sub jet, the losest to ours app ears to b e the w ork of Kemen y and Snell (1962) whi h w as extended b y Y ao (1995) and used b y Balabano v and Shoham (1997) in their F AB ollab orativ e ltering system. These w orks use a similar notion of ordering funtions and feedba k; ho w ev er, they assume that b oth the ordering funtions and the feedba k are omplete and transitiv e. Hene, it is not p ossible to lea v e elemen ts unrank ed, or to ha v e inonsisten t feedba k whi h violates the transitivit y requiremen ts. It is therefore diÆult to om bine and fuse inonsisten t and inomplete orderings in the Kemen y and Snell mo del. There are also sev eral related in tratabilit y results. Most of them are onerned with the diÆult y in rea hing onsensus in v oting systems based on preferene ordering. Sp eially , Bartholdi, T o v ey and T ri k (1989) study the problem of nding a winner in an eletion when the preferenes of all v oters are irreexiv e, an tisymmetri, transitiv e, and omplete. Th us, their setting is more restritiv e than ours. They study t w o similar s hemes to deide on a winner of an eletion. The rst w as in v en ted b y Do dgson (1876) (b etter kno wn b y his p en name, Lewis Carroll) and the seond is due to Kemen y (1959). F or b oth mo dels, they sho w that the problem of nding a winner in an eletion is NP-hard. Among these t w o mo dels, the one suggested b y Kemen y is the losest to ours. Ho w ev er, as men tioned ab o v e, this mo del is more restritiv e as it do es not allo w v oters to abstain (preferenes are required to b e omplete) or to b e inonsisten t (all preferenes are transitiv e). As illustrated b y the exp erimen ts, the problem of learning to rank is losely related to the problem of om bining the results of dieren t sear h engines. Man y metho ds for this ha v e b een prop osed b y the information retriev al omm unit y , and man y of these are adap- tiv e, using relev ane judgmen ts to mak e an appropriate hoie of parameters. Ho w ev er, generally , rankings are om bined b y om bining the sores that w ere used to rank do u- men ts (Lo h baum & Streeter, 1989; Kan tor, 1994). It is also frequen tly assumed that other prop erties of the ob jets (do umen ts) to b e rank ed are a v ailable, su h as w ord frequenies. In on trast, in our exp erimen ts, instanes are atomi en tities with no asso iated prop erties exept for their p osition in v arious rank-orderings. Similarly , w e mak e minimal assump- 266 Learning to Order Things tions ab out the rank-orderings|in partiular, w e do not assume sores are a v ailable. Our metho ds are th us appliable to a broader lass of ranking problems. General optimization metho ds ha v e also b een adopted to adjust parameters of an IR system so as to impro v e agreemen t with a set of user-giv en preferene judgmen ts. F or in- stane, Bo y an, F reitag, and Joa hims (1994) use sim ulated annealing to impro v e agreemen t with \li k data," and Bartell, Cottrell and Belew (1994) use onjugate gradien t desen t to ho ose parameters for a linear om bination of soring funtions, ea h asso iated with a dieren t sear h exp ert. T ypially , su h approa hes oer few guaran tees of eÆieny , optimalit y , or generalization p erformane. Another related task is ol le tion fusion . Here, sev eral sear hes are exeuted on disjoin t subsets of a large olletion, and the results are om bined. Sev eral approa hes to this prob- lem that do not rely on om bining ranking sores ha v e b een desrib ed (T o w ell, V o orhees, Gupta, & Johnson-Laird, 1995; V o orhees, Gupta, & Johnson-Laird, 1994). Ho w ev er, al- though the problem is sup erially similar to the one presen ted here, the assumption that the dieren t sear h engines index disjoint sets of do umen ts atually mak es the problem quite dieren t. In partiular, sine it is imp ossible for t w o engines to giv e dieren t relativ e orderings to the same pair of do umen ts, om bining the rankings an b e done relativ ely easily . Etzioni et al. (1996) formally onsidered another asp et of metasear h|the task of optimally om bining information soures with asso iated osts and time dela ys. Our formal results are disjoin t from theirs, as they assume that ev ery query has a single reognizable orret answ er, rendering ordering issues unimp ortan t. There are man y other appliations in ma hine learning, reinforemen t learning, neural net w orks, and ollab orativ e ltering that emplo y ranking and preferenes, e.g., (Utgo & Saxena, 1987; Utgo & Clouse, 1991; Caruana, Baluja, & Mit hell, 1996; Resni k & V arian, 1997), While our w ork is not diretly relev an t, it migh t b e p ossible to use the framew ork suggested in this pap er in similar settings. This is one of our future resear h goals. Finally , w e w ould lik e to note that the framew ork and algorithms presen ted in this pap er an b e extended in sev eral w a ys. Our urren t resear h fo uses on eÆien t bat h algorithms for om bining preferene funtions, and on using restrited ranking exp erts for whi h the problem of nding an optimal total ordering an b e solv ed in p oly omial time (F reund, Iy er, S hapire, & Singer, 1998). 7. Conlusions In man y appliations, it is desirable to order rather than lassify instanes. W e in v estigated a t w o-stage approa h to learning to order in whi h one rst learns a preferene funtion b y on v en tional means, and then orders a new set of instanes b y nding the total ordering that b est appro ximates the preferene funtion. The preferene funtion that is learned is a binary funtion PREF( u; v ), whi h returns a measure of ondene reeting ho w lik ely it is that u is preferred to v . This is learned from a set of \exp erts" whi h suggest sp ei orderings, and from user feedba k in the form of assertions of the form \ u should b e preferred to v ". W e ha v e presen ted t w o sets of results on this problem. First, w e presen ted an online learning algorithm for learning a w eigh ted om bination of ranking exp erts whi h is based 267 Cohen, Shapire, & Singer on an adaptation of F reund and S hapire's Hedge algorithm. Seond, w e explored the omplexit y of the problem of nding a total ordering that agrees b est with a preferene funtion. W e sho w ed that this problem is NP-omplete ev en in a highly restritiv e ase, namely , preferene prediates that are linear om binations of a ertain lass of w ell-b eha v ed \exp erts" alled rank orderings. Ho w ev er, w e also sho w ed that for an y preferene prediate, there is a greedy algorithm that alw a ys obtains a total ordering that is within a fator of t w o of optimal. W e also presen ted an algorithm that rst divides the set of instanes in to strongly onneted omp onen ts and then uses the greedy algorithm (or full en umeration, for small omp onen ts) to nd an appro ximately go o d order within large strongly onneted omp onen ts. W e found that this appro ximation algorithm w orks v ery w ell in pratie and often nds the b est order. W e also presen ted exp erimen tal results in whi h these algorithms w ere used to om bine the results of a n um b er of \sear h exp erts," ea h of whi h orresp onds to a domain-sp ei strategy for sear hing the w eb. W e sho w ed that in t w o domains, the learned system losely tra ks and often exeeds the p erformane of the b est of these sear h exp erts. These results hold for either traditional relev ane feedba k mo dels of learning, or from w eak er feedba k in the form of sim ulated \li k data." The p erformane of the learned systems also learly exeeds the p erformane of more naiv e approa hes to sear hing. A kno wledgmen ts W e w ould lik e to thank Noga Alon, Edith Cohen, Dana Ron, and Ri k V ohra for n umerous helpful disussions. An extended abstrat of this pap er app eared in A dvan es in Neur al Information Pr o essing Systems 10 , MIT Press, 1998. Referenes Balabano v , M., & Shoham, Y. (1997). F AB: Con ten t-based, ollab orativ e reommenda- tion. Communi ations of the A CM , 40 (3), 66{72. Bartell, B., Cottrell, G., & Belew, R. (1994). Automati om bination of m ultiple rank ed retriev al systems. In Sevente enth A nnual International A CM SIGIR Confer en e on R ese ar h and Development in Information R etrieval . Bartholdi, J., T o v ey , C., & T ri k, M. (1989). V oting s hemes for whi h it an b e diÆult to tell who w on the eletions. So ial Choi e and Welfar e , 6 , 157{165. Berger, B., & Shor, P . (1997). Tigh t b ounds for the ayli subgraph problem. Journal of A lgorithms , 25 , 1{18. Bla k, D. (1958). The ory of Committe es and Ele tions . Cam bridge Univ ersit y Press. Bo y an, J., F reitag, D., & Joa hims, T. (1994). A ma hine learning ar hiteture for opti- mizing w eb sear h engines. T e h. rep. WS-96-05, Amerian Asso iation of Artiial In telligene. 268 Learning to Order Things Caruana, R., Baluja, S., & Mit hell, T. (1996). Using the future to `Sort Out' the presen t: Rankprop and m ultitask learning for medial risk ev aluation. In A dvan es in Neur al Information Pr o essing Systems (NIPS) 8 . Co op er, W. (1968). Exp eted sear h length: A single measure of retriev al eetiv eness based on the w eak ordering ation of retriev al systems. A meri an Do umentation , 19 , 30{41. Do dgson, C. (1876). A metho d for taking votes on mor e than two issues . Clarendon Press, Oxford. Reprin ted with disussion in (Bla k, 1958). Etzioni, O., Hanks, S., Jiang, T., Karp, R. M., Madani, O., & W aarts, O. (1996). EÆien t information gathering on the in ternet. In Pr o e e dings of the 37th A nnual Symp o- sium on F oundations of Computer Sien e (F OCS-96) Burlington, V ermon t. IEEE Computer So iet y Press. Ev en, G., Naor, J., Rao, S., & S hieb er, B. (1996). Divide-and-onquer appro ximation algo- rithms via spreading metris. In 36th A nnual Symp osium on F oundations of Computer Sien e (F OCS-96) , pp. 62{71 Burlington, V ermon t. IEEE Computer So iet y Press. Ev en, G., Naor, J., S hieb er, B., & Sudan, M. (1998). Appro ximating minim um feedba k sets and m ultiuts in direted graphs. A lgorithmi a , 20 (2), 151{174. Fish burn, F. (1970). Utility The ory for De ision Making . Wiley , New Y ork. F ren h, S. (1989). De ision The ory: A n Intr o dution to the Mathematis of R ationality . Ellis Horw o o d Series in Mathematis and Its Appliations. F reund, Y., Iy er, R., S hapire, R., & Singer, Y. (1998). An eÆien t b o osting algorithm for om bining preferenes. In Mahine L e arning: Pr o e e dings of the Fifte enth Interna- tional Confer en e . F reund, Y., & S hapire, R. (1997). A deision-theoreti generalization of on-line learning and an appliation to b o osting. Journal of Computer and System Sien es , 55 (1), 119{139. Galil, Z., & Megido, N. (1977). Cyli ordering is NP-omplete. The or eti al Computer Sien e , 5 , 179{182. Gary , M., & Johnson, D. (1979). Computers and Intr atability: A Guide to the The ory of NP- ompleteness . W. H. F reeman and Compan y , New Y ork. Kan tor, P . (1994). Deision lev el data fusion for routing of do umen ts in the TREC3 on text: a b est ase analysis of w orst ase results. In Pr o e e dings of the thir d text r etrieval onfer en e (TREC-3) . Kemen y , J. (1959). Mathematis without n um b ers. Dae dalus , 88 , 571{591. Kemen y , J., & Snell, J. (1962). Mathemati al Mo dels in the So ial Sien es . Blaisdell, New Y ork. 269 Cohen, Shapire, & Singer Littlestone, N. (1988). Learning qui kly when irrelev an t attributes ab ound: A new linear- threshold algorithm. Mahine L e arning , 2 (4). Littlestone, N., & W arm uth, M. (1994). The w eigh ted ma jorit y algorithm. Information and Computation , 108 (2), 212{261. Lo h baum, K., & Streeter, L. (1989). Comparing and om bining the eetiv eness of la- ten t seman ti indexing and the ordinary v etor spae mo del for information retriev al. Information pr o essing and management , 25 (6), 665{676. Resni k, P ., & V arian, H. (1997). In tro dution to sp eial setion on Reommender Systems. Communi ation of the A CM , 40 (3). Rob erts, F. (1979). Me asur ement the ory with appli ations to de ision making, utility, and so ial sien es . Addison W esley , Reading, MA. Salton, G., & MGill, M. (1983). Intr o dution to Mo dern Information R etrieval . MGra w- Hill. Seymour, P . (1995). P a king direted iruits frationally . Combinatori a , 15 , 281{288. Shak es, J., Langheinri h, M., & Etzioni, O. (1997). Dynami referene sifting: a ase study in the homepage domain. In Pr o e e dings of WWW6 . Shmo ys, D. (1997). Cut problems and their appliation to divide-and-onquer. In Ho h baum, D. (Ed.), Appr oximation algorithms for NP-Har d Pr oblems . PWS Pub- lishing Compan y , New Y ork. T o w ell, G., V o orhees, E., Gupta, N., & Johnson-Laird, B. (1995). Learning olletion fusion strategies for information retriev al. In Mahine L e arning: Pr o e e dings of the Twelfth International Confer en e Lak e T aho, California. Morgan Kaufmann. Utgo, P ., & Clouse, J. (1991). Tw o kinds of training information for ev aluation funtion learning. In Pr o e e dings of the Ninth National Confer en e on A rtiial Intel ligen e (AAAI-91) , pp. 596{600 Cam bridge, MA. AAAI Press/MIT PRess. Utgo, P ., & Saxena, S. (1987). Learning a preferene prediate. In Pr o e e dings of the F ourth International Workshop on Mahine L e arning , pp. 115{121 San F raniso, CA. Morgan Kaufmann. V o orhees, E., Gupta, N., & Johnson-Laird, B. (1994). The olletion fusion problem. In Sev- ente enth A nnual International A CM SIGIR Confer en e on R ese ar h and Development in Information R etrieval . Y ao, Y. (1995). Measuring retriev al eetiv eness based on user preferene of do umen ts. Journal of the A meri an So iety for Information Sien e , 46 (2), 133{145. 270
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment