Analogy perception applied to seven tests of word comprehension

Analogy Perceptio n Applied to Se ven T ests of W ord Comprehension Peter D. T urney Institute for Information T echn ology National Research Council of Canada M-50 Montreal Road Otta wa, Ontario, Canada K1A 0R6 Phone: (613) 993-8564 F ax: (61 3) 952-7151 peter .turne y@nrc-cnrc.gc.ca February 23, 2009 Analogy Perception Applied to Se v en T ests of W ord Comprehension Abstract It has been argued that analog y is the core of cognition . In AI research, algorithm s fo r analogy are o ften limited by the n eed for han d-cod ed hig h- lev el representations as inpu t. An alternative appro ach is to u se high -lev el perception , in which hig h-level represen tations are au tomatically gen erated from raw d ata. Analogy p erception is the pr ocess of rec ognizing analo gies using high- lev el per ception. W e presen t PairClass, an alg orithm for an al- ogy perception that recognizes lexical propor tional analogies using represen- tations that are automatically gen erated from a large cor pus of raw textual data. A proportio nal analogy is an analo gy of the form A : B :: C : D , meaning “ A is to B as C is to D ”. A lexical pro portiona l analogy is a prop ortional analogy with words, such as carpenter:wood::mason :stone. PairClass rep- resents the semantic relations between two words using a high- dimension al feature vector, in wh ich the elements are based on frequen cies of patterns in the corpus. PairClass rec ognizes analogies b y applying stan dard supervised machine learning techn iques to the feature vectors. W e show how seven dif- ferent tests of word co mprehen sion can be framed as pro blems of analogy perception and we then apply PairClass to the seven resulting sets of analog y perception problem s. W e achiev e competitive results on all sev en tests. This is the ﬁrst ti me a uniform approa ch has handled such a ran ge of tests of word compreh ension. Ke ywords: analog ies, word comprehe nsion, test-b ased AI, seman tic relati ons, synon yms, antonyms. 1 Introd uction Many AI researc hers and cogniti ve scient ists believ e that analog y is “the core of cognit ion” (Hofstadter , 2001): • “Ho w do we ev er understand an ything? Almost always, I think, by using one or anoth er kind of analogy . ” – Marvin Minsky (1986) • “My thes is is this: what mak es humans smart is (1) our ex ceptiona l abil- ity to learn by an alogy , (2) the possessi on of symbol systems such as lan- guage and mathemati cs, and (3) a relati on of mutual causation between them 2 whereby o ur analo gical pro wess is multip lied by the p ossessio n of relation al langua ge. ” – Dedre Gentner (2003) • “W e ha ve repeatedly seen ho w analogies and mappi ngs gi ve rise to sec- ondary meanings that ride on the backs of primary meanings. W e hav e seen that eve n primar y meanings depe nd on un spok en mappings, an d so in the end, we ha ve seen th at all meaning is mapping-me diated, which is to s ay , all meaning comes from analo gies. ” – Doug las Hofstadter (2007) These quotes conne ct analogy with understand ing, learning, languag e, and mean- ing. Our resear ch in natural language processing for word comprehen sion (lexic al semantic s) has been guided by this view of the impor tance of analogy . The bes t-kno wn approach to analogy -making is the Struc ture-Mappi ng E ngine (SME) (F alkenh ainer et al. , 1989), which is able to process sci entiﬁc analo gies. SME constructs a mapping between two high -le vel conc eptual represen tations. These kinds of high-le vel an alogies are sometimes called co nceptual analog ies . For ex ample, SME is able to b uild a mapping be tween a high -le vel rep resenta- tion of Rutherfo rd’ s mode l of the atom and a high-le vel represen tation of the so- lar system (Falken hainer et al. , 1989). The input to SME consists of hand- coded high-l ev el representat ions, written in LISP . (See Appe ndix B of Falk enhainer et al. (1989) for examp les of the input LISP code.) The SME app roach to analogy-maki ng has been criticized beca use it assumes that hand-coded repre sentation s are a v ailable as the basic build ing blocks for ana- logy-mak ing (Chalmer s et al. , 1992). The process of forming high-le vel concep- tual representa tions from ra w data (without hand-codin g) is calle d high-le vel per - ceptio n (Chalmers et al. , 1992). T urne y (2 008a) introd uced the Latent R elation Mapping E ngine (LRME ), which comb ines ideas fro m SME and Latent Rela- tional Analysis (LRA) (T urney , 2006 ). LR ME is able to construc t mappings with- out hand-code d high-le vel representat ions. Using a kind of high- lev el perceptio n, LRME buil ds conce ptual represen tations from raw data, consisting of a large cor- pus of plain tex t, gathered by a web crawle r . In this paper , w e use ideas from LRA and LR ME to solve word comprehen - sion tes ts. W e focus o n a kind o f lo wer -lev el analog y , called pr oporti onal analo gy , which has the form A : B :: C : D , mea ning “ A is to B as C is to D ”. Each component mapping in a hig h-le vel conc eptual analogy is essential ly a lo wer- lev el prop or- tional analogy . For example, in the analogy between the solar system and R uther - ford’ s model of the atom, the co mponent mapping s include the pr oportion al analo- gies sun:p lanet::nu cleus:electron and mass:sun::ch arg e:nucleus (Tu rney , 2008a). Proportio nal anal ogies are common in psycho metric test s, such as the Miller Analogie s T est (MA T) and the Gradua te Record Examina tion (GRE). In these tests, the items in the analog ies are usuall y either geometr ic ﬁ gures or words. An 3 early AI system for proportion al analogies with geomet ric ﬁgures was A NA L O G Y (Ev ans, 1964) and an early system for words was Argu s (Reitman, 1965). Both of these systems used hand-co ded represent ations to solv e simple proportiona l anal- ogy questi ons. In Secti on 2, we p resent an algorit hm we call PairClass, de signed for recogniz - ing propor tional analo gies with words. PairClass performs high-le vel perception (Chalmers et al. , 1992), forming conceptua l repre sentatio ns of semantic relation s between word s, by analys is of raw textual data, without hand-co ding. The repre - sentat ions are high-dimensi onal vectors, in which th e v alues of th e elements are deri ve d from the fre quencies of pa tterns in te xtual dat a. This form of repres en- tation is similar to latent semantic analysi s (LSA) (Landauer and Dumais, 1997), b ut vectors in L SA represent the meanin g of indi vidual words, whereas vectors in PairClas s represent the rela tions between two wo rds. The use of f requenc y vectors to represen t seman tic relation s was introduce d in T urney et al. (2003). PairClas s uses a stand ard supervi sed machine learning algorith m (Platt, 1998; W itten and Frank, 1999) to classify word pairs according to the ir semantic rel a- tions. A propor tional analogy such as sun:plan et::nucleus :electron as serts that the semantic rela tions between sun and plane t are simil ar to th e semantic rel ations between nu cleus and electron. The planet or bits the sun; the ele ctron orbit s the nucleu s. The sun’ s gra vity attrac ts the planet; the nu cleus’ s cha rge attr acts the electro n. The task of perc eiv ing th is propo rtional analogy can be framed as the task of learni ng to classify sun: planet and nucleus:e lectron into the same clas s, which we mig ht call or bited:orb iter . Thus ou r approach to an alogy percep tion is to frame it as a problem of classiﬁcat ion of word pairs (hence the name PairClass) . T o ev aluate PairClass, we use sev en word comprehensi on tests. This could be seen as a return to the 1960’ s psyc hometric test-bas ed approach of A N A L - O G Y (Evan s, 1964) and Ar gus (Reitman, 1965), b ut the d iffe rence is tha t PairClas s achie ves h uman-le ve l sco res o n the tests without using hand-cod ed repre sentatio ns. W e bel iev e that word comprehensio n test s serv e as an e xcellent benchmark for e va luating progress in computation al linguistics. More generall y , we support test- based AI research (Bringsjo rd and S chimansk i, 2003). In Section 3, we present our expe riments with se ven tests: • 374 multiple-cho ice analogy que stions from the SA T colle ge entran ce test (T urney et al. , 2003), • 80 m ultiple -choice synon ym questions from the TOEFL (test of English as a foreig n language) (Landauer and Dumais, 1997), • 50 multi ple-choic e synon ym question s from an ESL (English as a second langua ge) practice test (T urne y , 2001 ), 4 • 136 synony m-anton ym questio ns collec ted from sev eral ESL practice tests (introd uced here), • 160 synon ym-anto nym question s from resear ch in computation al linguistics (Lin et al. , 2003), • 144 similar -associated -both questio ns that were used for resea rch in cogni - ti ve psy chology (C hiarell o et al. , 1990), and • 600 noun- modiﬁer relati on classiﬁcation problems from research in compu- tation al linguistic s (Nastase and Szpak o wicz, 2003). W e discuss the results of the experimen ts in Section 4. For ﬁv e of the sev en tests, the re are pas t resul ts that we can co mpare with th e perfo rmance of Pair Class. In ge neral, PairCla ss is c ompetiti ve , but no t the best sy stem. H o wev er , the strength of PairClass is that it is able to handle sev en differ ent tests. As far as we kno w , no other system can handle this range of tests. PairClass perfor ms well, alth ough it is competin g against speci alized algorit hms, dev elope d for single tasks. W e belie ve that thi s illustr ates the po wer of a nalogy perce ption as a uniﬁed approach to lexical semantic s. Related w ork is e xamined in Section 5 . PairClass is similar to past wo rk on se- mantic relatio n classiﬁcat ion ( Rosario and Hearst, 2001; Nastase and Szpako wicz, 2003; T urne y and Littman, 2005; G irju et al. , 2007). For e xample, with noun-modiﬁer classiﬁca tion, the task is to classify a noun-modiﬁer pair , such as laser printe r , ac- cordin g to the semantic relation between the head noun, printer , and the modiﬁer , laser . In this ca se, the relatio n is instrument: age ncy : the l aser is an i nstrument that is use d by the printer . The sta ndard appro ach to semant ic relation cla ssiﬁcation is to use supervised machine lear ning techniqu es to classify feature v ectors that repres ent relation s. W e demonstrate in this pape r that the par adigm of semantic relatio n classiﬁcation can be exten ded beyon d the usual relations, such as instru- ment:a gency , to i nclude ana logy , synonymy , anton ymy , similarity , and a ssociatio n. Limitations and future wor k are consi dered in Section 6. Limitation s of Pair- Class are the ne ed for a large corpus and the time required to run th e algorithm. W e conclude in Section 7. PairClas s was brieﬂy introd uced in T urney (20 08b). The current paper de - scribe s PairClass in more detail, provid es more backgr ound in formation and dis- cussio n, and brings the number of tests up from four to se ven. 2 Analogy Pe rception A lexica l analog y , A : B :: C : D , asse rts that A is to B as C is to D ; for exa mple, carpen ter:wood ::mason:stone asserts that carpenter is to wood as mason is to stone; 5 that is, the semantic re lations between carpen ter and wo od are highly s imilar to the semantic relations between mason and stone. In this paper , we frame the task of recogn izing lexical analogi es as a problem of classify ing word pairs (see T able 1). W ord pair Class label carpen ter:wood artisan :material mason:st one artisan :material potter :clay artisan :material glassb lower :glass artisan :material sun:pl anet orbite d:orbiter nucleu s:electro n orbite d:orbiter earth:mo on orbite d:orbiter starlet :paparazz o orbite d:orbiter T able 1: Examples of ho w the tas k of reco gnizing lexi cal analog ies may be v iewed as a problem of classify ing wor d pairs. W e appr oach this task as a standar d classiﬁcatio n problem for superv ised ma- chine learning (W itten and Frank, 1999). Pai rClass tak es as inp ut a training set of word pairs with class labels and a testing set of word pairs without labels. Each word pa ir is r epresent ed as a v ector in a feature space an d a supe rvised learning al- gorith m is used to classify the feature vectors. The elements in the feature vectors are based on the frequ encies of automatical ly deﬁned pat terns in a lar ge corpus. The out put of the algor ithm is an assig nment of labels to the w ord pairs i n the test- ing set. For some of the follo wing ex periments, we select a uniqu e label for each word pair; for othe r experiment s, we ass ign probabilitie s to each possib le label for each word pair . For a giv en word pair , such as mason:s tone, the ﬁ rst step is to generate mor- pholo gical v ariatio ns, such as masons:st ones. In th e follo wing experimen ts, we use morpha (morpholog ical analyze r) and morphg (morphologica l generator) for morphol ogical processing (Minnen et al. , 2001). 1 The sec ond step is to se arch in a lar ge corpus fo r phrases of the foll owing forms: • “[0 to 1 word s] X [0 to 3 words] Y [0 to 1 words]” • “[0 to 1 word s] Y [0 to 3 wor ds] X [0 t o 1 words]” In the se templates, X : Y co nsists of morp hologica l v ariatio ns of the gi ve n word pair; for ex ample, mason:sto ne, mason:stones , masons:stone s, and so on. T ypical phrase s for mason:stone would be “the mason cut the stone with” and “the stones 1 http://www .informatics.susx.ac.uk/research/groups/nlp/carroll/morp h.html. 6 that the maso n used” . W e th en normal ize all of the p hrases that are found, by using morpha to remov e suf ﬁxes. The templa tes we use here are similar to those in T urne y (200 6 ), b ut we hav e added extra context words before the ﬁrst vari able ( X in the ﬁrst template and Y in the second) and after the second v ariabl e. Our m orphol ogical processi ng also dif fers from T urney (2006). In the following experimen ts, we search in a corpus of 5 × 10 10 words (about 280 GB of plain text) , con sisting of web pages gather ed by a web crawler . 2 T o retrie ve phrases from the corpus, w e use W umpus (B ¨ uttche r and Clarke, 2005), an efﬁcient search engine for passage retrie val from lar ge corpora. 3 The ne xt step is to genera te patterns from all of the phrase s that were foun d for all of the input word pairs (from both the training and testing sets). T o generate pattern s from a phr ase, we repla ce the gi ven word pairs with variab les, X and Y , and we repl ace the remainin g words with a wild c ard symbol (an asteri sk) or leav e them as they are. For exa mple, the phras e “the mason cut the stone with” yields the p atterns “the X cut * Y with”, “* X * the Y *”, and s o on . If a phrase contains n wor ds, then it yields 2 ( n − 2) pattern s. Each pattern correspo nds to a feature in the feature vect ors that we will gen- erate. Since a typical input set of word pairs yields millions of patterns , we need to use fe ature selectio n, to reduce the number of patterns to a manageab le qua n- tity . For each patt ern, we count the number of input word pairs that gen erated the pattern . For example, “* X cut * Y *” is gen erated by both mason:s tone and car- penter :wood. W e then sort the patterns in descendi ng order of the number of word pairs that gene rated them. If there are N inpu t word pairs (and thus N feature vec tors, inc luding both the traini ng and test ing sets), then we select the top kN pattern s and drop the remainder . In the followin g exp eriments, k is set to 20. T he algori thm is no t sensiti ve to the precise value of k . The reasoning beh ind the feature selection alg orithm is that shared pa tterns make more useful features than rare patterns . The number of feature s ( k N ) de - pends on the number of word pairs ( N ), because, if we hav e more feature vecto rs, then we need more features to distingui sh them. T urney (2006) also selects pat- terns based on the number of pairs that generate them, but the number of select ed pattern s is a constant (8000), independen t of the number of input word pai rs. The ne xt step is to gen erate feature vect ors, one ve ctor for each input wo rd pair . Each of the N feature v ectors has k N elements , one element for each se- lected pattern . The v alue of an element in a vector is giv en by the logarith m of the 2 The corpus was collected by Charles Clarke at the Univ ersity of W aterloo. W e can provide copies of the corpus on request. 3 http://www .wumpus-search.org/. 7 freque ncy in the corpu s of the corresp onding pattern for the gi ve n word pair . For exa mple, suppose the gi ven p air is mason:sto ne and th e pattern is “* X cut * Y *”. W e look at the normal ized phrases that we collect ed for mason:sto ne and we count ho w many match t his pattern. If f phras es match the pattern , then the va lue of this element in t he feature v ector is log( f + 1) (we add 1 because log (0) i s undeﬁned) . Each feature vector is then normalized to unit length. The normalization ensures that features i n ve ctors fo r high- frequenc y word pair s are compara ble to fe atures in vec tors for low-frequ ency word pairs. T able 2 shows the ﬁrst an d last ten features (e xclud ing zero-v alued features) and the corres ponding feature va lues for the word pair audacio us:bold ness, taken from the SA T anal ogy questions. The feat ures are in descendin g ord er of the num- ber of word pair s that gener ate them; that is, they are ordere d from common to rare. Thus the ﬁrst features typically in vo lve patterns with m any wild cards and high-f requenc y words , and the ﬁrst fea ture v alues are usuall y nonzero. The last feature s often hav e few w ild cards and con tain lo w-frequ ency words , with feature v alues that are usually zero. T he featur e vector s are generally highly sparse (i.e., the y are m ainly zeros ; if f = 0 , then log( f + 1) = 0 ). No w that we hav e a feat ure vector for each input word pair , we can ap ply a stan dard supervise d learn ing algorith m. In the follo wing exper iments, we us e a sequen tial minimal optimizatio n (SMO) support vector machine (SVM) with a radial b asis fun ction (RBF) ke rnel (Platt, 1998), as i mplemented in W eka (W aikato En vironment for Kno wledge Analysis ) (Wi tten and F rank, 1999). 4 The algo rithm genera tes probabi lity estimates for each class by ﬁtting logistic regressi on models to the outpu ts of the SVM. W e disa ble the normalizatio n optio n in W eka, since the vec tors are already normaliz ed to unit length. W e chos e the SMO R BF algorithm becaus e it is fast, rob ust, and it easily handles larg e numbers of features. In the follo wing expe riments, PairCla ss is applied to eac h of the sev en test s with no adjustments or tuning of the learning parameters to the speciﬁc problems. Some wor k is requ ired to ﬁt each pr oblem into the gen eral frame work of Pa irClass (analo gy perc eption: supe rvised classiﬁcation of wor d pairs), but the core algo - rithm is the same in each case. It might be objecte d that what PairClass does s hould n ot b e co nsidered as high- le vel perception , in the sens e gi ven by Chalmers et al. . (1992). They deﬁne high- le vel perce ption as follo ws: Perceptu al proces ses form a spectrum, w hich for con venience we can di vide into two components. ... [W e] ha ve lo w-le vel perceptio n, which in vol ves the early processing of information from the va rious sensory 4 http://www .cs.waikato.ac.nz/ml/weka/. 8 Feature number Feature (patte rn) V alue (normaliz ed log) 1 “* X * * Y * ” 0.090 2 “* Y * * X *” 0.150 3 “* X * Y *” 0.198 4 “* Y * X *” 0.221 5 “* X * * * Y * ” 0.045 7 “* X Y *” 0.233 8 “* Y X *” 0.167 10 “* Y * the X *” 0.071 12 “* Y and * X *” 0.116 13 “* X and Y *” 0.135 27,591 “deﬁne X * Y *” 0.045 28,524 “what Y and X *” 0.045 28,804 “for Y and * X and” 0.045 29,017 “ve ry X and Y *” 0.045 32,028 “s Y and X and” 0.045 34,893 “under stand X * Y *” 0.071 35,027 “* X be not * Y b ut” 0.045 39,410 “* Y and X cause” 0.045 41,303 “* X but Y and” 0.105 43,511 “be X not Y *” 0.105 T able 2: The ﬁrst and last ten features, excludin g zero-v alued features, for the pair X : Y = audaciou s:boldness. (The “s” in the pattern for feature 32,028 is part of a posses siv e noun. The “be” in the patterns for featu res 35,027 and 43,51 1 is the result of normaliz ing “is” and “wa s” with morpha .) modaliti es. High-le vel perc eption, on the other han d, in volv es taking a more global vie w of this information , ext racting meaning from the raw materi al by accessing concepts, and making sense of situations at a conceptual le vel. T his ranges from the recogniti on of obje cts to the graspi ng o f abstract relations, an d on to underst anding en tire situat ions as coherent wholes. ... The study of hig h-le vel percept ion leads us directl y to the problem of mental r epr esentatio n . Represe ntations are the frui ts of percepti on. Spoke n or written language can be con verte d to elect ronic tex t by speech recog- nition software or optical character rec ognition so ftware. It seems reasonable to call this low-le vel percep tion. PairClass tak es electronic text as input and gener - ates high-dimens ional feat ure vec tors from the text. These feature vect ors repre sent 9 abstra ct semantic relations and they can be used to classify semantic relations into v arious sema ntic cla sses. It seems reasonable to call thi s hig h-le vel pe rception . W e do not claim that PairClass has the richness and complexit y of human high-le ve l percep tion, b ut it is nonetheles s a (simple, restrict ed) form of high-le vel perc eption. 3 Experiments This sectio n presents se ven sets of e xperi ments. W e e xplain ho w each of the se ven tests is treate d as a problem of analo gy percep tion, we gi ve th e experi mental results , and we discuss past work with each test. 3.1 SA T Analogies In this section, w e apply PairClass to the task of recognizin g lexical analogies . T o e va luate the perfo rmance, w e use a set of 374 multip le-choic e questions from the SA T colleg e entrance exam. T able 3 shows a typical questi on. The tar get pair is called the stem . The task is to select the choice pair that is most analogous to the stem pair . Stem: mason:stone Choices: (a) teache r:chalk (b) carpen ter:wood (c) soldie r:gun (d) photo graph:came ra (e) book: word Solution : (b ) carpenter:w ood T able 3: An example of a questi on from the 374 SA T ana logy question s. The problem of recog nizing lex ical ana logies was ﬁrst attempted with a s ystem called Argus (Reitman, 1965 ), us ing a small ha nd-b uilt semanti c netwo rk with a spread ing acti v ation algorithm. T urne y et al. (2003) used a combinatio n of 13 indepe ndent modules. V eale (2004) us ed a spreadi ng acti vat ion algori thm with W ordNet (in eff ect, treating W ordNet as a semantic network). Tu rney (2005) used a corpu s-based algorithm. W e may vie w T able 3 a s a bi nary classiﬁcati on problem, in which mason: stone and carpenter:w ood are positi ve examples and the remaining word pairs are nega- ti ve ex amples. The dif ﬁculty is that the labels of the choi ce pairs must be hidden from th e learni ng algorit hm. That is, the tra ining set consist s of one positi ve exa m- ple (the stem pair) and the testing set consists of ﬁve unl abeled e xamples (the ﬁv e 10 choice pairs). T o make this task more tracta ble, we randomly choose a stem pair from one of the 373 other SA T analogy question s, and we ass ume that this new stem pair is a neg ati ve example, as sho w n in T able 4. W ord pair T rain or test C lass label mason:st one train positi ve tutor:p upil train neg ati ve teache r:chalk test hidden carpen ter:wood test hidden soldie r:gun test hidden photo graph:came ra test hidden book: word test hidden T able 4: How to ﬁt a SA T analogy question into the framewor k of super vised classiﬁca tion of word pairs. The randomly chosen stem pair is tutor:p upil. T o an swer a SA T que stion, w e use Pa irClass to estimate the pro bability that each test ing example is positi ve, and we guess the testing e xample with the high- est probability . Learning from a training set w ith only one positi ve example and one ne gati ve e xample is difﬁcult, since the learned model can be high ly unstabl e. T o increase the stability , we repeat the learning process 10 times, using a dif fer - ent randomly chose n negat iv e training example each time. For each testin g word pair , the 10 probabil ity estimates are av eraged tog ether . This is a form of bagging (Breiman, 1996). T able 5 sho ws an example of an analo gy that has been correctly solv ed by PairClass. Stem: insubo rdinatio n:punishment Probabil ity Choices: (a) e ven ing:nigh t 0.236 (b) earthq uake:t ornado 0.260 (c) cand or:fal sehood 0.391 (d) herois m:praise 0.757 (e) ﬁne:pe nalty 0.265 Solution : (d) herois m:praise 0.757 T able 5: An example of a correct ly solve d SA T analogy question. PairClas s attain s an accu racy of 52.1% on the 3 74 SA T analogy questio ns. T he best pre vious result is an accu racy of 56.1% (Tu rney , 2005 ). Random guessing would yield an accuracy of 20% (ﬁv e choice s per question). T he ave rage senior high school student achiev es 57% correct (T urne y , 2006). The A CL W iki lists 12 11 pre viousl y publis hed results w ith the 374 SA T analogy question s. 5 Adding Pair- Class to the l ist, we ha ve 13 resul ts. PairClass ha s the third high est accurac y of the 13 systems. 3.2 T OEFL Synonyms No w we apply PairClass to the task of recogn izing synon yms, using a set of 80 multiple -choice synonym ques tions from the TOEFL (test of English as a foreign langua ge). A sample question is sho wn in T able 6. T he task is to select the choice word that is most similar in meaning to the stem wor d. Stem: levi ed Choices: (a) imposed (b) belie ved (c) reques ted (d) correla ted Solution : (a ) imposed T able 6: An exa mple of a question from the 80 TOEFL syno nym questions. Synon ymy can be viewed as a high de gree of semantic similarity . T he m ost common way to m easure semantic similarity is to m easure the distance be tween words in W ordNet (Resnik , 1995; Jiang and Conrath, 1997; Hirst and St-Onge, 1998; Budanits ky and Hirst, 2001). Corpus-based meas ures of word similarity are also common (Lesk, 1969; Landaue r and Dumais, 1997 ; T urne y , 2001 ). W e may view T able 6 as a binary clas siﬁcation problem, in which the pair le vied:impo sed is a pos iti ve example of the class syno nymous and the other poss ible pairin gs are negat iv e examples , as sho wn in T able 7. W ord pair Class labe l le vied:impo sed positi ve le vied:be lie ved neg ati ve le vied:req uested negati ve le vied:co rrelated neg ati ve T able 7: Ho w to ﬁt a TOEFL synon ym questio n into the framew ork of supervised classiﬁca tion of word pairs. 5 For more information, see S AT Analog y Questions (State of the art) at http://aclweb .org/aclwiki/. There were 12 previo us results at the time of writing, but the list is likely to gro w . 12 The 80 TOEFL ques tions yiel d 320 ( 80 × 4 ) word pairs, 8 0 labeled positiv e an d 240 labeled neg ativ e. W e apply PairClass to the word pairs using ten-fold cross- v alidatio n. In each random fold, 90% of the pairs are used for trainin g and 10% are u sed for testing. For e ach fol d, we u se th e learn ed model to assign pro babilitie s to the testing pairs. Our guess for each TOEFL question is the choice that has the highes t prob ability of being positi ve, when paired with the correspon ding stem. T able 8 giv es an exa mple of a correct ly solved question . Stem: prominen t Probabil ity Choices: (a) battere d 0.005 (b) ancient 0.114 (c) myste rious 0.010 (d) conspicu ous 0.998 Solution : (d) conspicu ous 0.998 T able 8: An example of a correct ly solved TOEFL s ynony m question. PairClas s attains an accuracy of 76.2%. For compariso n, the A CL W iki lists 15 pre viousl y publ ished re sults with the 80 TOEFL syno nym questio ns. 6 Adding Pair - Class to the l ist, we ha ve 16 a lgorithms. PairClass has the ninth hig hest ac curacy of the 16 systems. The best previ ous result is an accur acy of 97.5% (T urne y et al. , 2003), obtain ed usin g a hybrid of fo ur dif ferent algori thms. Random guess ing wo uld yield an accuracy of 25% (four choic es per questio n). The av erage fore ign applica nt to a US uni ver sity achie ves 64.5% correct (Landauer and Dumais, 1997). 3.3 ESL Synonyms The 50 ES L syn onym quest ions are similar to the TOEFL synonym quest ions, exc ept that each question includes a sentenc e that sho ws the stem word in conte xt. T able 9 giv es an example. In our expe riments, we ignore the sentence conte xt and treat the ESL syno nym que stions the same way a s we t reated the T OEFL sy nonym questi ons (see T able 10). The 50 ESL que stions yield 200 ( 50 × 4 ) word pairs, 50 labele d positi ve and 150 labeled neg ativ e. W e apply PairClass to the word pairs using ten-fold cross- v alidatio n. Our guess for each que stion is the choice word that has the highest probab ility of being positi ve, when paired with the correspond ing stem word . 6 See TOEF L Synonym Questions (State of the a rt) at http:// aclweb .org/aclwiki/. There were 15 systems at the time of writing, but the list is lik ely to grow . 13 Stem: “ A rusty nail is not as strong as a clean , ne w one. ” Choices: (a) corrod ed (b) black (c) d irty (d) painte d Solution : (a) corroded T able 9: An e xample of a question from the 50 ESL synon ym questions . W ord pair Class label rusty: corroded positi ve rusty: black neg ati ve rusty: dirty negati ve rusty: painted neg ati ve T able 10: Ho w to ﬁt an ES L synon ym questio n into the frame work of supervi sed classiﬁca tion of word pairs. PairClas s attains an accurac y of 78.0%. T he best previ ous result is 82.0% (Jarmasz and Szpako wicz, 2003). T he ACL W iki lists 8 pre viously publish ed re- sults for the 50 ESL synon ym question s. 7 Adding PairClass to the list, we ha ve 9 algori thms. PairClass has the third highest accura cy of the 9 systems. The av erage human sco re is unkno wn. Random guessi ng would yiel d an accurac y of 25% (four choice s per q uestion) . 3.4 ESL Synonyms and Antonyms The task of c lassifyin g wor d pairs as eit her synon yms or antony ms readily ﬁ ts in to the fra mewo rk of su pervised classiﬁcati on of word pa irs. T able 11 sho ws some exa mples from a set of 136 ESL (Engl ish as a seco nd language ) practice questions that we collect ed from vario us ES L websi tes. Hatzi v assilogl ou and McKeo wn (1997 ) propose that anton yms and synon yms can be di stinguis hed by their sema ntic orienta tion. A word tha t suggests praise has a positi ve semantic orientati on, whereas criticism is negati ve semantic orien- tation . Antonyms tend to ha ve opposite semantic orientation (fast:slo w is posi- ti ve:ne gati ve) and s ynony ms tend to hav e the same semanti c orient ation (fast :quick is positi ve: positi ve) . Howe ver , this propos al has not been ev aluat ed, and it is not 7 See E SL S ynonym Questions (State of the art) at http://aclweb . org/acl wiki/. There were 8 sys- tems at the time of writing, but the list is likely to gro w . 14 W ord pair Class label gallin g:irksome synon yms yield: bend synon yms nai ve: callo w syn onyms advise :suggest synon yms dissimila rity:rese mblance anton yms commend:d enounce anton yms exp ose:camouﬂag e a ntony ms un veil:v eil anton yms T able 11: E xamples of synon yms and antony ms from 136 ESL practice questions . dif ﬁcult to ﬁnd counter -ex amples (simple:simplis tic is posi tiv e:ne gati ve, yet the words are syno nyms, rather than antony ms). Lin et al. (2003) dist inguish synon yms from anto nyms usi ng two patterns, “from X to Y ” and “either X or Y ”. When X and Y are antonyms, they occa- sional ly appear in a lar ge corpu s in one of the se two patterns, b ut it is very rare for synon yms to appear in these patterns. Our approach is similar to Lin et al. (2003), but we do not rely on hand-c oded patterns; instead , PairClass patterns are genera ted automatically . Using ten-fold cros s-v alidation , PairClass attains an accuracy of 75.0%. A l- ways guessing the majority class would result in an accurac y of 65.4%. T he av er - age human score is unkno wn and there are no pre vious results for comparison. 3.5 CL Synonyms and Antonyms T o compare PairCla ss with the algo rithm of Lin et al. (200 3 ), this e xperiment uses their set of 160 wor d pairs, 80 labeled synonym and 80 labeled anto nym . These 160 pairs were chosen by Lin et al. (2003) for thei r high frequenc y; thus they are some what easi er to classify than the 136 E SL practice questions. Some exa mples are gi ven in T able 12. Lin et al. (2003) report their performance using precis ion (86.4%) and recall (95.0%), inste ad of accurac y , b ut an accur acy of 90.0% can be deriv ed from their ﬁgures, with some minor algebraic manipu lation. Using ten-fold cross-v alida tion, PairClas s has an accu racy of 81.9%. Random gu essing would yie ld an accur acy of 50%. The av erage human score is unkno wn. 15 W ord pair Class label audit: rev iew synon yms educat ion:tuiti on syno nyms locatio n:positi on synon yms material: stuf f synon yms ability :inabilit y anton yms balanc e:imbalanc e anton yms exa ggeration :understatement anton yms inferio rity:sup eriority anton yms T able 12: E xamples o f synon yms and anton yms from 160 labeled pairs for ex peri- ments in computatio nal ling uistics (CL ). 3.6 Similar , Associ ated, and Both A common criticism of corpus-b ased measu res of word similarity (as oppose d to lex icon-base d measures ) is that they are merely detecting associatio ns (co-oc cur- rences ), rather than actu al semantic simila rity (Lund et al. , 1995). T o address this critici sm, Lund et al. (1995) ev aluated their algorithm for measuring word simi- larity with wo rd pairs that were lab eled similar , associa ted , or both . T hese labele d pairs were or iginally crea ted for cogn itiv e psyc hology expe riments with human subjec ts (Chiarello et al. , 1990). T able 13 shows some e xamples from this collec- tion of 144 word pairs (48 pairs in each of the thr ee classes). W ord pair Class lab el table:b ed similar music:ar t similar hair:fu r simila r house :cabin simila r cradle :baby associ ated mug:bee r associ ated camel:hu mp assoc iated cheese :mouse associ ated ale:be er bo th uncle: aunt both peppe r:salt both fro wn:smile both T able 13: E xamples of word p airs labeled similar , associated , or both . Lund et al. (1995) did not meas ure the accu racy of th eir algo rithm on this 16 three-c lass class iﬁcation prob lem. Instea d, follo w ing standa rd practic e in cogniti ve psych ology , the y sho wed that their algorit hm’ s similarity scor es for the 144 word pairs were correl ated with the respo nse times of human subje cts in priming tests. In a typica l priming test, a human subje ct reads a primin g w ord ( cradl e ) and is then ask ed to complet e a partial word (complete bab as baby ) or to distingui sh a word ( baby ) from a non-wo rd ( baol ). The time required to perfo rm the task is take n to indica te the streng th of the cogniti ve link between the two words ( cra dle a nd baby ). Using ten-fold cross-v alidatio n, PairClass attains an accuracy of 77.1% on the 144 word pairs. Since the three classes are of equal size, gu essing the majority class and rand om guessing both yield an accurac y of 33.3%. The av erage human score is unkno wn and ther e are no pre vious results for comparison. 3.7 Noun-Modiﬁer Relations A noun- modiﬁer express ion is a compound of two (or more) words, a head noun and a modiﬁer of the head. The modiﬁer is usuall y a noun or adjecti ve . Fo r ex- ample, in the noun -modiﬁer expre ssion stu dent discoun t , the head noun disco unt is modiﬁed by the noun studen t . Noun-modi ﬁer express ions are ve ry common in English. There is wide varia- tion in th e types of semantic re lations between heads a nd modiﬁers. A chal lenging task for natural lan guage processing is to cla ssify nou n-modiﬁer pairs accord ing to their semantic relation s. For example, in the noun-modiﬁer expres sion electr on micr oscope , the relation might be theme:tool (a microsco pe for electrons; perh aps for vie wing electr ons), instrument :ag ency (a micros cope that uses elect rons), or material :artifact (a m icrosco pe made out of electro ns). 8 There are man y poten- tial applic ations for algor ithms that can automatically classify noun-modiﬁer pairs accord ing to their semantic relations. Nastase and Szpak owicz (2003) collected 600 noun- modiﬁer pairs and hand- labele d them with 30 diffe rent classes of semant ic relations. The 30 class es w ere or ganized into ﬁve group s: caus ality , temporality , spatial, partici pant, and quali ty . Due to the dif ﬁculty of distinguish ing 30 classes , most researchers prefer to treat this as a ﬁ ve-cl ass classiﬁca tion problem. T able 1 4 sho ws some exampl es of noun- modiﬁer pairs with the ﬁv e-class label s. The desi gn of the P airClass alg orithm is clos ely related to pa st work on the proble m of classify ing noun-modi ﬁer semantic relation s, so we will examin e this past work in more detail than in our discussions of related work for the other six tests. S ection 5 will fo cus on th e rel ation betwee n Pair Class and past wo rk on semantic relatio n clas siﬁcation. 8 The correct answer is instrument:ag ency . 17 W ord pair Class label cold:v irus causa lity onion :tear causal ity morning :frost temporal ity late:su pper temporality aquati c:mammal spa tial west:coa st spatia l dream:an alysis partici pant police :interv ention partici pant coppe r:coin qualit y rice:pa per quali ty T able 14: Examples of noun-mod iﬁer word pairs labeled with ﬁv e semantic rela- tions. Using te n-fold cr oss-v alidati on, Pa irClass achiev es an accurac y of 58.0% on the task of classifying the 600 nou n-modiﬁer pairs into ﬁve classe s. The best pre- vious result was also 58.0% (T urne y , 2006). T he A CL W iki lists 5 pre viously pub- lished resul ts w ith the 600 noun -modiﬁer pairs. 9 Adding PairClass to the list, we ha ve 6 algo rithms. PairClass ties for ﬁrst place in the set of 6 syste ms. Guessing the majori ty class wou ld result in an accur acy of 43.3%. The a vera ge human score is unkno w n. 4 Discussion The sev en experi ments are summarized in T ables 15 and 16. For the ﬁve experi- ments f or whic h there are p rev ious resu lts, PairClass is not the b est, b ut it performs competit iv ely . For the oth er two experi ments, PairClass perf orms signiﬁcantly abo ve the baselines. Ho wev er , the strengt h of this approach is not its performance on any one task, b ut the range of tasks it can handle. No other algorith m has been applie d to th is range of lexical semantic problems. Of the se ve n tests we use here, as far as we kno w , only the noun-mo diﬁer re- lation s ha ve been approached using a standard supervised learning algorithm. Fo r the o ther six tests, PairCla ss is the ﬁrst at tempt to apply su pervised learni ng. 10 The adv antag e of being able to cast these six problems in th e framewo rk of st andard 9 See Noun-Modiﬁ er Semantic R elations (State of the art) at http://aclweb .org/aclwik i/. There were 5 systems at the time of writing, but the list is lik ely to grow . 10 T urney et al. (2003) apply someth ing like superv ised learning to the SA T analogies and TOEFL synon yms, but it would be more accurate to call i t reinforcement learning, rather than standard su- pervised learning. 18 Experiment V ectors Featur es Classes SA T Analogies 2,244 44,880 374 TOEFL Syno nyms 320 6,400 2 ESL Synony ms 200 4,000 2 ESL Synony ms and A nton yms 136 2,720 2 CL Synon yms and Antonyms 160 3,200 2 Similar , Associated, and Both 144 2,880 3 Noun-Modi ﬁer Relations 600 12,000 5 T able 15: Summary o f the s ev en tasks. See Section 3 for exp lanation s. The number of features is 20 times the number of vecto rs, as mentioned in Section 2. For SA T Analogie s, the number of ve ctors is 374 × 6 . For TOEFL Synonyms, the number of vecto rs is 80 × 4 . For ESL Synon yms, the number of vect ors is 50 × 4 . Experiment Accurac y Best pre vious Baseline Rank SA T Analogies 52.1% 56.1% 20.0% 3 o f 13 TOEFL Syno nyms 76.2 % 97.5% 25.0% 9 of 16 ESL Synony ms 78.0% 82.0% 25.0% 3 of 9 ESL Synony ms and A nton yms 75.0% - 65.4% - CL Synon yms and Antonyms 81.9% 90.0% 50.0% 2 of 2 Similar , Associated, and Both 77.1% - 33.3% - Noun-Modi ﬁer Relations 58.0% 58.0% 43.3% 1 of 6 T able 16: Summary of experi mental results. See Section 3 for explanat ions. For the Noun-Modi ﬁer Relations, PairClass is tied for ﬁrst place. superv ised learni ng probl ems is that we can no w exploi t the huge literature on su- pervis ed learni ng. Pa st work on these problems has required implicitly coding our kno wledge of the nature of the task into the structure of the algori thm. For ex- ample, the structure of the algorithm for latent semantic analysis (LSA) implicitly contai ns a theory of synonymy (Landauer and Dumais, 1997). The problem with this approach is that it can be very difﬁcul t to work out how to modify the algo- rithm if it does not beha ve the way we want. O n the other hand, with a supervis ed learnin g algori thm, we can put our kno wledge into the labeling of the feature vec- tors, instead of putting it dire ctly into the algorithm. T his makes it easier to guide the system to the desired beha viour . Humans are able to make analogi es withou t sup ervised learnin g. It m ight be ar- gued that the requirement for supervis ion is a major limitation of PairClass. How- e ver , with ou r approac h to the SA T analo gy questions (see Section 3.1), we are blurrin g the lin e between supervise d and uns upervise d learning, since the train- 19 ing se t for a gi ven SA T question consist s of a si ngle real posit iv e exampl e (and a single “virtual ” or “simulate d” negati ve example). In effect, a sing le exa mple (such as mason:stone in T able 4) becomes a su i generis ; it con stitutes a class of its own. It may be possible to apply the m achine ry of superv ised learnin g to other proble ms that apparent ly call for uns upervised learning (for e xample, clu stering or measurin g similarity), by using this sui gen eris dev ice. 5 Related W ork One of the ﬁrst papers us ing supe rvised machin e learnin g to classify word pairs was Rosario and H earst’ s (2001) paper on classifying noun-modiﬁer pa irs in the medical domain . For exampl e, the noun-mod iﬁer e xpressio n bra in biopsy was classiﬁed as Pr ocedu r e . Rosario and Hearst (2001) con structed featur e vectors for each noun-modi ﬁer pair usin g MeSH (Medical S ubject Headings) and UMLS (Uniﬁed Medical Languag e System) as lexica l resources. They th en trained a neu- ral network to disting uish 13 class es of semantic relation s, such as Cause , Loca- tion , Measur e , and Instrument . Nastase and Szpako wicz (2003) explore d a similar approa ch to classify ing genera l-domain noun-mod iﬁer pairs, using W ordNet and Roget’ s T hesaur us as lexical reso urces. T urne y and Littman (200 5 ) us ed corpu s-based features for classifyin g noun- modiﬁer pairs. Their features were based on 128 hand-c oded patterns . They used a nearest-ne ighbour learning algorithm to classif y gen eral-domai n noun-mod iﬁer pairs into 30 diffe rent classes of semantic relations. T urney (2005; 200 6) later addres sed the same problem using 8000 automatic ally generated patterns . One of the tasks in SemEval 2007 was the classiﬁcation of semantic relati ons between nominals (Girju et al. , 2007). 11 The problem is to classify seman tic rela- tions between no minals (nouns and noun compoun ds) in the context of a sentenc e. The task attr acted 14 teams w ho created 15 syst ems, all of which used supervis ed machine learnin g with fea tures that were lexi con-base d, corpus- based, or both. PairClas s is most s imilar to the algorit hm of T urne y (2006), b ut it dif fers in th e follo w ing ways: • PairClass does not use a lexi con to ﬁnd synony ms for the inpu t word pair s. One of our goals in this paper is to sho w that a pure corpus-b ased algor ithm can hand le syno nyms without a lexicon. This conside rably simpliﬁes the algori thm. 11 SemEval 2007 was the Fourth International W orkshop on S emantic Evaluations. More in- formation on T ask 4, the classiﬁcation of semantic relations between nominals, i s av ail able at http://purl.org/net/seme val/task4. 20 • PairClass uses a suppo rt vecto r machine (SVM) instea d of a neare st neigh- bour (NN) lear ning algorithm. • PairClass does not us e the si ngular value decompos ition (SVD) to smooth the feature ve ctors. It has been our ex perience that S VD is not ne cessary with SVMs. • PairClass gen erates probability estimates, whereas T urne y (2006) uses a co- sine measure of similarity . Probabil ity estimates can be read ily used in fur - ther do wnstream proces sing, but cosine s are less useful. • The automatically genera ted patterns in PairClass are slightl y m ore general than the patter ns of T urney ( 2006 ), as mention ed in Section 2. • The morphologic al processing in PairClas s (Minnen et al. , 2001) is m ore so- phisti cated than in T urney (20 06). Ho wev er , we belie ve that the main contri bu tion of thi s paper is not PairClas s itself, b ut the extensio n of s upervise d word p air cla ssiﬁcation be yond t he cl assiﬁcation of noun- modiﬁer pairs and semantic relation s between nominals, to ana logies, syn- ony ms, ant onyms, and ass ociations . As f ar as we kno w , this has not been done before . 6 Limitations and Futur e W o rk The main limitation o f PairClass is the need for a large corpu s. Phrases that contain a pair of w ords tend to be m ore rare th an p hrases that contain ei ther of the m embers of the pair , thus a lar ge corpus is neede d to ensure that suf ﬁcient numbers of phrases are found for e ach input word pair . The size of the corpu s ha s a cost in terms of d isk space and processin g time. In the futur e, as hard ware improv es, this will become less of an issue, bu t there may be ways to improve the algorithm, so that a smaller corpus is suf ﬁcient. Human langu age can be creati ve ly exten ded as needed . Give n a ne wly-deﬁned word, a human would be abl e to use it immediately in an analogy . Since PairClass requir es a lar ge number of phrases for each pa ir of words, it would be unabl e to han dle a ne wly-deﬁned word. A pro blem for futu re work is the e xtension of PairClas s, so tha t it is ab le to wo rk with deﬁni tions of words . One appro ach is a hybrid algorithm that combines a corpus-ba sed algorith m with a lexicon- based algori thm. For ex ample, Tu rney et al. (2003) describe an algorithm that combines 13 dif ferent modules for solving proporti onal analog ies with words. 21 7 Conclusion The PairClass algorit hm classiﬁes wo rd pairs accor ding to t heir semanti c relatio ns, using features generated from a larg e corp us of text. W e describe PairClass as perfor ming analo gy perce ption, because it recognizes le xical proporti onal analo- gies using a form of high-le vel perce ption (Cha lmers et al. , 1992). Fo r gi ven in- put training and testing sets of word pairs , it automati cally generates pattern s and constr ucts its o wn repr esentatio ns of the word pairs as high-dimensi onal featu re vec tors. No hand-co ding of represen tations is in v olved . W e believ e that analogy perceptio n pro vides a uniﬁed approach to natural lan- guage processing for a wide v ariety of lexical se mantic tasks. W e supp ort this by applying Pair Class to se ven dif ferent tests of word comprehensio n. It achie ves competit iv e performa nce on the tes ts, althou gh it is co mpeting with algorithms th at were dev eloped for single tasks . More signiﬁcant is the range of tasks that can be framed as problems of analo gy perception. The idea of subsuming a broa d range of semanti c phe nomena under analo gies has been sugges ted by many res earchers (Minsky , 1986 ; Gentne r , 2003 ; Hofstad ter , 2007). In compu tational lingistics, analo gical algorithms ha ve be en applied to machi ne transla tion (Lepage and Denoual, 2005), morphology (Lepage, 1998), and seman- tic relations (T urne y and Littman, 2005). Analogy provi des a frame work that has the potent ial to unify the ﬁeld of s emantics. This paper is a small s tep tow ards that goal. In this paper , w e hav e used tests from educationa l testing (SA T analogies and TOEFL synony ms), second lang uage practice (ESL synonyms and ESL synonym and antonyms), computationa l lingui stics (CL synon yms and anton yms and noun- modiﬁers), and cogniti ve psychology (similar , associat ed, and both). Six of the tests ha ve been used in pre vious research and four of the tests hav e associat ed per - formance resu lts and bibliogra phies in the A CL W iki . S hared tes ts make it po ssible for researc hers to co mpare their algorithms and assess the progress of the ﬁeld. Applying human tests to machines is a natural way to ev aluate progress in AI. Fiv e of the se ven tests were originally dev elope d for humans. For the SA T and TOEFL tests, the a verage human scores are av ailable. On the SA T test, PairClass has an accu racy of 52.1%, with a 95% conﬁdence inter val ranging from 46.9% to 57.3% (using the Binomia l Exact test). T he a ver age senior high school stud ent applyi ng to a US uni vers ity achie ves 57 % (T urne y , 2006), which i s within the 95 % conﬁden ce interv al for PairClass. On the TOEFL syno nym test, PairClass has an accura cy of 76.2%, w ith a 95% con ﬁdence interv al ranging from 65.4% to 85.1% (using the Binomial Exact test). The a verag e foreign appl icant to a US uni versity achie ves 64.5 % (Landau er and Dumais, 1997), which is belo w the 95% co nﬁdence interv al fo r PairClas s. T hus PairClass performance on SA T is no t signiﬁcant ly 22 dif ferent from a vera ge human perf ormance, and P airClass p erformance on TOEFL is signiﬁcant ly better than av erage human perfor mance. One critici sm of AI as a ﬁeld is that its success stories are li mited to narro w domains , such as chess. Human intelligenc e has a generality and ﬂexibil ity that AI currently lacks. This paper is a tiny step tow ards the goal of perfo rming com- peti ve ly on a wide range of tests, rathe r than performing very well on a s ingle test. Ackno wledgements Thanks to Michael Littman for the 374 SA T analogy question s, Thomas Lan- dauer for the 80 TOEFL syno nym questio ns, Donn a T atsuki for the 50 ES L syn- ony m qu estions, Dekang L in for the 160 synonym-a ntony m questions, Christine Chiarello for the 144 s imilar -associat ed-both question s, and V iv i Nastase and Sta n Szpak owicz for the 600 labe led noun-mod iﬁers. Thanks to Charle s Clarke for the corpus , Stefan B ¨ uttcher for W umpus, Guido Minnen , John Carroll, and Darren Pearce for morpha and morphb , and Ian W itten and Eibe F rank for W eka. Thanks to Selmer Bringsjord for in viting me to contrib ute to the specia l issue of JET A I on T est-Based AI . Thanks to Joel Martin for comments on an earlier version of this paper . I am espec ially thankful to Michael Littman for ini tiating my intere st in analog ies in 200 1, by suggestin g that a statistical appr oach might be able to solve multiple -choice ana logy questions. Refer ences (Breiman, 1996) Leo Breiman. Bagging predictors. Mach ine Learnin g , 24(2): 123–140 , 1996. (Bringsjo rd and Schimanski, 2003) S elmer Bringsjo rd and Bettina Schimanski. What is artiﬁcial intel ligence? Psychometric AI as an answer . In Pr oceed ings of the 18th Inter nationa l J oint Confer ence on A rtiﬁcia l Intellige nce (IJCAI-03 ) , pages 887–89 3, Acapu lco, Mexico, 2003. (Budanit sky and H irst, 2001 ) Alex ander Budanitsk y and Graeme Hirst. Semantic distan ce in W ordNet: An exp erimental, applicati on-orien ted ev aluati on of ﬁve measures . In Pr oceedi ngs of the W orkshop on W or dNet and Other Lexic al Re- sour ces, Secon d Meeting of the N orth American Chapter of the A ssociat ion for Computatio nal Lingui stics (NAA CL-2001) , page s 29–24, Pittsbu rgh , P A, 2001. 23 (B ¨ uttche r and Clarke, 2005) Stefan B ¨ uttcher and Charles Clarke. Efﬁcien cy vs. ef fecti ven ess in teraby te-scale information retrie val . In Pr oceedi ngs of the 14th T ex t REtrieval Confer ence (TREC 2005) , Gaithersb urg, MD, 2005. (Chalmers et al. , 1992) David J. Chalmers, Robert M. French , and Dougl as R. Hofstadt er . High-le vel perception, representa tion, and analogy: A critique of artiﬁcial intellige nce methodolog y . Jou rnal of E xperimen tal & T heor etical Ar- tiﬁcia l Intellig ence , 4(3) :185–21 1, 1992. (Chiarell o et al. , 19 90) Christine Chiarello, Curt B ur gess, Lorie R ichard s, and Alma Pollock. Semantic and associat iv e priming in the cerebral hemispheres : Some words do, some wor ds don’ t . . . sometimes, so me places. B rain and Lan- gua ge , 38:75–104 , 1990 . (Ev ans, 1964) Thomas E v ans. A heuristic program to so lve geometric -analogy proble ms. In Pr oceedi ngs of the Spring J oint Computer C onfer ence , pages 327– 338, 1964. (Falk enhain er et al. , 1989 ) Brian Falken hainer , K enneth D . Forb us, and D edre Gentner . The str ucture-mapp ing engine: Algorithm and e xamples. Artiﬁcial Intelli gence , 41(1):1– 63, 1989. (Gentner , 2003) Dedre Gentn er . Why w e’ re so smart. In Dedre Gentner and Su- san Goldin -Meado w , editors, Langua ge in Mind: Advance s in the Study of Lan - gua ge and Thought , pages 195–235. MIT P ress, 200 3. (Girju et al. , 2007) Roxana Girju, P resla v Nako v , V i vi Nastase, Stan S zpak owic z, Peter T urne y , and D eniz Y uret. S eme va l-2007 task 04: Classiﬁcati on of se- mantic relations betwee n nomin als. In Pr oceedin gs of the F ourth Internatio nal W orksho p on Semantic Evaluat ions (SemEval 2007) , pages 13–18, Prag ue, Czech Republic , 2007. (Hatzi v assilog lou and McKeo wn, 1997) V asileio s Hatziv assiloglou and Kath- leen R. McKeo w n. P redicti ng the semantic orientatio n of adjecti ves. In Pr o- ceedin gs of the 35th Annual Meeting of the ACL and the 8th Confer ence of the Eur opean Chapter of the A CL (A CL/EACL-1997 ) , pages 174–181, 1997. (Hirst and St-Onge, 1998) Graeme Hirst and David St-Onge. L exi cal chains as repres entations of context for the detectio n and correction of m alapro pisms. In Christian e Fellba um, ed itor , W or dNet: An Electr onic Lexical Database , pages 305–3 32. MIT Press, 1998. 24 (Hofstad ter , 2001) Douglas Hofstadter . Epilogue: A nalogy as the core of cogni- tion. In Dedre Gentner , K eith J. H olyoak , and Boicho N. Kok inov , editors, The Analo gical Mind: P erspec tives fr om Cogni tive Science , pag es 499–538 . MIT Press, 2001. (Hofstad ter , 2007) Douglas Hofstadter . I Am a Sran ge Loop . Basic Books, 2007. (Jarmasz and Szpako wicz, 2003) Mario Jarmasz and Stan Szpako w icz. Roget’ s thesau rus and semantic similarit y . In Pr oceeding s of the Interna tional Confer - ence on R ecent A dvanc es in N atur al Languag e Pr ocess ing (RANLP -03) , pages 212–2 19, B oro vets, Bulgaria , 2003. (Jiang and Conrath, 1997) Jay J. Ji ang and David W . C onrath . Semantic si milarity based on corpus stati stics and lexica l taxo nomy . In Pr oceedi ngs of the In ter - nation al Confer ence on R esear ch in Computational Linguis tics (R OCLING X) , pages 19–33, T apei, T aiwan, 19 97. (Landaue r and Dumais, 199 7) T homas K. Landauer and Susan T . Dumais. A so- lution to Plato’ s probl em: The latent semant ic analy sis the ory of the acquisit ion, induct ion, and repre sentatio n of kno wledge. Psycholo gical R evie w , 104(2):211 – 240, 1997. (Lepage and Denoual, 2005) Yves Lepage and Etien ne D enoua l. Pures t ev er exa mple-based machine tr anslation : Detailed presen tation and ass essment. Ma- chi ne T ranslatio n , 19(3):251 –282, 2005. (Lepage, 1998) Yves Lepag e. Solving ana logies on words: An algor ithm. In Pr o- ceedin gs of the 36th Annual Confer ence of the Associat ion for Computational Linguist ics , pages 728–735 , 1998. (Lesk, 1969) Michael E. L esk. W ord-word asso ciations in document retr iev al sys- tems. American Documentat ion , 20(1):27–3 8, 1969. (Lin et al. , 2003) Dekang Lin, S haojun Zhao, Lijuan Qin, and Ming Zhou. Iden- tifying synony ms among distrib utionally similar words. In Pr oceedi ngs of the 18th Intern ational Joi nt Confer ence on Artiﬁcial Intellig ence (IJCAI-2003) , pages 1492–1 493, 200 3. (Lund et al. , 1995) Ke vin Lund, Curt Burge ss, and Ruth Ann Atchley . Semantic and associati ve priming in high-dimens ional semantic space. In Pr oceed ings of the 17th Annual Confer ence of the Cogn itive Science Society , page s 660–665 , 1995. 25 (Minnen et al. , 2001) Guido Minnen, John Carroll, and Darren P earce. Ap- plied morpho logical processin g of E nglish. Natur al Langua ge Engineerin g , 7(3):2 07–223, 200 1. (Minsk y , 1986) Marvin Minsky . The Society of Mind . Simon & Schuster , Ne w Y ork, NY , 198 6. (Nastase and Szpako wicz, 2003) V i vi Nastase and Stan Szpako wicz. Exploring noun- modiﬁer semantic relati ons. In F ifth Internationa l W orkshop on C ompu- tation al Semantics (IWCS-5) , pages 285–301, Tilb urg, The Netherland s, 2003 . (Platt, 1998) John C. Platt. Fast training of support vecto r machines usin g se- quenti al minimal optimiz ation. In Advances i n K ernel Method s: Suppor t V ector Learning , pages 185–208. MIT P ress Cambridg e, MA, USA , 19 98. (Reitman, 1965) W alter R. R eitman. Cognition and Though t: A n Informati on Pr o- cessin g Appr oac h . John Wi ley and Sons, Ne w Y ork, NY , 19 65. (Resnik, 1995) Philip Resnik. Using info rmation content to ev aluate semantic sim- ilarity i n a taxonomy . In Pr oceedi ngs of the 1 4th Inte rnationa l Join t Conf er ence on A rtiﬁcia l Intellig ence (IJCAI-95) , pages 448–453, San Mateo, CA, 1995. Mor gan Kaufmann. (Rosario and Hearst, 2001) Barbara Rosario and Mar ti Hearst. Classif ying the se- mantic relati ons in noun-c ompounds via a domain-spe ciﬁc lex ical hierar chy . In P r oceedings of the 2001 Confe r ence on Em pirical Methods in N atur al Lan- gua ge Pr ocessin g (E MNLP-01) , page s 82–90, 2001. (T urney and Littman, 2005) Peter D. T urney and Michael L. Littman. Corpus- based learning of analogies and semantic relations. Machine L earning , 60(1 – 3):251 –278, 2005. (T urney et al. , 2003) Peter D. T urney , Michae l L. Littman, Jef frey Bigham, and V ictor Shnayder . Combining indepe ndent modules to solve multi ple-choic e synon ym and analogy problems. In P r oceedings of the Inter nationa l Conf er- ence on R ecent A dvanc es in N atur al Languag e Pr ocess ing (RANLP -03) , pages 482–4 89, B oro vets, Bulgaria , 2003. (T urney , 2001) Peter D. T urne y . M ining the W eb for synony ms: P MI-IR versus LSA on TOEFL. In Pr oceedin gs of the T welfth Eur opean Confer ence on Ma- chi ne Learning (ECML-01) , pages 491–50 2, Freib urg, Germany , 2001. 26 (T urney , 2005) Peter D. T urney . Measu ring seman tic similarity by latent relation al analys is. In Pr oceedi ngs of the Nineteenth Interna tional J oint Confer ence on Ar- tiﬁcia l Intellig ence (IJCAI-05) , pages 1136–1 141, Edinb urgh, S cotlan d, 2005. (T urney , 2006) Peter D. T urne y . Similarity of semanti c relations. Computationa l Linguist ics , 32(3):379–4 16, 2006. (T urney , 2008a) Peter D. Tur ney . The laten t relation mappi ng engine: Algorithm and ex periments. J ournal of Art iﬁcial I ntellig ence Resear ch , 3 3:615–6 55, 2008. (T urney , 2008b) Peter D. T urney . A uniform approa ch to analogie s, synonyms , anton yms, and associat ions. In Pr oceed ings of the 22nd Internatio nal Confer- ence on Computa tional L inguis tics (Coling 2008) , pages 905–9 12, Manchester , UK, 2008. (V eale, 2004) T ony V eale. W ordNet sits the SA T: A kn owled ge-based approach to lex ical analogy . In Pr oceedi ngs of the 16th Eur opean Confer ence on Artiﬁcia l Intelli gence (ECAI 2004) , pages 606–612, V alencia, Spain, 2004. (W itten and Frank, 1999) Ian H. W itten and Eibe Frank. Data Minin g: Prac tical Machi ne Learning T ools and T echniq ues with J ava Implementati ons . M or gan Kaufmann, San Francisco, 1999. 27

Analogy perception applied to seven tests of word comprehension

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment