How to realize "a sense of humour" in computers ?
Computer model of a "sense of humour" suggested previously [arXiv:0711.2058, 0711.2061, 0711.2270] is raised to the level of a realistic algorithm.
Authors: I. M. Suslov (P.L.Kapitza Institute for Physical Problems, Moscow, Russia)
Ho w to real ize ”a sense of h umour” in c omputers ? I. M. Suslo v P .L.Kapitza Institute for Ph ysical Pro blems, 119337 Mosco w, Russia E-mail: suslo v@k apitza.ras.ru Abstract Computer mo del of a ”sense of h umour” suggested previously [1 – 3] is raised to the lev el of a realistic algorithm. 1. In tr o duction In the previous pap ers of the presen t author [1, 2, 3], the general sc heme of information pro cessing w as suggested, whic h naturally leads to a p ossible realization in compute rs of the simplest h uman emotions and, in particular, a ”sense of h umour”. The aim of the presen t pap er is to dev elop this general sc heme to a lev el of a realistic algorithm. 1 Briefly , the previously formulated mo del [1] consists in the follow ing. Let a success ion of sy mbols (or ”w ords”) A 1 , A 2 , A 3 . . . is en tering the input of a pro cessor. Eac h w ord A n is asso ciated with a set of imag es { B n } . The problem consists in the choice from eac h set { B n } of the single imag e B i n n , w hic h is implied in a giv en con text. W e consider that the text is ”understo o d”, if the succession of images B i 1 1 , B i 2 2 , B i 3 3 . . . is put in corresp ondence to the sequence of sym b ols A 1 , A 2 , A 3 . . . ; the former can be considered as a certain ”tra jectory” (Fig. 1). In principle, t he algorithm consists in the following: (a) all p ossible tra jectories are comp osed; (b) a certain probability is ascrib ed to each tra jectory; (c) the most probable tra jectory is c ho sen. Only step (b) is nontrivial, i.e. the alg o rithm for estimation of the probabilit y for a g iven tra jectory . Suc h algorithm should b e based on t he correlation of images, whic h can b e studied in the pro cess of ”learning ” o n a sufficie n tly long ”deciphered” text, i.e. the text written in images and not w ords. An y sp ecific realization o f suc h algorithm needs a n um b er of op erations, exp o- nentially gro wing with the length of the text; so the algorithm is able to treat the text, whic h con t ains not more tha n a certain num b er ( L ) of sym b ols. T o deal with longer texts, one can suggest the follo wing pro cedure. During pro cessing of the first L w o r ds, one remem b ers not one but 1 A t this work, w e ha ve in mind the simplest samples of h umour related with the pr imary pro cessing of information. The higher lev els of informatio n pro ce s sing can be treated s imilarly [3 ] but requir e more complicated constructions. 1 Figure 1: The sc heme of information pro cessing: eac h sym b o l A n is asso ciated with a set of images { B n } , from whic h a single image B i n n should be c hosen; succes sion B i 1 1 , B i 2 2 , B i 3 3 . . . can b e considered as a certain ”tra jectory”. sev eral ( M ) most probable tra j ectories. As the next step, the fragment of the text b etw een the sec o nd and ( L + 1)-th w ord is c onsidered, and all p ossible con tin ua tions are composed for eac h of the M tra j ectories. Then again M mo st probable o f them is remem b ered, and so on. In general, the pro cess lo oks in the following manner (F ig .2,a): the t r a jectory is branc hed strongly near its f ron t edge A , the bra nc hing is ended in a certain p oin t B , and the deciphered part C D is transmitted to the output of a pro cessor (with a certain delay AC ). A t first sight, p oint C should alw ays g o b ehind p oint B o r coincide with it. How ev er, for a biological system a dela y AC should hav e upp er b ound on the time scale, since information pro cessing is carried out in sub consciousness and no info rmation app ears in consciousness b efore a deciphered tra jectory C D has reached it. If a distance AB is sufficien tly large, p oin t C b egins to outrun p o in t B [1], i.e. the most probable tr a jectory is transmitted to consciousnes s, though the comp eting ve r sions are still conserv ed in the op era t ive mem o ry . If, in the follo wing, the probability of the transmitted tra j ectory b ecomes lesse r than for one of comp eting v ersions, then a c haracteristic malf unction o ccurs, whic h can b e iden tified with ”a h umoro us effect” on the psyc holog ical grounds. It is easy to see, tha t to endo w a computer b y a ”sense of h umour” one should be able to solve a ” linguistic problem”, i.e. recognition of a succes sion of p o lyseman tic images, whic h also arises in the mac hine translation researc hes [4, 5] 2 . This g eneral problem can 2 In spite to similarity with the pr oblem of mac hine transla tion, our a im is somewhat different; this difference is the main sub ject o f s ubsequent discussion. Usual stra tegy o f machine translatio n contains three stages (analysis of the sour c e text, tra nsfer to a nother language, and g eneration o f the target text); we are interested only in the fir st of them but developed as far as p os sible. In ex is ting prog rams this stag e 2 Figure 2: A visual imaginatio n of information pro cessing: thin lines are the tr a jectories con tained in the op erativ e memory (or subconsciessness), A is a fron t edge, B is the p oin t where branc hing is ov er, C D is a p ortion of a tra jectory tr a nsmitted to the output of the pro cessor (or consciousness). 3 b e divided in to sev eral more sp ecific problems: (1) one should comp o se a list of images, whic h are actual for a p opulation; (2) eac h w ord of a giv en lang uage should b e asso ciated with a certain set of images; (3) one needs a sufficie ntly long text for learning, i.e. the text written in images and not in w ords; (4) one should form ula t e the educational algorithm, reducing to the study of correlations b et wee n images; (5) o n the basis of this algorithm, a r ule for estimating pro ba bilit y of a finite s uccession of images should b e w orked out. In principle, problems 1 – 3 are trivial, but they need enormous a moun t of qualified work; it is difficult to imagine that s uch work can b e m ade specially for realizatio n of a ”sense of h umour”. Below w e consider t he p ossibilit y to solv e these problems in the automatic regime, ha ving in mind the linguistic information, whic h is gathered in the systems lik e ABBYY Lingv o. Problems 4, 5 are more complicated: t hey cannot b e solv ed pure theoretically and require long p erio d of exp erimentation by trial and error metho d. W e suggest here only a preliminary v arian t of their solution, ha ving in mind to emphasize some ess ential p oints . The practical algorithm can be f orm ulated on the basis of approac hes w or ked out in the field of mac hine t r a nslation [4, 5]. 2. Ideal langua ge as a limiting case. Supp ose t ha t there exists some langua g e (let it b e La tin, for definiteness), which in a certain approx ima t io n can b e considered a s ideal. W e define the ideal language as a language, whose w or ds ar e in o ne to one corresp ondence with images. Then pro blems 1 – 3 ar e solve d trivially . T o compose a list of a ctual imag es ( problem 1), it is sufficien t to write down all Latin w ords; t hey can b e num era t ed in alphab etical order. If w e w ant to recognize texts in En glish, then problem 2 is solved with a help of the English – Latin dictionary: it is sufficie n t to write dow n all v aria n ts for a translation of a giv en English w ord to Latin. As a text for learning (problem 3), one can tak e an y literary Latin text. No w let us discuss problems 4 , 5. L e arning algorithm. A simplest algorithm for learning consists in construction o f the correlation matrix A ij where indices i and j run all images in alphab etical order. W e accept that A ij ≡ 0 b efore education. O ne step of education consis t s in the analysis of a separate sen tence. Each sen tence by definition expresses a closed thought, and hence all w ords in it are in ter-related (w ords are equiv alen t to imag es in the ideal language). W e increase b y unit y an elemen t A ij of the correlatio n matrix for any pair ( i, j ) of w ords en tering this sen tence (Fig . 3), ∆ A ij = 1 , (1) is no t very a dv anced (see Fig. 6.1 in [4]). 4 Figure 3: A change of the correlation matrix A ij in the result of pro cessing the sen tence 4–2–1. 5 Figure 4: Syn tactic bonds in the sen tence ha v e a tree strusture: a Sub ject ( S ) is related with a Predicate ( P ), whic h can b e related with O b jects ( O 1 , O 2 ); the former and t he latter can ha ve A ttributes ( A 1 – A 5 ). including a case i = j (see b elow). Of course, suc h learning rule leads to inevitable errors. Indeed, syn tactic b onds in the sen tence hav e a tree c har a cter (Fig.4), and the existence of asso ciativ e links can b e guaran teed only for syn tactically connected images. How ev er, absolutely uncorrelated images can app ear in one sente nce only with probability ∼ ( n/ N ) 2 ( n is the n um b er of words in a sen tence, N is the num ber of words in a language), which should b e compared with probability ∼ k ( n/ N ) for asso ciative ly related images ( k is the correlation coefficien t). Eviden tly , suc h pro cedure a llows to rev eal correlations effectiv ely for a wide range of k v alues (1 > ∼ k > ∼ 10 − 3 ÷ 10 − 4 ). There is practically no alternative to this algo r it hm, since the syn tactic ana lysis can b e made by a computer only with large p ercen tage of errors [4]. On the other ha nd, an educational algorithm based only on syn tactic connections is a lso not v ery go o d: a sente nce ma y contain description of oft-rep eating situations (e.g. ”A h er dman drives a her d” ) where all images a re a sso ciativ ely related. As the last argument, we can note that the syn tactic analysis plays no essen tial role in human learning: the childre n and p o orly-educated p eople are sufficien tly g o o d in conv ersational language, though they ha ve no idea on sy n tax. It is eviden t from the give n estimates, that short sen tences a re more preferable for learning: the error is lesser for small n , and the whole pro cess is more effectiv e. Indeed, treatmen t of tw o sen tences consisting of 10 words require s consideration of 10 2 · 2 = 20 0 pair b onds (most of which are ineffectiv e), while the a nalysis of one s entenc e con t a ining 20 w ords deals with 20 2 = 400 pair b onds. As for av oiding long sen tences, it is not related with the essen tial waste of time or res o urces. 6 Pr ob ability of tr aje ctory. When the correlation matrix A ij is formed in the result of learning on a sufficien tly long text, the probability o f a finite succession of images can b e defined as 3 p = X i,j ( i 6 = j ) A ij (2) where i and j run the images con tained in this succession. The algorithm is based o n the pure analog principle, so the combinations of images app ear to be more probable if they frequen tly o ccur in the learning text. W e hav e excluded the terms with i = j from the sum in Eq.2, since the probability w e are interes ted in should c har a cterize the degree of connectednes s of a giv en tra jectory . A t first sight, the analog c haracter of the algorithm require s to in tro duce the condition i 6 = j also in t he learning rule (1 ). In fact, it is not so, since learning and recognition are carried o ut in somewhat differen t conditions: w e use only closed sen tences for learning, while an y fragment of text can be giv en f o r recognition, indep enden tly on the bounds of sen tences. If the condition i 6 = j is in t r o duced in (1), the n self-correlation of images app ear to b e practically zero: rep eated use o f the same w ord is usually considered a s a st ylistic mistak e 4 and practically nev er o ccurs in t he literary text. It is eviden t from the g eneral principles, tha t correlation of an image with itself should b e maximal: it is pro vided b y inclusion of the case i = j in the learning rule (1). In the course of recognition, self- correlation of images plays essen tial role: the connected frag ment of the text con tains usually some kind of the ”main hero”, whose imag e is presen t in almost any sen tence. As a result, existenc e o f the same imag e in t wo neighbouring sen tences is rather t ypical, and its self-correlation is imp ortant for an adequate es timation of the connectedness o f the text. 3. Real langua ge: nob o dy wan ted ”as worse”. Of course , any r eal language is v ery far from ideal: almost an y w o r d is asso ciated with a lot of images, while any image can b e described b y differen t w o r ds. It lo oks, as if some evil spirit interfered in human life and sp oiled sp ecially all existing languages 5 . In fact, nob o dy w an ted ”a s w orse” and nob o dy w as specially en tangling the situation: a mbiguit y of real languages is a natural consequence of their ev olution. The first w ords of the ancien t man w ere formed according to a principle ”I sing what I see” 6 : e.g. w ords ”fox” , ”wolf” , ”b e ar” arised from t he shouts of hun ters warning ab out app earance of the cor r esp o nding a nimal. These w ords had clear a sso ciations and did not p ossess a n y am big uit y (Fig. 5,a). When in ter-relatio ns b et w een people b ecame more com- plicated, some in terest arised to the problems of so cial behav io r : t he w ords lik e ”manner” , 3 Normalization of pr obability r e ma ins arbitra ry , but it is not essential for compariso n o f tra jectories. 4 If it is not an ar ticle or a technical word. 5 Legend on the Babilo n T owel is pr o bably arised fr om such impres sion. 6 T ra dition of some Asia trib es. 7 Figure 5: (a) The first stage in ev olution of language: w ords and images are in one to one corresp ondence. (b) The next stage of evolution: t he main mec hanism of arising am biguity is sho wn. F or clarity , images are giv en in square frames. 8 ”char acter” came to life. Com bination of these w ords with already existing led to ap- p earance of complex images: ”fox m anner” , ”wolf manner” , ”b e ar manner” ; these notions app eared to b e useful and w as denoted b y sp ecial words: ” cunning” , ”cruelty” , ” clum- siness” (Fig. 5,b). One can see, that the language adequately r eacted on the change of the situatio n: arised new images g a ve birth to new w ords, and the total num b er of w ords remained in corresp ondence with the n um b er of images (Fig. 5,b). The en tanglemen t of language app ears already at this primitiv e stage: on one hand, the synon ym ambiguit y arises (one can sa y ”cunning” or ”fox manner” ), on the other hand, old w ords acquire new meanings ( t he w ord ”fox” no w denotes not only ”a red thing with a big tail” but also ”cunning aun t Mary”). W e see , that ambiguit y of language is ina v oidable conseq uence of its deve lo pmen t: initially new images are explained b y old w ords, but later on the sp ecial names are in v en ted for them; ho w eve r , asso ciative relations with the old w ords nob o dy is able to ab olish. F ortunately , the en tanglemen t of language related with its dev elopmen t is easily remov- able. It is p ossible to distinguish the main meaning f o r eac h w ord (solid arrow s in Fig.5 ,b and b elo w) and its secondary meanings (dott ed ar r o ws). If each w o rd is ascrib ed to an image, associat ed with its main meaning, t hen one to o ne correspondence b et w een w ords and images is restored (Fig. 5 ,b). Unfortunately , there are ano t her reasons for arising am biguit y , whic h are external from viewpoint of lang uage. If in t wo pro vinces the same image w as named b y the differen t w ords, then unification of these provin ces in one state mak es both words to b e admissib le. As a result, irreducible synon yms arise (Fig . 6,a), under whic h we understand the words with the same main meaning; they are opp osite to reducible synon yms (Fig . 6,b), whose main meanings ar e differen t. The same effect is pro duced by t he so cial segregatio n of so ciety : for example, the sexual ob jects ar e asso ciated with the sleng words b y p o orly educated p eople, while the neutral names are g iv en t o them in the aristo cratic circles, and the scien tific terms of the Latin origin are suggested b y a medical so ciet y . The same mech anism can w ork in the opp osite direction: if one word w a s used as a name f o r completely differen t images (e.g. w ord ”like” in English), then homon yms arise, i.e. the w ords with sev eral main meanings. Irreducible synon yms and homonyms are in fact the defects of la ng uage: they a r ise b y o ccasional reasons and there is no con vincing motiv ation for their existenc e. 4. Real language instead of ideal. The analysis giv en in Sec.3 allows to understand, what kind of corrections should b e in tro duced in the algorithm, if w e w an t to us e the real languag e ( let it b e German) instead of ideal (Latin in Sec.2). In o rder to obtain the list of images, it is sufficien t (in the first approxim ation) to write do wn a ll German w ords and asso ciate them with their main meanings (i.e. a w ord is considered as a sym b olic name for an image, whic h corresponds to its main meaning). I n fact, the whole algorithm of Sec.2 remains unchanged in the first a pproximation: a set o f 9 Figure 6: (a) W ords A and B are irreducible synon yms, since their main meanings corre- sp ond to the same image γ . (b) W ords A and B are synon yms in resp ect to the image β , but they are reducible, since their main meanings are differen t. (c) The p erfect synon yms ha ve coinciding b oth main and secondary meanings. 10 images asso ciated with an English w ord is obtained with the help of the English – German dictionary , the lear ning is carried out on t he German texts, etc. The main error, intrinsic for suc h pro cedure, consists in the fa ct that correlations b etw een images are replaced by the correlations b et w een G erman words. How ev er, it lo o ks p ossible to ignore this error, b ecause a man a cts in the same manner. In h uman practice it is not customary to comp ose the lists of images, while the lists of w ords (dictionaries) are know n to ev eryb o dy . No long texts written in images are a v ailable, though long t exts written in words (b o o ks) a re w ell- kno wn. As a result, replacemen t of images b y t he corresp onding w o r ds b ecomes inevitable in h uman education. Comparative ly small error of suc h education is related with the follo wing: (a) F or most w ords, their ”main meanings” are indeed m ain, i.e. w o rds a r e used in this sense with o verw helming probability; (b) The secondary meanings of a w ord are logically related with the main one (see Fig. 5,b), and correlatio ns of the whole conglomerate of meanings with other w ords are close to correlations for the main meaning; (c) It is customary in h uman practice, that learning is carried out on a standard set of ”classical” texts. It is assumed that the ”classic” writers express their thoughts more clearly in comparison with other p eople, and the implied image is usually denoted b y the w ord, whic h has this imag e as a main meaning. In an y case, learning on the same texts leads to the same education for differen t p eople; so the people are able to und erstand eac h other, ev en if such education is far from ideal. Let us discuss no w, what kind of corrections can b e really made to tak e in to accoun t non-idealit y of language. Irr e ducible synonyms. Existence o f irreduc ible synon yms is displa y ed in t he fa ct that in our list of images some of them will be rep eated by sev eral times. T o regulate a situation, it is conv enien t to accept a viewp oint that the p erfect synon yms (Fig. 6,c) practically do not exist: if, in some momen t, w ords A and B w ere equiv alen t (coinciding b oth in the main and secondary meanings), then their eq uiv alence is spoiled with a flo w of time: they are differen tly ov ergro wn b y new meanings (Fig. 6,a) and b egin to use in differen t con texts. 7 F rom this p oin t of view, irreducible synon yms mark small v ariations of the main meaning and corresp ond to close but differen t images. Correlations of these slightly differen t ob jects with other ima g es are described ade- quately b y the matrix A ij obtained in accordance with the learning rule (1) (since in fact correlations are studied b etw een w o r ds). Some problems arise in resp ect to correlat io ns b e- t we en synon yms themselv es; they are analogo us to the problems related with self-correlation of images (Sec. 2 ). The use of tw o synony ms in one short sen tence is not desirable in the same sense, as a rep eated use o f the same word. As a result, the learning pro cedure of Sec. 2 will lead to the practical absense of correlations b etw een close imag es, while suc h correlation is large from the general principles. As a mo del for irreducible synon yms one can accept that appearing image is denoted 7 F or a situation in Fig. 6,a: if b oth images α a nd γ ca n app ear in some context, then image γ should be denoted by word B , while the use of word A leads to real a m biguities. 11 b y w o r d A with probability p A , b y w ord B with probability p B , b y w ord C with probability p C , etc. 8 Then it is easy to sho w (see App endix) that the blo c k of the correlation matrix, corresp onding to synon yms A , B , C . . . , should hav e a follow ing structure S AA √ S AA S B B √ S AA S C C . . . √ S B B S AA S B B √ S B B S C C . . . √ S C C S AA √ S C C S B B S C C . . . . . . . . . . . . . . . . (3) In the learning pro cess according to rule (1), diagonal elemen ts S AA , S B B , S C C . . . are determined correctly , while off-diagonal elemen ts a pp ear to b e practically zero; ho wev er, they can b e corrected artificially in accordance with the matrix (3). W e s ee that ”teac hing to synon yms” is carried out separately , as it is customary in h uman pra ctice. Homonyms. F rom the linguistic p o int of view, homon yms are considered a s differen t w ords, and this fact is clearly mark ed in dictionaries (usually b y Roman n umerals, e.g. like I , like II , etc.). The refore, no problems arise with homony ms, when a list of images is comp osed; they are natura lly r egistered as differen t o b jects. Ho we ver, in written and con v ersationa l text homon yms are indistinguishable, and the problems arise in the learning pro cess. F or a giv en situation, it means that ” teac hing to ho mon yms” should b e carried out ”b y hand”: if the computer meets in the learning text one of the registered homon yms, it should ask the op erato r , w hich of them is im plied in the giv en sen tence. Ho w eve r, suc h ”b y hand” stage ma y b e not v ery long: when a minimal statistics is obtained for a correlation matrix A ij , iden tification of homon yms can be trusted to the computer. As a rule, homonym s are used in en tirely different con texts, and their asso ciative links are clearly differen t. A b senc e of im ages. Our list of images may b e somewhat incomplete: certain imag es ma y b e absent in it , if no sp ecial w ords ar e inv en ted for t hem: as a rule, it concerns new or not v ery wide-spread images. The lat t er means that an image is not sufficien tly actual for a p opulation and a lot of p eople hav e no idea o f it; so we can forgive a computer, if it do es not kno w suc h imag e. Of course, suc h images can b e in tro duced in our list ”by ha nd” , if w e ascrib e them to some groups of w ords. Unfortunately , learning also should b e pro duced b y hand: the computer should ask, do es the found com bination of w ords corresp ond to the giv en image, or this com bination o ccured in the sen tence acciden tly . 5. Conclusion In the previous sections w e suggested a p ossible v arian t of solution for problems 1 – 5, formulated in In tro duction. Considering the first three problems, we hav e in mind a 8 Such mo del implies that the difference betw e en close images ha s a s ymbolical character and no atten tion is g iven to it in human practice. 12 linguistic information contained in the system s lik e ABBYY Lingv o. Suc h systems clearly distinguish the main meaning of a w ord (it is refered there as ”the first meaning”) and its secondary meanings; a list of synony ms is also attac hed. Of course, the latter should b e tested o n reducibility , but an algorithm for such test is eviden t from the discussion. The ABBY Lingv o system contains also man y combinations o f w or ds, some of whic h corr espo nd to original images; unfortunately , t heir separatio n requires additional w ork. W e hav e suggested also the a nalog alg orithm for learning and recognition (problems 4, 5), whic h should b e conside red as preliminary: it can b e dev elop ed to a practical lev el in the course of exp erimen ts based on existing programs fo r mac hine translation [4 , 5]. W e hop e that a giv en a nalysis suggests a sufficien t mat eria l for b eginning of suc h experimen ts, and realization of a ”sense of humour” in computers may already o ccur in the nearest future. App endix. Correlation of irreducible synon yms It is clear from the text, that irreducible synon yms are describ ed b y a mo del, according to whic h the app earing image is denoted by w ord A with probability p A , b y w ord B with probabilit y p B , b y w o rd C with probability p C , etc. The rep eated app earance of the same image in a short sen tence is rather improbable, and a ty pical situation corresp onds to existence of tw o coinciding images in the neigh b ouring sen tences. The probabilities of the configurations AA , AB , . . . are eq uil to p 2 A , p A p B , . . . correspo ndingly and associated with a correlation matrix p 2 A p A p B p A p C . . . p B p A p 2 B p B p C . . . p C p A p C p B p 2 C . . . . . . . . . . . . . . . , (4) whic h differs b y a constan t factor from t he mat r ix (3 ) due to arbitrar iness in normalization of the latter. The diagonal elemen ts S AA , S B B , S C C . . . of the matrix (3) can b e considered as kno wn, since they are correctly determined b y the learning rule (1). The off- diagonal elemen ts can b e established from corresp ondence of (3) and (4). References [1] I.M.Suslo v, Biofizik a SSSR 37 , 318 (1992) [Bioph ysics 37 , 242 (1992) ]; arXiv: 0711.2058 . [2] I.M.Suslo v, Biofizik a SSSR 37 , 325 (1992) [Bioph ysics 37 , 249 (1992) ]; arXiv: 0711.2061 . [3] I.M.Suslo v, Computer Chronicle (Mosco w), 19 9 4, issue 1; arXiv: 07 11.2270. 13 [4] J.W.Hutc hins, S.L.Harold. An Introduction to Mac hine T ranslation, London, Aca- demic Press, 1992. [5] Detailed bibliograph y on mac hine translation can b e found on the w eb-site b y John Hutc hins, www.h utc hinsw eb.me.uk. 14
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment