Lexical growth, entropy and the benefits of networking

If each node of an idealized network has an equal capacity to efficiently exchange benefits, then the network's capacity to use energy is scaled by the average amount of energy required to connect any two of its nodes. The scaling factor equals \text…

Authors: Robert Shour

Lexical growth, ent ropy and the benefits o f netw orking 1 Lexical growth, en tropy and the b enefits of netw orking Rob ert Shour T oron to, Cana da Abstract If eac h no de of an idealized netw ork has an equ al capacity to efficien tly ex- change b enefits, then the netw ork’s capacit y to use energy is scaled by the a vera ge amoun t of energy required to conn ect any tw o of its nod es. The scaling factor equals e , and the netw ork ’s en trop y is ln( n ). Netw ork in g emerges in consequence of no des minimizing the ratio of their energy use to the ben efits obtained fo r such use, and their connectability . Net w orking leads to n ested hierarc hical clustering, which multiplies a netw ork’s capac- it y to use its energy to b enefit its nodes. N etw ork entrop y multiplies a nod e’s capacity . F or a real netw ork in whic h the no des hav e the capacit y to exchange b en efi ts, n etw ork entrop y may b e estimated as C log L ( n ), where the base of the log is the path length L , and C is th e clustering co efficient. Since n , L and C can be calculated for real net w orks, netw ork en trop y for real net w orks can b e cal culated and can reveal aspects of emergence and also of economic, biolog ical, conceptu al and other netw orks, such as the relationship b etw een rates of lexical grow th and divergence, and th e economic b enefit of adding customers to a commercial comm unications netw ork . Entr opy dat ing can help estimate the age of netw ork processes, such as the gro wth of hierarchical society and of language. P A CS n um b ers 89.70.- a, 89.70 .C f, 89.75 .Da Keyw ords emergence, energy sca ling, entropy , g lo tto chronology , lexical growth, Metcalfe’s Law, net works. What are the b enefits of netw orking for a pers o n and for a lexicon? More precisely , and more generally , how muc h is a pro cess’s no dal r ate multiplied by net working? Language growth rais es the above questions. T o grow a lang ua ge, a so ciety m ust p erfo r m three problem-solving pro cesses: 1. Devise sounds (phonemes) and choo se whic h o ne s will be us ed and clus- tered for co ding. 2. Ident ify , c o nceptualize and choose which per ceptions, and which a bstrac- tions arising from them, s ho uld be co de d. Lexical growth, ent ropy and the benefits o f netw orking 2 3. Decide, using the chosen co ding so unds, ho w to code and cluster chosen per ceptions and abstractions . Language memorialize s the pro ducts of these three pr o blem-solving pro- cesses. As a so cie t y a ges, the lex icon grows, the emergent pro duct of pr oblem solving. Consisten t with this p ers pective, the English lexicon grew a n av erage of a bo ut 3.4% per decade from 1657 to 198 9 1 , and average IQs, an approximate measure of society’s problem solv ing skill, grew at a bo ut the sa me r ate in the U.S.A from 1 947 to 20 02 2 . Since lexico n formation and growth is an e mergent pro cess in a s o ciet y , the similar it y in the rates of gr owth is consistent with in- creasing av erage IQs also being a n emergent pro cess. This implies that the rate at whic h so c ie ty e mer gently improv es its capacity to solve problems, languag e being a par ticula r a ccomplishment of so ciety’s problem so lving capacity , may po ssibly be mea s ured indirectly by mea s uring lexical growth. As the num b er of p eople in so ciety g rows, so ciety’s capac ity to inven t words increases: the lexical g rowth rate increases . If w e could quantify how m uch so ciety on av erage mult iplies an individual’s capa city for lexical growth, we might then b e a ble to calculate an average ba sic lex ic al growth rate, and us e that rate as a clo ck to estimate when languag e be g an. What is the benefit of netw orking? Consider the p os ition of a child dep endent on so ciety for infor mation. A child receives informa tio n directly from L sources , including pare nts and clo se friends. Paren ts in turn rece ive informatio n from the child’s four gra ndparents, eight grandpar ent s and s o on. Ea ch of the L direct sources receives infor mation from (as a s implifica tion) the sa me av erage n um ber o f L s ources. This pro visionally suggests that the multiplicativ e b enefit o f netw orking for the child eq uals a log function, lo g L ( n ), where n is the n um ber of p eople in the so ciet y . Only decreasing L is c onsistent with bo th increa sing the v alue of the log and incr easing the netw ork b enefit: to increase the net w ork’s information benefit, the ba s e of the log must decr ease. Hence, L m ust b e prop or tional to an av erage time or distance to the information so urces be cause reducing the av erage time or distance to co nnect to information sour ces would increas e the rate o f information transmitted to a r ecipient. If the netw ork b enefit H is log( n ) = η with L a s the base of the log (equiv alen tly , L η = n ), one may infer that the net w ork is hierarchical (fro m L η ), that, likewise, L is the scaling factor , that the no des m ust all have equal (or the same av erage) attributes since the fo r mula for netw ork bene fit do es not disting uish b etw een no des, a nd that the hier a rch y must b e flat, since L η = n requires that low er levels in the hierar ch y cont ain the same n no des. The av erage distance (or time) betw een no des is non-commensurable with the num ber o f nodes . W e seek L co mmensurable with a para meter of the ent ire netw ork. If energy is pro p o rtional to the distance a sig na l has to traverse, then the av erage energ y for the reception of information fr om another node in the netw ork scales the (commensur able) energy o f the netw ork. Adjacency plays a critica l role since a node cannot co nnect to non-adjacent nodes without first connecting with an a dja c e n t no de. The fo r egoing o bserv a tions guide the mo deling of an ideal net w ork N with Lexical growth, ent ropy and the benefits o f netw orking 3 the following attributes. 1. N exists. Its no des require ener gy , are connectable and can transmit a nd receive b enefits. N has n different but otherwise indistinguishable no des, where n is grea ter than o ne a nd finite. Eac h no de has the capacity to transmit and receive benefits. E ach no de in N ha s some a djacent no des, and all other no des are non-adjace n t. A pa ir of no des are adjacent if they are connectable in one step, and are non-adjac ent if they ar e only connectable in multiples o f o ne step. Each one step connection only nee ds to be cre ated once. Each cr eation of a one s tep connection has the same finite energy cost. Each no de’s ener g y is con tin uously supplied, at a finite rate, solely b y its environment . At every p oint in time, ener gy units are defined so that o ne unit of ener gy per unit of time transmits one unit of b enefit one step. F or a s uccession of netw ork equilibrium states, the energy units and time units are adjusted if necessa ry to maintain their one-to-one prop ortion to each other. 2. Every no de in N can r esp ond to its environment and will minimize its use of energ y for the acquisition of each unit of benefit received from the environmen t or fro m ano ther no de in N , and will maximize the b enefits it receives for each unit of energy it exp e nds. This attribute ma y b e called no dal self-interest. The next 11 pr o p o sitions follow from the preceding attributes: 1. No des hav e the capacity to transmit multiple benefits: b ecause energy is contin uously supplied to them. 2. No des c o nnect. When the b enefit received is greater than the ener gy cost, a no de connects to a nother no de, beca use of no dal connecta bility and self- int erest. Even if the energy co st of a connection exceeds the v alue of the bene fit, the benefit of receiving multiple transmissions will at some p oint exceed the cost of connecting, b ecause c o nnecting has a one-time finite cost. 3. Adjacent no des connect in one step beca use nodes maximize the b enefit per unit of ener gy , and the b enefit of connecting to adjacent no de s p er unit of energy is higher than the b e ne fit o f connecting to non-adjacent no des p er unit of energy . T her e is an energy a dv antage to adjac e nc y and nearer proximit y . 4. No des connect bi-directionally when possible: b eca us e o ne bi-directional connection co sts less energy to create than tw o s ing le direction connec - tions, and no des can b oth transmit and r eceive; a ll s ingle step connections hav e the same energy cost. 5. An average num ber of steps L b etw een pairs of no des in N exists: be c ause the n um ber of no des and, therefore, the num ber of s teps b etw een no des, Lexical growth, ent ropy and the benefits o f netw orking 4 are each finite. Since non-adjacent no des exist for every no de, L is a po sitive num b er grea ter than 1. L plays a cr itical r ole in the emergence o f a netw o rk, as discussed later in these prop ositions and later in this pap e r . 6. The av erage n um ber of energy units to transmit one unit of benefit fro m one no de to another is L : b eca use one unit o f energy tra nsmits one unit of bene fit one s tep in one unit of time, a nd L is the av erage distance in steps betw een pair s of no des. Therefore , a t every po in t in time, L , in addition to being the av erage num b er of steps be t w een no des, is also the av erage nu m ber of energy units p er unit o f time req uir ed by a no de to transmit a bene fit to another no de. L is prop o rtional to the average energy req uired to connect t w o no des . 7. N uses n units of energy pe r unit of time, at each po int of time: b ecause of the w ay energy units are defined at ea ch p oint in time. 8. N ’s energy use is scaled b y L at each p oint o f time. Suppose, to sim- plify calculation, an external energy sour ce contin uously tr ansmits energy to each no de in N at a constant rate, so the ra tio of the ra te of ener gy units transmitted to the b enefit p er step is constant. Supp ose the energy source, the z er o th energy generation, is one step aw ay 3 from N and tra ns - mits energy b enefits to a single path of L steps, thereb y r eaching L first generation no des. Suppo se further that ea ch r ecipient fir st genera tion no de retransmits en- ergy be nefits along single paths, since every recipient no de ca n a lso trans- mit. By the connectability of no des and the existence of L , each no de on the initial pa th of L s teps can transmit to L nodes using L energy units. L 2 second gener ation no de s receiving tra nsmissions fr om L first generatio n receive L 2 energy b enefits. Each path o f L no de s can co nnect to L times as many no des, until the energy b enefits r each all L η = n no des in N , for some η . Now, instead, suppo se the external source has the capacity to transmit ener g y benefits to all single pa ths of L distinct nodes in N (that is, n/L s uch paths), and those fir st generation no des have the ca pacity to transmit in turn as a bove, whic h is p ossible b eca use a ll no des are e qually capable of receiving and transmitting. The num ber of all η th generation no des cannot tota l mor e tha n n distinct no des. In view of L b eing the av- erage distance in steps betw een no des, the η th generation no des cannot b e more than an average of L steps aw a y from N ’s other no des, e ven though η can b e larger than L . No des are also contained in generations preceding those in the η th generation. The o nly wa y this capacity for clustering ca n b e acco mplished is if the n no des in each generation are the same n no des a s those in every η th generation of no des, and if k th generation clusters are nested in ( k − 1 ) st generation clusters. It must b e that every no de tha t is a member of a cluster is a lso contained in a cluster closer to the ener gy source that is L times larger , up to the z ero th energy generation. The L first generation Lexical growth, ent ropy and the benefits o f netw orking 5 clusters have size L η − 1 , the L 2 second genera tion clusters hav e size L η − 2 , and so on. The expo nent in the ex po nent ial formula for the capa city to transmit b enefits m ust b e the same a s the base o f the log in the form ula for the capacity to receive b enefits, for N to b e sca led by L . A t every p oint in time, N ’s receipt of n ener gy units p er unit of time is scaled by the av erage num ber o f energy units used to tra verse the av erage nu m ber of steps L b e t w een no des. Since connections are bi-dir ectional, if a node can transmit benefits to η cluster generations , it can also receive bene fits fr o m η cluster gene r ations. As a model o f N ’s capacity fo r scaling, consider 27 nodes in a single line forming a flattene d hierarch y sca le d by 3. Differen tly scaled clusters are brack eted using differently shap ed brackets: [(. . . )(. . . )(. . . )][(. . . )(. . . )(. . . )][(. . . )(. . . )(. . . )]. In the mo del’s 27 no des, single no des s c ale up to clus ter s of 3. Clusters of 3 scale up to cluster s of 9. Clusters of 9 sca le up to a cluster of 27 . I infer that differently sized cluster s hav e differen t emergent pro cesses. T he 27 no des ha v e three hier archical cluster ge ne r ations, though w e can observe only one row of 27 no des. The English language is structured in a similar wa y . F or example, in its alphab etical r epresentation, the English letters, s , h , and t can combine to form sh and th , tw o-letter clus ter s that sound differently than their c om- po nent letters. Different ly sized letter clusters can join to for m word r o ots, prefixes, suffixes, and larg er cluster s we call w ords. W o rds, together with spaces, can form noun, verb and ob ject clusters, c a lled phrases. W ord clusters from different cluster ge ne r ations can form sentences. But an alphab etically r epresented se n tence just consists of a sing le string of indi- vidual letters and spaces. Similar observ ations apply to a sentence co ded by a string of sounds. So ciety , using g r ammar emerg ently a nd hier archi- cally organiz e s langua ge 4 . If so, the 1950s h ypothesis, still current, that there is a grammar module in the bra in (a ‘language instinct’), is unnec- essary . Gr ammar emerges through the adaptive netw ork efficiency of a so ciety using a lexicon. Reductionism applied to a conceptua l problem involv es the a pplication of problem solving (energy ) to a conceptual cluster that is a part of the larger problem. The design of a place ho lding numeration sy stem ena bles the combination of different cluster genera tions to describ e num b er s . The num ber 52 8 , for exa mple, co m bines and contains five second g eneration clusters (100s), t wo first g eneration clusters (10s) and 8 z e ro th generation singletons. Each cluster size is a sepa rate concept. So cial netw orks, music 5 , laws 6 , and athletic mov es are also clustered hier- archically . Lexical growth, ent ropy and the benefits o f netw orking 6 9. N is self-simila r, since efficiency consider a tions applicable to adjacent no des apply a lso to adjacent clusters. A cluster is par t of a cluster L times la rger. F or a no de or cluster to efficiently maximize the bene fits it receives, reception of benefits from a c lus ter s maller than N ma y b e suffi- cient. F o r a rec ipie nt no de or c luster to be efficien t, it m ust not ca ll upo n more of the netw ork’s cluster e d energ y reso urces (or energy capacity) than is minimally neces sary for it to obtain the b e nefits it needs in particular circumstances, but it has the capacity to call o n a ny of those clusters. F o r N to b e efficient, it must not use more o f its energy res ources to b e nefit a no de o r cluster than is neces sary , but it ha s the capacity to transmit all of its r esources to a no de or cluster. Energy clustering ena bles efficient allo- cation of N’s energ y resour c e s. F o r N to b e ev erywhere self-simila r, there can b e no lo cal v ariatio ns in the path length; the ener gy req uirements p er no de m ust be everywher e equa l. 10. The benefit of a netw ork emerges: η m ultiplies the c apacity of a s ingle recipient no de. Since ea ch of the η gener ations in the hierarch y of clusters contains all n no des in N , N has the capacity to communicate η times the capacity of one cluster generatio n to a no de. (When η = 0, no en- ergy units are transmitted b ecause there is no energ y to transmit. The capacity of a cluster o f size L is itself s ome m ultiple o f the capa c it y of an individual no de.) So if the measure of the capacity of a single ge ne r ation of c lusters co n taining all n no des to b enefit a no de is A ( L ), the measure of the capac it y o f a ll η cluster g enerations to benefit a no de is η A ( L ). But the c a pacity of a ll η cluster genera tions is the capa city of the net- work, A ( n ). Hence A ( n ) = A ( L η ) = η A ( L ), and A must b e a log arithmic function with base L . If H L represents the ca pacity o f a netw ork of n no des to benefit a node, H L ( n ) = log L ( n ) = log L ( L η ) = η . The capacity of N to m ultiply the effect of its ene r gy reso urces depends on cluster ing. A no de can b enefit o nly by receiving tra nsmissions from a cluster, and a cluster can increa se its capacity only if no des and clus ter s transmit to it. Without the capac ity for both reception a nd tra nsmission 7 , this could not o ccur, o r at least w ould not necessa rily oc cur in an efficient way , cont rary to the as sumption that no des a re b enefit maximizers. A netw ork can also tempo rally netw ork with r ecorded earlier editions of itse lf, like a p ers on pro of-rea ding their own earlier work. No dal self-interest, combined with co nnectability , leads to the emerg ence of a net w ork that b enefits the netw ork’s no des. T ra nsmission by a so cial netw ork of a so cial be nefit to a recipient is a n indirect trans fer o f the netw ork’s lo garithmically compr essed e ne r gy . A lexical net w ork (lang uage) lo g arithmically compresses the transfer of en- ergy (the e nergy used to solve the problem of co mpressing per c eptions int o concepts) b e t w een and among the members o f so ciety who use the language. By receiving information expresse d in words, a recipient can receive and share in b enefits arising from the previous exp enditure of en- ergy by other members of so ciety , past and present, which energ y was Lexical growth, ent ropy and the benefits o f netw orking 7 used in o ne o r more of the three problem solving pr o cesses inv olved in creating language. B e cause of netw orks, a member of so ciety nee d not b e adjacently co nnected to receiv e s uch b enefits from a remotely connected other mem ber of so ciety , past or pres ent . 11. F or an idealized netw ork, L = e . F o r every no de in the first generatio n, the num ber of no des it ha s transmitted to increases from genera tion to generation a t the ra te L , until it has r eached all the no des of the η th generation. F or contin uous functions, if a function is its own de r iv ative, so that y ′ = y , then y = f ( x ) = e x , to whic h the behavior of the L -sized clusters is similar. The self-similar it y o f N in all g e ne r ations ther efore implies that L , the bas e o f the log, is the natura l lo garithm, na mely e , ab out 2.71 828. In that case, a netw ork’s b enefit is ln( n ). The optimal path length fo r a one-wa y broadcasting no de is 1 (but such a node would require mor e energy than a v erage to broadca st to the netw ork). If nodes did not all have equal capacity for tra ns mitting and receiving, then N would not necessar ily b e s elf-similar in a ll cluster generations. T o determine the netw ork benefit for a no de in an ide a l netw ork, the a t- tributes ab ove seem sufficien t; microstates of the no des a nd clusters are not of int erest b ecause the scaling factor is an a verage. In their semina l 1998 article 8 , W atts and Stroga tz use three parameters , n , L and C , to character ize a kind of real net work they call ‘a small world net w ork’. The first parameter, n , is the num ber of no des. L is the path length, the smallest num ber , averaged ov er all pairs o f no des, of steps b etw e en no des. C is the clustering co efficient, the fraction of allow able edges, connecting to a vertex in a graph of the net work, that actually ex ist, averaged ov er all no des. The clustering co efficient can also be defined using the notion o f adja c ency . Suppose we c a lculate, for every no de , the pr op ortion of its adjacent no des that are connected to it. The clustering co efficient, C , is the average of those pro p o rtions for N ’s no des. F or a real net work, the n um ber of steps b etw een nodes a nd the prop ortion of connected adjacent no des are measured for all, or a r epresentativ e sample, of the netw ork’s no des, and the results ar e av eraged to obtain L and C . Long distance connec- tions b etw een clusters result in the ‘sma ll world effect’, sometimes descr ib e d as ‘six degrees o f separation’. F or a rea l net w ork, the clustering co efficie n t is b etw een zero and one, which differs from an ideal net w ork which implicitly assumes C is 1. Thus for a real net work, o nly a pro po rtion C of the b enefit of the netw ork reaches a no de, and for n , L a nd C at a given po int in time, H L ( n ) = C log L ( n ) . (1) In a real net work, no des might b e unequal in capacities, energy r equirements, and the num ber of steps b etw een no de s . An a v erage num ber of steps L exists, how ev er, b ecaus e , whether for top olo gical, physiological or other reasons, when the num ber of no des is lar ge, they ca nnot all bi-directiona lly connect to a ll other no des in one step. In a real netw ork, the fractio n p er step (energy/b enefit) ma y Lexical growth, ent ropy and the benefits o f netw orking 8 differ from 1. F or a netw ork, e is a b enchmark. Suppo se that for a r e al net work, the pe r step fra ction (energy/b enefits) < 1, with n and C unc hanged. Either the b enefits p er av erage step a re higher, or energ y p er av erage step is low er, compared to an ideal netw ork. If the relative benefits p er step increa s es, the relative b enefit of the net w ork increa ses. F or n and C unchanged, the only wa y the netw ork b enefit can increase is if L is smaller than e . An analo gous argument implies that when (energ y/b enefits) > 1, L is gr eater than e . F or example, in so cia l netw orking, the ‘six’ in six deg rees of sepa r ation may reflect the gre ater amount of energy require d to connect to remotely lo cated p e o ple, and the sma ller so cial benefits r eceived from remotely lo cated p eople, compare d to tho se c lo ser. Though energy scaling leads to a flattened hierarchy for a n ideal netw ork, it may b e p os sible that a ph ysically observ able e ne r gy hierar ch y indirectly mani- fests itself in r e al netw orks of cells in o rganisms, buildings in a cit y , or star s in a galaxy . Equation (1) has a form simila r to that for entrop y used in infor mation theory , and so may be called, b y analo gy , the entropy of a netw ork. In 1 948, C. E. Shannon der ived an equation fo r the entropy of a set o f probabilities 9 , H r ( S ) = K n X i =1 p i log r p i , (2) to analyze str ings of s ymbols. He called H (the Gr eek letter eta ) in E quation (2) entropy beca use it has the sa me for m a s that used for entropy in s ta tistical mechanics. The r is an arbitra ry base of the lo g, S is the sym bo l source, K is an arbitr ary po sitive constant, and p i is the pro bability of the i th symbol. In Shannon’s deriv ation, pr obability and informa tion a re related. If the pro b- ability of an even t occurr ing or no t o ccurr ing is 1 00%, no new informatio n is acquired after its o cc ur rence. O nly resolution of uncertaint y adds information. In Equatio n (2), the base of the log is usually 2 b ecause Equation (2) is mostly used in co nnection with dig ital communication. K is usua lly set to 1. Like Equation (2), the formula for an idea l netw ork’s entrop y can b e de- rived using probability . Equality of no dal capacities implies that the av erage probability that a no de in N is a n infor mation so urce is 1 /n . When p i = 1 /n , Equation (2) reduces to K lo g r ( n ), with the base of the log L and the c onstant K the clustering co efficient, for the rea sons stated ab ov e. W e ig ht ed pro babilities and ener gy s c a ling b oth lead to the same formula for net w ork entropy . Each deriv a tion likely implies the other: weigh ted probability pa ths imply scaling when p i = 1 / n , and s caling implies weigh ted pro bability paths. Ea ch describ es a different asp ect o f entropy . An idea l netw ork has max imal uncertaint y (or equality) p i = 1 /n for a ll no dal sour ces. The resulting equality of no dal ca- pacities leads to energ y s caling, maximally efficien t and maxima lly uncer tain or equal. In information theory , the joint en trop y o f a joint ev en t is less than o r equal to the sum of the comp onent entropies. In information theo ry , entropy is maximal 10 for a netw ork o f n no des when p i = 1 / n . E quiv alently , netw ork en trop y is maximal if w e suppose the energy Lexical growth, ent ropy and the benefits o f netw orking 9 requirements of N ’s no des ar e equal, or if we scale N ’s energy by L . Wh y L scales N ’s energy gives some insight int o the op e r ation o f a netw ork. Supp ose a given signal can b e pro pagated from a prop er subset o f N co nsisting of n/ ( L η ) no des. This is efficient for N , b ecaus e N do es not hav e to use all its no des’ energy 11 any time a sig nal is to b e se nt to all or par t of N . If the sp eed of the signal is less tha n L steps per L time units the signal can not reach the whole of the netw ork within L time units; the signaling no des in the subset are using less than the average amount of energy p er no de, and the en tropy of N is therefor e les s than optimal. On the o ther hand, if the sp eed of the s ig nal is gr eater than L steps pe r L time units, the signaling no des in the subset a re using mor e than the average amount o f energy p er node, a nd the entrop y of N will als o b e less than optimal b e cause N ’s other no des will hav e less than the a v erage ca pacity to transmit. T o optimize net w ork entropy a conserv a tive approach is to str ucture N so that N ’s no des hav e equa l ca pacity to acc e ss N ’s energy , becaus e p otentially each no de has an equa l capac ity to b enefit N . The distribution of equal capacity may o ccur in some netw orks naturally due to the randomness of ener gy distribution. While no dal self-interest would result in a no de tending to accumulate as m uch energy to itself as po ssible, netw orking leads to the emerg e nce of a net- work b enefit, which benefits no des individually and collec tively , a nd therefore restrains the a ccumulation of energy b y individua l nodes. L equaling e r econ- ciles self-interest and the benefit of netw orking. Since a netw ork is self-similar, the conflict b etw e e n no dal self-interest (lea ding to unr estrained accumulation of energ y) and ne tw ork benefit (leading to equal distribution o f ener gy) would arise in c lus ter g enerations as w ell. An ideal netw ork ma x imizes efficiency as a consequence of its assumed at- tributes. A real net work ma ximizes its ener gy efficiency by its c o nt inual adap- tation to its environmen t. Since b oth the ideal and r eal netw orks are maximally efficient, the idea l b y assumption, and the actual b y a da ptation, an ideal net- work may b e a reasonable model of a real netw ork with similar attributes. If the assumptions of an ideal ne tw ork apply to economic actors, a commu- nication system, b o dies that ar e mutually gravitationally attractive, or a gro up of molecules , the net w ork will be maximally e fficie nt when the capacities and energy of the net w ork a re eq ually distributed a mong its no des. This infer ence omits co ns ideration of the impact that the netw ork may hav e on its environmen t (externalities), and the effect of changes in the environment on N . Shannon also observed 12 that, for sym bo ls, H ′ = mH . (3) This applies, analog ously , to net works. If H ′ is the rate of a net w ork pro c e ss, and H is the net w ork’s en tropy fo r that pro cess, then m is the pro cess rate when η = 1, that is, when hiera r chical str ucture and net w orking b egan. If the pro cess gr ows exp onentially (which s caling suggests ca n o ccur), we can calculate the average rate a t which the num ber o f no des grows, if their n um ber a t the beg inning of the pr o cess (time t 1 ) and at its end (time t 2 ) are known, by s olving Lexical growth, ent ropy and the benefits o f netw orking 10 for m in n ( t 2 ) = n ( t 1 ) e mt , wher e t = t 2 − t 1 . If the entrop y H o f a sys tem S at t 2 and of its ances tral system at t 1 are bo th known, a nd t = t 2 − t 1 , solving m in H ( S ( t 2 )) = H ( S ( t 1 )) e mt (4) may give an estimated av erage r ate of g rowth for the en tropy itself. The pr o- ductivit y r ate of so ciety when η was 1 mea sures so ciet y’s capac it y to use ener gy befo re that ca pacity w as mu ltiplied by clustering in a s caled wa y (i.e. b y en- tropy). With that, knowing the rate of change p ermits one to date a beginning of a pro cess, b ecause the ending a nd s tarting rates o f e ne r gy utilization, a nd the degree o f energy cluster ing, are all indir e c tly known. W e can use the av erage rate o f growth in the num ber of no des or in the size of e ntropy to estimate when a netw o rk’s en tropy growth b egan: that is, when η was 1. Supp ose entropy and the av erage rate of gr owth in the n um ber of no des a t a pro cess ’s beginning t 0 and their num ber at the pro cess’s end t 2 are all known. Then we es timate the duration of the pr o cess by solving for t in e mηt = n ( t 2 ), with t = t 2 − t 0 . Simila rly , the av erage rate of gr owth in ent ropy can b e used to es timate when η was 1. F or ex ample, the finding of the age of mito chondrial Eve using DNA ma y be finding the age o f the cluster generation for η = 1 for div erging mito chondrial DNA; thus E ve would be a representative individua l from that cluster g e neration, not necessarily a single per son as appea rs to b e sometimes inferred. Entrop y dating is acc urate o nly if the calcula ted av erage rate prev ailed for the entire p er io d preceding the earlier of the t w o dates used for ca lculation. F or example, if neuronal physiology since languag e b eg an ha s no t c hanged, then neuronal ener gy us e p er step has not c hanged, and m for lexical growth may hav e b een unchanged during lang uage’s dev elopment. O n the other hand, ov er millions of years neuronal physiology and the rate of energy supplied by the environmen t may have v ar ied, a nd using m for a long per io d pr e ceding the time for whic h its average v alue was determined may yield uncertain results. The following observ a tion ab out conceptual net w orks applies to the lexical growth example b elow. Each p erson in a so ciety p ossess e s netw orks of ideas; living individuals netw ork with inherited idea s. Supp ose that, on average, each per son po ssesses the capacity to a ccess the same concepts. T o calc ulate the ent ropy of concepts promulgated by the s o ciety for a given era , multiply the ent ropy of that so ciety times the entropy of the concepts that a re held in co m- mon. The net work of idea s common to ea ch average member of a so ciety is like an infrastr ucture (in a mathematical der iv ation, a constant). Infra s tructures include realized idea s suc h as roads, buildings , a nd technologies. T o apply E quation (1) to a r eal netw ork, the r eal netw ork’s attributes m ust be similar to those of an idea l netw ork. Then only n , L and C , whic h provide statistics ab out the macrostate of the real netw ork, are needed. Even though no des p ermute among cluster s for some real netw orks, the av eraging used to calculate L and C for a re a l netw ork in effect assigns to cluster s distinct no des of eq ual average capacities. Researchers’ calculations have enabled them to estimate the pa th length Lexical growth, ent ropy and the benefits o f netw orking 11 for rea l net w orks, such a s, for ex a mple, a human brain (2.4 9) 13 , the nervous system of the worm C. ele g ans (2.65) 14 , and the English lexicon (2 .6 7) 15 . F o r these e x amples, L is clo se to e , 2.7 1828 . Perhaps in these e x amples the conflict betw een no dal self-interest and the b enefit o f netw orking has been efficiently reconciled. W e now estimate the effect of adding nodes to a net work. Let H ′ 1 be the rate for a net w ork pro cess for a netw ork of n 1 no des. Let H ′ 2 be the rate for a larger num ber of no des n 2 = ( n 1 + A ). Assume L , C and m do no t change a s the net w ork g rows. Then the increase in H ′ 1 due to A additiona l no des is H ′ 2 − H ′ 1 = mC log L ( n 2 ) − mC log L ( n 1 ) = mC log L ( n 2 /n 1 ) = mC log L (1 + A/n 1 ) . (5) If A = 1, Equation (5) represents the difference that the pr esence or absence of an individual makes to a gr oup. If n 1 is small, likely C is closer to 1 a nd L sma ller than for a lar ge gr o up, and an individual ma kes a larg er difference to the entropy o f the group. A related issue aris es in the early 19 80s prop osed estimate, dubb ed Metcalfe’s law, that the profitability of a co mmer cial com- m unication netw ork grows with the square of its size. Equation (5 ) ma y apply instead 16 . Since the entropy of a lar ge netw ork changes s lowly with n , m uch of the commercia l benefit of adding customers to a large netw ork likely results from eco nomies of scale. F or merging related ex isting netw orks, the joint en- tropy is less than the s um of the comp onent entropies if the pro cesses of the tw o are not indep endent, as may b e the case, for example, for fixed line and cellular telephone net w orks. As a n exa mple of en tropy dating, s uppo se that h umans’ lineal anc e stor had one third as many neurons 3 million years ago. Then H ( ear l y brain ) would be 14.077 , compared to H ( modern brai n ) = 14 . 71 17 . The a v erage growth rate in neuronal entropy over 3 million y ears would be .0147 8. . . per million years. At that rate, it would tak e 995 million y ears for neuronal en trop y to e volv e from 1 to 14.71, or from the first connected neurons to 10 11 neurons. This manner of estimation req uires that the energy requir ement s, the energy supply , and the capacity of neurons were on av erage the same ov er the whole per io d of their developmen t, probably unlikely given the num b er of years inv olv ed, though if net work ed neuro ns optimized their L a nd C early in their developmen t, the v alues of L and C may have c hanged only slig ht ly over those y ears. The estimated 1989 entropy of 350 million English spea kers (a s o cial net- work) 18 is 12 and o f a n English lexicon (a conceptua l netw ork) 19 of 616,0 0 0 words 20 , 5.93 . The entropy of Eng lish lexical g rowth is the pro duct of the t w o ent ropies. W e now wish to estima te the a verage basal r ate lexical gr owth, the rate of lex ic a l g rowth without the multiplier effect. The es timated 1 6 57 entrop y 21 of 5 ,281,34 7 Eng lish sp eakers 22 is 9.44 5. The ent ropy of the 1 657 E nglish lexicon of 200 ,000 w ords is 5.431 . The pr o duct of the av erage p opula tion entrop y of 1 657 and 1989 times the average lexicon Lexical growth, ent ropy and the benefits o f netw orking 12 ent ropy for 1657 and 1989 is 10 . 72 × 5 . 6 8 = 60 . 94. This is the average v alue of the m ultiplier for the per io d from 1 657 to 1989. Using this multiplier, the basal lexical growth rate from 1 657 to 19 89 is a bo ut 5 .6 % p er tho us and years. A indep endent mea ns of chec king the 5.6% per tho usand year ra te in volv es glotto chronology 23 . Glo tto chronology uses the r ate a t which tw o related lan- guages diverge to date their common a ncestral lang ua ge. In the 196 0s, Mo rris Swadesh deter mined that a fter 1 ,000 years, t wo related Indo -Europ ean lan- guages shared on av erage 86 % of the words o n a Basic List he c o mpiled (i.e. a 14% divergence after thousand years) 24 . The divergence betw een two related languages after a thousand years, if now adjusted b y recent w ork b y Gray a nd A tkinson 25 , is ab out 1 1.32% p er thousand years 26 . If each of the tw o daughter languages diverges from the mother language at the same av erage r ate, then the av erage rate of divergence p er daugh ter language is one half o f 11.32 % p er thousand years, w hich is 5.66 % per thousand years, very close to the 5.6% per thousand years found using the entropies of the English sp eaking p opulation and E nglish lexicon. If we assume that the English lexica l growth r ate is r epresentativ e o f lexical growth rates and that h uman lex ic a l g rowth is a stable capacity , we can use the 5.66% p e r thousand years basal lexical g rowth rate to estimate when langua ge beg an. W e as s ume that ancestral so cieties, consisting of 50 individuals 27 using 100 different call signals immediately pr eceded languag e’s beginning, and had mo dern v alues for L and C for their s o ciet y . It would take ab out 154,000 years for the le x icon to grow from 100 words to the 616,5 00 words of the OED in 1989 at the r ate o f 5.66% p er tho us and years. In a ddition to the three pr oblems co nfronted in growing a lang uage is a fourth problem: c hoos ing, from the menu o f concepts and opp ortunities tha t a so ciety has stored up in all cluster g enerations of its la ng uage, culture, and economy , which ones b est apply to the immediate circumsta nces. What we regar d as individua l in telligence may cons ist to a large extent of le a rning the conceptual menu crea ted by so cieties ov er thousands of years, as seems to b e suggested b y the m ultiplicativ e effect of netw ork entropy . Some concepts a nd theorems in information theo ry ma y b e adaptable to the ent ropy of a network. Being a ble to calculate entrop y may a ssist in the analysis of eco no mic 28 , biological, communication, conceptual, and so cial netw orks. If the entropy of a netw ork has these uses, then statistical informa tio n ab out real net works of in terest will b e helpful. Notes 1 The Early Mo dern English Dictionaries Database (EMEDD) at the Universit y of T oronto , w w w .chass. u toronto.ca/eng l ish/emed/ # dic at October 15, 1999 had ab out 200,000 w ord- en tries at 1657. The Oxf or d English Dictionary (OED) at 1989 had ab out 616,000 w ord- en tries. W ord-coun ts v ary among dictionaries. I assume that lexical criteria are similar for these sources. Because of the recency of historical dictionary pro j ects, lexical growth as a metric of language app ears not to hav e b een pr eviously considered. 2 J. R. Flynn, Journal of Educational Measurement, 2 1 (3), (1984), 283; Psychological Bulletin, 101 (2) 171 (1987); What is Intel ligenc e? Cambridge Unive rsity Pr ess, 2007. Lexical growth, ent ropy and the benefits o f netw orking 13 3 So no node has a preferred rol e as a transmitter. Al l n nodes ar e p oten tial recipien ts. 4 V. F romkin and R. Rodman, An In troduction to Language (6th ed.) (Harcourt Brace, New Y or k, 1998), p. 77, 111; A. Radford, M. Atkinson, D. Britain, H. Clahsan, A. Sp encer, Linguistics - A n Introduction (Cam bridge University Press, Cam bridge U. K ., 1999), p. 88. 5 D. J. Levitin, This is Y ou r Brain on M usic - The Science of a Human Obsessi on (Dutton, New Y ork, 2006). 6 H. Kelsen, H ans, Pure Theory of Law (The Lawbo ok Exchange, Clark, New Jersey , 2005). 7 Zipf discusses the efficiency conflict in language b etw een sp eakers and hearers. G. K. Zipf, Human Behavior and the Principle of Least Effort (Hafner Publi shing Company , New Y ork, 1949, 1972 reprint). 8 D. J. W atts and S. H. Strogatz, Nature (London), 393 , 440 (1998). 9 C. E. Shannon and W. W eav er, The Mathematical Theory of Communication. (Universit y of Illinois, C hi cago, 1949). 10 C. E. Shannon, p. 51; A. Y a Khi nc hin, Mathematical F oundations of Information Theory (Do v er, New Y or k, 1957), p. 41. 11 On a net w ork’s efficiency: V. Latora,, and M. Marc hiori, Phys. Rev. Let t. 87, 198701-2 (2001); R. F err er i Cancho, and R. V. Sol ´ e, PNAS 100: 788 (2003). 12 C. E. Shannon , p. 53. 13 S. R. Ac hard, R. Salv ador, B. Whitche r, J. Suckling, and E. Bullmore, The J. of Neuro- science 26(1), 63 (2006). They found C = . 53. 14 D. J. W atts and S. H. Strogatz. 15 R. F err er i Cancho and R. V. Sol´ e, Pr o. R. So c. B, 26 8 , 2261 (2001). L = 2 . 67, C = . 437 based on 3/4 of the mill ion different words of the Bri tish National Corpus (about 70 million wo rds). A study of the English lexicon based on w ords in an online thesaurus, likely less represen tativ e of English usage is: A. Motter and A. de Moura, Y. Lai, and P . Dasgupta, Ph ys. Rev. E. 65 065102(R) (2002). They obtain L = 3 . 16, C = . 53, which would give η = 6 . 14. 16 A. Odlyzko and B. Tilly of the Unive rsity o f Minnesota, http://www.dtc .umn.edu/ ∼ odlyzko /doc/metcalfe.p df (2005) i n A refutation of M etcalfe’s Law and a b etter estimate for the v alue of netw orks and netw ork interconne ctions; B. Brisco e, A. Odl yzk o, and B. Tilly , IEEE Sp ec- trum, July 2006, 26. They estimate that the v alue of a communication netw or k of size n grows like n log( n ). 17 Using L and C f rom S. R. A c hard, R. Salv ador, B. Whitch er, J. Suc kling, and E. Bull more for η ( modern ), and for η ( earl ier br ain ), and assuming the earlier brain had one third the neurons, where n i s the num ber of neurons. Assuming n = 10 11 neurons, from J. G. Nicho lls, A. R. M artin, B. G. W allace, and P . A. F uchs, F rom Neuron to Br ain (4th ed.) (Sinauer, Sunderland, Mass. , 2001), p. 480. 18 Using L = 3 . 65, C = . 79 for 225,226 actors from W atts and Strogatz. 19 Using L and C for English from F err er i Canc ho and Sol´ e (2001). 20 Oxford English Dictionary (OED) at 1989. 21 Again using L = 3 . 65, C = . 79 for 225,226 actors fr om W atts and Strogatz. 22 E.A. W rigley , R. Sc hofield & R. D. Lee. The population history of England, 1541-1871: a reconstruction Cambridge University Press, 1989, T able 7. 8, following p. 207, for the ye ar 1656. 23 What is Glotto c hronology , p 271, in M. Sw adesh The Origin and Diversification of Lan- guage. (Aldine-A therton, Chicago, 1971). 24 M. Swadesh, p. 276. 25 R. D. Gray and Q. D. At kinson, Nature (London) 42 6 , 435 (2003), estimate Indo- European at 8,700 years ago. Swadesh, 37 y ears b efore Gra y and Atkinson, estimated Indo- European beginning at least 7,000 years ago (p. 84). I assume Gr a y and A tkinson’s estimate is an improv ement on Swadesh’s, and so multiply 14% by 7037/8700 to obtain 11.32%. 26 R. D. Gray and Q. D. At kinson, Nature (London) 42 6 , 435 (2003), estimate Indo- European at 8,700 years ago. Swadesh, 37 y ears b efore Gra y and Atkinson, estimated Indo- European beginning at least 7,000 years ago (p. 84). I assume Gr a y and A tkinson’s estimate is an improv ement on Swadesh’s, and so multiply 14% by 7037/8700 to obtain 11.32%. 27 R. Dun bar, Gr o oming, Gossip and Language. (Harv ard Universit y Press, Cam bridge, Massac h usetts, 1997) , p. 120 - 123. Lexical growth, ent ropy and the benefits o f netw orking 14 28 If one dollar is a clai m on (or a proxy for) one unit of energy , then to maximize the entrop y of the economy , the members of so ciet y should maximize the efficiency of each dollar used to acquire benefits from soci ety . This requires the economy to permit net w ork adaptation (and therefore, nodal and cluster adapta bility) that maintains, for equilibr ium stat es, the equal ratio of one dollar to a unit of b enefit.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment