Lexical growth, entropy and the benefits of networking

Lexical growth, ent ropy and the beneﬁts o f netw orking 1 Lexical growth, en tropy and the b eneﬁts of netw orking Rob ert Shour T oron to, Cana da Abstract If eac h no de of an idealized netw ork has an equ al capacity to eﬃcien tly ex- change b eneﬁts, then the netw ork’s capacit y to use energy is scaled by the a vera ge amoun t of energy required to conn ect any tw o of its nod es. The scaling factor equals e , and the netw ork ’s en trop y is ln( n ). Netw ork in g emerges in consequence of no des minimizing the ratio of their energy use to the ben eﬁts obtained fo r such use, and their connectability . Net w orking leads to n ested hierarc hical clustering, which multiplies a netw ork’s capac- it y to use its energy to b eneﬁt its nodes. N etw ork entrop y multiplies a nod e’s capacity . F or a real netw ork in whic h the no des hav e the capacit y to exchange b en eﬁ ts, n etw ork entrop y may b e estimated as C log L ( n ), where the base of the log is the path length L , and C is th e clustering co eﬃcient. Since n , L and C can be calculated for real net w orks, netw ork en trop y for real net w orks can b e cal culated and can reveal aspects of emergence and also of economic, biolog ical, conceptu al and other netw orks, such as the relationship b etw een rates of lexical grow th and divergence, and th e economic b eneﬁt of adding customers to a commercial comm unications netw ork . Entr opy dat ing can help estimate the age of netw ork processes, such as the gro wth of hierarchical society and of language. P A CS n um b ers 89.70.- a, 89.70 .C f, 89.75 .Da Keyw ords emergence, energy sca ling, entropy , g lo tto chronology , lexical growth, Metcalfe’s Law, net works. What are the b eneﬁts of netw orking for a pers o n and for a lexicon? More precisely , and more generally , how muc h is a pro cess’s no dal r ate multiplied by net working? Language growth rais es the above questions. T o grow a lang ua ge, a so ciety m ust p erfo r m three problem-solving pro cesses: 1. Devise sounds (phonemes) and choo se whic h o ne s will be us ed and clus- tered for co ding. 2. Ident ify , c o nceptualize and choose which per ceptions, and which a bstrac- tions arising from them, s ho uld be co de d. Lexical growth, ent ropy and the beneﬁts o f netw orking 2 3. Decide, using the chosen co ding so unds, ho w to code and cluster chosen per ceptions and abstractions . Language memorialize s the pro ducts of these three pr o blem-solving pro- cesses. As a so cie t y a ges, the lex icon grows, the emergent pro duct of pr oblem solving. Consisten t with this p ers pective, the English lexicon grew a n av erage of a bo ut 3.4% per decade from 1657 to 198 9 1 , and average IQs, an approximate measure of society’s problem solv ing skill, grew at a bo ut the sa me r ate in the U.S.A from 1 947 to 20 02 2 . Since lexico n formation and growth is an e mergent pro cess in a s o ciet y , the similar it y in the rates of gr owth is consistent with in- creasing av erage IQs also being a n emergent pro cess. This implies that the rate at whic h so c ie ty e mer gently improv es its capacity to solve problems, languag e being a par ticula r a ccomplishment of so ciety’s problem so lving capacity , may po ssibly be mea s ured indirectly by mea s uring lexical growth. As the num b er of p eople in so ciety g rows, so ciety’s capac ity to inven t words increases: the lexical g rowth rate increases . If w e could quantify how m uch so ciety on av erage mult iplies an individual’s capa city for lexical growth, we might then b e a ble to calculate an average ba sic lex ic al growth rate, and us e that rate as a clo ck to estimate when languag e be g an. What is the beneﬁt of netw orking? Consider the p os ition of a child dep endent on so ciety for infor mation. A child receives informa tio n directly from L sources , including pare nts and clo se friends. Paren ts in turn rece ive informatio n from the child’s four gra ndparents, eight grandpar ent s and s o on. Ea ch of the L direct sources receives infor mation from (as a s impliﬁca tion) the sa me av erage n um ber o f L s ources. This pro visionally suggests that the multiplicativ e b eneﬁt o f netw orking for the child eq uals a log function, lo g L ( n ), where n is the n um ber of p eople in the so ciet y . Only decreasing L is c onsistent with bo th increa sing the v alue of the log and incr easing the netw ork b eneﬁt: to increase the net w ork’s information beneﬁt, the ba s e of the log must decr ease. Hence, L m ust b e prop or tional to an av erage time or distance to the information so urces be cause reducing the av erage time or distance to co nnect to information sour ces would increas e the rate o f information transmitted to a r ecipient. If the netw ork b eneﬁt H is log( n ) = η with L a s the base of the log (equiv alen tly , L η = n ), one may infer that the net w ork is hierarchical (fro m L η ), that, likewise, L is the scaling factor , that the no des m ust all have equal (or the same av erage) attributes since the fo r mula for netw ork bene ﬁt do es not disting uish b etw een no des, a nd that the hier a rch y must b e ﬂat, since L η = n requires that low er levels in the hierar ch y cont ain the same n no des. The av erage distance (or time) betw een no des is non-commensurable with the num ber o f nodes . W e seek L co mmensurable with a para meter of the ent ire netw ork. If energy is pro p o rtional to the distance a sig na l has to traverse, then the av erage energ y for the reception of information fr om another node in the netw ork scales the (commensur able) energy o f the netw ork. Adjacency plays a critica l role since a node cannot co nnect to non-adjacent nodes without ﬁrst connecting with an a dja c e n t no de. The fo r egoing o bserv a tions guide the mo deling of an ideal net w ork N with Lexical growth, ent ropy and the beneﬁts o f netw orking 3 the following attributes. 1. N exists. Its no des require ener gy , are connectable and can transmit a nd receive b eneﬁts. N has n diﬀerent but otherwise indistinguishable no des, where n is grea ter than o ne a nd ﬁnite. Eac h no de has the capacity to transmit and receive beneﬁts. E ach no de in N ha s some a djacent no des, and all other no des are non-adjace n t. A pa ir of no des are adjacent if they are connectable in one step, and are non-adjac ent if they ar e only connectable in multiples o f o ne step. Each one step connection only nee ds to be cre ated once. Each cr eation of a one s tep connection has the same ﬁnite energy cost. Each no de’s ener g y is con tin uously supplied, at a ﬁnite rate, solely b y its environment . At every p oint in time, ener gy units are deﬁned so that o ne unit of ener gy per unit of time transmits one unit of b eneﬁt one step. F or a s uccession of netw ork equilibrium states, the energy units and time units are adjusted if necessa ry to maintain their one-to-one prop ortion to each other. 2. Every no de in N can r esp ond to its environment and will minimize its use of energ y for the acquisition of each unit of beneﬁt received from the environmen t or fro m ano ther no de in N , and will maximize the b eneﬁts it receives for each unit of energy it exp e nds. This attribute ma y b e called no dal self-interest. The next 11 pr o p o sitions follow from the preceding attributes: 1. No des hav e the capacity to transmit multiple beneﬁts: b ecause energy is contin uously supplied to them. 2. No des c o nnect. When the b eneﬁt received is greater than the ener gy cost, a no de connects to a nother no de, beca use of no dal connecta bility and self- int erest. Even if the energy co st of a connection exceeds the v alue of the bene ﬁt, the beneﬁt of receiving multiple transmissions will at some p oint exceed the cost of connecting, b ecause c o nnecting has a one-time ﬁnite cost. 3. Adjacent no des connect in one step beca use nodes maximize the b eneﬁt per unit of ener gy , and the b eneﬁt of connecting to adjacent no de s p er unit of energy is higher than the b e ne ﬁt o f connecting to non-adjacent no des p er unit of energy . T her e is an energy a dv antage to adjac e nc y and nearer proximit y . 4. No des connect bi-directionally when possible: b eca us e o ne bi-directional connection co sts less energy to create than tw o s ing le direction connec - tions, and no des can b oth transmit and r eceive; a ll s ingle step connections hav e the same energy cost. 5. An average num ber of steps L b etw een pairs of no des in N exists: be c ause the n um ber of no des and, therefore, the num ber of s teps b etw een no des, Lexical growth, ent ropy and the beneﬁts o f netw orking 4 are each ﬁnite. Since non-adjacent no des exist for every no de, L is a po sitive num b er grea ter than 1. L plays a cr itical r ole in the emergence o f a netw o rk, as discussed later in these prop ositions and later in this pap e r . 6. The av erage n um ber of energy units to transmit one unit of beneﬁt fro m one no de to another is L : b eca use one unit o f energy tra nsmits one unit of bene ﬁt one s tep in one unit of time, a nd L is the av erage distance in steps betw een pair s of no des. Therefore , a t every po in t in time, L , in addition to being the av erage num b er of steps be t w een no des, is also the av erage nu m ber of energy units p er unit o f time req uir ed by a no de to transmit a bene ﬁt to another no de. L is prop o rtional to the average energy req uired to connect t w o no des . 7. N uses n units of energy pe r unit of time, at each po int of time: b ecause of the w ay energy units are deﬁned at ea ch p oint in time. 8. N ’s energy use is scaled b y L at each p oint o f time. Suppose, to sim- plify calculation, an external energy sour ce contin uously tr ansmits energy to each no de in N at a constant rate, so the ra tio of the ra te of ener gy units transmitted to the b eneﬁt p er step is constant. Supp ose the energy source, the z er o th energy generation, is one step aw ay 3 from N and tra ns - mits energy b eneﬁts to a single path of L steps, thereb y r eaching L ﬁrst generation no des. Suppo se further that ea ch r ecipient ﬁr st genera tion no de retransmits en- ergy be neﬁts along single paths, since every recipient no de ca n a lso trans- mit. By the connectability of no des and the existence of L , each no de on the initial pa th of L s teps can transmit to L nodes using L energy units. L 2 second gener ation no de s receiving tra nsmissions fr om L ﬁrst generatio n receive L 2 energy b eneﬁts. Each path o f L no de s can co nnect to L times as many no des, until the energy b eneﬁts r each all L η = n no des in N , for some η . Now, instead, suppo se the external source has the capacity to transmit ener g y beneﬁts to all single pa ths of L distinct nodes in N (that is, n/L s uch paths), and those ﬁr st generation no des have the ca pacity to transmit in turn as a bove, whic h is p ossible b eca use a ll no des are e qually capable of receiving and transmitting. The num ber of all η th generation no des cannot tota l mor e tha n n distinct no des. In view of L b eing the av- erage distance in steps betw een no des, the η th generation no des cannot b e more than an average of L steps aw a y from N ’s other no des, e ven though η can b e larger than L . No des are also contained in generations preceding those in the η th generation. The o nly wa y this capacity for clustering ca n b e acco mplished is if the n no des in each generation are the same n no des a s those in every η th generation of no des, and if k th generation clusters are nested in ( k − 1 ) st generation clusters. It must b e that every no de tha t is a member of a cluster is a lso contained in a cluster closer to the ener gy source that is L times larger , up to the z ero th energy generation. The L ﬁrst generation Lexical growth, ent ropy and the beneﬁts o f netw orking 5 clusters have size L η − 1 , the L 2 second genera tion clusters hav e size L η − 2 , and so on. The expo nent in the ex po nent ial formula for the capa city to transmit b eneﬁts m ust b e the same a s the base o f the log in the form ula for the capacity to receive b eneﬁts, for N to b e sca led by L . A t every p oint in time, N ’s receipt of n ener gy units p er unit of time is scaled by the av erage num ber o f energy units used to tra verse the av erage nu m ber of steps L b e t w een no des. Since connections are bi-dir ectional, if a node can transmit beneﬁts to η cluster generations , it can also receive bene ﬁts fr o m η cluster gene r ations. As a model o f N ’s capacity fo r scaling, consider 27 nodes in a single line forming a ﬂattene d hierarch y sca le d by 3. Diﬀeren tly scaled clusters are brack eted using diﬀerently shap ed brackets: [(. . . )(. . . )(. . . )][(. . . )(. . . )(. . . )][(. . . )(. . . )(. . . )]. In the mo del’s 27 no des, single no des s c ale up to clus ter s of 3. Clusters of 3 scale up to cluster s of 9. Clusters of 9 sca le up to a cluster of 27 . I infer that diﬀerently sized cluster s hav e diﬀeren t emergent pro cesses. T he 27 no des ha v e three hier archical cluster ge ne r ations, though w e can observe only one row of 27 no des. The English language is structured in a similar wa y . F or example, in its alphab etical r epresentation, the English letters, s , h , and t can combine to form sh and th , tw o-letter clus ter s that sound diﬀerently than their c om- po nent letters. Diﬀerent ly sized letter clusters can join to for m word r o ots, preﬁxes, suﬃxes, and larg er cluster s we call w ords. W o rds, together with spaces, can form noun, verb and ob ject clusters, c a lled phrases. W ord clusters from diﬀerent cluster ge ne r ations can form sentences. But an alphab etically r epresented se n tence just consists of a sing le string of indi- vidual letters and spaces. Similar observ ations apply to a sentence co ded by a string of sounds. So ciety , using g r ammar emerg ently a nd hier archi- cally organiz e s langua ge 4 . If so, the 1950s h ypothesis, still current, that there is a grammar module in the bra in (a ‘language instinct’), is unnec- essary . Gr ammar emerges through the adaptive netw ork eﬃciency of a so ciety using a lexicon. Reductionism applied to a conceptua l problem involv es the a pplication of problem solving (energy ) to a conceptual cluster that is a part of the larger problem. The design of a place ho lding numeration sy stem ena bles the combination of diﬀerent cluster genera tions to describ e num b er s . The num ber 52 8 , for exa mple, co m bines and contains ﬁve second g eneration clusters (100s), t wo ﬁrst g eneration clusters (10s) and 8 z e ro th generation singletons. Each cluster size is a sepa rate concept. So cial netw orks, music 5 , laws 6 , and athletic mov es are also clustered hier- archically . Lexical growth, ent ropy and the beneﬁts o f netw orking 6 9. N is self-simila r, since eﬃciency consider a tions applicable to adjacent no des apply a lso to adjacent clusters. A cluster is par t of a cluster L times la rger. F or a no de or cluster to eﬃciently maximize the bene ﬁts it receives, reception of beneﬁts from a c lus ter s maller than N ma y b e suﬃ- cient. F o r a rec ipie nt no de or c luster to be eﬃcien t, it m ust not ca ll upo n more of the netw ork’s cluster e d energ y reso urces (or energy capacity) than is minimally neces sary for it to obtain the b e neﬁts it needs in particular circumstances, but it has the capacity to call o n a ny of those clusters. F o r N to b e eﬃcient, it must not use more o f its energy res ources to b e neﬁt a no de o r cluster than is neces sary , but it ha s the capacity to transmit all of its r esources to a no de or cluster. Energy clustering ena bles eﬃcient allo- cation of N’s energ y resour c e s. F o r N to b e ev erywhere self-simila r, there can b e no lo cal v ariatio ns in the path length; the ener gy req uirements p er no de m ust be everywher e equa l. 10. The beneﬁt of a netw ork emerges: η m ultiplies the c apacity of a s ingle recipient no de. Since ea ch of the η gener ations in the hierarch y of clusters contains all n no des in N , N has the capacity to communicate η times the capacity of one cluster generatio n to a no de. (When η = 0, no en- ergy units are transmitted b ecause there is no energ y to transmit. The capacity of a cluster o f size L is itself s ome m ultiple o f the capa c it y of an individual no de.) So if the measure of the capacity of a single ge ne r ation of c lusters co n taining all n no des to b eneﬁt a no de is A ( L ), the measure of the capac it y o f a ll η cluster g enerations to beneﬁt a no de is η A ( L ). But the c a pacity of a ll η cluster genera tions is the capa city of the net- work, A ( n ). Hence A ( n ) = A ( L η ) = η A ( L ), and A must b e a log arithmic function with base L . If H L represents the ca pacity o f a netw ork of n no des to beneﬁt a node, H L ( n ) = log L ( n ) = log L ( L η ) = η . The capacity of N to m ultiply the eﬀect of its ene r gy reso urces depends on cluster ing. A no de can b eneﬁt o nly by receiving tra nsmissions from a cluster, and a cluster can increa se its capacity only if no des and clus ter s transmit to it. Without the capac ity for both reception a nd tra nsmission 7 , this could not o ccur, o r at least w ould not necessa rily oc cur in an eﬃcient way , cont rary to the as sumption that no des a re b eneﬁt maximizers. A netw ork can also tempo rally netw ork with r ecorded earlier editions of itse lf, like a p ers on pro of-rea ding their own earlier work. No dal self-interest, combined with co nnectability , leads to the emerg ence of a net w ork that b eneﬁts the netw ork’s no des. T ra nsmission by a so cial netw ork of a so cial be neﬁt to a recipient is a n indirect trans fer o f the netw ork’s lo garithmically compr essed e ne r gy . A lexical net w ork (lang uage) lo g arithmically compresses the transfer of en- ergy (the e nergy used to solve the problem of co mpressing per c eptions int o concepts) b e t w een and among the members o f so ciety who use the language. By receiving information expresse d in words, a recipient can receive and share in b eneﬁts arising from the previous exp enditure of en- ergy by other members of so ciety , past and present, which energ y was Lexical growth, ent ropy and the beneﬁts o f netw orking 7 used in o ne o r more of the three problem solving pr o cesses inv olved in creating language. B e cause of netw orks, a member of so ciety nee d not b e adjacently co nnected to receiv e s uch b eneﬁts from a remotely connected other mem ber of so ciety , past or pres ent . 11. F or an idealized netw ork, L = e . F o r every no de in the ﬁrst generatio n, the num ber of no des it ha s transmitted to increases from genera tion to generation a t the ra te L , until it has r eached all the no des of the η th generation. F or contin uous functions, if a function is its own de r iv ative, so that y ′ = y , then y = f ( x ) = e x , to whic h the behavior of the L -sized clusters is similar. The self-similar it y o f N in all g e ne r ations ther efore implies that L , the bas e o f the log, is the natura l lo garithm, na mely e , ab out 2.71 828. In that case, a netw ork’s b eneﬁt is ln( n ). The optimal path length fo r a one-wa y broadcasting no de is 1 (but such a node would require mor e energy than a v erage to broadca st to the netw ork). If nodes did not all have equal capacity for tra ns mitting and receiving, then N would not necessar ily b e s elf-similar in a ll cluster generations. T o determine the netw ork beneﬁt for a no de in an ide a l netw ork, the a t- tributes ab ove seem suﬃcien t; microstates of the no des a nd clusters are not of int erest b ecause the scaling factor is an a verage. In their semina l 1998 article 8 , W atts and Stroga tz use three parameters , n , L and C , to character ize a kind of real net work they call ‘a small world net w ork’. The ﬁrst parameter, n , is the num ber of no des. L is the path length, the smallest num ber , averaged ov er all pairs o f no des, of steps b etw e en no des. C is the clustering co eﬃcient, the fraction of allow able edges, connecting to a vertex in a graph of the net work, that actually ex ist, averaged ov er all no des. The clustering co eﬃcient can also be deﬁned using the notion o f adja c ency . Suppose we c a lculate, for every no de , the pr op ortion of its adjacent no des that are connected to it. The clustering co eﬃcient, C , is the average of those pro p o rtions for N ’s no des. F or a real net work, the n um ber of steps b etw een nodes a nd the prop ortion of connected adjacent no des are measured for all, or a r epresentativ e sample, of the netw ork’s no des, and the results ar e av eraged to obtain L and C . Long distance connec- tions b etw een clusters result in the ‘sma ll world eﬀect’, sometimes descr ib e d as ‘six degrees o f separation’. F or a rea l net w ork, the clustering co eﬃcie n t is b etw een zero and one, which diﬀers from an ideal net w ork which implicitly assumes C is 1. Thus for a real net work, o nly a pro po rtion C of the b eneﬁt of the netw ork reaches a no de, and for n , L a nd C at a given po int in time, H L ( n ) = C log L ( n ) . (1) In a real net work, no des might b e unequal in capacities, energy r equirements, and the num ber of steps b etw een no de s . An a v erage num ber of steps L exists, how ev er, b ecaus e , whether for top olo gical, physiological or other reasons, when the num ber of no des is lar ge, they ca nnot all bi-directiona lly connect to a ll other no des in one step. In a real netw ork, the fractio n p er step (energy/b eneﬁt) ma y Lexical growth, ent ropy and the beneﬁts o f netw orking 8 diﬀer from 1. F or a netw ork, e is a b enchmark. Suppo se that for a r e al net work, the pe r step fra ction (energy/b eneﬁts) < 1, with n and C unc hanged. Either the b eneﬁts p er av erage step a re higher, or energ y p er av erage step is low er, compared to an ideal netw ork. If the relative beneﬁts p er step increa s es, the relative b eneﬁt of the net w ork increa ses. F or n and C unchanged, the only wa y the netw ork b eneﬁt can increase is if L is smaller than e . An analo gous argument implies that when (energ y/b eneﬁts) > 1, L is gr eater than e . F or example, in so cia l netw orking, the ‘six’ in six deg rees of sepa r ation may reﬂect the gre ater amount of energy require d to connect to remotely lo cated p e o ple, and the sma ller so cial beneﬁts r eceived from remotely lo cated p eople, compare d to tho se c lo ser. Though energy scaling leads to a ﬂattened hierarchy for a n ideal netw ork, it may b e p os sible that a ph ysically observ able e ne r gy hierar ch y indirectly mani- fests itself in r e al netw orks of cells in o rganisms, buildings in a cit y , or star s in a galaxy . Equation (1) has a form simila r to that for entrop y used in infor mation theory , and so may be called, b y analo gy , the entropy of a netw ork. In 1 948, C. E. Shannon der ived an equation fo r the entropy of a set o f probabilities 9 , H r ( S ) = K n X i =1 p i log r p i , (2) to analyze str ings of s ymbols. He called H (the Gr eek letter eta ) in E quation (2) entropy beca use it has the sa me for m a s that used for entropy in s ta tistical mechanics. The r is an arbitra ry base of the lo g, S is the sym bo l source, K is an arbitr ary po sitive constant, and p i is the pro bability of the i th symbol. In Shannon’s deriv ation, pr obability and informa tion a re related. If the pro b- ability of an even t occurr ing or no t o ccurr ing is 1 00%, no new informatio n is acquired after its o cc ur rence. O nly resolution of uncertaint y adds information. In Equatio n (2), the base of the log is usually 2 b ecause Equation (2) is mostly used in co nnection with dig ital communication. K is usua lly set to 1. Like Equation (2), the formula for an idea l netw ork’s entrop y can b e de- rived using probability . Equality of no dal capacities implies that the av erage probability that a no de in N is a n infor mation so urce is 1 /n . When p i = 1 /n , Equation (2) reduces to K lo g r ( n ), with the base of the log L and the c onstant K the clustering co eﬃcient, for the rea sons stated ab ov e. W e ig ht ed pro babilities and ener gy s c a ling b oth lead to the same formula for net w ork entropy . Each deriv a tion likely implies the other: weigh ted probability pa ths imply scaling when p i = 1 / n , and s caling implies weigh ted pro bability paths. Ea ch describ es a diﬀerent asp ect o f entropy . An idea l netw ork has max imal uncertaint y (or equality) p i = 1 /n for a ll no dal sour ces. The resulting equality of no dal ca- pacities leads to energ y s caling, maximally eﬃcien t and maxima lly uncer tain or equal. In information theory , the joint en trop y o f a joint ev en t is less than o r equal to the sum of the comp onent entropies. In information theo ry , entropy is maximal 10 for a netw ork o f n no des when p i = 1 / n . E quiv alently , netw ork en trop y is maximal if w e suppose the energy Lexical growth, ent ropy and the beneﬁts o f netw orking 9 requirements of N ’s no des ar e equal, or if we scale N ’s energy by L . Wh y L scales N ’s energy gives some insight int o the op e r ation o f a netw ork. Supp ose a given signal can b e pro pagated from a prop er subset o f N co nsisting of n/ ( L η ) no des. This is eﬃcient for N , b ecaus e N do es not hav e to use all its no des’ energy 11 any time a sig nal is to b e se nt to all or par t of N . If the sp eed of the signal is less tha n L steps per L time units the signal can not reach the whole of the netw ork within L time units; the signaling no des in the subset are using less than the average amount of energy p er no de, and the en tropy of N is therefor e les s than optimal. On the o ther hand, if the sp eed of the s ig nal is gr eater than L steps pe r L time units, the signaling no des in the subset a re using mor e than the average amount o f energy p er node, a nd the entrop y of N will als o b e less than optimal b e cause N ’s other no des will hav e less than the a v erage ca pacity to transmit. T o optimize net w ork entropy a conserv a tive approach is to str ucture N so that N ’s no des hav e equa l ca pacity to acc e ss N ’s energy , becaus e p otentially each no de has an equa l capac ity to b eneﬁt N . The distribution of equal capacity may o ccur in some netw orks naturally due to the randomness of ener gy distribution. While no dal self-interest would result in a no de tending to accumulate as m uch energy to itself as po ssible, netw orking leads to the emerg e nce of a net- work b eneﬁt, which beneﬁts no des individually and collec tively , a nd therefore restrains the a ccumulation of energy b y individua l nodes. L equaling e r econ- ciles self-interest and the beneﬁt of netw orking. Since a netw ork is self-similar, the conﬂict b etw e e n no dal self-interest (lea ding to unr estrained accumulation of energ y) and ne tw ork beneﬁt (leading to equal distribution o f ener gy) would arise in c lus ter g enerations as w ell. An ideal netw ork ma x imizes eﬃciency as a consequence of its assumed at- tributes. A real net work ma ximizes its ener gy eﬃciency by its c o nt inual adap- tation to its environmen t. Since b oth the ideal and r eal netw orks are maximally eﬃcient, the idea l b y assumption, and the actual b y a da ptation, an ideal net- work may b e a reasonable model of a real netw ork with similar attributes. If the assumptions of an ideal ne tw ork apply to economic actors, a commu- nication system, b o dies that ar e mutually gravitationally attractive, or a gro up of molecules , the net w ork will be maximally e ﬃcie nt when the capacities and energy of the net w ork a re eq ually distributed a mong its no des. This infer ence omits co ns ideration of the impact that the netw ork may hav e on its environmen t (externalities), and the eﬀect of changes in the environment on N . Shannon also observed 12 that, for sym bo ls, H ′ = mH . (3) This applies, analog ously , to net works. If H ′ is the rate of a net w ork pro c e ss, and H is the net w ork’s en tropy fo r that pro cess, then m is the pro cess rate when η = 1, that is, when hiera r chical str ucture and net w orking b egan. If the pro cess gr ows exp onentially (which s caling suggests ca n o ccur), we can calculate the average rate a t which the num ber o f no des grows, if their n um ber a t the beg inning of the pr o cess (time t 1 ) and at its end (time t 2 ) are known, by s olving Lexical growth, ent ropy and the beneﬁts o f netw orking 10 for m in n ( t 2 ) = n ( t 1 ) e mt , wher e t = t 2 − t 1 . If the entrop y H o f a sys tem S at t 2 and of its ances tral system at t 1 are bo th known, a nd t = t 2 − t 1 , solving m in H ( S ( t 2 )) = H ( S ( t 1 )) e mt (4) may give an estimated av erage r ate of g rowth for the en tropy itself. The pr o- ductivit y r ate of so ciety when η was 1 mea sures so ciet y’s capac it y to use ener gy befo re that ca pacity w as mu ltiplied by clustering in a s caled wa y (i.e. b y en- tropy). With that, knowing the rate of change p ermits one to date a beginning of a pro cess, b ecause the ending a nd s tarting rates o f e ne r gy utilization, a nd the degree o f energy cluster ing, are all indir e c tly known. W e can use the av erage rate o f growth in the num ber of no des or in the size of e ntropy to estimate when a netw o rk’s en tropy growth b egan: that is, when η was 1. Supp ose entropy and the av erage rate of gr owth in the n um ber of no des a t a pro cess ’s beginning t 0 and their num ber at the pro cess’s end t 2 are all known. Then we es timate the duration of the pr o cess by solving for t in e mηt = n ( t 2 ), with t = t 2 − t 0 . Simila rly , the av erage rate of gr owth in ent ropy can b e used to es timate when η was 1. F or ex ample, the ﬁnding of the age of mito chondrial Eve using DNA ma y be ﬁnding the age o f the cluster generation for η = 1 for div erging mito chondrial DNA; thus E ve would be a representative individua l from that cluster g e neration, not necessarily a single per son as appea rs to b e sometimes inferred. Entrop y dating is acc urate o nly if the calcula ted av erage rate prev ailed for the entire p er io d preceding the earlier of the t w o dates used for ca lculation. F or example, if neuronal physiology since languag e b eg an ha s no t c hanged, then neuronal ener gy us e p er step has not c hanged, and m for lexical growth may hav e b een unchanged during lang uage’s dev elopment. O n the other hand, ov er millions of years neuronal physiology and the rate of energy supplied by the environmen t may have v ar ied, a nd using m for a long per io d pr e ceding the time for whic h its average v alue was determined may yield uncertain results. The following observ a tion ab out conceptual net w orks applies to the lexical growth example b elow. Each p erson in a so ciety p ossess e s netw orks of ideas; living individuals netw ork with inherited idea s. Supp ose that, on average, each per son po ssesses the capacity to a ccess the same concepts. T o calc ulate the ent ropy of concepts promulgated by the s o ciety for a given era , multiply the ent ropy of that so ciety times the entropy of the concepts that a re held in co m- mon. The net work of idea s common to ea ch average member of a so ciety is like an infrastr ucture (in a mathematical der iv ation, a constant). Infra s tructures include realized idea s suc h as roads, buildings , a nd technologies. T o apply E quation (1) to a r eal netw ork, the r eal netw ork’s attributes m ust be similar to those of an idea l netw ork. Then only n , L and C , whic h provide statistics ab out the macrostate of the real netw ork, are needed. Even though no des p ermute among cluster s for some real netw orks, the av eraging used to calculate L and C for a re a l netw ork in eﬀect assigns to cluster s distinct no des of eq ual average capacities. Researchers’ calculations have enabled them to estimate the pa th length Lexical growth, ent ropy and the beneﬁts o f netw orking 11 for rea l net w orks, such a s, for ex a mple, a human brain (2.4 9) 13 , the nervous system of the worm C. ele g ans (2.65) 14 , and the English lexicon (2 .6 7) 15 . F o r these e x amples, L is clo se to e , 2.7 1828 . Perhaps in these e x amples the conﬂict betw een no dal self-interest and the b eneﬁt o f netw orking has been eﬃciently reconciled. W e now estimate the eﬀect of adding nodes to a net work. Let H ′ 1 be the rate for a net w ork pro cess for a netw ork of n 1 no des. Let H ′ 2 be the rate for a larger num ber of no des n 2 = ( n 1 + A ). Assume L , C and m do no t change a s the net w ork g rows. Then the increase in H ′ 1 due to A additiona l no des is H ′ 2 − H ′ 1 = mC log L ( n 2 ) − mC log L ( n 1 ) = mC log L ( n 2 /n 1 ) = mC log L (1 + A/n 1 ) . (5) If A = 1, Equation (5) represents the diﬀerence that the pr esence or absence of an individual makes to a gr oup. If n 1 is small, likely C is closer to 1 a nd L sma ller than for a lar ge gr o up, and an individual ma kes a larg er diﬀerence to the entropy o f the group. A related issue aris es in the early 19 80s prop osed estimate, dubb ed Metcalfe’s law, that the proﬁtability of a co mmer cial com- m unication netw ork grows with the square of its size. Equation (5 ) ma y apply instead 16 . Since the entropy of a lar ge netw ork changes s lowly with n , m uch of the commercia l beneﬁt of adding customers to a large netw ork likely results from eco nomies of scale. F or merging related ex isting netw orks, the joint en- tropy is less than the s um of the comp onent entropies if the pro cesses of the tw o are not indep endent, as may b e the case, for example, for ﬁxed line and cellular telephone net w orks. As a n exa mple of en tropy dating, s uppo se that h umans’ lineal anc e stor had one third as many neurons 3 million years ago. Then H ( ear l y brain ) would be 14.077 , compared to H ( modern brai n ) = 14 . 71 17 . The a v erage growth rate in neuronal entropy over 3 million y ears would be .0147 8. . . per million years. At that rate, it would tak e 995 million y ears for neuronal en trop y to e volv e from 1 to 14.71, or from the ﬁrst connected neurons to 10 11 neurons. This manner of estimation req uires that the energy requir ement s, the energy supply , and the capacity of neurons were on av erage the same ov er the whole per io d of their developmen t, probably unlikely given the num b er of years inv olv ed, though if net work ed neuro ns optimized their L a nd C early in their developmen t, the v alues of L and C may have c hanged only slig ht ly over those y ears. The estimated 1989 entropy of 350 million English spea kers (a s o cial net- work) 18 is 12 and o f a n English lexicon (a conceptua l netw ork) 19 of 616,0 0 0 words 20 , 5.93 . The entropy of Eng lish lexical g rowth is the pro duct of the t w o ent ropies. W e now wish to estima te the a verage basal r ate lexical gr owth, the rate of lex ic a l g rowth without the multiplier eﬀect. The es timated 1 6 57 entrop y 21 of 5 ,281,34 7 Eng lish sp eakers 22 is 9.44 5. The ent ropy of the 1 657 E nglish lexicon of 200 ,000 w ords is 5.431 . The pr o duct of the av erage p opula tion entrop y of 1 657 and 1989 times the average lexicon Lexical growth, ent ropy and the beneﬁts o f netw orking 12 ent ropy for 1657 and 1989 is 10 . 72 × 5 . 6 8 = 60 . 94. This is the average v alue of the m ultiplier for the per io d from 1 657 to 1989. Using this multiplier, the basal lexical growth rate from 1 657 to 19 89 is a bo ut 5 .6 % p er tho us and years. A indep endent mea ns of chec king the 5.6% per tho usand year ra te in volv es glotto chronology 23 . Glo tto chronology uses the r ate a t which tw o related lan- guages diverge to date their common a ncestral lang ua ge. In the 196 0s, Mo rris Swadesh deter mined that a fter 1 ,000 years, t wo related Indo -Europ ean lan- guages shared on av erage 86 % of the words o n a Basic List he c o mpiled (i.e. a 14% divergence after thousand years) 24 . The divergence betw een two related languages after a thousand years, if now adjusted b y recent w ork b y Gray a nd A tkinson 25 , is ab out 1 1.32% p er thousand years 26 . If each of the tw o daughter languages diverges from the mother language at the same av erage r ate, then the av erage rate of divergence p er daugh ter language is one half o f 11.32 % p er thousand years, w hich is 5.66 % per thousand years, very close to the 5.6% per thousand years found using the entropies of the English sp eaking p opulation and E nglish lexicon. If we assume that the English lexica l growth r ate is r epresentativ e o f lexical growth rates and that h uman lex ic a l g rowth is a stable capacity , we can use the 5.66% p e r thousand years basal lexical g rowth rate to estimate when langua ge beg an. W e as s ume that ancestral so cieties, consisting of 50 individuals 27 using 100 diﬀerent call signals immediately pr eceded languag e’s beginning, and had mo dern v alues for L and C for their s o ciet y . It would take ab out 154,000 years for the le x icon to grow from 100 words to the 616,5 00 words of the OED in 1989 at the r ate o f 5.66% p er tho us and years. In a ddition to the three pr oblems co nfronted in growing a lang uage is a fourth problem: c hoos ing, from the menu o f concepts and opp ortunities tha t a so ciety has stored up in all cluster g enerations of its la ng uage, culture, and economy , which ones b est apply to the immediate circumsta nces. What we regar d as individua l in telligence may cons ist to a large extent of le a rning the conceptual menu crea ted by so cieties ov er thousands of years, as seems to b e suggested b y the m ultiplicativ e eﬀect of netw ork entropy . Some concepts a nd theorems in information theo ry ma y b e adaptable to the ent ropy of a network. Being a ble to calculate entrop y may a ssist in the analysis of eco no mic 28 , biological, communication, conceptual, and so cial netw orks. If the entropy of a netw ork has these uses, then statistical informa tio n ab out real net works of in terest will b e helpful. Notes 1 The Early Mo dern English Dictionaries Database (EMEDD) at the Universit y of T oronto , w w w .chass. u toronto.ca/eng l ish/emed/ # dic at October 15, 1999 had ab out 200,000 w ord- en tries at 1657. The Oxf or d English Dictionary (OED) at 1989 had ab out 616,000 w ord- en tries. W ord-coun ts v ary among dictionaries. I assume that lexical criteria are similar for these sources. Because of the recency of historical dictionary pro j ects, lexical growth as a metric of language app ears not to hav e b een pr eviously considered. 2 J. R. Flynn, Journal of Educational Measurement, 2 1 (3), (1984), 283; Psychological Bulletin, 101 (2) 171 (1987); What is Intel ligenc e? Cambridge Unive rsity Pr ess, 2007. Lexical growth, ent ropy and the beneﬁts o f netw orking 13 3 So no node has a preferred rol e as a transmitter. Al l n nodes ar e p oten tial recipien ts. 4 V. F romkin and R. Rodman, An In troduction to Language (6th ed.) (Harcourt Brace, New Y or k, 1998), p. 77, 111; A. Radford, M. Atkinson, D. Britain, H. Clahsan, A. Sp encer, Linguistics - A n Introduction (Cam bridge University Press, Cam bridge U. K ., 1999), p. 88. 5 D. J. Levitin, This is Y ou r Brain on M usic - The Science of a Human Obsessi on (Dutton, New Y ork, 2006). 6 H. Kelsen, H ans, Pure Theory of Law (The Lawbo ok Exchange, Clark, New Jersey , 2005). 7 Zipf discusses the eﬃciency conﬂict in language b etw een sp eakers and hearers. G. K. Zipf, Human Behavior and the Principle of Least Eﬀort (Hafner Publi shing Company , New Y ork, 1949, 1972 reprint). 8 D. J. W atts and S. H. Strogatz, Nature (London), 393 , 440 (1998). 9 C. E. Shannon and W. W eav er, The Mathematical Theory of Communication. (Universit y of Illinois, C hi cago, 1949). 10 C. E. Shannon, p. 51; A. Y a Khi nc hin, Mathematical F oundations of Information Theory (Do v er, New Y or k, 1957), p. 41. 11 On a net w ork’s eﬃciency: V. Latora,, and M. Marc hiori, Phys. Rev. Let t. 87, 198701-2 (2001); R. F err er i Cancho, and R. V. Sol ´ e, PNAS 100: 788 (2003). 12 C. E. Shannon , p. 53. 13 S. R. Ac hard, R. Salv ador, B. Whitche r, J. Suckling, and E. Bullmore, The J. of Neuro- science 26(1), 63 (2006). They found C = . 53. 14 D. J. W atts and S. H. Strogatz. 15 R. F err er i Cancho and R. V. Sol´ e, Pr o. R. So c. B, 26 8 , 2261 (2001). L = 2 . 67, C = . 437 based on 3/4 of the mill ion diﬀerent words of the Bri tish National Corpus (about 70 million wo rds). A study of the English lexicon based on w ords in an online thesaurus, likely less represen tativ e of English usage is: A. Motter and A. de Moura, Y. Lai, and P . Dasgupta, Ph ys. Rev. E. 65 065102(R) (2002). They obtain L = 3 . 16, C = . 53, which would give η = 6 . 14. 16 A. Odlyzko and B. Tilly of the Unive rsity o f Minnesota, http://www.dtc .umn.edu/ ∼ odlyzko /doc/metcalfe.p df (2005) i n A refutation of M etcalfe’s Law and a b etter estimate for the v alue of netw orks and netw ork interconne ctions; B. Brisco e, A. Odl yzk o, and B. Tilly , IEEE Sp ec- trum, July 2006, 26. They estimate that the v alue of a communication netw or k of size n grows like n log( n ). 17 Using L and C f rom S. R. A c hard, R. Salv ador, B. Whitch er, J. Suc kling, and E. Bull more for η ( modern ), and for η ( earl ier br ain ), and assuming the earlier brain had one third the neurons, where n i s the num ber of neurons. Assuming n = 10 11 neurons, from J. G. Nicho lls, A. R. M artin, B. G. W allace, and P . A. F uchs, F rom Neuron to Br ain (4th ed.) (Sinauer, Sunderland, Mass. , 2001), p. 480. 18 Using L = 3 . 65, C = . 79 for 225,226 actors from W atts and Strogatz. 19 Using L and C for English from F err er i Canc ho and Sol´ e (2001). 20 Oxford English Dictionary (OED) at 1989. 21 Again using L = 3 . 65, C = . 79 for 225,226 actors fr om W atts and Strogatz. 22 E.A. W rigley , R. Sc hoﬁeld & R. D. Lee. The population history of England, 1541-1871: a reconstruction Cambridge University Press, 1989, T able 7. 8, following p. 207, for the ye ar 1656. 23 What is Glotto c hronology , p 271, in M. Sw adesh The Origin and Diversiﬁcation of Lan- guage. (Aldine-A therton, Chicago, 1971). 24 M. Swadesh, p. 276. 25 R. D. Gray and Q. D. At kinson, Nature (London) 42 6 , 435 (2003), estimate Indo- European at 8,700 years ago. Swadesh, 37 y ears b efore Gra y and Atkinson, estimated Indo- European beginning at least 7,000 years ago (p. 84). I assume Gr a y and A tkinson’s estimate is an improv ement on Swadesh’s, and so multiply 14% by 7037/8700 to obtain 11.32%. 26 R. D. Gray and Q. D. At kinson, Nature (London) 42 6 , 435 (2003), estimate Indo- European at 8,700 years ago. Swadesh, 37 y ears b efore Gra y and Atkinson, estimated Indo- European beginning at least 7,000 years ago (p. 84). I assume Gr a y and A tkinson’s estimate is an improv ement on Swadesh’s, and so multiply 14% by 7037/8700 to obtain 11.32%. 27 R. Dun bar, Gr o oming, Gossip and Language. (Harv ard Universit y Press, Cam bridge, Massac h usetts, 1997) , p. 120 - 123. Lexical growth, ent ropy and the beneﬁts o f netw orking 14 28 If one dollar is a clai m on (or a proxy for) one unit of energy , then to maximize the entrop y of the economy , the members of so ciet y should maximize the eﬃciency of each dollar used to acquire beneﬁts from soci ety . This requires the economy to permit net w ork adaptation (and therefore, nodal and cluster adapta bility) that maintains, for equilibr ium stat es, the equal ratio of one dollar to a unit of b eneﬁt.

Lexical growth, entropy and the benefits of networking

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment