You Share, I Share: Network Effects and Economic Incentives in P2P File-Sharing Systems

Y ou Share, I Share: Net w ork Eﬀects and Economic Incen tiv es in P2P File-Sharing Systems Mah y ar Salek ∗ Shahin Sha y a ndeh † Da vid Ke mp e ‡ Abstract W e study th e interaction b etw een netw ork eﬀects and external incentiv es on ﬁle sharing b ehavior in P eer-to-Pee r (P2P) netw orks. Man y curren t or envisio ned P2P net works reward ind ividuals fo r sharing ﬁles, via ﬁ nancial incen tives or so cial recognition. P eers w eigh this rew ard against the cost of sharing incurred when others do wnload the shared ﬁle. As a result, if other nearby no des share ﬁ les as w ell, the cost to an individu al no de decreases. Such p ositive n et w ork sharing eﬀects can be exp ected to increase the rate of peers who share ﬁles. In this pap er, w e formula te a natural mod el for the n etw ork eﬀects of sharing b ehavior, which w e term the “demand model.” W e pro ve that the mo del h as desirable diminishing retu rns prop erties, meaning that the net work beneﬁ t of increasing paymen ts decreases when the pa y ments are already high. This result holds q uite generally , for submodu lar ob jective funct ions on the part of the netw ork operator. In fact, w e show a stronger result: the demand mo del leads to a “co vera ge process,” meaning t hat there is a distribution ov er graphs such that reachabili ty under th is distribution exactly captures t h e join t distribution of nodes which end up sh aring. The existence of suc h distributions has advan tages in sim u lating and estimating the p erformance of the system. W e establish this result via a general theorem chara cterizing which t yp es of mo dels lead t o cover age pro cesses, and also show that all co verage pro cesses p ossess the desirable sub modu lar prop erties. W e complement our theoretical results with experiments on several real-w orld P2P topologies. W e compare our model quantitati vely against more na ¨ ıve mo dels ignoring netw ork eﬀects. A main outcome of th e exp eriments is that a goo d incentiv e scheme should make the rew ard depen dent on a no d e’s degree in the netw ork. 1 In tro duction Peer-to-Peer (P2P) ﬁle shar ing systems ha ve b ecome an importa n t platform for the dissemina tion of ﬁles, m usic, and other conten t. The basic idea is very simple: individuals mak e ﬁle s a v aila ble for do wnlo a d from their own ma chine. Other users can s earch for ﬁle s they desire and download them from a p eer who has made the ﬁle av ailable. Natura lly , desig ning systems s uch that the sea rch and download o f ﬁles ar e eﬃcie nt po ses many resea rch challenges, which hav e received a lot of attention in the liter ature [2, 22]. A sec ond, and somewha t o r thogonal, iss ue is how to ensure s uﬃcient participation a nd s haring of ﬁles. Unless enough co nt ent is provided by individuals , the utility of membership will b e very small. If free- riding [9] is to o prev alent, the system ma y exhibit a quick decrease in mem b er ship co mmon to public-go o ds t yp e e conomic settings [23]. Thu s, the P2P system m ust b e designed with incentiv es in mind to encoura ge ﬁle sharing. The s e incen- tives can take the form of mo neta ry pa yment s or r edeemable “ po in ts” [11], download privileges, or simply recognition. F rom the system de s igner’s p ers pec tive, these paymen ts should be “ small,” w hile e nsuring enough participation. ∗ Departmen t of Computer Science, University of Southern California, CA 90089-0781 , USA. E-mail: salek@usc.edu † Microsoft Cor poration, One Mi crosoft W a y , Redmond, W A, 98052-6399, USA. E-m ail: shahins@m icrosoft.com. W ork done while the author w as at the Universit y of Southern Calif ornia. ‡ Departmen t of Computer Science, U niv ersity of Southern California, CA 90089-0781, USA. E-mail: c lk emp e @ usc.edu. W ork supported in part by NSF CAREER Award 0545855 and an O NR Y oung Inv estigator Award. 1 On the o ther hand, from a p eer’s p ersp ective, the paymen ts need to b e weighed ag ainst the c ost incurred by sharing a ﬁle . In this pa per , we assume that the con ten t is shar ed legally and the system is des igned with secur it y in mind: hence, the ma in cost to an individual is the upload bandwidth whic h will be used whenever a nother peer downloads a ﬁle fro m this node. No des will in general choose to download from nea rby peers (in terms of ba ndwidth o r latency). Therefor e, as a dditional nearby p eers share the same ﬁles , the load will get distributed among more no des, and the cost to each individual no de will decrea se. Th us, not only will we e x pec t cascading eﬀects of sharing based on so cial dynamics [12], but we w ould also expe c t these cascading eﬀects to b e based on a netw or k structure determined by p oint-to-po in t latencies and bandwidths. Our cont ribution in this pap er is the deﬁnition and analysis (bo th theoretical and exp erimental) of a natural mo de l for p eers’ sharing b ehavior in P2 P systems, in the presence of netw o rk eﬀects and economic incent ives. In our mo del, we fo cus only on shar ing one ﬁle; in pr actice, the mo del can be applied separately for each ﬁle o f in terest. The basic premise of the mo del is that each node has a certa in demand for the ﬁle. F urthermore, the net w ork determines which p e r centage of the dema nd will b e met b y downloading from ea ch pee r s haring the ﬁle 1 . The crucial implication of this mo del is that the more nearby p eers are shar ing a ﬁle, the more evenly the demand will be dis tr ibuted among them. The upload bandwidth co st is comp ensated by a p ayment to the p eer s who make the ﬁle av ailable. Again, our model is agnostic about whether these pa yments ar e mone ta ry , r ecognition, or tak e other forms. In our mo del, the paymen ts can be explicitly based on the netw ork degree o f peers, since high-degree nodes presumably serve a key r ole in propagating sharing behavior. W e ar gue that this mo de l captures the essential dynamics of P2P systems in whic h a p eer can join the net work and download ﬁles without sharing; hence, a v a ilability of ﬁles is not the only incentiv e for sharing. The F astT rack P 2P proto col, used by KaZaA, Grokster, a nd iMesh, is an exa mple where this assumption holds; hence, our model s hould be a reaso nable appro ximation for these ser v ices in ter ms of its incen tives. The netw o rk o pe r ator is in ter ested in max imizing a s o cial welfare function W , which grows monotonically as a function of the set o f nodes that share the ﬁle. This function could b e the total num b er of sharing no des, the num b er of no des with at lea st one uploa ding neighbor, or the total download bandwidth av ailable to peers under v ar ious natural models of downloading. After deﬁning this model formally (in Section 2), we prove strong and general diminishing r etu rns prop- erties ab out it (in Section 3). In particular, we show that whenever W is monotone a nd submo dular , the net work’s social welfare as a function o f the paymen ts oﬀered to the peers is monotone i.e., increasing pay- men ts will alw ays incr ease so cial welfare. How ever the r ate of increase decr eases when payment s are a lr eady high. W e call the latter proper ty diminishing returns . T o prov e this result, we consider a slightly diﬀeren t mo del, wherein pa yments are combined with giving the netw or k op era tor the ability to “ force” some set S of pe ers to s ha re. By ﬁrst pr oving cer tain lo cal submo dularity prop erties for this mo diﬁed mo del, the desir ed diminishing returns prop erties a re implied by the genera l result of Mossel a nd Ro ch [18]. How e ver, we derive a similar result to [18] for a broa d sub class of submo dular functions which we call coverage functions. It consists of the functions for which in the under lying pro cess, the distribution o f no des shar ing the ﬁle is equiv ale n t to the distribution of nodes r eachable from S in a n appropr iately deﬁned rando m graph mo del. W e establish this equiv a lence via a general and non- trivial theor em characterizing all functions that can b e obtained by countin g rea chable no des under ra ndom graph mo dels. As a coro llary , our appr oach provides a m uc h simpler pro of o f the main result from [18] for cov era ge pro cesses. Moreov er, the fact that the pr opagatio n of sharing b ehavior is a cov erage pro cess is useful for the purp ose of simulating the proces s and es timating the parameters of the system, allowing more eﬃcient algo rithms for sim ula tions. Finally , o ur c hara cterization ca n b e of indep endent interest in the study of submo dular set functions. While the bulk of our pap er fo cuses on a theo retical analy s is of the demand mo del, w e co mplement the theoretical res ults b y an exp erimental ev aluatio n of our mo del (in Section 4), using tw o netw or k topolo gies derived from real-world data s ets [1 3, 20, 21], a nd a regular tw o- dimensional g r id top o lo gy . W e ﬁrst sho w 1 In practice, we could exp ect these p ercenta ges to corr elate s trongly with net work latency or av ail able bandwidth, but our model is agnostic about the deriv ation. 2 that net work eﬀects a re signiﬁcan t b y compar ing our demand model with one in which peers a re not a ware of c hang es in load due to nearby sharing p eers. W e then ev alua te diﬀerent paymen t sc hemes , in particular regar ding their dependence on nodes’ degrees. W e ev a luate thes e b oth in terms of the fra ction of p eers that end up sharing, and the a mount paid by the netw o rk ope rator per shar ing node. 1.1 Related W ork There is a large b o dy of work on incen tive mec hanis ms in P2P ﬁle-sharing systems. (See [8] for a thoroug h ov erv ie w a nd [27] for a recent generalized a nalysis fr amework.) I nc e ntive mec hanisms can b e class iﬁed in three categories: barter-ba sed mechanisms, reputation-based mechanisms, and currency -based mec hanis ms . Barter-b ase d metho ds [1] e nfo r ce rep eated transactio ns a mong p eers by matc hing ea ch p eer to o nly a small subset of the netw o rk, hence raising the surviv al c hance for strategies ba sed on recipro catio n. This metho d only works when we hav e a small and popular set of ﬁles. F or instance, the BitT orrent protoco l [6] is a popula r P2P ﬁle-sharing proto col using this metho d. R eputation-b ase d me chanisms hav e an exc ellent track r e c ord at facilitating co o pe r ation in very diverse settings, from evolutionary bio logy to marketplaces like eBay . These sy s tems keep a tally of the co ntribution of each p eer; the past co nt ributions de ter mine which peer s obtain more of the system’s resour c e s in the future. How ever, the av a ilability of cheap pseudonyms in P2P systems ma kes reputation systems vulnerable to Sybil and whitewashing attacks [9], leading to ong o ing w o rk on designing s ybilpro of reputation mechanisms [5]. Moreov er, reputation systems may b e vulnerable to co ordina ted gaming strategies due to distr ibuted rating systems [24]. Inspired by markets, a P 2P system can also deploy a curr e ncy scheme to facilitate res ource contributions by rational p eers. Generally , peer s ea rn curr ency by contributing resour ces to the system, and sp end the currency to obta in re s ources fr om the system. K a rma [2 5] is one example of this k ind. Curr ency-b ase d systems may also suﬀer from Sybil and whitew ashing attacks, dep ending on their p olicies tow ard newcomers. If newcomers are endow ed with a p os itive balance, then the sys tem is vulnerable to these a ttacks; otherwise , there mig h t not b e enoug h incentiv e for newcomers to join the netw or k. Balance control co uld also b e troublesome, as the system migh t need to deal with neg ative balances. Lai et a l. [16] introduce d the co ncept of “pr iv a te” history vs. “shar ed” history as a wa y to combine ba rter- based and reputation-ba s ed mechanisms in the context of an evolutionary priso ner ’s dilemma . Shared histor y is a p o ol that records p eers’ pas t b ehavior and servic es them according to their r eputation. In [9], ﬁle sharing is mo deled as a so cial phenomenon, akin to those discussed by Sc helling [2 3]. Users consider whether or not to contribute ﬁles ba sed on the n umber o f other users who con tr ibute. Our mo del is diﬀerent in that it explicitly mo dels the c o sts incurred by contributing no des, rather than simply p ositing an int rinsic generosity parameter for each user. 2 Mo d els and Preliminaries W e c onsider a peer-to- peer net work with n servers (or no des or p e ers ), and focus on the behavior of sharing one par ticula r ﬁle. Thus, each p eer v may either choo se to s hare the ﬁle o r to not sha re it. W e also call sharing peer s active , and the other ones inactiv e . The set of all peer s who share is de no ted b y V + . 2.1 The Demand Mo del Each pe er has a lo cal demand d v for the ﬁle: this demand will origina te from individual users on the s erver v (who themselv e s might no t posses s the ﬁle or be in a positio n to make it av aila ble). The demand d v should be served by downloading the ﬁle from other se rvers u ∈ V + . The quality of the connection b etw een v and u is captured b y a matrix P : the la rger p v, u , the la rger a fraction o f v ’s demand will b e s erved by u (assuming that u shares the ﬁle). Sp eciﬁcally , the demand that u ∈ V + will see from v is d v · p v,u P w ∈ V + p v,w . The matrix P will in practice depend on netw o rk latencies or ba ndwidth, a s well a s explicit download agreements. It need not be symmetric. F or the pur p o s e of the general mo del, we are agno s tic to the deriv ation of P ; in Section 3 4, we will derive P from measured net work la tencies b y positing a latency thr eshold whic h individuals are willing to tolerate. A node u ∈ V + sharing the ﬁle will incur a c ost of c u per unit of demand tha t it ser ves; this cost is the result of us ing upload bandwidth, machine pro ces sing time, or similar res o urces. T o encourag e p eers to share the ﬁle despite this cost, the P2P net work administrato r oﬀers payments π u to the no des u ∈ V + . These paymen ts need not be the same for all node s , and can be derived from the netw o rk structure, e.g., a no de’s degree. Diﬀerent no des may hav e diﬀerent (and unknown) tradeoﬀs b etw een money and upload bandwidth. W e mo del this fact b y assuming that each no de u has a tradeoﬀ factor λ u , drawn indep endently and uniformly at rando m from [0 , 1], which captures how many units of bandwidth o ne unit o f money is worth to the node. Thu s, the sharing utility of an active no de u ∈ V + is U ( u ) = λ u π u − c u X v d v p v, u P w ∈ V + p v, w , while the sharing utilit y of no n-sharing no des is 0. (A non- s haring no de doe s no t get paid and incurs no upload co sts.) W e assume that a gents are rational, and thu s choo se whether to share or no t to s ha re so as to maximize their o w n utility . 2.2 Other Mo dels As we discuss ed in Section 1, one of o ur main contributions is the observ a tion that ﬁle sharing b ehavior should b e sub ject to p ositive netw ork externalities, i.e., that the presence of other sharing pee rs ma kes sharing less costly . T o quan tify the size of suc h netw ork eﬀects, we deﬁne t wo alterna tive mo dels with no or limited eﬀects; w e will compare these t wo mo dels experimentally with the demand mo del in Section 4. 1. In the No-Network Mo del, the peers completely ignore other sha ring p eer s. Thus, a no de u assumes that if it shares the ﬁle, then it will s ee a fractio n p v, u of the demand orig inating with no de u . Hence, the per ceived utilit y of no de u when sharing is U ( u ) = λ u π u − c u · X v d v p v, u . 2. In the O ne-Hop Model, the p eers are aw are of net work eﬀects in a very limited wa y: no de u assumes that any no de v sharing the ﬁle will contribute tow a rd serv ing b oth v ’s and u ’s demand, but no t toward serving the demand of any other no de w 6 = u, v . Thus, the p erceived utility of no de u ∈ V + is in the One-Hop Mo del is U ( u ) = λ u π u − c u · d u p u,u P w ∈ V + p u,w − c u · X v 6 = u d v p v, u P w ∈ V + ∩{ u,v } p v, w . 2.3 P aymen t Sc hemes, Sharing Pr o cess, and A dministrator’s Ob jectiv e The netw o rk administr ator’s choice is how to set the payment oﬀers π u . In do ing so, the administra tor balances tw o compe ting goa ls: low o verall paymen ts and high utility for the participants in the s ystem. In this pap er, w e study the impact of paymen t schemes on these ob jectives. In order to provide enough incen tives for sharing, the netw ork a dministrator sho uld alwa ys ensur e that π u ≥ C u := c u · P v d v . Otherwise , even a no de u with λ u = 1 (i.e., the highest p oss ible utility for money) would hav e no incentive to shar e the ﬁle if no other pe e rs are sharing the ﬁle. The full mo del is thus as follows: after the administra to r decides on the payments π u for a ll no des u , the ra ndom tradeoﬀs λ u betw een mo ne y and ba ndwidth are deter mined indep endently for all no des u . Subsequently , the pro cess pro cee ds in iterations. In e a ch iter ation, a ll p eer s simultaneously decide whether to s ha re the ﬁle o r not, based on the pa yments, cos ts, a nd prev io us decisions of all other p eer s. The pro cess 4 contin ues un til an equilibrium is reached. Notice that beca us e the c ost to a p eer is monotone decreasing in the set V + of current ly s haring peers, the set of s haring peers can only become larger from iteration to iteration. In particular , this implies that the pr o cess will event ually termina te with some set V + of activ e pee rs. W e call this the sharing pr o c ess or activation pr o c ess . The netw ork administrato r is in genera l interested in increa sing a ccess to the ﬁle while keeping the paymen ts low. This g e ne r al o b jective ma y b e captured using v ario us metrics. In general, w e allow for a ny ov era ll so cial welfare function W which increas es monoto nically in the s et S of shar ing no des. Notice that since the set S itself is the r esult of a rando m pr o cess, the administrato r ’s goal will be to maximize E [ W ( S )], where S is derived from the r andom activ ation pro c ess in the demand mo del. Several so cial welfare functions W sugges t themselves naturally: 1. The n umber of activ e p eers is a natural meas ure of participation. It is the meas ure frequently studied in the co nt ext of the diﬀusion of innov ations or behaviors in so cial netw o rks [10, 12, 14, 15, 17, 1 8]. While the ob jective is s imilar, the precise dynamics ar e diﬀerent b etw een those mo dels and the demand mo del. 2. The total num b er o f servic e d no des, i.e ., no des v with a t lea st one activ e no de u with p u,v > 0. This mo del is appro priate if we only care ab out how m any p eers can download the ﬁle , but no t ab out the quality of the c o nnection. It implicitly assumes that each pe e r has a cons ta nt utilit y of 1 for downloading. 3. Ea ch no de u gets a utilit y o f P v ∈ V + p u,v , and the so cial welfare is the sum o f all these utilities. This mo del is based on the assumption that u ’s demand is served by all of its neigh b ors (including p ossibly u ) simultaneously , and that u ’s utility is the total “download bandwidth” a v a ilable in this sense. W e call this the sum-welfar e function. 4. Ea ch no de u gets a utilit y of max v ∈ V + p u,v , and the s o cial welfare is the sum of a ll these utilities. This is ba sed o n the as s umption that u ’s demand is ser ved by its active neighbor with the b est connection, corres p o nding to a situation where par allel download from m ultiple sources is not p o s sible. W e call this the max-welfar e function. Notice that the socia l w elfar e function W may also include the utilities o f the sharing nodes. 3 Theoretical Analysis of the Mo d el The main analy tical contribution of this pap e r is based on c over age pr o c esses 2 , de ﬁned for mally in Deﬁnition 5. Informally , a c ov era ge pr o cess is a ra ndom pro cess such that the distr ibutio n over sets o f ultimately active no des is also the distr ibution o f r eachable no des under a suitably chosen distribution of rando m gra phs. Our results on co verage pro cess e s are tw ofold: (1) W e g ive a gener al characterization of coverage pro c e s ses, and show that the activ ation pro ces s for P2P systems is a cov er age proces s. (2) W e g ive a sig niﬁcantly simpliﬁed pro of (compar ed to the general result of [18]) showing that under cov era ge pro cess es, the ex pec ted so c ia l welfare as a function of the paymen ts has diminishing returns in the sense o f Deﬁnition 1 so lo ng as the so cial w elfare is a submo dular function of the active no des. Recall that a function f deﬁned on s ets is su bmo dular if f ( S + v ) − f ( S ) ≥ f ( T + v ) − f ( T ) whenever S ⊆ T , i.e., if the addition o f an element to a la rger set causes a s maller increase in the function v alue than to a smaller set. Th us, submo dular it y is the discrete analogue o f concavit y , and intuitiv ely co rresp onds to “diminishing returns.” An easy inductive pro of (on the size o f X ) shows that submo dularity is eq uiv a le n t to the condition that for all sets X , f ( S ∪ X ) − f ( S ) ≥ ( T ∪ X ) − f ( T ) whenever S ⊆ T . (1) 2 W e thank Bobb y Kleinberg for this naming suggestion, and also note here that Theorem 8 w as deriv ed independen tly by him. 5 Deﬁnition 1 A fun ct ion g : R n → R has diminishing returns if for every p air i, j and al l ve ctors x , it satiﬁses ∂ g ( x 1 , x 2 , . . . , x n ) ∂ x i ∂ x j ≤ 0 . Remark 2 The notion of “diminishing returns” is strictly weak er than concavit y ; it cor resp onds to concavity only along po sitive co or dinates axes. 3 The t wo main con tr ibutions o f our paper tog e ther imply the following theorem as a corollary: Theorem 3 L et W ( π 1 , . . . , π n ) = E [ W ( S )] b e the exp e cte d so cial welfar e when set S is obtaine d fr om the sharing pr o c ess of the demand mo del with p ayments π 1 , . . . , π n . If W ( S ) is submo dular, then W ( π 1 , . . . , π n ) is monotone and has diminishing r eturns with r esp e ct to t he p ayments π 1 , . . . , π n . F or the so cial welfare function, the diminishing r e tur ns prop erty intuitiv ely means that the a dditional bene ﬁt in social w elfare that can b e derived from increa sing the pa yment to a p eer u decrea ses as the p eers’ current payments increase. The pro of of Theore m 3 is bas e d o n analyzing the following S e e d S et Mo del , which w e deﬁne mainly for the purpo se of analysis. Deﬁnition 4 (Seed Se t Mo del) F or e ach n o de, the p ayment oﬀer e d is π u = C u . Besides p ayments, we have a see d set S of p e ers that wil l always shar e r e gar d less of the p ayments. Su bse quently, t he pr o c ess u nfolds exactly ac c or ding to the sharing pr o c ess. The main technical step is to show tha t the Seed Set Mo del is a c over age pr o c ess , in the following sens e . Deﬁnition 5 (Cov erage Pro cess) L et φ ( S ) b e the r andom variable describing the set of no des active at the end of a pr o c ess starting fr om the set S of no des active. The pr o c ess is c al le d a co verage pro cess if t her e exists a distribution D over gr aphs G such that for e ach set T of no des, Prob[ φ ( S ) = T ] e quals the pr ob ability that ex actly T is r e achable st art ing fr om S in G if G is dr awn fr om the distribution D . Remark 6 Without using our nomenclature, [1 4] show ed s ubmo dula rity fo r the Casc a de and Threshold mo dels of innov ation diﬀusio n [10, 12] by e stablishing that b oth gav e rise to cov er age pr o cesses. Subsequently , [15] show ed that ther e are na tural diﬀusion pro cesses which are not cov erage pro ces s es, yet hav e a submo dular function E [ | φ ( S ) | ]. W e prove that the Seed Set Model is a cov er age pro cess in tw o steps . First, in Section 3.1, we give a general and complete characterization of Coverage P ro cesses. This c har a cterization may b e of interest in its own right, as co verage pr o cesses ha ve a practical a dv a n tage: they can be simulated easily and eﬃciently , by ﬁrst generating a random graph according to D , a nd then simply ﬁnding the set of r e a chable no des. Then, in Section 3.2, w e show that the Seed Set Pro cess satisﬁes the conditions established in Sectio n 3.1. Finally , in Section 3.3, w e give a simple pro o f that for any coverage pr o cess and any submo dular so cial welf are function, the exp ected socia l welfare under the pro c e ss is a lso submo dular . This implies diminishing returns with resp ect to the pa yment s. Remark 7 The fact that the tradeoﬀs λ u betw een money and bandwidth are uniformly random in [0 , 1] is impo rtant to ensure the submo dularity and diminishing retur ns prop erties. If the λ u are not ra ndom but ﬁxed, then the diminishing returns and submo dularity prope r ties cease to hold. F urthermore, in the Seed Set Mo del, the optimization problem of ﬁnding the b est seed set S of at most k no des bec o mes very har d, as w e sho w in the a ppe ndix. 3 W e thank Shaddin Dughmi for p ointing out this am biguit y in an earl ier version of the paper. 6 3.1 Characterization of Co verage P ro cesses In this sectio n, we character iz e exactly which random pr o cesses ar e coverage proces ses. This theo r em ma y be o f in terest in its own right, when analyzing diﬀerent pr o cesses. Our setting is exactly as in the pap er b y Mossel and Ro ch [18]: ea ch no de u has an activ a tion function f u , which is mono tone non-decreas ing and sa tis ﬁe s f u ( ∅ ) = 0. Ea ch no de indep endently chooses a thre s hold θ u ∈ [0 , 1] uniformly at random, a nd beco mes activ e when f u ( S ) ≥ θ u , where S is the prev io usly active s et of no des. This pro cess is rep eated un til no more changes o ccur . In order to express our r esults concisely , we use the following discrete equiv alent of a der iv a tiv e (see, e.g., [26]). F or a function f deﬁned on sets, we deﬁne inductively: f ∅ ( S ) = f ( S ) f R ∪{ v } ( S ) = f R ( S ∪ { v } ) − f R ( S ) . It is no t diﬃcult to verify that this no tion is well-deﬁned, i.e., indep endent of which element v is c hosen at which sta ge. Theorem 8 The following c onditions ar e ne c essary and su ﬃcient for the pr o c ess to b e a c over age pr o c ess. • F or al l sets T of o dd c ar dinality | T | , as wel l as for T = ∅ , and e ach n o de u , we have f u T ( T ) ≥ 0 . • F or al l sets T of p ositive even c ar dinality | T | , and e ach no de u , we have f u T ( T ) ≤ 0 . • f u ( ∅ ) = 0 for al l u . T o pr ov e this theorem, we b egin with the follo wing rea soning. F o cus on o ne node u , and its activ a tion function f u . If there were an equiv alent gr aph distribution D , then it would hav e to deﬁne a probability q u ( T ) for the presence o f edg es from exa ctly the vertex set T to u . The s e pr obabilities need to sa tisfy the following prop erty: if a set S of no des is active, then the probability of u having at le ast one incoming edge from S must equa l f u ( S ). Th us, a neces s ary and suﬃcien t condition for b eing a co verage function is that for each no de u , there exists a distribution q u ( T ) ov er sets T such that f u ( S ) = X T : T ∩ S 6 = ∅ q u ( T ) . (2) W e can express this r equirement more compactly using matrix no tation. Let f u be the (2 n − 1)- dimens io nal vector consisting of all entries of f u ( S ) for S 6 = ∅ . Similarly , let q u be the (2 n − 1 )-dimensional vector o f all q u ( S ) for S 6 = ∅ . Let A b e the ((2 n − 1) × (2 n − 1))-dimensiona l matrix indexed by non-empt y subsets such that A S,T = 1 if and only if S ∩ T 6 = ∅ , and A S,T = 0 otherwise. ( A is called an incidenc e m atr ix [4].) Then, Equation 2 can b e r ewritten as the requirement that for eac h no de u , there exists a distributio n q u such that A · q u = f u . F or the a nalysis, we ﬁx a canonical orde r ing o f subse ts . Sp eciﬁcally , if the cur rent (sub-)universe consists of k no des indexed { 1 , 2 , . . . , k } , their cano nical or de r ing is deﬁned r ecursively a s ﬁr st co nt aining all subsets of { 1 , 2 , . . . , k − 1 } in canonica l or der, then the set { k } , follow e d by the sets T ∪ { k } , where the sets T ⊆ { 1 , 2 , . . . , k − 1 } app ear in canonical order. In order to ﬁnd out when the distribution q u exists, we want to solve the equation A · q u = f u , or f u = A − 1 · q u . While the inv ers e s o f some incidence matrices ha ve bee n studied b efore (see, e.g., [3 ]), w e a re not aw ar e of an y source explicitly giving the in verse of the matrix A . Hence, we esta blish here: Lemma 9 The inverse of A is the matrix B deﬁne d by b S,T :=  0 if S ∪ T 6 = { 1 , . . . , n } ( − 1) | S ∩ T | +1 otherwise 7 Pro of. T he key insight is that under the cano nical or dering of s ets deﬁned ab ov e , the matric es A and B can be deﬁned rec ur sively via matrices A k and B k . Sp eciﬁcally , let A 1 = 1, and A k +1 =                 A k 0 . . . A k 0 0 0 . . . 0 0 1 1 . . . 1 1 A k 1 . . . 1 1 1                 . Similarly , let B 1 = 1, and B k +1 =                 0 0 . . . B k 0 -1 0 . . . 0 -1 0 0 . . . 0 1 B k 0 . . . − B k 0 1                 . The fact that A = A n and B = B n can be observed dir ectly fro m the deﬁnition and the canonical ordering. T o pr ov e the lemma, we ca n s how b y induction on k that A k · B k = I k for all k , where I k is the k × k ident it y matrix. The base case k = 1 is obvious. F or the inductive step to k + 1, cons ide r the ( i , j ) entry ( A k +1 · B k +1 ) i,j . W e distinguish 7 diﬀerent cases , based on the ( i , j ) indices . (W e use 0 to denote the (2 k − 1 ) · (2 k − 1 ) matrix of all zer o es, 1 for the vector of all ones, and ˆ u for the (2 k − 1 )-dimensional unit vector with 1 in its last coo rdinate and 0 ev erywhere else.) 1. If i, j < 2 k , then the en tr y is ( A k · 0) i,j + 0 + ( A k · B k ) i,j = ( I k ) i,j by induction hypo thesis. 2. If i > 2 k , j < 2 k , then (writing i ′ = i − 2 k ), the en tr y is ( A k · 0) i ′ ,j − ˆ u i ′ + ( 1 · B k ) i ′ = 0 using Lemma 10(a) below. 3. If i < 2 k , j > 2 k , then (writing j ′ = j − 2 k ), the en tr y is ( A k · B k ) i,j ′ + 0 i,j ′ − ( A k · B k ) i,j ′ = 0. 4. If i, j > 2 k , then (wr iting i ′ = i − 2 k , j ′ = j − 2 k ), the entry is ( A k · B k ) i ′ ,j ′ + ˆ u j ′ − ( 1 · B k ) j ′ = ( I k ) i ′ ,j ′ , again using Lemma 10(a). 5. If i = j = 2 k , a straightforw a rd calculation shows tha t the e ntry is 1. 6. If i = 2 k , j < 2 k , then the entry is ( 1 · B k ) j − ˆ u j = 0 by Lemma 10(a). Similarly , for i = 2 k , j > 2 k , writing j ′ = j − 2 k , the en try is ˆ u j ′ − ( 1 · B k ) j ′ = 0 by Lemma 1 0(a). 7. Finally , fo r j = 2 k , i < 2 k , the e n try is − ( A k · ˆ u T ) i + ( A k · ˆ u T ) i = 0, whereas for j = 2 k , i > 2 k , writing i ′ = i − 2 k , the en try is − ( A k · ˆ u T ) i ′ + 1 = 0 b y Lemma 10(b). This prov es that A k +1 · B k +1 = I k +1 . Lemma 10 L et 1 b e the ve ctor of al l 1’s, and ˆ u deﬁne d as in the pr o of of L emma 9. Then, (a) 1 · B k = ˆ u , and (b) A k · ˆ u T = 1 . 8 Pro of. F or part (a), w e sho w that the row sums of all r ows of B k are zero except the las t r ow, which has a row sum of one. The pro of is by induction. The ba se case B 1 = 1 is clear. F or the inductive step fr o m k to k + 1 , ﬁrst notice that all the entries in columns j < 2 k − 1 are zero by induction hypothesis. F or column 2 k − 1 , the row sum of B k contributes 1 b y induction hypo thesis, from which 1 is subtracted beca use of the ent ry in the middle c olumn. Column 2 k adds up to 0 ex plicitly , and columns j = 2 k + 1 , . . . , 2 k +1 − 2 hav e terms of B k and − B k canceling o ut. Finally , for the last co lumn, the en tries of B k and − B k cancel out, leaving the entry 1 from the middle co lumn. F or part (b), simply notice that using par t (a) and the induction h yp othesis of Lemma 9 (for k ), we get that A k · ˆ u T = A k · B T k · 1 = I k · 1 = 1 . Here, we used that B k is symmetric. The next lemma shows that so long as all q u ( S ) are non-neg a tive, by setting q u ( ∅ ) a ppr opriately , we can alwa ys obtain a probability distribution. Lemma 11 With q u ( S ) deﬁne d as q u = B · f u , we have P S q u ( S ) ≤ 1 . Pro of. L et 1 denote the all-ones v ector as befo re. W e can rewrite X S q u ( S ) = 1 · ( B · f u ) = ( 1 · B ) · f u . Using Lemma 10(a), the sum is exactly equal to f u ( { 1 , . . . , n } ) ≤ 1, completing the pro of. By Lemma 9 , we know that q u = B · f u . And by Lemma 1 1, the e n tries sum up to at most 1. Thus, it remains to show that the en tries of q u are no n-negative if and only if f u satisﬁes the conditions of Theorem 8. T o r elate these formulations, we prov e the follo wing non-rec ur sive characterization of disc r ete deriv atives. Lemma 12 F or al l sets T , we have that f T ( W ) = X S ⊆ T ( − 1) | T |−| S | f ( W ∪ S ) . Pro of. T he pro o f is by induction o n | T | . F or T = ∅ , the claim is trivia l. No w, consider a set T k +1 = T k ∪ { t } of size k + 1. By deﬁnition of the dis c r ete deriv ative and induction hypothesis , f T k +1 ( W ) = f T k ( W ∪ { t } ) − f T k ( W ) = X S ⊆ T k ( − 1) k −| S | f ( W ∪ S ∪ { t } ) − X S ⊆ T k ( − 1) k −| S | f ( W ∪ S ) = X S ⊆ T k +1 : t ∈ S ( − 1) k +1 −| S | f ( W ∪ S ) + X S ⊆ T k +1 : t / ∈ S ( − 1) k +1 −| S | f ( W ∪ S ) = X S ⊆ T k +1 ( − 1) k +1 −| S | f ( W ∪ S ) , which co mpletes the inductiv e pro o f. Pro of of Theorem 8 . Fix an y node u , and deﬁne q u = B · f u . By Lemma 12, we can write the discrete deriv ative of f u at T a s f u T ( T ) = X S ⊆ T ( − 1) | T |−| S | f u ( T ∪ S ) . Now, if | T | is odd, then ( − 1 ) | T |−| S | = ( − 1 ) | S | +1 , so w e can r ewrite the above as X S ⊆ T ( − 1) | S | +1 f u ( T ∪ S ) = X W ⊇ T ( − 1) | W ∩ T | +1 f u ( W ) = q u ( T ) . 9 Similarly , if | T | is even, then ( − 1) | T |−| S | = ( − 1) | S | , so w e can rewrite the dis c rete deriv a tive as X S ⊆ T ( − 1) | S | f u ( T ∪ S ) = X W ⊇ T ( − 1) | W ∩ T | f u ( W ) = − q u ( T ) . Thu s, the q u ( T ) are all no n-negative (and the probability distribution th us well-deﬁned) if and only if f u T ( T ) ≥ 0 for | T | odd, and f u T ( T ) ≤ 0 for | T | > 0 even. 3.2 Co verage P rop erty of the Seed Set Pro cess In this section, w e establish the following theorem. Theorem 13 The Se e d Set Pr o c ess is a c over age pr o c ess. Pro of. I n order to prov e this theorem, we w ant to a pply Theorem 8. T o do so, we need to show that the lo cal decisions of no des ab out sha r ing can be c ast in terms of submo dular thre s hold functions. Sp eciﬁcally , we deﬁne f u ( S ) := 1 − 1 C u · c u · X v d v p v, u P w ∈ S ∪{ u } p v, w and let θ u = 1 − λ u π u C u . (Recall from Section 2.3 that C u = c u · P v d v .) A no de u b eco mes active if doing so has p ositive utility , i.e., if λ u π u > c u · P v d v p v,u P w ∈ S ∪{ u } p v,w . Dividing bo th sides b y C u , and subtracting from 1 shows that this is equiv a lent to saying that 1 − λ u π u C u < 1 − 1 C u · c u · X v d v p v, u P w ∈ S ∪{ u } p v, w . Since λ u π u is uniformly ra ndo m in [0 , C u ] by the deﬁnition of π u in the Seed Set Model, this condition is equiv alent to sa y ing tha t θ u < f u ( S ). Thus, we ha ve shown that the activ atio n pr o cess can be equiv alently recast in terms of threshold activ ations functions. Finally , w e need to sho w tha t for every no de u , all deriv atives f u T ( S ) are non-negative when | T | is o dd and no n-p ositive when | T | > 0 is even. (The fact tha t f u ( S ) = f u ∅ ( S ) is non-negative follo ws directly b y deﬁnition.) Let ˆ f u ( x 1 , . . . , x n ) = 1 − 1 C u · c u · X v d v p v, u P v i ∈ V p v, v i x i be the contin uous equiv alent of the lo ca l inﬂuence function f u . F or a set S , let y ( S ) denote the n -dimensional vector with y ( S ) i = 1 if v i ∈ S ∪ { u } and y ( S ) i = 0 otherwise. Then, f u ( S ) = ˆ f u ( y ( S ) ). Notice that by deﬁnition, there is no division b y zer o . W riting d Y T = dy i 1 dy i 2 · · · dy i | T | , where T = { i 1 , i 2 , . . . , i | T | } , an easy inductiv e pro o f ﬁrst shows that f u T ( S ) = Z 1 0 . . . | T | Z 1 0 d ˆ f u ( y ( S ) ) d Y T d Y T . It re ma ins to sho w that each term inside the in tegra tion is no n- negative for odd | T | and non-p ositive for even | T | . W e accomplish this b y showing that d ˆ f u ( y ( S ) ) d Y T = ( − 1) | T | +1 | T | ! c u C u X v d v p v, u Q t ∈ T p v, t ( P v i ∈ V p v, v i y ( S ) i ) | T | +1 . 10 The pro of is b y induction. T he bas e case: | T | = 1 ca n be veriﬁed easily . Assume that the claim holds for | T | = i − 1. W e hav e d ˆ f u ( y ( S ) ) d Y T dy i = d dy i ( − 1) | T | +1 | T | ! c u C u X v d v p v, u Q t ∈ T p v, t ( P v i ∈ V p v, v i y i ) | T | +1 = ( − 1)( − 1) | T | +1 | T | ! c u C u · X v ( | T | + 1) p v, v i d v p v, u Q t ∈ T p v, t ( P v i ∈ V p v, v i y i ) | T | ( P v i ∈ V p v, v i y i ) 2 | T | +2 = ( − 1) | T | +2 | T + 1 | ! c u C u X v d v p v, u Q t ∈ T ∪{ v i } p v, t ( P v i ∈ V p v, v i y i ) | T | +2 . This completes the inductiv e pr o of, and th us the pro of of Theorem 13. While w e deﬁned the Seed Set Pr o cess prima rily as a tool for analysis, we remar k here that Theorem 13 has a direc t consequence for the optimizatio n problem of maximizing the expe cted total num b er of active no des at the end of the proces s, s ub ject to a size constraint on the seed se t S . A Theorem of Nemhauser et al. [7, 19] states tha t if f is an y non-nega tive, monotone , a nd submo dular function on sets, then the greedy algorithm is a p olynomial-time (1 − 1 /e )-a pproximation (wher e e is the ba se of the natural lo garithm). Since we ca n approximate the exp ected num b er of active no de s under the Se e d Set Pr o cess arbitrar ily closely b y simulating the activ ation pro ces s (see [14] for an in-depth discussion of the greedy algo rithm), w e obtain the following co rollary : Corollary 14 The b est starting set S for the Se e d Set Pr o c ess c an b e appr oximate d within (1 − 1 /e − ǫ ) in p olynomial time, for any ǫ > 0 . 3.3 Diminishing R eturns of Exp ected So cial W elfare Finally , we use the ma chin ery of co verage proce sses to sho w diminishing returns of soc ia l w elfar e. Consider an arbitra ry coverage pro cess. When the cov erage pro ces s s ta rts with the set T , let φ ( T ) b e a random v a riable desc r ibing the set of nodes activ e at the end of the pro cess. Thus, the distr ibutio n of φ ( T ) for all T precisely characterizes the coverage pro cess. Our main theorem is now the following: Theorem 15 L et h ( S ) b e any monotone submo dular function of S . Then, E [ h ( φ ( T ))] is a monotone sub- mo dular funct ion of T , wher e t he exp e ct ation is taken over the r andomness in φ ( T ) . This theorem follows from the ge ne r al r esult of [18], since a ll cov e r age pro cesses are lo cally submo dular , and our utility function is submodula r with resp ect to the set of sharing neighbor s. How ever, below w e g ive a very simple pr o of based on reachabilit y in graphs using the fact that φ is a cov er a ge pro c e ss. This is useful for the purp ose of simulating the pr o cess and estimating φ . It means that instead of gener ating random thresholds and simulating a dynamic pro cess, w e can generate a random graph and then simply use BFS to ﬁnd the n umber of rea chable no des. Pro of. B ecause φ is a cov erage pro cess, by Theorem 8, there is a distribution P r[ · ] over g raphs H such that for an y set T , the set of no des reachable in H from T has the same distribution as φ ( T ). Le t φ H ( T ) denote the set of nodes rea chable from T in H . Then, E [ h ( φ ( T ))] = X H Pr[ H ] · h ( φ H ( T )) . Fix some graph H a nd let S ⊆ T and x / ∈ T . Then, h ( φ H ( T + x )) − h ( φ H ( T )) = h ( φ H ( T ) ∪ φ H ( { x } )) − h ( φ H ( T )) ≤ h ( φ H ( S ) ∪ φ H ( { x } )) − h ( φ H ( S )) = h ( φ H ( S + x )) − h ( φ H ( S )) , 11 where the inequalit y followed from Inequalit y (1), and the equalities fr om the deﬁnitions o f reachabilit y in a graph. Thus, for any ﬁx ed gr aph H , the function h ( φ H ( T )) is monoto ne and submodular in T . B ecause the Pr[ H ] are proba bilities , E [ h ( φ ( T ))] is a non-neg ative linear c ombination of monotone submo dular functions, and th us als o monotone a nd submo dular. The ﬁnal piece of the pro of of Theorem 3 is the following lemma, s howing that monoto nicit y and s ub- mo dularity of the Seed Set Mo del imply diminishing returns for the original mode l. Lemma 16 L et f b e a non-ne gative, monotone, submo dular function on set s . Consider the function g deﬁne d as fol lows: Each element u is include d in S indep endently with pr ob ability q u ( π u ) , wher e q u is an incr e asing and c onc ave function of π u . Deﬁne g ( π ) = E [ f ( S )] . Then, g is monotone and satisﬁes t he diminishing r etu rns pr op erty as deﬁne d in Deﬁ n ition 1. Pro of. Fir st, no tice that g ( π ) = P S ′ ⊆ V f ( S ′ ) Q u ∈ S ′ q u ( π u ) Q u / ∈ S ′ (1 − q u ( π u )). In order to show the diminishing returns prop erty , it is eno ugh to show that ∂ g ( π ) ∂ π i ≥ 0 a nd ∂ g ( π ) ∂ π i ∂ π j ≤ 0 for all i, j ∈ V . Using the deﬁnition of g , we hav e: ∂ g ( π ) ∂ π i = X S ⊆ V ,i ∈ S f ( S ) · dq i ( π i ) dπ i · Y u ∈ S,u 6 = i q u ( π u ) · Y u / ∈ S (1 − q u ( π u )) − X S ⊆ V ,i / ∈ S f ( S ) dq i ( π i ) dπ i · Y u ∈ S q u ( π u ) · Y u / ∈ S,u 6 = i (1 − q u ( π u )) = X S ⊆ V ,i ∈ S ( f ( S ) − f ( S − i )) · dq i ( π i ) dπ i · Y u ∈ S,u 6 = i q u ( π u ) · Y u / ∈ S (1 − q u ( π u )) ≥ 0 . The last inequalit y holds b ecause dq i ( π i ) dπ i ≥ 0 and f is monotone. Next we need to show that ∂ g ( π ) ∂ π i ∂ π j ≤ 0 for all i, j ∈ V . F or i = j , a calculation similar to the one ab ov e shows that ∂ 2 g ( π ) ∂ π 2 i = X S ⊆ V ,i ∈ S ( f ( S ) − f ( S − i )) · d 2 q i ( π i ) dπ 2 i · Y u ∈ S,u 6 = i q u ( π u ) · Y u / ∈ S (1 − q u ( π u )) , which is non-p ositive because f is monotone and q i is concav e . Finally , suppose that i 6 = j . Using a calculation similar to the o ne ab ov e, w e can rewrite ∂ g ( π ) ∂ π i ∂ π j as X S ⊆ V \{ i,j } ( f ( S + i + j ) − f ( S + i ) − f ( S + j ) + f ( S )) · dq i ( π i ) dπ i · dq j ( π j ) dπ j · Y u ∈ S q u ( π u ) · Y u / ∈ S,u 6 = i,j (1 − q u ( π u )) , which is non-p ositive because f is submo dular and q i , q j are concav e . With Theorem 13 a nd Lemma 16, w e can now complete the pro o f of Theo r em 3. Pro of of Theorem 3 . Cons ider one node u . The pro ba bilit y that it b ecomes active initially is p 0 u = Prob[ λ u π u ≥ C u ] = 1 − C u π u . Recall that C u = c u · P v d v , and π u ≥ C u in our mode l, s o this n umber is always no n- negative. Clearly , p 0 u is a lso a monotone increas ing function of π u . T o verify concavity , w e simply take tw o deriv a- tives: the second deriv a tive is − 2 C u ( π u ) 3 , and th us non-po sitive, so p 0 u is concav e. 12 Now, consider all the no des u whic h did not initially beco me active. This is equiv alent to saying that λ u π u ≤ C u . But sub ject to this bound, λ u π u is uniformly random, so we are in the situation o f having an initially active set S , and for each remaining no de u , the paymen t is indep endently a nd uniformly r andom in [0 , C u ]. By Theo rems 13 and 15, the expec ted s o cial welfare W ( S ) is a monotone and submo dular function of the see d set S , so lo ng as W is submodula r in the set of a ctive no des. W e can therefore apply Lemma 16 to E [ h ( φ ( T ))], whic h implies that W ( π 1 , . . . , π n ) has the diminishing returns prop erty . Each o f the s o cial welfare functions listed in Section 2 can b e shown to be monotone and submo dular in the set of active no des b y simple calculations. Thus, for all of these ob jective functions , the total so c ia l welf are is a monotone function of the payments with diminishing retur ns prop erties. 4 Exp erimen tal ev aluation In this se ction, we summarize our obser v ations based on s im ulations b oth on syn thetic and real- world P2P net works. W e ha ve developed a sim ula to r for the three mo dels described in Section 2. 4.1 Sim ulation mo del Given a paymen t scheme π , we g e nerate ra ndom λ u and compute the num b er of active (sharing) nodes. W e also compute the v alue of the so cial welfare a ccording to the utilit y functions in Section 2. In addition, w e calculate the total paymen ts, and the average paymen t p er activ e and p er ser viced no de. These n umber s are a veraged ov er 1000 iterations, each with diﬀerent r andom λ . Net work top olog y . F or our ev a luation, we consider diﬀerent net work top ologies, including tw o netw ork top ologies derived from rea l-world data sets [13, 20, 21], and a regular t wo-dimensional grid top o logy . The real-world data sets ar e based on measured end-to-e nd latencie s b etw ee n pairs of servers deploy ed in the Int ernet [13]. The MIT King da ta set [21] is symmetric a nd measur e s R TT b etw een ea ch pair amo ng 1740 servers, while the Harv a rd King data se t [20] provides asymmetric median latencies be tween each pa ir among 189 5 servers. In additio n to netw orks derived from these tw o da ta sets, w e also consider a regular t wo-dimensional gr id. W e de r ive the download per c e n tage matrix P from the latencies by setting p v, u = max(0 , 1 − ∆ u → v Γ ), where ∆ u → v is the latency fr om u to v , and Γ is a hard threshold for tolerable latencies. This mo dels the fact that users prefer to download from p eers to which they hav e fast connections, and have a threshold beyond which la tency may not b e to le rable any mo r e. By v a rying Γ, we can obtain denser o r sparser download net work topolo g ies. W e will refer to the netw o r ks derived from the MIT King data set a s MIT networks , and those derived from the Harv ard King data set as Harvar d networks . In addition to netw orks derived from these tw o data sets, we also consider a regular tw o-dimensio nal grid. W e do not r epo rt all results for all top ologies her e . Unless stated otherwise , our o bserved trends apply to all of these topolog ies. P a ymen t schemes and non-sharing p e ers . In our exp eriments, we co nsider diﬀerent payment schemes π , to study the impact of payment s on the propa gation of sharing b ehavior. W e pa rameterize the schemes with tw o parameters α, β , and set π u = α · d β u , wher e d u is the degree o f no de u in the netw or k deﬁned b y the p u,v v a lues. Thus, the ﬁnancial utilities are c hosen uniformly at r andom from the in terv al [0 , α · d β u ]. W e also c o nsider the impa ct of peer s who ca nnot (or do not wan t to) s hare the ﬁle a t all, r egardles s of the paymen t oﬀer e d. Such p eers ma y still be in terested in downloading the ﬁle. Their presence can b e exp ected to decrea se the sharing behavior in netw orks , as they will place loa d on other p eers without con tr ibuting. W e call such no des “E mpt y ” nodes , and consider the impact of diﬀerent per centages of Empt y no des on the ov era ll sharing p ercentage. 13 1 1.5 2 2.5 3 3.5 4 4.5 5 0 10 20 30 40 50 60 70 80 90 α Percentage active nodes (%) Demand No−net One−hop 1 1.5 2 2.5 3 3.5 4 4.5 5 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 α Active / $ Demand No−net One−hop (a) Percen ta g e active nodes (b) Active no des p er unit of paymen t Figure 1: Co mparison of diﬀeren t models, using the Harvar d network with no Empt y no des. 4.2 Results Comparison o f diﬀ eren t mo dels . W e b egin by estimating the size of net work eﬀects, by comparing the Demand mo del with the No-Network a nd O ne-Hop mo dels. Figur e 1 (a) compares the participation ra tes under the thr ee mo dels, with the sa me paymen t scheme a nd same netw or k (Ha r v a rd). W e keep β = 1 constant in the paymen t scheme, and v a ry α . Thus, paymen ts ar e prop ortional to no de s ’ degr e e s. The ﬁg ure shows that b y ig noring netw ork eﬀects, we would under estimate the num b er of shar ing no des b y ab out 1 5% on average, and as muc h as 25% (for α = 1 . 2). The same trends ho ld for the fraction of servic e d no des (not shown here): the n umber of serviced no des is underestimated b y ab out 10% if ignoring net work eﬀects. Figure 1 (b) compar es the num b er of a ctive no des p er unit of paymen t sp ent by the net work administra to r. This is an in ter esting metric a s it ca ptures the tra deoﬀ betw een par ticipa tion and payment s. Compared to the num b er of a ctive nodes, the c hoic e of mo del seems to hav e rema rk ably little impact on the es timate of this q uantit y . F or small v alues of α , the netw o rk eﬀects lead to s lightly higher pa yments p er active no des, as the net work eﬀects le ad to a n activ a tion of mo re high-degree nodes, whic h hav e higher paymen ts. This eﬀect disapp ear s as α incr eases, and more no des are activ ated in the N o-Network mo del as well. The sa me trends hold for the n umber of servic e d no des p er unit of paymen t spent (not shown here). The results rep orted here stay essentially the same both for the MIT and gr id topolo g ies. In particular, the underestimate of the num b er of activ e no des by the No-Net work mo del is essen tially the same in these top ologies. In the grid topo logy , the No-Network mo del in fact ov e restimates the cost p er active no de by ab out 10%, as the dep endence on the degr e e disapp ear s, and netw ork eﬀects lead to an activ a tion of more no des with smaller pa yment s. Diﬀerent So cial W elfare F unctions W e ev alua te o ur theor etical re s ults a gainst the num b er of ser viced no des and the tw o so cial welfare functions sum-w e lfa re and ma x -welfare, as deﬁned in Section 2. All three are plotted in Figur e 2. Although each so cial w elfare function diﬀers from the o thers in terms of the degree of submo dularity (for example, sum-welfare can b e sho wn to be completely mo dular in the n umber of active no des), the curv a tures of the plots as a function of paymen ts are more o r less the same. Thus, the co ncavit y (diminishing returns) app ears to be dominated by the submodularity of the ac tiv atio n pro cess. Diﬀerent pa yment sc hem es . F or a net work administrator, it is particular ly int eresting how the choice of pa y men ts will aﬀect sharing b ehavior, and the cost-e ﬀectiveness of achieving a certain participation rate. Our next set of exper iment s therefore sho ws the percentage of activ e no des, and the n umber o f active no des per unit of payment , when the parameters α and β in the payments π u = α · d β u are v arie d. Figure 3 (a) shows the p ercentage of active no des in Harvar d network with 50% Empt y no des, a s a 14 1 1.5 2 2.5 3 3.5 4 4.5 5 0 500 1000 1500 2000 2500 3000 3500 4000 α Social Welfare # Serviced Nodes Sum Utility Max Utility Figure 2: Sum-W elfar e , Max-W elfare and the num b er of ser viced nodes , Harvar d network . function of α a nd β . Figur e 3 (b) shows the nu m be r of active no des per unit of paymen t under the same setting. The cost eﬀectiveness is max imized for very small v alues of α a nd β , speciﬁca lly β = 0 and α = 1 . 2. How ever, this comes a t a steep price , in that almo st no no de s (only a bo ut 4.4% of the netw ork) share in this case. Clearly , ther e is no single p oint at whic h the net work should oper ate. Rather, a netw ork administrator who wan ts to achiev e a certain par ticipation rate can use these plots ﬁnd the most cos t-eﬀective paymen t scheme to a chiev e this ra te. F o r instance, if the g oal is to achieve 30% shar ing, this ca n b e achiev ed by setting α = 1 . 6 and β = 1 . 5, or α = 1 . 8 and β = 1 . O f these, the ﬁrs t scheme sp ends ab out 30 units per active node, while the second scheme s pends ab out 7 units p er active node. Th us, a judicious choice of paymen ts can lead to signiﬁcant savings while ensuring the same level o f participation. In g eneral, the plot suggests tha t β ∈ [0 . 5 , 1] tends to lea d to go o d tra deo ﬀs b etw een participation and cost: for smaller v a lues of β , participation tends to be to o low, while for higher v alues, the c o st per active no de increases signiﬁcantly . The o bserved trends are fairly independent of the netw o rk topo logies. In pa rticular, the plots for bo th the grid and MIT network also sug g est that β ∈ [0 . 5 , 1] g ives the b est cost eﬃciency for a given fraction of participating no des. Diﬀerent thresholds ( Γ ) . Finally , w e in vestigate the impact o f diﬀerent latency tolerance thresholds Γ o n the activ a tion pr o cess. Recall that the larger Γ, the more peers u may serve v . F or instance, with Γ = 2ms, the average deg ree o f no des in the Harvar d network is 4 .58, while with Γ = 5ms, the av erage degree increas es to 14 .93. In the re s ulting denser g raph, we w ould expect less degree imbalance, and overall higher netw ork eﬀects; how ever, the paymen ts will need to compe ns ate for more do wnlo a ds fr om any individual no de. The exp eriments, co nducted o n the Harvar d network with no Empty no des, conﬁrm this in tuition. When β = 0, Figur e 4 (a) shows that the num b er of no des ser v iced is smaller in Ha rv ard Γ=5ms than in Harv ar d Γ=2ms . The reason is that the payments do not increase with the degree, so it is costlier for no des in Ha rv ard Γ=5ms to b ecome active. As β increase s, and high deg r ees result in higher comp e nsation, mor e no des are se r viced in Harv ard Γ=5ms . With β > 0, paymen ts increas e in the no de degree, and nodes in Harv ard Γ=5ms receive more paymen ts b ecause of their higher av e r age deg ree. Th us, mor e no des a re activ ated, and as a result, more no des can be service d. The increa s ed a ctiv a tion comes at a price, a s seen in Figure 4 (b). The higher average degree in Harv ard Γ=5ms , combined with the dep endence of paymen ts on the degrees, leads to somewhat higher pa y- men ts p er active (or s e rviced) no de. Th us , in the Demand mo del, the incr eased par ticipation in denser net works is not only a result of net work eﬀects, but also of higher paymen ts. Therefore, in order to inv es tigate the eﬀectiveness of density itself on the par ticipation o r ser vice r ate, we 15 1 2 3 4 5 0 0.5 1 1.5 2 0 10 20 30 40 50 α β  Percentage active nodes (%) 1 2 3 4 5 0 0.5 1 1.5 2 0 0.2 0.4 0.6 0.8 1 α β Active / $ (a) P e r centage of no des active (b) Active no des p er unit of paymen t Figure 3: Compa r ison of diﬀerent pa yment schemes, Harvar d network with 50% Empty nodes. Notice that for readability , the directions of axis labels for α, β is diﬀerent in the tw o ﬁgures. 1 2 3 4 5 6 0 0.5 1 1.5 2 0 20 40 60 80 100 α β Number of serviced nodes 2ms 5ms 1 2 3 4 5 6 0 0.5 1 1.5 2 0 0.2 0.4 0.6 0.8 1 α β Serviced / $ 2ms 5ms (a) F r action of serviced nodes (b) Serviced nodes p er unit of paymen t Figure 4: Co mparison of diﬀeren t thresholds, Harvar d network with no Empt y no des . make the following co mparison. Fix the payment p er active no de for b oth Harv ard Γ=5ms and Ha r v ard Γ=2ms to an a rbitrary num b er by choosing the appropriate payment schemes for each g raph. F or example, in order to get a paymen t of 2 3 p er active no de, a paymen t sc heme for Harv ar d Γ=5ms would b e α = 2 . 7 and β = 1 and for Harv a rd Γ=2ms would b e α = 1 and β = 1 . 5. It turns out that the dens er net work (Harv a rd Γ=5ms ) gives a signiﬁcantly higher rate for b oth par ticipation and service. F or a paymen t of 23 units p er active no de, for instance, the fra c tion of participating no des for Harv ar d Γ=5ms is 8 6% while the sa me fra ction go es down to 39 % in Harv ard Γ=2ms . Based on the sim ulatio ns, the following w ere our main observ a tions: 1. How diﬀeren t are the pr edictions in sharing behavior b etw een the Demand , the No-Network , and the 16 One-Hop mo dels? O ur re s ults s how a signiﬁca n t diﬀerence betw een the mo dels in their prediction of sharing: while the fraction of sharing nodes is qua lita tively similar, the predictions ignoring ne tw ork eﬀects can b e oﬀ by a b o ut 15%–2 5%. This results in up to 1 0% depreciation in the n umber of ser viced pee rs. 2. How do es the pa rticipation depend on the netw or k top ology and density? W e observe that the denser the netw o rk, the hig he r the rate of pa rticipation, given ﬁxed incent ives. This holds across grid and realistic In ter net topo logies. 3. How do es the pa yment scheme a ﬀect the num b er o f sha ring and serviced no des, and the pr ic e paid p er no de? Our exp eriments sugge s t tha t the pa yments π u for r ealistic to p o lo gies should b e prop or tional to u ’s de g ree to give high ov erall par ticipation a t low cost. In other words, given a ne tw ork top ology , there exists a c ho ice o f par ameters for payments prop ortio na l to no de degrees that maximizes the overall “bang per buck”. W e derive thes e parameters for eac h netw ork top ology exper imentally . 5 Conclusions There a re several natura l directio ns for future work. A very interesting ques tion ar ises when taking paymen ts by “reputatio n” o r download pr io rities into acco un t. While monetary paymen ts can (in principle) b e incr eased arbitrar ily , reputation is inheren tly constant-sum: if some p eers are recognized as outstanding sharers, then others will rece ive less recognition, and might ﬁnd the reduced recognitio n not enoug h incentiv e to keep sharing. Similarly , download prio rities co me at the exp ense of other pe e r s, and can th us not be arbitrar ily increased for all members of the netw ork. As a result, the pro cess of sharing will no t necessa rily b e mo notone: pee rs ma y choose to stop sharing once to o man y other peers are active. A ﬁrst question is then whe ther stable (equilibr ium) states even exis t. If so, it w ould b e in ter esting what fraction o f the p eers will b e sharing, what the so cial w elfar e is, a nd how these quantities will depend on the net work structure. F rom a more prac tical viewp o int, it would be de s irable to ev aluate how a ccurately o ur mo del (o r a v a riation ther eof ) ca ptures the actual b ehavior of participants in a P2 P system. This would likely b e a diﬃcult exp eriment to p er form, as many of the parameters, such as ﬁle demands a nd latency , are inherently transient, and in a realistic system, pa yments cannot be c ha ng ed constantly to e v a luate the impact of such changes. In the bigger picture, the net work designer also ha s to be conc e rned ab out manipulation b y p eers. F or instance, co lluding p eers could artiﬁcia lly inﬂate the p erc eived “ degree” of a pe e r (by claiming a do wnlo ad preference), a nd thus the paymen ts to that p eer. A mor e thorough inv estigatio n of mechanisms taking these and other concerns in to acco un t is a n ex citing direction for future w or k . Finally , our work lies amo ng v ario us applications in economics for whic h there are positive or negative externalities a mong ag ents in a neighbo rho o d. Our results sug gest that in order to study diﬀeren t economic metrics such a s r even ue or so cial welfare, w e should alwa ys co ns ider the casca ding eﬀect of agen ts’ str ategies ov er the net work. References [1] Kostas G. Anagnostakis and Mic hael B. Green w ald. Exchange-base d incentiv e mec ha nisms for peer-to- pee r ﬁle sharing. In Pr o c. 24th In t l. Conf. on Dist ribut e d Computing Systems , pages 524–5 33, 200 4 . [2] Christina Ap erjis, Michael J. F re edman, and Rames h Jo hari. Peer-assisted conten t distribution with prices. In Pr o c. 4th Intl. Conf. on emer ging Networking Ex p eriment s and T e chnolo gies (CONEXT) , 2008. [3] Ravindra B. Bapat. Mo ore-Penrose inv erse of set inclusion matrices . Line ar Alge br a and its Applic ations , 318(1):35 –44, 20 00. 17 [4] Richard A. Br ualdi and Herber t J. Ryser. Combinatorial Matrix The ory . Cambridge Univ ers ity Press, 1991. [5] Alice Cheng a nd Eric F r iedman. Sybilpro of re puta tio n mechanisms. In Pr o c. 3r d Workshop on the Ec onomics of Pe er- to-Pe er S ystems (P2PE CON) , 2005 . [6] Bram Cohen. Incentives build robustness in bittor rent. In Pr o c. 1st Workshop on Ec onomics of Pe er- to-Pe er Systems , 2003. [7] G´ erard Cor n u ´ e jo ls, Mar shall L. Fisher, and Georg e L. Nemhauser. Lo cation of bank acc o unt s to optimize ﬂoat. Management Scienc e , 23 :789–8 10, 197 7 . [8] Michal F eldman and J ohn Chuang. Overcoming free- r iding behavior in peer -to-p eer systems. SIGe c om Exchanges , 5(4):41–50 , 2005 . [9] Michal F eldman, C hr istos Papadimitriou, Jo hn Ch uang, a nd Ion Stoica. F ree-riding and whitewashing in peer -to-p eer systems. IEEE Journal on Sele cte d A r e as in Communic ations , 24(5):1 0 10–1 0 19, 20 06. [10] Jaco b Go ldenberg, Bara k Libai, and E itan Muller. T alk of the net work: A complex sys tems lo ok a t the underlying pro cess of w o rd-of-mouth. Marketing L etters , 12:2 1 1–22 3 , 20 01. [11] Philipp e Golle, Kevin Ley ton-Brown, Ilya Miro nov, and Mark Lillibridg e. Incentiv e s for sharing in pee r-to-p eer netw orks . In Pr o c. 2nd Intl. Workshop on Ele ctro nic Commer c e (WELCOM) , page s 75– 87, 2001. [12] Mark Granov etter. Threshold models of collectiv e b ehavior. Americ an Journal of So ciolo gy , 83 :1 420– 1443, 1978. [13] Kris hna P . Gummadi, Stefan Sar oiu, and Steven D. Gribble. King: Estimating la tency b etw een arbitr ary int ernet e nd hosts. In Pr o c. 2nd Usenix/ACM SIGCOMM Int ern et Me asure ment Workshop (IMW) , 2002. [14] David Kemp e, Jon Kleinber g, and Ev a T ar dos. Maximizing the sprea d of inﬂuence in a social net work. In Pr o c. 9th In t l. Conf. on Kn ow le dge Disc overy and Data Mining , pag e s 137–146 , 2003. [15] David Kemp e, Jo n Kleinber g, a nd E v a T ar dos. Inﬂuential no des in a diﬀusion mo del for so cial netw or ks. In Pr o c. 32nd Intl. Col lo q. on A u tomata, L anguages and Pr o gr amming , pages 1 127– 1 138, 20 05. [16] Kevin Lai, Mic ha l F eldman, Ion Stoica, a nd John Chuang. Incentiv es for co o per ation in pee r -to-p eer net works. In 1st Workshop on Ec onomics of Pe er-t o-Pe er S yst ems , 2003. [17] Stephen Morris . Contagion. R eview of Ec onomic Studies , 67:57 – 78, 200 0. [18] Elchanan Mossel and Sebastien Ro ch. On the submo dularity of inﬂuence in so cia l netw or ks. In Pr o c. 38th ACM Symp. on The ory of Computing , pag es 128–134, 2007. [19] Georg e L. Nemhauser, Laur ence A. W o ls ey , and Marshall L. Fisher. An analysis of the a pproximations for maximizing submodula r set functions. Mathematic al Pr o gr amming , 14 :265–2 94, 1978. [20] Netw ork Co ordinate Research a t Harv ar d. [21] Parallel & Distributed O per ating Systems Group at MIT. [22] Stefan Saroiu, P . Krishna Gummadi, and Stev en D. Gribble. A measurement s tudy of p eer-to- pee r ﬁle sharing s ystems. In Pr o c. SPIE/ACM Conf. on Multime dia Computing and Networking ( MMCN) , 2002. [23] Thomas Sc helling. Micr omotives and Macr ob ehavior . Norton, 1978. 18 [24] Jeﬀrey Shneidman and David C. P arkes. Using redundancy to improv e ro bustness of distributed mech- anism implemen tatio ns. In Pr o c. 5th A CM Conf. on Ele ctro n ic Commer c e , pages 27 6–277 , 2 003. [25] Vivek Vishnu m urthy , Sang eeth Cha ndrakumar, and Emin G¨ un Sir er. Karma: A s e cure eco nomic framework for p eer-to-p eer resource sharing. In 1st Worksho p on Ec onomics of Pe er-t o-Pe er Systems , 2003. [26] Jan V ondr´ ak. O ptimal approximation for the submo dular welfare problem in the v alue ora cle model. In Pr o c. 39th A CM Symp. on The ory of Comp uting , pages 67– 74, 20 08. [27] Ben Q. Zhao, John C. S. Lui, and Dah-Ming Chiu. Analysis o f adaptive pr oto cols for p2p netw o rks. In Pr o c. 28th IEEE INF O CO M Confer enc e , pa ges 325–333 , 2009 . A Hardness of Ap pro ximation und er the Seed S et Mo d el Here, we prove that ﬁnding a seed set S to (even approximately) maximize the even tual num b er of active no des is hard under the Sharing Pro cess. Let Best Seed be the optimizatio n pr oblem of ﬁnding the seed set S of at most k no des that maximizes the total n umber of sharing no des, g iven n servers u 1 , . . . , u n and the cor resp onding par a meters c u , d u , λ u , λ u π u , p v, u . (Notice that when all of the λ u are g iven, the proc e ss is deterministic.) Prop ositio n 1 7 It is har d to appr oximate Best Seed within n 1 − ǫ for any ǫ > 0 un less P = NP. Pro of. W e reduce from the Ver tex Cover problem. Reca ll that the Ver tex Cover problem is for- m ulated as follows: Given a g raph G = ( V , E ), a v ertex cover is a set S ⊆ V o f no des such that ea ch edge e ∈ E has at leas t o ne endp oint in S . In the V er tex Cove r decis ion problem, the input is a pa ir ( G, k ): the question is whether there is a vertex cov er of size at mo st k . W e ass ume without loss of generality that G contains no isolated vertices. Given an arbitr ary Ver tex Cover ins ta nce with N = | V | no des and M = | E | ≥ N / 2 edges, we construct an instance of Best Seed as fo llows: F or each no de u ∈ V , we ha ve a no de w u . F or each edge e ∈ E , we create t w o nodes x e , x ′ e . Finally , setting r = 1 / ǫ , we create M r “bulk” nodes y 1 , . . . , y M r . W e set p x ′ e ,y i = 1 for all y i , x e . F or all e , p x ′ e ,x e = 1. Fina lly , whenever e is inc ide nt o n u , we hav e p x e ,w u = 1. All other v a lue s of p are 0. W e visualiz e the construction a bove in 4 la yers. The “no de lay er” cons ists o f all no des w u for all u . The “primary lay er ” consists of all x e . The “secondary layer” co ns ists of a ll x ′ e . Finally , the “bulk layer” co nsists of all y i . Next, we deﬁne paymen ts and demands: λ v π v =        0 if v = w u for some u ∈ V (no de layer) 3 . 5 if v = x e for some e ∈ E (primary la yer) 0 if v = x ′ e for some e ∈ E (secondary lay er) M + 0 . 5 otherwise (bulk la yer) d v =        0 if v = w u for some u ∈ V (no de lay er ) 2 if v = x e for some e ∈ E (primary la yer) 2 if v = x ′ e for some e ∈ E (secondary lay er) 0 otherwise (bulk la yer) First, let T be a v er tex co ver of size at mos t k . Consider the eﬀect of starting with the no des w u , u ∈ T as a s eed set. Beca use T is a vertex cov er , each primar y no de x e now ha s a n a ctive no de w u with p x e ,w u = 1, so that its demand of 2 is split b etw e en itself and (at least) one no de w u . Thus, up on activ ation, it w o uld face at most a demand of 2 from x ′ e and 1 fr om itself, whereas its payment is 3 . 5. Hence, eac h primary no de 19 will become active in the second ro und. Once the primary node x e is a ctive, x ′ e will split its demand evenly betw een x e and all active bulk nodes. Hence, each bulk no de y i will se e demand at most 1 from each x ′ e , for a total of M . Since its paymen t o ﬀer is la rger, y i will b ecome activ e . Hence, all bulk nodes will be active by round 3, and the total n umber of active no des is at least M r + M + k . Conv ers e ly , supp ose that strictly more than M + N no des are active. Because none of the secondary no des ever b ecome active (since they hav e a pa yment oﬀer of 0), this means that at least o ne bulk node m ust b e active. Let y i be the ﬁrst bulk no de to be come activ e, br eaking ties arbitrar ily . Be cause no other bulk no des are active at this time, y i m ust s ee demand at le a st 1 from each seconda ry no de x ′ e . And b ecause its payment oﬀer is only M + 0 . 5, this means that it cannot see demand 2 from any secondary no de — otherwise, the total demand w o uld ex ceed the pa y men t. This means that for e a ch secondar y no de x ′ e , the corres p o nding primary no de x e m ust a lready be activ e. Without lo ss o f gener ality , the seed set co ntained no primary no des — other w is e, the no de x e could be r eplaced by w u (where u is an endp oint of e ), whic h would next activ ate x e . Thus, x e m ust hav e b ecome activ ated at some p oint of the pro ce s s, which can only happ en when its total demand is smaller than its pa yment . Since at that po in t, only x e can ser ve the demand of x ′ e , this in turn means that x e ’s o wn demand must b e split betw ee n itself and one or more ac tive no des w u . Thu s, if S is the s et of initially active no de s in the no de lay er , then the corr e sp o nding vertices of G must form a v er tex co ver. In summary , if there is a vertex cov e r of size at most k , then ther e is a seed s et of size at most k activ ating at least M r + M + k no des , whereas otherwise, no seed set of size a t most k can activ ate more than M + N ≤ 3 M no des. Thus, no approximation be tter than Ω( M r − 1 ) is p ossible. Since the total num b er of no des is n = M r + 2 M + N ≤ 2 M r (for r lar ge enough), this proves an approximation ha r dness of Ω( n 1 − 1 /r ) = Ω( n 1 − ǫ ), unless P=NP . 20

You Share, I Share: Network Effects and Economic Incentives in P2P File-Sharing Systems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment