Universal Fingerprinting: Capacity and Random-Coding Exponents

Univ ersal Fingerprin ting: Capacit y and Random-Co ding Exp one n ts Pierre Moulin ∗ Octob er 30, 2018 Abstract This pap er studies ﬁnger pr in ting (traitor tracing) ga mes in whic h the num ber of colluder s and the collusion c hannel ar e un known. The ﬁngerprints a re em b edded in to host sequences representing signals to b e protected a nd provide the receiver with the capability to trace back pirated copies to the colluders. The colluders and the ﬁngerprin t embedder are sub ject to signal ﬁdelit y co nstraint s. Our problem setup uniﬁes the signal-dis to rtion and Boneh- Sha w form ula- tions of ﬁngerprinting. The fundamental tr adeoﬀs b etw een ﬁngerprint co delength, num b er o f users, num ber of co lluders, ﬁdelit y constraints, and decoding reliability are then determined. Several b ounds on ﬁngerprinting capacit y hav e been prese nted in recen t literature. This pap er derives exact capacity form ulas a nd presents a new randomized ﬁngerpr in ting scheme with the following prop erties: (1) the enco der and receiver assume a nominal coalition size but do not need to know the actual coalition size and the collusion c hannel; (2 ) a tunable parameter ∆ trades oﬀ false-pos itiv e and fals e-negative err o r exponents; (3) the receiver pro vides a reliabilit y metric for its decision; and (4) the scheme is capacit y- a c hieving when the false- positive exp onent ∆ tends to zero and the nominal coalition size coincides with the actual coa litio n size. A fundamen tal comp onen t of the new sc heme is the use of a “time-shar ing” ra ndomized se- quence. The deco de r is a maximum p enalize d mutu al information de c o der , where the sig niﬁcance of each ca ndidate coalition is a ssessed relative to a thres ho ld, and the p enalty is prop ortional to the coalition siz e . A m uch simpler thr eshold de c o der that satisﬁes pr oper ties (1)—(3) ab ov e but not (4) is also given. Index T erms. Fingerpr in ting, traitor tracing, watermarking, data hiding, randomized co des, unive rsal codes, metho d of t yp es, maxim u m mutual inform atio n deco d er, m inim um equ iv o cati on deco der, c hannel co ding with side inf ormation, ca pacit y , strong co nv erse, err or exp onents, m ultiple access c hannels, mo del order s election. ∗ The author is with the ECE Department, t h e Co ordinated Science Lab oratory , and the Beckman Institute at the Universit y of Illinois at Urbana-Champaign, U rbana, IL 61801, USA. Ema il: moulin@ifp. uiuc.edu . This w ork w as supp orted by NSF und er gran ts CCR 03-25924, CCF 06-35137 and CCF 07-29061. A 5-page version of th is paper w as presented at I SIT in T oronto, July 200 8. The current manuscript w as submitted for publication on January 24, 2008 and revised on December 9, 2008, June 9, 2009, Jan uary 24, 2010, December 10, 2010, and Ma y 24, 2011. 1 1 In tro duction Digital ﬁngerprinting ( a.k.a. traitor tr acing) is essen tially a m ultiuser v ersion of wate rm arking. A co verte xt — such as image, vid eo, audio, text, or soft w are — is to b e distr ibuted to man y users. Prior to distr ibution, ea ch user is assigned a ﬁngerprint that is em b edded in to the co ve rtext. In a collusion attac k, a coalitio n of users combine their mark ed copies, creating a pirated cop y that con tains only w eak traces of their ﬁngerprints. The pirated cop y is sub ject to a ﬁdelit y requirement relativ e to th e co alition’s copies. The ﬁdelit y requ iremen t ma y take the form of a distortion c onstr aint , whic h is a natural mo del f or media ﬁngerprin ting applications [1 – 7]; or it ma y tak e the form of Boneh and Sh a w’s marking assumption , whic h is a p opular mod el for soft w are ﬁngerprinting [8 – 10]. T o trace the forge ry bac k to the coalition mem b ers, one needs a ﬁngerprin ting sc heme that can reliably iden tify the coll u ders’ ﬁngerprints from the pirated copy . The ﬁngerprinting problem presen ts t wo ke y c hallenges. 1. Th e n um b er of colluders ma y b e large, which mak es it ea sier for the co llud er s to moun t a strong attac k . Th e diﬃcult y of the deco ding p roblem is comp ounded b y the fact th at the numb er of c ol luders and the c ol lusion channel ar e unknown to the enc o der and de c o der. 2. Th ere are t wo fu ndamen tal typ es of error ev ents, n amely false p ositives , b y w h ic h inno cen t users are wr ongly accused, and false ne gatives , by wh ic h one or m ore colluders escap e detec- tion. F or le gal reasons, a maxim um a dm issible v alue fo r the false-p ositiv e error probabilit y should b e sp eciﬁed. This pap er prop oses a mathematic al mo del that satisﬁes these requirements and deriv es th e corre- sp onding information-theoretic p erformance limits. Prior art on related formulatio ns of the ﬁnger- printi n g problem is review ed b elo w. The basic p erformance metric is capacit y , wh ic h is d eﬁned with resp ect to a class of collusion c hannels. A m u ltius er data hiding problem w as analyzed by Moulin and O’Sulliv an [3, Sec. 8], and capacit y expressions w ere obtained assu ming a comp ound class of memoryless c h annels, exp ected- distortion constraints for the distribu tor and the coal ition, and noncoop erating, single-user de- co ders. Despite clear mathematical similarities, this setup is quite diﬀeren t from the on e adopted in more recen t ﬁ ngerprin ting pap ers. Somekh-Baruc h and Merhav [4, 5] studied a ﬁngerp rin ting problem with a known n umb er of co llud er s and explored connections with th e problem of co d ing for the multiple-a ccess channel (MA C). The notion of false p ositive s do es n ot app ear in their problem form ulation. Lo wer b ounds on capacit y were obtained assuming almost-sure distortion constraints b et w een the pirated cop y and one [4] or all [5] of the coalition’s copies. The lo wer b ounds on capacit y corresp ond to a restrictiv e enco d ing strategy , namely rand om constan t-comp osition cod es without time-sharing. Other b ounds on capacit y and co nn ectio ns b etw een MACs an d ﬁngerpr in ting under the Boneh- Sha w assum ption hav e b een recen tly s tu died by Anthapadmanabhan et al. [10]. The co ve rtext is degenerate, and s ide information d oes not ap p ear in th e information-theoretic formulat ion of this problem. In order to cop e with un k n o wn collusion c han n els and u nkno wn n umber of colluders, a sp ecial kind of univ ersal deco der sh ould b e designed, where u niv ersalit y holds not only with resp ect to 2 some set of channels, but also with resp ect to an u nkno wn num b er of in p uts. An early ve rs ion of this idea in the con text of the so-calle d r andom MAC wa s introdu ced b y Plotnik and Satt [11]. In the conte xt of ﬁn gerprin ting, a tunable p aramete r should trade oﬀ the t wo fund amen tal t yp es of error probabilit y . When the num b er o f colluders is unkno wn, t w o extreme instances of this tradeoﬀ are to accuse al l users or none of them . While ﬁngerprin ting c apacit y is a f undamen tal mea sur e of the ability of an y sc h eme to resist colluders, it only guarantee s that the error probabilities v anish if the co des are “long enough”. Error exp onen ts provide a ﬁn er descrip tion of system p erform ance. T hey p r o v id e estimates of the necessary length of a ﬁngerprinti n g cod e that can withstand a sp eciﬁed num b er of colluders , giv en target false-p ositiv e a nd fal se-negativ e error probabilities. This is esp ecially v aluable in any legal system where the reliabilit y of accusations should b e assessed. Besides capacit y and error-exp onen t formulas, the information-theoretic analysis sheds ligh t ab out the stru cture of optimal co des. P articularly r elev an t in this resp ect is a random co ding sc heme by T ard os [9], whic h uses an auxiliary rand om sequence for en codin g ﬁ ngerprin ts. While his sc heme is presented at an algo rithm ic leve l (and n o optimization was inv olve d in its construction), in ou r game-theoreti c setting th e auxiliary random v ariable app ears fun damen tally as part of a randomized strategy in an information-theoretic game whose pa yo ﬀ function is nonconca ve with resp ect to the maximizing v ariable (the ﬁngerprint distrib ution). Another issue that can b e resolv ed i n our game-theoretic setting is the optimalit y o f coalition strategies that are in v arian t to p ermutations of the colluders. While one may heuristically exp ect that suc h strategies are optimal, a pro of of this pr op erty is established in this p ap er . The ap- proac h used in pr evious pap ers was to assume that coalitions emplo y such strategies, but often no p erformance guaran tee is giv en if the colluders emplo y asymmetric strategies. Finally , in the aforemen tioned pap er b y T ardos [9] and in the sig nal p r ocessing literature, sev eral simple algorithms ha ve been pr op osed to detect colluders, in vo lving computing some correla tion score b et wee n pirated cop y and users’ ﬁ ngerprin ts, and setting u p a detec tion threshold. W e study the limits of such strategies and compare them with join t deco d ing strategies. 1.1 Organization of This Pa p er As indicated b y the b ibliographic references, probabilistic analyses of digital ﬁn gerprin ting hav e b een rep orted b oth in the inform atio n theory literature and in the theoretical computer science literature. While the r esults d eriv ed in this pap er a re put in the con text of related inform ation- theoretic work, esp ecially multiple-ac cess c hann els, th is pap er is nev ertheless in tended to b e ac- cessible to a broader comm unit y of readers that are trai n ed in p robabilit y theory and statistics. The main to ols used in our d eriv ations are the m ethod of t yp es [12, 13] for analyzing random- co ding sc h emes, F ano’s lemma for der ivin g up p er b ounds on capacit y , spher e-packing metho ds, and elemen tary prop erties of in f ormation-theo retic functionals. A mathematica l statement of our generic ﬁngerprinting problem is giv en in Sec. 2, together with the deﬁnitions of co des, collusion c hannels, error p r obabilities, capacit y , and error exp onen ts. Our ﬁrst main resu lts are ﬁngerpr in ting capacit y theorems. They are stated in Sec. 3. The next t w o sectio ns presen t the new random co ding sc heme and the resu lting error exp onen ts. Sec. 4 p r esen ts a simple bu t sub optimal deco der that compares empir ical m utual inf ormation scores 3 b et w een receiv ed data and in dividual ﬁngerp rin ts, and outputs a guilt y decision wh enev er the score exceeds a ce rtain tunable thresh old. This su b optimal deco der is closely relate d to str ateg ies used in the signal pro cessing literature and in [9]. F or simplicit y of the exp osition, the sc heme and results are pr esen ted in the setup with degenerate sid e information, w h ic h is directly applicable to the Boneh-Sha w problem. Sec. 5 int ro du ces and analyzes a more elab orate joint deco der that assigns a p enalized emp irical equiv o catio n score to candidate coalitions and s elec ts t h e coalitio n with the lo w est score. Th e penalty is prop ortional to coa lition size. Th e join t decod er is capac ity- ac h ieving. Sec. 6 outlines an extension to the problem wher e the collusion c hann el is memoryless. The pro ofs of the main results app ear in Secs 7 — 10, and the pap er concludes in S ec. 11. 1.2 Notation W e use upp ercase letters for random v ariables, lo werca se letters for their individual v alues, calli- graphic lett ers for ﬁnite alphab ets, and b oldface letters f or sequences. Given an int eger K , we use the sp ecial sym b ol K for the set { 1 , 2 , · · · , K } . W e denote b y M ⋆ the set of sequences of arbitrary length (including 0) whose element s are in M . The probabilit y mass function (p.m.f.) of a random v ariable X ∈ X is denoted b y p X = { p X ( x ) , x ∈ X } . T he v ariational distance b et ween t w o p.m.f ’s p and q o ver X is den oted by d V ( p, q ) = P x ∈X | p ( x ) − q ( x ) | . The en trop y of a random v ariable X is denoted b y H ( X ), and the mutual information b et ween t w o random v ariables X and Y is denoted b y I ( X ; Y ) = H ( X ) − H ( X | Y ). Should the dep endency on the un derlying p .m.f.’s b e explicit, we write the p.m.f.’s as s u bscripts, e.g., H p X ( X ) and I p X p Y | X ( X ; Y ). The Kullbac k-Leibler d iv ergence b et w een t wo p.m.f.’s p and q is denoted b y D ( p || q ), and the conditional Kullb ac k-Leibler div ergence of p Y | X and q Y | X giv en p X is denoted b y D ( p Y | X || q Y | X | p X ) = D ( p Y | X p X || q Y | X p X ). All log arithms are in b ase 2 unless sp eciﬁed otherwise. Giv en a sequence x ∈ X N , denote by p x its t yp e, or emp irical p.m.f. o v er the ﬁnite alphab et X . Denote by T x the t yp e class asso ciated with p x , i.e., the set of all sequences of t yp e p x . Lik ewise, p xy denotes the joint t yp e of a pair of sequences ( x , y ) ∈ X N × Y N , and T xy the asso ciated join t t yp e class. The conditional t yp e p y | x of a pair of s equ ences ( x , y ) is deﬁned by p xy ( x, y ) /p x ( x ) for all x ∈ X suc h that p x ( x ) > 0. Th e cond itional typ e class T y | x giv en x , is the set of all sequences ˜ y suc h that ( x , ˜ y ) ∈ T xy . W e denote b y H ( x ) th e empirical entrop y of the p.m .f . p x , b y H ( y | x ) the empirical conditional entrop y , and b y I ( x ; y ) the empirical mutual inf ormation for the j oin t p .m .f . p xy . Recall that the num b er of typ es and conditional t yp es is p olynomial in N and that [12] ( N + 1) −|X | 2 N H ( x ) ≤ | T x | ≤ 2 N H ( x ) , (1.1) ( N + 1) −|X | |Y | 2 N H ( y | x ) ≤ | T y | x | ≤ 2 N H ( y | x ) . (1.2) W e use the calligraphic fon ts P X and P [ N ] X to represent the set o f all p.m.f.’s and all empirical p.m.f.’s for le ngth- N sequences, resp ectiv ely , on the alphab et X . Lik ewise, P Y | X and P [ N ] Y | X denote the set of all conditional p .m .f .’s and all empirical conditional p.m.f.’s on the alphab et Y . The sp ecial sym b ol W K will b e used to denote the feasible set of collusion channels p Y | X 1 , ··· , X K that can b e selec ted b y a size- K coali tion. Mathematica l exp ectation is denoted b y the sym b ol E . The sh orthands a N . = b N and a N  ≤ b N denote asymptotic relations in the exp onen tial s cale , resp ectiv ely lim N →∞ 1 N log a N b N = 0 and 4 lim sup N →∞ 1 N log a N b N ≤ 0. W e d eﬁne | t | + , max( t, 0), and exp 2 ( t ) , 2 t . The in dicator function of a set A is denoted b y 1 { x ∈ A} . The sy mb ol A \ B is used to denote the relativ e complemen t (or set-theoretic d iﬀerence) of set B in set A . (Note that B is generally not a subset of A .) Finally , w e adop t the notational con ven tion that the minimum of a fu nction o v er an empt y set is + ∞ , and the maxim um is 0. 2 Problem Statemen t and Basic Deﬁnitions 2.1 Ov erview class [ W K ] D Host sequence Secret key S V X 2 NR 2 1 Fingerprints fingerprints Decoded Pirated copy Coalition copies Y g Decoder Collusion channel 1 mK m1 Y|X , ..., X P Fingerprinted . . K 2 NR 2 N N mK m2 m1 1 X K X X Choose . . X X Encoder f . . Figure 1: Mod el for ﬁ ngerprint ing game, using randomized co de ( f N , g N ). In the Boneh-Sh a w setup, the host sequence S is degenerate and there is n o distortio n constrain t ( D 1 ). T he class W K c haracterizes th e ﬁdelit y constraint on the collusion channel. The enco der and deco der kn o w neither K nor the collusion c hann el. Our mo del for digital ﬁngerprinting is d iagrammed in Fig. 1. Let S , X , and Y b e three ﬁnite alph ab ets. The co ve rtext sequen ce S = ( S 1 , · · · , S N ) ∈ S N consists of N indep en d en t and iden tically distributed (i.i.d.) samples dr a wn from a p.m.f. p S ( s ), s ∈ S . A secret ke y V taking v alues in an alph ab et V N , whose cardin alit y potent ially gro ws with N , is shared b et w een enco der and deco der, and not publicly reve aled. T he key V is a rand om v ariable indep en den t of S . Th er e are 2 N R users, eac h of whic h receiv es a ﬁngerpr inted copy: X m = f N ( S , V , m ) , 1 ≤ m ≤ 2 N R , (2.3) where f N : S N × V N × { 1 , · · · , 2 N R } → X N is the enco ding function, and m is the ind ex of the user. Th e ﬁdelit y requirement b et w een S and X m is expr essed via a distortion constraint. Let d : S × X → R + b e the distortion measure and d N ( s , x ) = 1 N P N i =1 d ( s i , x i ) the extension of this measure to length- N sequ en ces. T he co de f N is sub j ect to the distortion constrain t d N ( s , x m ) ≤ D 1 , 1 ≤ m ≤ 2 N R . (2.4) Let K , { m 1 , m 2 , · · · , m K } b e a coalit ion of K users; no constraints are imp osed on the formation of coaliti ons. The coalit ion uses its copies X K , { X m , m ∈ K } to pro duce a pirated 5 cop y Y ∈ Y N . Without loss of generalit y , we assume that Y is generated sto c hastically according to a c ond itio n al p.m.f. p Y | X K called the c ollusion channel . This includes deterministic mappings as a sp ecial case. A ﬁ delit y constraint is imp osed on p Y | X K to ensur e that Y is “close” to the ﬁngerprinted copies X m , m ∈ K . This constraint may tak e the form of a distortion constrain t (analogo u s ly to (2.4)), or alternativ ely , a constraint that will b e r eferred to as the Boneh-Sha w constrain t. T he form ulation of these constrain ts is detailed b elo w and results in the deﬁnition of a feasible set W K ( p x K ) f or the conditional t yp e p y | x K . The enco der and deco der assume a nominal coalition size K nom but know neither K n or p Y | X K selected by the K colluders 1 . The deco der has access to th e pirated cop y Y , the host S , and the secret k ey V . It pro duces an estimate ˆ K = g N ( Y , S , V ) (2.5) of the c oalition. Su ccess can b e deﬁned a s catc hing one colluder or catc hing all colluders, the latter task b eing seemingly muc h more diﬃcult. An admissible d eco der outpu t is the empty set, ˆ K = ∅ , reﬂecting the p ossibilit y that the signal su bmitted to the deco der is unr elat ed to the ﬁngerprints. If this p ossibilit y w as not allo we d, an inno cen t user would b e ac cused. Another go o d reason to allo w ˆ K = ∅ is simply that reliable d etec tion is imp ossible when there are to o man y colluders, and the constrain t on the probabilit y of f alse p ositive s w ould b e violate d if ˆ K = ∅ w as not an option. 2.2 Randomized Fingerprin ting Co des The formal deﬁ nition of a ﬁngerpr inting co de is as follo ws. Deﬁnition 2.1 A randomized rat e- R length- N ﬁngerprin ting co de ( f N , g N ) with emb e dding distortion D 1 is a p air of enc o der mapping f N : S N × V N × { 1 , 2 , · · · , ⌈ 2 N R ⌉} → X N and de c o der mapping g N : Y N × S N × V N → { 1 , 2 , · · · , ⌈ 2 N R ⌉} ⋆ . Man y kinds of randomization are p ossible. In the most general setting, the k ey space V N can gro w sup erexp onentia lly with N . F or ﬁngerpr in ting, three kinds of randomization seem to b e fundamental, eac h serving a diﬀeren t pur p ose. All three kin d s can b e combined. Th e ﬁrst one is randomized p ermutati on of th e letters { 1 , 2 , · · · , N } to cop e with c hann els with arbitrary memory , similarly to [14]. Deﬁnition 2.2 A randomly modulat e d (R M ) ﬁngerprinting c o de is a r andomize d ﬁngerprinting c o de deﬁne d via p ermutations of a pr ototyp e ( ˜ f N , ˜ g N ) . The c o de is of the form x m = ˜ f π N ( s , w , m ) , π − 1 ˜ f N ( π s , w , m ) ˜ g π N ( y , s , w ) , ˜ g N ( π y , π s , w ) (2.6) wher e π is chosen uniformly fr om the set of al l N ! p ermutations of the letters { 1 , 2 , · · · , N } and is not r ev e ale d publicly. The se quenc e π x m is obtaine d by applying π to the elements of x m . The se cr et ke y is V = ( π, W ) , wher e W is indep endent of π . The second kin d of randomization is u niform p erm utations of the 2 N R ﬁngerprint assignmen ts, to equalize er r or pr obabilities ov er all p ossible coalitions [7, 10]. 1 If K nom = K , our random coding sc heme of Sec. 5 is capacit y-achieving. 6 Deﬁnition 2.3 A randomly p ermuted (RP) ﬁngerprinting c o de is a r andomize d ﬁngerp rinting c o de deﬁne d via p ermutations of a pr ototyp e ( ˜ f N , ˜ g N ) . The c o de is of the form x m = ˜ f π N ( s , w , m ) , ˜ f N ( s , w , π − 1 ( m )) ˜ g π N ( y , s , w ) , π ( ˜ g N ( y , s , w )) (2.7) wher e π i s chosen uniformly fr om the set of al l 2 N R ! p ermutations of the user indic es { 1 , 2 , · · · , 2 N R } and is not r eve ale d publicly. The se cr et key is V = ( π, W ) , wher e W is indep endent of π . In (2.7), we have use d the shorthand π ( ˆ K ) , { π ( m ) , m ∈ ˆ K} . The thir d kind of r an d omizati on arises via an auxiliary “time-sharing” rand om sequence. This strategy w as not used in [4, 5 , 10] b ut a remark able example was dev elop ed by T ardos [9]. F or binary alphab ets S , X , and Y , i.i.d. rand om v ariables W i ∈ (0 , 1) , 1 ≤ i ≤ N , a re generated, and next the ﬁngerpr in t letters X i ( m ) are generated as indep end en t Bern oulli ( W i ) random v ariables. Here V = { W i , 1 ≤ i ≤ N } is the secret key sh ared by enco der and decod er. Giv en an embedd ing distortion D 1 and a size– K coalitio n using a collusion c hannel from class W K , th ere corresp onds a c apacit y C ( D 1 , W K ) w hic h is the suprem um o ve r ( f N , g N ) of all ac h iev able R , under a p rescrib ed error criterion. 2.3 Collusion Channels First we d eﬁne some basic terminology for MACs with K inpu ts, common input alphab et X , and output alph ab et Y . Recall that K = { 1 , 2 , · · · , K } and let X K = { X 1 , · · · , X K } . Giv en a conditional p.m.f. p Y | X K , consider the p erm uted conditional p.m.f. p Y | X π ( K ) ( y | x 1 , · · · , x K ) , p Y | X K ( y | x π (1) , · · · , x π ( K ) ) (2.8) where π is any p ermutatio n of the K inputs. W e sa y that p Y | X K is p erm utation-in v ariant if p Y | X π ( K ) = p Y | X K , ∀ π . A s ubset W K of P Y | X K is said to b e p ermuta tion-inv ariant if p Y | X K ∈ W K ⇒ p Y | X π ( K ) ∈ W K , ∀ π . In general, n ot all elemen ts of suc h W K are p ermutation-in v arian t. The subset of p ermutation- in v arian t W K that consists of p erm utation-in v ariant conditional p.m.f.’s will b e denoted b y W fair K = n p Y | X K ∈ W K : p Y | X π ( K ) = p Y | X K , ∀ π o . (2.9) Finally , if W K is p ermutati on-inv arian t and conv ex, the p ermutatio n-av eraged conditional p.m.f. 1 K ! P π p Y | X π ( K ) is also in W K and is p er mutation-in v arian t b y construction. In the ﬁ ngerprin ting problem, the conditional type p y | x K ∈ P [ N ] Y | X K is a ran d om v ariable whose conditional distribu tion giv en x K dep ends on the collusion channel p Y | X K . Our ﬁd elity constrain t on the coalition is of the general form P r [ p y | x K ∈ W K ( p x K )] = 1 , (2.10) 7 where for e ac h p x K , W K ( p x K ) is a con v ex, p ermutation-in v arian t subset of P Y | X K . That is, the empir ical conditional p.m.f . of th e pirated cop y giv en the mark ed copies is restricted. The c hoice o f the feasible set W K ( p x K ) dep en ds on the a pp licat ion, as el ab orated b elow. The e xp licit dep endency o f W K on p x K will sometimes be omitted to simplify n ota tion. Note that assuming W K is p erm utation-in v arian t do es not imply that p y | x K actually selec ted b y the co alition is p ermutat ion- in v arian t. Finally , it is assu med that the set-v alued mapp ing W K ( p ) is deﬁn ed for p ∈ P X K and is uniform ly con tin uous in th e v ariational distance, in the sense that for ev ery ǫ > 0, there exists δ > 0 such that ∀ p X K , p ′ X K ∈ P X K s . t . d V ( p X K , p ′ X K ) < δ : max p Y | X K ∈ W K ( p X K ) min p ′ Y | X K ∈ W K ( p ′ X K ) d V ( p Y | X K p X K , p ′ Y | X K p ′ X K ) < ǫ. (2.11) The mod el (2.10) can b e used to imp ose hard distortion constrain ts on the coa lition o r to enforce the Boneh-Sha w marking assumption when X = Y . 1. Distortion Constrain ts. Consider the follo w ing v ariation on the constrain ts used in [3–5]. Deﬁne a p ermutation-invariant estimator f : X K → S which pr od uces an estimate ˆ S = f ( X K ) of the h ost s ignal sample based on the co r r esp onding mark ed samples. 2 The estimator could b e, e.g., a maxim um-like liho o d estimator. T hen W K ( p x K ) = ( p Y | X K : X x K ,y p x K ( x K ) p Y | X K ( y | x K ) d 2 ( f ( x K ) , y ) ≤ D 2 ) (2.12) where d 2 : S × Y → R + is the coalitio n’s distortion fu n ction, and D 2 is the maxim um allo wed distortion. Th e constrain t (2.10) ma y b e equiv alen tly wr itten as P r " d N 2 ( f ( x K ) , y ) = 1 N N X t =1 d 2 ( f ( x K ,t ) , y t ) ≤ D 2 # = 1 . (2.13) 2. Interlea ving A tta ck. Here eac h col lud er cont rib utes N /K samples to the forgery – tak en at arbitrary p ositions. T he class W K is a sin gleton: p Y | X K ( y | x K ) = 1 K X k ∈K 1 { y = x k } . (2.14) 3. Boneh-Shaw Marking Assumption. Assum e X = Y and W K is the set of cond itional p.m.f.’s that satisfy x 1 = · · · = x K ⇒ y = x 1 . (2.15) Then the constraint (2.10) enforces the Boneh-Sh a w marking assump tion : the c olluders are not allo wed to mo dify their samples a t an y lo cation where these s amples agree. Thus y t = x m 1 ,t at any p osition 1 ≤ t ≤ N su c h that x m 1 ,t = · · · = x m K ,t . Note th at W K do es not dep end on p x K and that the interlea ving attac k (2.14) satisﬁes the Boneh-Sha w condition. 2 A permutation-inv ariant estimator dep ends on the samples { X k , k ∈ K} only via their empirical distribution on X . 8 2.4 Strongly Excha ngeable Collusion Channels Recall the deﬁnition of RM co des in (2. 6 ); a du al notion app lies to co llusion c hannels. F or an y p Y | X K and p erm utation π of { 1 , 2 , · · · , N } , deﬁne the p ermuted c hann el p π Y | X K ( y | x K ) , p Y | X K ( π y | π x K ). Then w e ha v e Deﬁnition 2.4 [4] A strongly exchangeable collusion c hannel p Y | X K is a channel such that p π Y | X K ( y | x K ) is indep endent of π , for e very ( x K , y ) . A strongly exc hangeable collusion c hannel is deﬁn ed by a pr obabilit y assignment P r [ T y | x K ] on the conditional typ e classes. The distribu tion of Y conditioned on Y ∈ T y | x K is uniform: p Y | X K ( ˜ y | x K ) = P r [ T y | x K ] | T y | x K | , ∀ ˜ y ∈ T y | x K . (2.16) In Sec. 2.6 we sho w that for RM co des ( f N , g N ), it is suﬃ cient to consider strongly exc hange- able collusion c hann els to derive w orst-case error p robabilities. Moreo v er, in the error p robabilit y calculatio ns for rand om cod es it will b e suﬃcien t to use th e trivial up p er b ound P r [ T y | x K ] ≤ 1 { p y | x K ∈ W K ( p x K ) } . (2.17) 2.5 F air Coalitions Tw o notions of fairness for coalitions will b e usefu l. Denote by π a p ermutat ion of { 1 , 2 , · · · , K } . Deﬁnition 2.5 The c ol lusion channel p Y | X K is p erm uta t ion-in v arian t if p Y | X K ( y | x m 1 , · · · , x m K ) = p Y | X K ( y | x π ( m 1 ) , · · · , x π ( m K ) ) , ∀ π . ( 2.18) F or instance, if X = Y and K = 2, the collusion c hannel p Y | X 1 X 2 ( y | x 1 , x 2 ) = 1 2 [ 1 { y = x 1 } + 1 { y = x 2 } ] (2.19) is p erm utation-in v arian t. Giv en x 1 , x 2 , there are t wo equally lik ely choice s for the pirated cop y , namely y = x 1 and y = x 2 . Note that one coll u d er carries full risk an d the other one zero risk. A stronger deﬁnition of fairness (whic h will not b e n eeded in this pap er) w ould requir e some kind of ergo dic b eha vior of the inputs and output of the collusion c hann el. Deﬁnition 2.6 The c ol lusion channel p Y | X K is ﬁrst-order fair if P r [ p y | x K ∈ W fair K ( p x K )] = 1 . F or an y ﬁrst-order fair collusion channel, the conditional t yp e p y | x K is inv arian t to permutations of the colluders, with probabilit y 1. F or instance, if X = Y an d K = 2, an y collusion c hannel p Y | X K resulting in the conditional typ e p y | x 1 x 2 ( y | x 1 , x 2 ) = 1 2 [ 1 { y = x 1 } + 1 { y = x 2 } ] is ﬁrst-order 9 fair. This is an interlea ving attac k in whic h eac h colluder con tributes exactly N / 2 samples (in an y order) to the pir ated cop y . A ﬁrst-order fair collusion channel is not necessarily p erm utation-in v arian t, and vice-v ers a. F ur- ther, if a collusion c h annel is ﬁrst-order fair and strongly exc hangeable, then it is also p erm utation- in v arian t. Ho w eve r the con verse is not tr ue. F or instance the collusion chanel of (2.19) is p erm utation-inv ariant and strongly exc hangeable bu t not ﬁ rst-order fair b ecause the conditional t yp e p y | x K ( y | x 1 , x 2 ) is g iven by either 1 { y = x 1 } or 1 { y = x 2 } , neit her of wh ic h is p ermutati on- in v arian t. 2.6 Error Probabilities Let K b e the coalition and ˆ K = g N ( Y , S , V ) the deco der’s outp ut. There are sev eral error pr oba- bilities of in terest: the probabilit y of false positives (one or more inno cen t u sers are accused): P FP ( f N , g N , p Y | X K ) = P r [ ˆ K \ K 6 = ∅ ] , (2.20) the probabilit y of missed detecti on for a sp eciﬁc coalition member m ∈ K : P e,m ( f N , g N , p Y | X K ) = P r [ m / ∈ ˆ K ] , the probabilit y of failing to catc h a single colluder: P one e ( f N , g N , p Y | X K ) = P r [ ˆ K ∩ K = ∅ ] , (2.21) and the p r obabilit y of failing to catc h the full coalitio n: P all e ( f N , g N , p Y | X K ) = P r [ K 6⊆ ˆ K ] . (2.22) The error criteria (2.21) and (2.22) will b e referr ed to as the detect-one and det e ct- a ll criteria, resp ectiv ely . The ab o v e error probabilities ma y b e w ritten in the explicit form P e ( f N , g N , p Y | X K ) = X v, s , x K , y p V ( v ) p N S ( s ) Y m ∈K 1 { x m = f N ( s , v , m ) } ! p Y | X K ( y | x K ) 1 {E } (2.23) where the error even t E is giv en by E FP = { g N ( y , s , v ) \ K 6 = ∅} , or E one = { g N ( y , s , v ) ∩ K = ∅} , or E all = {K 6⊆ g N ( y , s , v ) } , when P e is giv en b y (2.20), (2.21), a nd (2.22), resp ectiv ely . The w orst-case probabilit y is giv en by P e ( f N , g N , W K ) = max p Y | X K P e ( f N , g N , p Y | X K ) where the maximum is ov er all feasible collusion c hannels, i.e., suc h that (2.10) holds. Maxim um vs av erage error probabilit y . The error probabilities (2.20)—(2.22) generally dep end on K . Prop. 2.1 b elo w states that (a) in order to make them indep endent of K and pro vide guaran tees on error pr obabilit y for any coalition, one ma y use RP co des, and (b) random p erm utations of ﬁ ngerprin t assignmen ts cannot increase the a verag e error probabilit y of an y co de. Let ( ˜ f N , ˜ g N ) b e an arbitrary code and ( f N , g N ) the RP co de of (2.7), obtained using ( ˜ f N , ˜ g N ) as a protot yp e. Let p Y | X K b e an arbitrary collusion c hannel when coalition K is in eﬀect. Given an y other coalit ion K ′ = π ( K ) of the same size, let p Y | X K ′ b e the corresp onding collusion c hann el, obtained b y applying (2.8), where π is no w a p erm utation of { 1 , · · · , 2 N R } . 10 Prop osition 2.1 F or any c o de ˜ f N , ˜ g N and c ol lu si on ch annel p Y | X K , we have ∀K ′ : P e ( f N , g N , p Y | X K ′ ) = P e ( f N , g N , p Y | X K ) ≤ max K ′ P e ( ˜ f N , ˜ g N , p Y | X K ′ ) (2.24) wher e ( f N , g N ) is the RP c o de of (2.7), and P e denotes any of the err or pr ob ability c riteria (2.20), (2.21), and (2.22). Pr o of . First consider the detect-one error criterion of (2.21): an error arises if g N ( Y , S , V ) ∩ K = ∅ . Giv en a RP ﬁngerp rin ting code with p r otot yp e ( ˜ f N , ˜ g N ) and p erm utation parameter π , the detect- one error probabilit y when coa lition K is in eﬀect is giv en b y P one e ( f N , g N , p Y | X K ) = P r [ g N ( Y , S , V ) ∩ K = ∅ ] = P r [ ˜ g π N ( Y , S , W ) ∩ K = ∅ ] = P r [ π ( ˜ g N ( Y , S , W )) ∩ K = ∅ ] = P r [ ˜ g N ( Y , S , W ) ∩ π − 1 ( K ) = ∅ ] = E Y , S ,W 1 2 N R ! X π 1 { ˜ g N ( Y , S , W ) ∩ π − 1 ( K ) = ∅} | {z } independent of K (2.25) whic h is in d ep enden t of K , by virtu e of the uniform distribution on π . The deriv ation for the detect-all an d the false-positive error probabilities is analogous to (2.25). This establishes the ﬁrst equalit y in (2.24). The inequalit y is pro ved similarly . ✷ F alse P ositiv es vs F alse Negatives. The tradeoﬀ b et ween false positiv es and false negativ es is central to statistica l detectio n theory (the Neyman-Pearson problem) and list deco ding [15]. Note that in the cla ssical formula tion of list decodin g [16, p. 166 ], an error is d ecla red only if the message sent do es not app ear on the deco der’s outp ut list. T h e false-negati ve error exp onen t increases with list siz e and approac hes the sphere pac king exp onent if the list size is allo wed to grow sub exp onentia lly with N . This classical form ulation do es not include a cost for “false p ositive s”. 2.7 Strongly Excha ngeable Collusion Channels Prop. 2.2 b elo w states that randomly mo dulated cod es (Def. 2.2) and str on gly exc h angeable chan- nels (Def. 2.4) satisfy a certain equilibrium prop erty: neither the ﬁ n gerprin t em b edder nor the coaliti on has in terest in deviating fr om those strategies. Let ( ˜ f N , ˜ g N ) b e an arb itrary co d e and ( f N , g N ) the RM co de of (2.6), obtained usin g ˜ f N , ˜ g N as a pr otot yp e. Give n any feasible collusion c hannel p Y | X K , d enote by p Y | X K ( y | x K ) = 1 N ! X π p Y | X K ( π y | π x K ) (2.26) the p erm utation-a v eraged c hann el, wh ic h is feasible and strongly exc han geable. Prop osition 2.2 F or any c o de ˜ f N , ˜ g N and c ol lu si on ch annel p Y | X K , we have P e ( f N , g N , p Y | X K ) = P e ( f N , g N , p Y | X K ) = P e ( ˜ f N , ˜ g N , p Y | X K ) ≤ max π P e ( ˜ f N , ˜ g N , p π Y | X K ) (2.27) 11 wher e ( f N , g N ) is the RM c o de of (2.6) and P e denotes any of the err or pr ob ability criteria (2.20), (2.21), and (2.22). Pr o of. First consider the detect -one error criterion of (2.21): an error arises if ˜ g N ( Y , S , V ) ∩ K = ∅ . F or an y ﬁxed K , the detect-one error probability is an a verag e o ve r all p ossible p erm utations π and the other rand om v ariables V , S , Y : P one e ( f N , g N , p Y | X K ) ( a ) = 1 N ! X π X w , s , x K , y p W ( w ) p N S ( s ) Y m ∈K 1 { π x m = ˜ f N ( π s , w , m ) } ! p Y | X K ( y | x K ) × 1 { ˜ g N ( π y , π s , w ) ∩ K = ∅} ( b ) = 1 N ! X π X w , s , x K , y p W ( w ) p N S ( π − 1 s ) Y m ∈K 1 { x m = ˜ f N ( s , w , m ) } ! p Y | X K ( π − 1 y | π − 1 x K ) × 1 { ˜ g N ( y , s , w ) ∩ K = ∅} ( c ) = X w , s , x K , y p W ( w ) p N S ( s ) Y m ∈K 1 { x m = ˜ f N ( s , w , m ) } ! 1 N ! X π p Y | X K ( π − 1 y | π − 1 x K ) ! × 1 { ˜ g N ( y , s , w ) ∩ K = ∅} = X w , s , x K , y p W ( w ) p N S ( s ) Y m ∈K 1 { x m = ˜ f N ( s , w , m ) } ! 1 N ! X π p π Y | X K ( y | x K ) ! × 1 { ˜ g N ( y , s , w ) ∩ K = ∅} = X w , s , x K , y p W ( w ) p N S ( s ) Y m ∈K 1 { x m = ˜ f N ( s , w , m ) } ! p Y | X K ( y | x K ) 1 { ˜ g N ( y , s , w ) ∩ K = ∅} = P one e ( f N , g N , p Y | X K ) (2.28) where (a) holds by d eﬁ nition of the RM co de, (b) is obtained by applying the c hange of v ariables z ← π z to the sequences s , x K , y , and (c) the fact that p N S ( s ) = p N S ( π s ). The deriv ation for the detect-all an d the false-positive error probabilities is analogous to (2.28). This establishes the ﬁrst equalit y in (2.27). The second equalit y and the inequalit y are pro v ed similarly . ✷ 2.8 Risk for F air Coalitions The maxim um and th e minimum of the error p robabilities P e,m ( K ) , m ∈ K , will b e u s eful. T h e maxim um v alue, P e ( f N , g N , p Y | X K ) = max m ∈K P e,m ( f N , g N , p Y | X K ) , (2.29) is the conv entional error criterion for inf orm atio n tr an s mission. Ho w ev er, the minim um v alue, P e ( f N , g N , p Y | X K ) = min m ∈K P e,m ( f N , g N , p Y | X K ) , (2.30) is more relev an t to the coalition b ecause it r ep resen ts the risk of their m ost vulnerable mem b er. Note that P one e ( f N , g N , p Y | X K ) ≤ P e ( f N , g N , p Y | X K ) ≤ P e ( f N , g N , p Y | X K ) ≤ P all e ( f N , g N , p Y | X K ) . 12 While it is concei v able that some colluders could b e tric ked or coer ced into taking a h igher risk than others, suc h str ategy is not secure b ecause the whole coalition w ould b e at r isk if some of its memb ers, esp ecially the vuln erable ones, are caught. The pr oof of the follo win g pr op ositio n is elemen tary . Prop osition 2.3 F or r andomly p ermute d c o des (Def. 2.3), if the c ol lusion channel p Y | X K is p ermutation-invariant, then al l c ol luders i ncur the same risk: P e ( f N , g N , p Y | X K ) = P e ( f N , g N , p Y | X K ) . The pro of of the follo wing prop osition is omitted b ecause it is similar to that of Pr op. 2.2. Assuming the ﬁngerprint d istributor uses R P co des, it follo ws f r om Pr op. 2.4 that p erm utation- in v arian t collusion c hann els are optimal for the colluders u n der the detect-one error criterion. Prop osition 2.4 F or r andomly p ermute d c o des, the maximum of the e rr or pr ob ability criteria (2.20) and (2.21) is achieve d by a p ermutation-invariant c ol lusion cha nnel ((2 .18)) under the dete ct- one criterion. T ak en toge ther with Pr op . 2.1 on optimalit y of r andomly-p erm uted ﬁngerprinting co d es, Prop. 2.4 implies an equilibrium prop erty: neither the ﬁn gerprin t emb ed der nor th e coalition has in terest in deviating from these symmetric strategies, un der the detect-one criterion. 2.9 Capacit y Ha ving deﬁn ed the detect -one and detect-all error criteria and f easible cl asses of cod es and coll us ion c hannels, we ma y n ow deﬁn e the corresp onding notions of ﬁn gerprin ting capacit y . Deﬁnition 2.7 A r ate R is achievable f or e mb e dding distortion D 1 , c ol lusion class W K , and detect-one criterion if ther e exists a se quenc e of ( N , ⌈ 2 N R ⌉ ) r andomize d c o des ( f N , g N ) with max- imum emb e dding distortion D 1 , such that b oth P one e ( f N , g N , W K ) and P FP ( f N , g N , W K ) vanish as N → ∞ . Deﬁnition 2.8 A r ate R is achievable f or e mb e dding distortion D 1 , c ol lusion class W K , and detect-all criterion i f ther e exists a se quenc e of ( N , ⌈ 2 N R ⌉ ) r andomize d c o des ( f N , g N ) with max- imum emb e dding distortion D 1 , such that b oth P all e ( f N , g N , W K ) and P FP ( f N , g N , W K ) vanish as N → ∞ . Deﬁnition 2.9 Fingerprinting c ap acities C one ( D 1 , W K ) and C all ( D 1 , W K ) ar e the supr ema of al l achievable r ates with r esp e c t to the dete ct-one and dete ct- al l criteria, r esp e ctively. W e h a ve C all ( D 1 , W K ) ≤ C one ( D 1 , W K ) b ecause an error ev en t for the detect-one problem is also an error even t for the detect-all problem. 13 2.10 Random-Coding Exp onen ts F or a sequence of rand omized co des ( f N , g N ), the error exp onents are deﬁned as E ( R , D 1 , W K ) = lim in f N →∞  − 1 N log P e ( f N , g N , W K )  where E repr esen ts th e r andom co ding exp onen t E FP , E one , or E all . Moreo v er, E all ( R, D 1 , W K ) ≤ E one ( R, D 1 , W K ) b ecause an error even t for the d etect -one problem is also an error ev ent for the detect-all p roblem. W e ha ve E all = 0 if the class W K includes c hannels in whic h one colluder can “sta y out,” i.e., not con tribute to the pir ated copy . Fig. 2 gives a preview of E one and E FP for our random co ding sc h eme, view ed as a function of the num b er K of colluders. The false-p ositiv e exp onent E FP is equal to ∆, for an y v alue of K . The false-negativ e exp onen t E one decreases with K , up to some maxim um v alue K R, ∆ where it b ecomes zero. The d ecoder o utp uts ˆ K = ∅ with high probabilit y , and therefore r eliable d ecodin g of an y colluder is imp ossib le, for an y K ≥ K R, ∆ . Fig. 3 illustrates the maxim u m rate R ( K, ∆) that can b e accommo dated by the random co ding sc heme, for ﬁxed ∆. This rate decreases with K and b ecomes zero for K ≥ K ∆ . If ∆ ↓ 0, the rate curv e R ( K, ∆) tends to the capacit y function C ( K ). Note that C ( K ) v anishes as K → ∞ but is generally p ositiv e for an y ﬁnite K ; in this case, lim ∆ → 0 K ∆ = ∞ . 1 nom K R, ∆ E FP (K, ) ∆ E one Error Exponents K ∆ K Figure 2: F alse-p ositiv e and false-negat ive error exp onen ts, as a fun ction of coa lition size K , for ﬁxed v alues of R and ∆ . 14 C(K) K ∆ K ∆ R(K, ) Rate K 1 nom Figure 3: C apacit y C and ac hiev able rate R (for false-p ositiv e error exp onent equal to ∆), as a function of coal ition size K . 2.11 Memoryless Collusion Channels As an alternativ e to the collusion c hann els sub j ect to the h ard constrain t P r [ p y | x K ∈ W K ( p x K )] = 1, w e ma y consider memoryless collusion c hannels: p Y | X K ( y | x K ) = N Y t =1 p Y | X K ( y t | x K ,t ) (2.31) where p Y | X K ∈ W K ( p x K ), view ed as a c omp ound class of c hannels [12 ]. As w e sh all see there is a strong lin k b et w een the tw o problems in th e form of Lemma 3.3 whic h is used to esta blish our con v erse theorems; also s ee Sec. 6. 3 Fingerprin ting Capacit y This section p resen ts ﬁngerprinting capacit y form ulas u nder the detect-one and detect- all error criteria. T o put these results in con text, let us ﬁrst recall related results for MA Cs. In th e absen ce of side inform atio n, the capacit y region of th e MA C w as determined by Ahlsw ede [17] and Lia o [1 8]. This region is also ac h iev able for the random MA C [11]. F or the MA C with common side information at the transmitter and receiv er, some v ery general capacit y form ulas w ere d eriv ed by Das and Nara y an [19 ] under the assumption that S is an er go dic p ro cess. In some sp ecia l cases these form ulas can b e sin gle-l etterized. F o r ﬁ ngerprin ting with i.i.d. S and coalitio n size equal to 2, b ounds on capacit y were deriv ed in [4, 5]. Thus the presence of the sid e information S causes diﬃculties in deriving sin gle- letter capacit y formulas for b oth MA C and ﬁngerprint ing problems. The pro of of t h e conv ers e under the detect -all criterio n is based on the standard F ano in equalit y . Surp r isingly , F ano’s inequ alit y do es not seem to b e the right tool to prov e the conv erse un der the detect-one criterion [2 0]. A strong c onv ers e was pr esen ted in [10], but the resulting up p er b ound 15 on capacit y is lo ose. The directio n w e ha ve pursued is based on explici t sphere-pac king argument s, sp eciﬁcally the fact that t ypical sets for Y giv en the em b edded ﬁngerprint s cannot ha v e to o m uc h statistica l o v erlap, otherwise reliable decod ing is imp ossible. T he to ols used here are diﬀeren t from those used f or c lassical problems suc h as the single-user discrete memoryless c hannel [16, pp. 173— 176] and the MA C [21]. Th e use of a detect-o ne criterion requires a diﬀeren t mac hinery . A simp le tec hnique is us ed to dea l with co dew ord pairs whose self-information sco re is w ell a b ov e a verage , and suﬃces to show that the error pr obabilit y cannot v anish for rates a b ov e capacit y . W e conjecture that a strong con verse holds, n amely: for any r ate ab o ve capaci ty , lim N →∞ min f N ,g N max { P one e ( f N , g N , W K ) , P FP ( f N , g N , W K ) } = 1 . Ho w eve r, esta blish ing this stronger result ma y require use of elab orate wr inging tec hniqu es [21]. Our lo w er b ound on error probabilit y do es not tend to 1 as N → ∞ b ecause th e b ound (8.50) is lik ely lo ose. 3.1 Mutual-Information Games The follo w ing lemma relate s to Han’s inequalities [22] and will b e useful throughout this paper. Its pro of app ears in App endix A. Lemma 3.1 L et K = { 1 , 2 , · · · , K } and assume the distribution of ( X K , Z ) is invariant to p ermu- tations of K . Then for any neste d sets A ⊆ B ⊆ K , we have 1 | A | H ( X A | Z X K \ A ) ≤ 1 | B | H ( X B | Z X K \ B ) , (3.1) 1 | A | H ( X A | Z ) ≥ 1 | B | H ( X B | Z ) . (3.2) Both ine qualities hold with e quality if X k , k ∈ K , ar e c onditional ly indep endent given Z . W e will deriv e tw o simple formulas b y application of this lemma. First, applying (3.1) with Z = ( Y , S, W ) and (3.2) with Z = ( S, W ) and subtr act ing the ﬁrst inequalit y from the second, we obtain 1 | A | I ( X A ; Y X K \ A | S W ) ≥ 1 | B | I ( X B ; Y X K \ B | S W ) , ∀ A ⊆ B ⊆ K (3.3) with equalit y if X k , k ∈ K , are conditionally indep enden t giv en Z . S econd, for X k , k ∈ K co nd i- tionally i.i.d. giv en ( S, W ), w e ha ve I ( X 1 ; Y | S, W ) = H ( X 1 | S, W ) − H ( X 1 | Y , S, W ) = 1 K H ( X K | S, W ) − H ( X 1 | Y , S, W ) ≤ 1 K H ( X K | S, W ) − 1 K H ( X K | Y , S, W ) = 1 K I ( X K ; Y | S, W ) (3.4) 16 where the inequ alit y follo ws from (3.2) with Z = ( Y , S, W ). No w consider an auxilia ry rand om v ariable W deﬁn ed o ve r an a lph ab et W = { 1 , 2 , · · · , L } , and indep endent of S . Deﬁne the set of cond itional p .m.f.’s P X K W | S ( p S , L, D 1 ) , ( p X K W | S = p W Y k ∈ K p X k | S W : p X 1 | S W = · · · = p X K | S W , E d ( S, X 1 ) ≤ D 1 ) (3.5) and the fu nctions C one L ( D 1 , W K ) = max p X K W | S ∈ P X K W | S ( p S ,L,D 1 ) min p Y | X K ∈ W fair K ( p X K ) 1 K I ( X K ; Y | S, W ) (3.6) C all L ( D 1 , W K ) = max p X K W | S ∈ P X K W | S ( p S ,L,D 1 ) min p Y | X K ∈ W K ( p X K ) min A ⊆ K 1 | A | I ( X A ; Y | S, X K \ A , W ) . (3.7) Using the same d eriv ation as in Lemma 2. 1 of [14], it is easily sho wn that C one L ( D 1 , W K ) and C all L ( D 1 , W K ) are nondecreasing fu nctions of L and con v erge to ﬁnite limits: e C one ( D 1 , W K ) , lim L →∞ C one L ( D 1 , W K ) (3.8) e C all ( D 1 , W K ) , lim L →∞ C all L ( D 1 , W K ) . (3.9) Moreo v er, t he gap to eac h limit ma y b e b ounded b y a p olynomial function of L , see [14, Sec. 3.5] for a similar deriv ation. Th e basic idea is to d iscretize eac h W K ( p X K ) to a ﬁn e grid of ˜ L collusion c hannels. By applicatio n of Caratheodory’s theorem, t he supremum of C L o v er L is ac h iev ed by L ≤ |S | |X | + ˜ L . The gap betw een the m in im um of the cost function ov er W K ( p X K ) and o ver its discrete appro ximation can b e b ounded by c ˜ L −|Y | − 1 |X | − K where c is a constan t. Since W fair K ( p X K ) ⊆ W K ( p X K ), w e hav e e C all ( D 1 , W fair K ) ≥ e C all ( D 1 , W K ). I n fact the righ t side is zero if W K ( p X K ) con tains cond itional p.m.f.’s p Y | X K suc h that Y is indep endent of one of the inputs X k , k ∈ K . Lemma 3.2 F or any D 1 and W K we have e C all ( D 1 , W K ) ≤ e C one ( D 1 , W K ) . (3.10) Equality holds for any class of fair c ol lusion channels ( W K = W fair K ). Pr o of : Prop ert y (3.10) follo ws from (3. 6 )—(3.9 ) and t he fact that W fair K ( p X K ) ⊆ W K ( p X K ). No w consider W K = W fair K . Application of Prop ert y (3.3) to any fair collusion c hann el yields 1 | K | I ( X K ; Y | S, W ) ≤ 1 | A | I ( X A ; Y | X K \ A , S, W ) , ∀ A ⊆ K . Hence the inner minimum in (3.7) is ac hiev ed by A = K , and equalit y holds in (3.1 0 ). ✷ 17 3.2 Capacit y Theorems The follo wing lemma will b e used to pro ve Theorems 3.5 and 3.7 b elo w. Its pro of is giv en in App endix B and b orrows ideas from [14, Th eorem 3.7]. Lemma 3.3 Consider the c omp ound family W K ( p x K ) of memoryless channels in (2.31). Under b oth the dete ct-one and dete ct-al l criteria, the c omp ound c ap acity for this pr oblem is an upp er b ound on the c ap acity for the main pr oblem of (2.10), in which p y | x K ∈ W K ( p x K ) with pr ob ability 1. W e n o w giv e a direct co ding theorem 3.4 and t wo con v erse theorems 3.5 and 3.7 p ertaining to the dete ct-all and the detect-one criteria, resp ectiv ely . These th eorems, combined with Lemma 3 .2, establish the capacit y theorem 3.8. Theorem 3.4 Under the c ontinuity assumption (2.11), al l ﬁngerprinting c o de r ates b elow e C all ( D 1 , W K ) and e C one ( D 1 , W K ) ar e achievable under the dete c t- al l and the dete ct- one criteria, r esp e ctively. Theorem 3.4 is a direct consequen ce of Theorem 5.2(vi), stated and pro ved later in this p ap er. Theorem 3.5 When W K is indep endent of p x K , no ﬁnge rprinting c o de r ate R exc e e ding e C all ( D 1 , W K ) is achievable under the dete ct-al l c riterion. The same holds for the c omp ound mem- oryless class of (2.31). Corollary 3.6 Under the c ontinuity assumption (2.11), when W K dep ends on p x K , the fol lowing holds. If the c ol luders ar e c onstr aine d to sele ct a fair c ol lusion channel, then W K ( p x K ) = W fair K ( p x K ) , and no r ate ab ove e C all ( D 1 , W fair K ) is achievable u nder the dete ct-al l criterion. The pro of of Theorem 3.5 and Corollary 3.6 is giv en in Sec. 7. Theorem 3.7 When W K is indep endent of p x K , no ﬁngerprinting c o de r ate R exc e e ding e C one ( D 1 , W fair K ) = e C one ( D 1 , W K ) (3.11) is achievable under the dete c t- one criterion. The same hold s for the c omp ound memoryless class of (2.31). The pro of of Theorem 3.7 is give n in Sec. 8 . Theorem 3.8 Consider ﬁngerprinting for c o alitions of size at mo st K . L e t W K b e the set of al l c onditional distributions p Y | X K (c ol lusion attacks) that c an b e sele cte d by the c o alition. (a) Det ect-all case. Fingerprinting c ap acity is lower-b ounde d by e C all ( D 1 , W K ) given by (3.9). If in addition one of the fol lowing holds: 18 (i) The set W K of attacks available to the c o alition is indep endent of the joint typ e of the ﬁngerprints p x K assigne d to the c o alition; or (ii) F or eve ry p x K , the set W K ( p x K ) of attacks given the joint typ e p x K c ontains only p ermutation-invariant attacks ( W K ( p x K ) = W fair K ( p x K ) ), then ﬁngerprint c ap acity under the dete ct-al l criterion is e qual to e C all ( D 1 , W K ) . (b) Dete ct- one case. Fingerprinting c ap acity i s lower-b ounde d by e C one ( D 1 , W K ) given b y (3.8). If i n addition the set W K of attacks available to the c o alition is indep endent of the joint typ e of the ﬁngerprints p x K assigne d to the c o alition, th en e C one ( D 1 , W K ) = e C one ( D 1 , W fair K ) = e C all ( D 1 , W fair K ) , and ﬁngerprint c ap acity under the dete ct-one criterion is e qual to this c ommon value. The low er b ound s on ﬁngerp rin ting capacit y derived in [4, 5] are of the form (3.6) with L = 1, i.e., the auxiliary random v ariable W is degenerat e. Since the pay oﬀ function I p S p K X | S p Y | X K ( X K ; Y | S ) is generally n onconca ve with resp ect to p X | S , a randomized strategy in whic h the v ariable p X | S is randomized will generally outp erform a deterministic str ategy in whic h p X | S is ﬁxed. The a ux iliary random v ariable W pla ys the role of selector of p X | S in this mutual-information game . Apparent ly the b eneﬁts o f th is randomization can b e dramatic f or large K . F or the Boneh-S h a w problem, the v alue of the maxmin of (3.6) with L = 1 is C one 1 ( D 1 , W K ) = K − 1 2 − ( K − 1) . Ho we ver T ardos’ scheme [9 ] us es W = [0 , 1] and ac hiev es a rate O ( K − 2 ) whic h is therefore m uch larger than C one 1 ( D 1 , W K ) f or large K . The rate of his co de is necessarily a low er b ound on C one ( D 1 , W K ). 4 Simple Fingerprin t Deco d er This section int ro du ces our rand om co ding scheme and a simple deco der that tests candidate ﬁn- gerprint s one by one. Th is d ecoder is closely related to the correlat ion d ecoders that hav e b een used in T ardos’ pap er [9] and in the signal pr ocessing lit erature. (Such deco ders ev aluate a mea su re of corr elation b et wee n the receiv ed sequence and the individual ﬁn gerprin ts, and retain the ﬁn ger- prints whose correlation score is ab o ve a certain thr eshold.) W e deriv e err or exp onen ts for this sc heme and establish maxim u m rates for reliable decoding. These rate s fall sh ort of the ﬁngerprint - ing capacitie s C all ( D 1 , W K ) and C one ( D 1 , W K ) giv en by Theorem 3.5 and 3.7. Th e deriv ations are giv en for the c ase without side information ( S = ∅ ) or distortio n constraint ( D 1 ) for the ﬁn gerprin t distributor. T his setup is directly applicable to the Boneh-Sha w mo del, and the deriv ations are m uch easier to follo w. This setup also cont ains several ke y ingredients of the err or analysis for the more elab orate joint ﬁngerprint decoder of Sec. 5. In particular, the false-negativ e error exp onents are determined b y the w orst conditional t yp e T yx K | w . 4.1 Co deb o ok The sc heme is designed to ac hiev e a false-p ositiv e err or e xp on ent equal to ∆ and assu mes a nomina l value K nom for coalition size. (Reliable d ecoding will generally b e p ossib le for K > K nom though.) 19 These paramet ers are used to iden tify a join t type cla ss T ∗ wx deﬁned belo w (9.4). An arbitrarily large L is selected, deﬁning an alphab et W = { 1 , 2 , · · · , L } . A random constan t-comp osition co de C ( w ) = { x m , 1 ≤ m ≤ 2 N R } is generated for eac h w ∈ T ∗ w b y dra wing 2 N R sequences indep endent ly and uniformly f rom the conditional t yp e class T ∗ x | w . 4.2 Enco ding Sc heme A s equence W is dra wn u niformly fr om the typ e class T ∗ w and sh ared with the receiv er. User m is assigned cod ew ord x m from C ( W ), for 1 ≤ m ≤ 2 N R . 4.3 Deco ding Sc heme The receiv er mak es an inno cen t/guilt y decision on eac h user indep e ndently of the other users , and there lies the simplicit y but also the sub optimalit y of this d ecoder. Sp eciﬁcally , the estimated coaliti on ˆ K is the colle ction of all m su c h that I ( x m ; y | w ) > R + ∆ . (4.1) If no such ˆ K is found, the receiv er outputs ˆ K = ∅ . T he users whose empirical mutual information score exceeds th e thr eshold R + ∆ are declared guilt y . 4.4 Error Exp onen ts Theorem 4.1 b elo w gi ves the false-p ositiv e and false-nega tive error exp onent s for this cod ing scheme. These exp onen ts are given in term s of the functions deﬁned b elo w. Deﬁne the set of conditional p.m.f.’s for X K giv en W wh ose co nd itional marginals are the same for all comp onents of X K : M ( p X | W ) = { p X K | W : p X m | W = p X | W , ∀ m ∈ K } . Denote by P X W ( L ) the set of p.m.f.’s p X W deﬁned ov er X × W . Deﬁne for eac h m ∈ K the set of conditional p.m.f.’s P Y X K | W ( p X W , W K , R, L, m ) ,  ˜ p Y X K | W : ˜ p X K | W ∈ M ( p X | W ) , ˜ p Y | X K ∈ W K ( ˜ p X K ) , I ˜ p Y X K | W p W ( X m ; Y | W ) ≤ R o (4.2) and the pseudo spher e p acking exp onent ˜ E psp ,m ( R, L, p X W , W K ) = min ˜ p Y X K | W ∈ P Y X K | W ( p X W , W K ,R,L,m ) D ( ˜ p Y X K | W k ˜ p Y | X K p K X | W | p W ) . (4.3) The termin olog y pseudo spher e-p acking exp onent is used b ecause despite its sup erﬁcial resem blance to a sph ere-pac kin g exp onent [12], (4. 3 ) does n ot p r o vide a f u ndamen tal asymptotic low er b ound on error p robabilit y . 20 T aking the maxim um and minimum of ˜ E psp ,m ab o v e o ve r m ∈ K , w e r esp ectiv ely deﬁne ˜ E psp ( R, L, p X W , W K ) = max m ∈ K ˜ E psp ,m ( R, L, p X W , W K ) , (4.4) ˜ E psp ( R, L, p X W , W K ) = min m ∈ K ˜ E psp ,m ( R, L, p X W , W K ) . (4.5) If these expressions are ev aluated for the s et W fair K whic h is p ermuta tion in v arian t, then (4.2) and (4.3) are in dep enden t of m ∈ K , and the expressions (4.4) and (4.5) coincide. Deﬁne E psp ( R, L, W K ) = max p X W ∈ P X W ( L ) ˜ E psp , 1 ( R, L, p X W , W fair K nom ) . (4.6) Denote b y p ∗ X W the maximize r in (4.6), whic h dep ends on R and W fair K nom . Finally , deﬁne E psp ( R, L, W K ) = ˜ E psp ( R, L, p ∗ X W , W K ) , (4.7) E psp ( R, L, W K ) = ˜ E psp ( R, L, p ∗ X W , W K ) , (4.8) where no fairn ess requ ir emen t is imp osed on W K . Theorem 4.1 The thr eshold de cision rule (4.1) yields the fol lowing err or exp onents. (i) The false-p ositive err or e xp onent is E FP ( R, L, W K , ∆) = ∆ . (4.9) (ii) The dete ct-one err or exp onent is E one ( R, L, W K , ∆) = E psp ( R + ∆ , L, W K ) . (4.10) (iii) The dete ct-al l err or exp onent is E all ( R, L, W K , ∆) = E psp ( R + ∆ , L, W K ) . (4.11) (iv) A fair c ol lu si on str ate gy is optimal under the dete ct- one err or criterion: E one ( R, L, W K , ∆) = E one ( R, L, W fair K , ∆) . (v) The dete ct-one and dete ct-al l err or exp onents ar e the same when the c ol luders r estrict their choic e to fair str ate gi e s: E one ( R, L, W fair K , ∆) = E all ( R, L, W fair K , ∆) . (vi) F or K = K nom , the supr emum of al l r ates for w hich the dete ct-one err or exp onent of (4.10) is p ositive is g i ven by C simple ( W K ) = C simple ( W fair K ) = lim L →∞ max p X W ∈ P X W ( L ) min p Y | X K ∈ W fair K ( p X K ) I p W p K X | W p Y | X K ( X 1 ; Y | W ) (4.12) and is achieve d by letting ∆ → 0 and L → ∞ . Note. Applying (3.4) with S = ∅ , w e h av e I ( X 1 ; Y | W ) ≤ 1 K I ( X K ; Y | W ) for an y p erm utation- in v arian t p Y | X K . Since this inequalit y is generally s trict, C simple ( W K ) is generally lo wer than the ﬁngerprinting capacit y C one ( W K ) of (3.8). Hence the simple thr esholding ru le (4.1) is generall y n ot capacit y-ac hieving. 21 5 Join t Fingerprin t Deco d er The enco d er and join t deco der are pr esen ted in this section, and the p erformance of the new sc heme is analyzed. As in the previous section, the enco der ensures a false-p ositiv e error exp on ent ∆ and assumes a nominal value K nom for coalition size. An arbitrarily large L is selecte d, deﬁning an alphab et W = { 1 , 2 , · · · , L } . A random constan t-comp osition co de C ( s , w ) = { x m , 1 ≤ m ≤ 2 N R } is generate d for eac h s ∈ S N and w ∈ T ∗ w b y dra win g 2 N R sequences indep enden tly and u niformly from a cond itional t yp e class T ∗ x | sw . Both T ∗ w and T ∗ x | sw dep end on ∆ and K nom as deﬁned b elo w (10.6). Prior to enco ding, a sequence W ∈ W N is dra w n indep end en tly of S and un iformly fr om T ∗ w , and shared w ith the receiv er. Next, user m is assigned co dewo rd x m ∈ C ( S , W ), for 1 ≤ m ≤ 2 N R . In terms of decoding, the fundamen tal impr o vemen t o ver th e simple strate gy of Sec. 4 resides in the use of a joint decodin g rule. Sp eciﬁcally , the deco der maximizes a p enalized emp irical m utu al information score o ve r all p ossible coalitions of an y s ize. The p enalt y is prop ortional to the size of the coalit ion. 5.1 Mutual Information of k Random V ariables Our ﬁn gerprin t deco ding sc heme is based on the notion of mutual information b et ween k r andom v ariables X 1 , · · · , X k . F or k = 3 , this m utu al inform ation is deﬁned as [12, p . 57] [23, p. 378 ] ◦ I ( X 1 ; X 2 ; X 3 ) = H ( X 1 ) + H ( X 2 ) + H ( X 3 ) − H ( X 1 , X 2 , X 3 ) . W e use the symb ol ◦ I to distinguish it from the sy mb ol I fo r standard mutual information b et ween t w o r andom v ariables. Note the c h ain rule ◦ I ( X 1 ; X 2 ; X 3 ) = I ( X 1 ; X 2 X 3 ) + I ( X 2 ; X 3 ) . The mutual inform atio n b et we en k r andom v ariables X 1 , · · · , X k is simila rly deﬁn ed as the sum of their ind ividual en tropies min us their j oin t entrop y [12, p. 57] or equiv alen tly , the divergence b et w een their join t d istribution and the pro duct of their marginals: ◦ I ( X 1 ; · · · ; X k ) = H ( X 1 ) + · · · + H ( X k ) − H ( X 1 , · · · , X k ) (5.1) = D ( p X 1 ··· X k k p X 1 · · · p X k ) . Note the follo win g prop erties, including the c h ain rules (P3) and (P4): (P1) T he m utu al inf ormation (5.1) is symmetric in its argu m en ts; (P2) ◦ I ( X 1 ; X 2 ) = I ( X 1 ; X 2 ); (P3) ◦ I ( X 1 ; · · · ; X k ) = I ( X 1 ; X 2 · · · X k ) + ◦ I ( X 2 ; · · · ; X k ) = P k − 1 i =1 I ( X i ; X i +1 · · · X k ); (P4) ◦ I ( X 1 ; · · · ; X k ) = ◦ I ( X 1 ; · · · ; X i ; X i +1 · · · X k ) + ◦ I ( X i +1 ; · · · ; X k ) for any i ∈ { 1 , 2 , · · · , k − 2 } ; 22 (P5) ◦ I ( X 1 ; · · · ; X k ) = P k − 1 i =1 H ( X i ) − H ( X 1 · · · X k − 1 | X k ). Similarly to (5.1), w e deﬁne the empirical mutual information ◦ I ( x 1 ; · · · ; x k ) b et w een k sequences x 1 , · · · , x k , as the mutual information with resp ect to the joint t yp e of x 1 , · · · , x k . Analogously to Prop ert y (P5), w e ha ve ◦ I ( x 1 ; · · · ; x k ; y ) = k X i =1 H ( x i ) − H ( x 1 · · · x k | y ) . (5.2) This leads to the follo wing alternativ e in terpretation of the minim um -equivocation deco der of Liu and Hughes [23]. If x 1 , · · · , x k are co dew ords from a constant- comp osition code C , th en H ( x i ) is the same for all i , th en th e minim um-equiv o cation deco der is equiv alen t to a maxim u m-m utual- information deco der: min x 1 ··· x k ∈C H ( x 1 · · · x k | y ) ⇔ max x 1 ··· x k ∈C ◦ I ( x 1 ; · · · ; x k ; y ) . (5.3) There is no similar interpretatio n when ordinary m u tu al information I ( x 1 · · · x k ; y ) is u s ed [2 3 ]. Liu and Hughes sho we d th at the minimum-equiv o catio n deco der outp erforms the ordinary maxim um- m utual-information deco der in terms of rand om-co ding exp onen t. 5.2 MPMI Criterion The restriction of x M to a subset A of M will b e d enoted b y x A = { x m , m ∈ A} . F or d isjoin t sets A = { m 1 , · · · , m |A| } and B = { m |A| +1 , · · · , m |A| + |B | } , w e use the shorthand ◦ I ( x A ; yx B | sw ) , ◦ I ( x m 1 ; · · · ; x m |A| ; yx B | sw ) (5.4) for the mutual information b et w een the |A| + 1 r andom v ariables x m 1 , · · · , x m |A| , and ( y , x B ), conditioned on ( s , w ). Deﬁne the f unction M P M I ( k ) =    0 : if k = 0 max x K ∈C k ( s , w )  ◦ I ( x K ; y | sw ) − k ( R + ∆ )  : if k = 1 , 2 , · · · (5.5) where k = |K| and ◦ I ( x K ; y | sw ) = ◦ I ( x 1 ; · · · ; x k ; y | sw ) = kH ( x | s w ) − H ( x K | ysw ) (5.6) is the mutual information b et we en the k +1 sequences x 1 , · · · , x k , y , conditioned on ( s , w ), as deﬁ n ed in (5 .4). Again w e stress that ◦ I ( x 1 ; · · · ; x k ; y | sw ) should not b e confused with the ordinary m utual information I ( x 1 · · · x k ; y | sw ) b et we en the k -tup le ( x 1 , · · · , x k ) and y , conditioned on ( s , w ). Our join t ﬁngerpr in t decoder is a Maxim um P enalized Mutual Information (MPMI) deco der: max k ≥ 0 M P M I ( k ) . (5.7) In case of a tie, the largest v alue o f k is reta ined . The deco der seeks the coalition size k and the co dew ords { x m , m ∈ ˆ K} i n C ( s , w ) that ac hieve the M PMI crite rion ab o ve. The i n d ices of t h ese co dew ords form the deco ded coalition ˆ K . If the maximizing k in (5.7) is zero, the receiv er outputs ˆ K = ∅ . Similarly to (5.3), the MPMI decod er ma y equiv alen tly b e in terpr eted as a Minimum P enalized Equivocation criterion. 23 5.3 Prop erties The follo w ing lemma shows that 1) eac h subset of the estimated coalition is signiﬁcan t, and 2) any extension of the estimated coalition w ould fail a signiﬁcance test. Lemma 5.1 L et ˆ K achieve the maximum in (5.5) (5.7). Then ∀A ⊆ ˆ K : ◦ I ( x A ; yx ˆ K\A | sw ) > |A| ( R + ∆) . (5.8) Mor e over, for ev ery A disjoint with ˆ K , ◦ I ( x A ; yx ˆ K | sw ) ≤ |A| ( R + ∆) . (5.9) Pr o of . F or any A ⊆ ˆ K , w e hav e ◦ I ( x A ; yx ˆ K\A | sw ) − |A| ( R + ∆) ( a ) = [ ◦ I ( x ˆ K ; y | sw ) − ˆ K ( R + ∆ )] − [ ◦ I ( x ˆ K\A ; y | sw ) − ( ˆ K − |A| ) ( R + ∆)] ( b ) = M P M I ( ˆ K ) − [ ◦ I ( x ˆ K\A ; y | sw ) − ( ˆ K − |A| ) ( R + ∆)] ≥ M P M I ( ˆ K ) − M P M I ( ˆ K − |A | ) ( c ) ≥ 0 where (a ) follo w s f rom the chain r u le for ◦ I , (b) holds b ecause ˆ K ac h iev es the maxim um in (5.5), and (c) b ecause ˆ K ac h iev es the maxim um in (5.7). This pro ve s (5.8). T o prov e (5.9), consider an y A disjoint with ˆ K and let K ′ = ˆ K ∪ A . W e ha ve ◦ I ( x A ; yx ˆ K | sw ) − |A| ( R + ∆) ( a ) = [ ◦ I ( x K ′ ; y | sw ) − K ′ ( R + ∆)] − [ ◦ I ( x ˆ K ; y | sw ) − ˆ K ( R + ∆ )] ( b ) = [ ◦ I ( x K ′ ; y | sw ) − K ′ ( R + ∆)] − M P M I ( ˆ K ) ≤ M P M I ( K ′ ) − M P M I ( ˆ K ) ( c ) ≤ 0 , where (a), (b), (c) are justiﬁed in the same wa y as ab o v e. This p ro v es (5.9). ✷ Reliabilit y metric. The score ◦ I ( x ˆ K ; y | sw ) − ˆ K R > ˆ K ∆ represent s a guilt index for the estimated coalit ion ˆ K . The larger this qu an tit y is, th e stronger the evidence that the m em b ers of ˆ K are guilt y . Lik ewise, ◦ I ( x m ; yx ˆ K\{ m } | sw ) − R > ∆ is a guilt index for accused user m ∈ ˆ K , and ◦ I ( x m ; yx ˆ K | sw ) − R ≤ ∆ is a guilt in dex for user m / ∈ ˆ K . The smaller this in dex is, the stronger the evidence that m is inno cen t. 24 5.4 Error Exp onen ts Theorem 5.2 b elo w giv es the false-p ositiv e and false-nega tive error exp onen ts for our c o ding sc heme. These exp onen ts are given in term s of the functions deﬁned b elo w. Recall P X K W | S ( p S , L, D 1 ) deﬁned in (3.5 ). W e similarly deﬁ n e P X K | S W ( p S W , L, D 1 ) , ( p X K | S W = Y k ∈ K p X k | S W : p X 1 | S W = · · · = p X K | S W , E d ( S, X 1 ) ≤ D 1 ) . Deﬁne no w the follo wing set of conditional p.m.f.’s for X K giv en S, W wh ose conditional marginal p.m.f. p X | S W is the s ame for eac h X m , m ∈ K : M ( p X | S W ) = { p X K | S W : p X m | S W = p X | S W , ∀ m ∈ K } . Deﬁne for eac h A ⊆ K the set of conditional p.m.f.’s P Y X K | S W ( p W , ˜ p S | W , p X | S W , W K , R, L, A ) ,  ˜ p Y X K | S W : ˜ p X K | S W ∈ M ( p X | S W ) , ˜ p Y | X K ∈ W K ( ˜ p X K ) , 1 | A | ◦ I p W ˜ p S | W ˜ p Y X K | S W ( X A ; Y X K \ A | S, W ) ≤ R  (5.10) and the pseudo spher e p acking exp onent ˜ E psp , A ( R, L, p W , ˜ p S | W , p X | S W , W K ) = min ˜ p Y X K | S W ∈ P Y X K | S W ( p W , ˜ p S | W ,p X | S W , W K ,R,L, A ) D ( ˜ p Y X K | S W ˜ p S | W k ˜ p Y | X K p K X | S W p S | p W ) . (5.11) T aking the maxim um 3 and th e min imum of ˜ E psp , A ab o v e ov er all subsets A ⊆ K , we deﬁn e ˜ E psp ( R, L, p W , ˜ p S | W , p X | S W , W K ) = ˜ E psp , K ( R, L, p W , ˜ p S | W , p X | S W , W K ) , (5.12) ˜ E psp ( R, L, p W , ˜ p S | W , p X | S W , W K ) = min A ⊆ K ˜ E psp , A ( R, L, p W , ˜ p S | W , p X | S W , W K ) . (5.13) No w deﬁne E psp ( R, L, D 1 , W K ) = max p W ∈ P W min ˜ p S | W ∈ P S | W max p X | S W ∈ P X | S W ( p W , ˜ p S | W ,L,D 1 ) ˜ E psp , K ( R, L, p W , ˜ p S | W , p X | S W , W fair K nom ) . (5.14) Denote b y p ∗ W and p ∗ X | S W the maximizers in (5.14), where the latter is to b e viewed as a f unction of ˜ p S | W . Also note that b oth p ∗ W and p ∗ X | S W implicitly dep end on R and W fair K nom . Finally , deﬁn e E psp ( R, L, D 1 , W K ) = min ˜ p S | W ∈ P S | W ˜ E psp ( R, L, p ∗ W , ˜ p S | W , p ∗ X | S W , W K ) , (5.15) E psp ( R, L, D 1 , W K ) = min ˜ p S | W ∈ P S | W ˜ E psp ( R, L, p ∗ W , ˜ p S | W , p ∗ X | S W , W K ) . (5.16) 3 The p roperty that K achiev es max A ⊆ K ˜ E psp , A is derived in the pro of of Theorem 5.2, Part (iv). 25 Theorem 5.2 The de cision rule (5.7 ) yields the fol lowing err or exp onents. (i) The false-p ositive err or e xp onent is E FP ( R, D 1 , W K , ∆) = ∆ . (5.17) (ii) The err or exp onent for the (false ne gative) pr ob ability tha t the de c o der fails to c atch al l c ol luders (misses some of them) is E all ( R, L, D 1 , W K , ∆) = E psp ( R + ∆ , L, D 1 , W K ) . (5.18) (iii) The err or exp onent f or the (false ne gative) pr ob ability that the de c o der fails to c atch even one c ol luder (misses every si ng le c ol luder) is E one ( R, L, D 1 , W K , ∆) = E psp ( R + ∆ , L, D 1 , W K ) . (5.19) (iv) E one ( R, L, D 1 , W K , ∆) = E one ( R, L, D 1 , W fair K , ∆) . (v) E all ( R, L, D 1 , W fair K , ∆) = E one ( R, L, D 1 , W fair K , ∆) . (vi) If K = K nom , the sup r emum of al l r ates for which the err or exp onents of (5.18) and (5.19 ) ar e p ositive ar e C all ( D 1 , W K ) and C one ( D 1 , W K ) of (3.9) and (3.8), r esp e ctive ly. Note. The expressions (5.18) and (5.19) for the false-negativ e err or exp onents m ay b e viewed as sequences indexed b y L . As discussed b elo w (3.7) and in [14, Sec. 3.5], one may sho w that these sequences are n ondecreasing and conv erge to ﬁnite limits at a p olynomial rate. 6 Error Exp onent s for Memoryless Collusion Channels Consider the comp ound class (2.31) of memoryless channels. Th e theorems of Sec. 3 show ed that comp ound capacit y is the same as for the main problem of (2.1 0 ). W e no w outline ho w the deriv ation of the err or exp onents. Retracing the steps of the pro of o f Th eorem 5 .2, it ma y b e seen that the expressions (5 .17 ), (5.18) and (5.19) for the error exp onen ts remain v alid, with t wo mo diﬁcations. First, in (5. 10 ), the constrain t ˜ p Y | X K ∈ W K is remov ed, and so th e resulting set P memoryless Y X K | S W is larger than P Y X K | S W of (5.10). Second, the div ergence cost function D ( ˜ p Y X K | S W ˜ p S | W k ˜ p Y | X K p K X | S W p S | p W ) (6.1) in the expression (5.11) for the pseudo sphere p ac king exp onen t ˜ E psp , A is replaced b y 4 min p Y | X K ∈ W K D ( ˜ p Y X K | S W ˜ p S | W k p Y | X K p K X | S W p S | p W ); (6.2) 4 This can be traced b ac k to (10.15), where p y | x K is now replaced with p Y | X K in the asymptotic ex pression for the prob ab ility of the cond itional type clas s T yx K | sw . 26 denote b y ˜ E memoryless psp , A the corresp onding p seudo spher e pac king exp onent. The dive rgences in (6.1) and (6.2) coi n cide when p Y | X K = ˜ p Y | X K , thus (6.2) is u pp er-b ounded b y (6.1). S ince p Y | X K = ˜ p Y | X K is feasible f or P Y X K | S W of (5.10), w e conclud e that ˜ E memoryless psp , A ≤ ˜ E psp , A of (5.11). Hence the false-negativ e e rr or exp onen ts in the memoryless case are upp er-b ounded b y those of Th eorem 5.2. This p henomenon is similar to r esults in [14]: due to the use of RM co des, the col lud ers’ optimal strategy is a nearly-memoryless strategy , but they are precluded from using a truly memoryless strat egy b ecause that w ould violate the hard co nstr ain t p y | x K ∈ W K . In the memoryless case, the worst conditional typ e (whic h determines the false-nega tive err or exp onen ts) migh t b e suc h that p y | x K / ∈ W K . 7 Pro of of Con ve rse Under Detect-All Criterion 7.1 Pro of of Theorem 3.5 The encod er generates marked copies x m = f N ( s , v , m ) for 1 ≤ m ≤ 2 N R and the decod er outpu ts an e stimated coalitio n g N ( y , s , v ) ∈ { 1 , · · · , 2 N R } ⋆ . By Lemma 3 .3 , it suﬃces to prov e the claim for the comp oun d class of memoryless c hannels W K of (2.31). Let K b e the siz e of the coalition and ( f N , g N ) a sequence of length- N , rate- R co d es. W e sh o w that for any such sequence of cod es, reliable deco ding of the ﬁngerprints is p ossible only if R ≤ e C all ( D 1 , W K ) under the detect-all criterion. Step 1. A low er b ound on error pr obabilit y is obtained when a help er provides some in formation to the deco der. Here th e help er inf orms the decod er that the coalit ion size is K . There are  2 N R K  ≤ 2 K N R p ossible coalitio ns of s ize K . W e repr esen t a coalitio n as M K , { M 1 , · · · , M K } , where M k , k ∈ K = { 1 , 2 , · · · , K } , a re assumed to b e d ra wn i.i.d. uniform ly 5 from { 1 , · · · , 2 N R } . W e similarly write X k , x M k , k ∈ K , and X K , { X 1 , · · · , X K } . The comp onen t of X K at p osition t ∈ { 1 , · · · , N } is denoted by X K ,t , { X 1 t , · · · , X K t } . Assumin g memoryless collusion c hann el p Y | X K ∈ W K is in eﬀect, the join t p.m.f. of ( M K , S , V , X K , Y ) is giv en b y p M K S V X K Y = p N S p V Y k ∈ K ( p M k 1 { X k = f N ( S , V , M k ) } ) p N Y | X K . (7.1) Deﬁne the random v ariables Q t = { V , S j , j 6 = t } ∈ V N × S N − 1 for 1 ≤ t ≤ N . By assumption, S t and Q t are indep endent, and X k t , k ∈ K , are conditionally i.i.d. give n ( S t , Q t ) = ( S , V ). How ever, note that X k t , 1 ≤ k ≤ K , are generally conditionally dep endent giv en ( S t , V ) alone. The join t p.m.f. of ( S t , Q t , X K ,t , Y t ) is p S t p Q t   Y 1 ≤ k ≤ K p X kt | S t Q t   p Y | X K , 1 ≤ t ≤ N (7.2) where the conditional p.m.f. p X kt | S t Q t is the same for al l k ∈ K . No w deﬁne a time-sharing random v ariable T , uniformly distrib u ted o ver { 1 , · · · , N } , and in dep enden t of the other random v ariables. 5 Capacit y could b e hig her if there w ere constrain ts on the formatio n of coalitions, for instance if the u sers form social netw orks [25]. 27 Let X K , X K ,T ∈ X K , Y , Y T ∈ Y , S , S T ∈ S , W , ( Q T , T ) ∈ W , V N × S N − 1 × { 1 , · · · , N } . (7.3) By (7.2) and (7.3), the co de f N and the r andom v ariables S , V , M K induce an empirical p.m.f. p X K whic h can b e view ed as a fun ction of f N . The join t p.m.f. of ( S, W , X K , Y ) is p S p W Y k ∈ K p X k | S W ! p Y | X K (7.4) where the conditional p .m .f . p X k | S W is the s ame for all k ∈ K . Moreo ver D 1 ≥ E " 1 N N X t =1 d ( S t , X k t ) # = E d ( S, X k ) , k ∈ K . Hence p X K W | S b elongs to the set P X K W | S ( p S , L, D 1 ) of (3.5 ), with L = |W | = N × V N × | S | N . Step 2. Our single-lette r expressions are d eriv ed from the follo wing inequalit y , whic h is v alid for all A ⊆ K and p Y | X K ∈ W K : I ( M A ; Y | S , V ) ( a ) = I ( X A ; Y | S , V ) = I ( X A ; Y | X K \ A , S , V ) + I ( X A ; X K \ A | S , V ) | {z } =0 − I ( X A ; X K \ A | Y , S , V ) ( b ) ≤ I ( X A ; Y | X K \ A , S , V ) = H ( Y | X K \ A , S , V ) − H ( Y | X K , S , V ) ( c ) = H ( Y | X K \ A , S , V ) − H ( Y | X K ) ( d ) = N X t =1 H ( Y t | Y t − 1 , X K \ A , S , V ) − N X t =1 H ( Y t | X K ,t ) ( e ) ≤ N X t =1 H ( Y t | X K \ A ,t , S , V ) − N X t =1 H ( Y t | X K ,t ) ( f ) = N X t =1 H ( Y t | X K \ A ,t , S t , Q t ) − N X t =1 H ( Y t | X K ,t , S t , Q t ) = N X t =1 I ( X A ,t ; Y t | X K \ A ,t , S t , Q t ) = N I ( X A ; Y | X K \ A , S, W ) (7.5) where (a) is due to the data processing inequalit y and the f act that X A is a function of ( M A , S , V ), (b) h olds b ecause the co dewo rd s { X k , 1 ≤ k ≤ K } are m utually indep endent giv en ( S , V ), (c) b ecause ( S , V ) → X K → Y forms a Mark o v c hain, (d) is obtained using th e c hain ru le for en tropy 28 and the fact that the collusion c hann el is memoryless, (e) holds b ecause conditioning r educes en tropy , and (f ) b ecause ( S , V ) = ( S t , Q t ) → X K ,t → Y t forms a Mark o v c hain. Step 3. Under co llusion c hannel p Y | X K ∈ W K , let P all e ( p Y | X K ) = P r [ ˆ K 6 = K ] b e the decod ing error probabilit y of the detect-all d eco der. The follo w in g inequalities h old for ev er y sub s et A of K and for every p Y | X K : | A | N R ( a ) = H ( M A ) ( b ) = H ( M A | S , V ) = H ( M A | Y , S , V ) + I ( M A ; Y | S , V ) ≤ H ( M K | Y , S , V ) + I ( M A ; Y | S , V ) ( c ) ≤ 1 + P all e ( p Y | X K ) · K N R + I ( M A ; Y | S , V ) (7.6) where (a) h olds b ecause M A is u niformly distributed o ve r { 1 , · · · , 2 | A | N R } , (b) b ecause M A and ( S , V ) are in dep enden t, and (c) b ecause of F ano’s inequalit y . F or the error probabilit y P all e ( p Y | X K ) to v anish for eac h p Y | X K ∈ W K , we need R ≤ lim inf N →∞ min p Y | X K ∈ W K min A ⊆ K 1 N | A | I ( M A ; Y | S , V ) . (7.7) W e hav e min p Y | X K ∈ W K min A ⊆ K 1 N | A | I ( M A ; Y | S , V ) ( a ) ≤ min p Y | X K ∈ W K min A ⊆ K 1 | A | I ( X A ; Y | X K \ A , S, W ) ( b ) ≤ max p X K W | S ∈ P X K W | S ( p S ,L ( N ) ,D 1 ) min p Y | X K ∈ W K min A ⊆ K 1 | A | I ( X A ; Y | X K \ A , S, W ) ≤ sup L →∞ max p X K W | S ∈ P X K W | S ( p S ,L,D 1 ) min p Y | X K ∈ W K min A ⊆ K 1 | A | I ( X A ; Y | X K \ A , S, W ) ( c ) ≤ lim L →∞ max p X K W | S ∈ P X K W | S ( p S ,L,D 1 ) min p Y | X K ∈ W K min A ⊆ K 1 | A | I ( X A ; Y | X K \ A , S, W ) (7.8) where (a) is due to (7.5), (b) to the fact that p X K W | S giv en in (7.4) b elongs to the set P X K W | S ( p S , L, D 1 ) deﬁned in (3.5), with L = L ( N ) = N × V N × |S | N , and (c) b ecause the supremand is nondecreasing in L . Com binin g (7.7) and (7.8), w e obtain R ≤ lim L →∞ max p X K W | S ∈ P X K W | S ( p S ,L,D 1 ) min p Y | X K ∈ W K ( p X K ) min A ⊆ K 1 | A | I ( X A ; Y | X K \ A , S, W ) = lim L →∞ C all L ( D 1 , W K ) = e C all ( D 1 , W K ) (7.9) whic h concludes the pro of of Theorem 3.5. ✷ 29 7.2 Pro of of Corollary 3.6 By assumption, here the coalitio n is fair and W K = W fair K dep ends on the join t t yp e p x K of the colluders’ ﬁngerprint ed sequences. W e denote this join t type b y Z ∈ Z = P [ N ] X K to make th e notation more compact. Note that Z is a fu n ction of ( S , V , K ) and that the cardin alit y o f Z is at most ( N + 1) |X | K . Since the c hannel p Y | X K selected b y the coalition ma y dep end on Z , w e indicate this dep end ency explicitly by represent ing t h e channel as p Y | X K Z and the set o f feasible channels as f W fair K = { p Y | X K Z : p Y | X K ,Z = z ∈ W fair K ( z ) , ∀ z ∈ Z } . (7.10) By Lemma 3.3, it suﬃces to pro ve the claim for the comp ound class of memoryless channels W fair K ( p x K ). Deﬁne the s et P X K W S ( p S , L, D 1 ) ,  p S p X K W | S : p X K W | S ∈ P X K W | S ( p S , L, D 1 )  and slice it into the follo wing d isjoin t collec tion of sets: ∀ z ∈ Z : P X K W S ( p S , L, D 1 , z ) , { p X K W S ∈ P X K W S ( p S , L, D 1 ) : p X K = z } . (7. 11) The error probability of the deco der is n ot increased if a help er r ev eals th e join t typ e Z . The en tropy of Z is at most log |Z | ≤ |X | K log( N + 1) . F ano’s inequalit y (7.6) applied to A = K b ecomes K N R = H ( M K | S , V ) ≤ H ( M K , Z | S , V ) = |X | K log( N + 1) + H ( M K | S , V , Z ) ≤ |X | K log( N + 1) + 1 + P all e ( p Y | X K Z ) · K N R + I ( M K ; Y | S , V , Z ) . (7.12) Analogously to (7.5), the follo wing single-let ter expression holds for eve r y z ∈ Z a nd p Y | X K ∈ W K ( z ): I ( M K ; Y | S , V , Z = z ) = I ( X K ; Y | S , V , Z = z ) = H ( Y | S , V , Z = z ) − H ( Y | X K , S , V , Z = z ) ( a ) = H ( Y | S , V , Z = z ) − H ( Y | X K , Z = z ) ( b ) = N X t =1 H ( Y t | Y t − 1 , S , V , Z = z ) − N X t =1 H ( Y t | X K ,t , Z = z ) ≤ N X t =1 H ( Y t | S , V , Z = z ) − N X t =1 H ( Y t | X K ,t , Z = z ) ( c ) = N X t =1 H ( Y t | S t , Q t , Z = z ) − N X t =1 H ( Y t | X K ,t , S t , Q t , Z = z ) = N X t =1 I ( X K ,t ; Y t | S t , Q t , Z = z ) = N I ( X K ; Y | S, W , Z = z ) (7.13) = N I p X K W S | Z = z p Y | X K ( X K ; Y | S, W ) 30 where (a) holds b ecause ( S , V ) → ( X K , Z ) → Y forms a Marko v c hain, (b) b ecause the collusion c hannel r emains memoryless eve n when conditioned on Z , and (c) b ecause ( S , V ) = ( S t , Q t ) → ( X K , Z ) → Y t forms a Mark o v c hain for eac h 1 ≤ t ≤ N . F or the error probabilit y P all e ( p Y | X K Z ) to v anish for eac h p Y | X K Z ∈ f W fair K , w e need R ( a ) ≤ lim inf N →∞ min p Y | X K Z ∈ f W fair K 1 N K I ( M K ; Y | S , V , Z ) ( b ) ≤ lim inf N →∞ min p Y | X K Z ∈ f W fair K 1 K I p S W X K Z p Y | X K Z ( X K ; Y | S, W , Z ) ≤ lim N →∞ max p Z ∈ P Z max { p X K W S | Z = z ∈ P X K W S ( p S ,L ( N ) ,D 1 ,z ) } z ∈Z min { p Y | X K ∈ W fair K ( z ) } z ∈Z 1 K X z ∈Z p Z ( z ) I p X K W S | Z = z p Y | X K ( X K ; Y | S, W ) ( c ) = lim N →∞ max z ∈Z max p X K W S | Z = z ∈ P X K W S ( p S ,L ( N ) ,D 1 ,z ) min p Y | X K ∈ W fair K ( z ) 1 K I p X K W S | Z = z p Y | X K ( X K ; Y | S, W ) = lim N →∞ max p X K W S ∈ P X K W S ( p S ,L ( N ) ,D 1 ) min p Y | X K ∈ W fair K ( p X K ) 1 K I p X K W S p Y | X K ( X K ; Y | S, W ) ≤ lim L →∞ max p X K W S ∈ P X K W S ( p S ,L,D 1 ) min p Y | X K ∈ W fair K ( p X K ) 1 K I ( X K ; Y | S, W ) = lim L →∞ max p X K W | S ∈ P X K W | S ( p S ,L,D 1 ) min p Y | X K ∈ W fair K ( p X K ) 1 K I ( X K ; Y | S, W ) = e C all ( D 1 , W fair K ) (7.14) where (a) follo ws fr om (7.1 2 ), (b) from (7.13), and (c) from the fact that in a g ame in whic h Z is a v ariable c hosen by the ﬁrst pla y er (here the em b edder) but kno w n to all pla y ers (embedd er, co llud - ers, receiv er), there can b e no adv antag e in randomizing Z , i.e., a deterministic c hoice of Z suﬃces to achiev e the v alue of the ma xmin game. More formally , equalit y (c) is a direct consequence of the fo llo win g sim p le lemma, using the ﬁngerprint distributor’s feasible set P X K W S ( p S , L ( N ) , D 1 , z ) in plac e of F ( z ), the c olluders ’ feasible set W fair K ( z ) in place of G ( z ), and the conditional m utual information I ( X K ; Y | S, W ) as th e p a yoﬀ function φ . Th is concludes the pro of. ✷ Lemma 7.1 Consider a discr ete set Z and two families of sets F ( z ) , z ∈ Z and G ( z ) , z ∈ Z indexe d by the elements of Z . Then the fol lowing game with p ayoﬀ function φ : V = m ax p ∈ P Z max { f z ∈F ( z ) } z ∈Z min { g z ∈G ( z ) } z ∈Z X z ∈Z p ( z ) φ ( f z , g z ) (7.15) admits a pur e-str ate gy solution, i.e., the maximum over th e p.m.f. p ∈ P Z is achieve d by deter- ministic p . Pr o of : W rite f = { f z } z ∈Z and g = { g z } z ∈Z where eac h f z ∈ F ( z ) and g z ∈ G ( z ). F or eac h ( p, f ), let g ∗ ( p, f ) ac hiev e the m inim um o v er g of the function P z ∈Z p ( z ) φ ( f z , g z ). F or eac h p , let f ∗ ( p ) ac hieve the maxim u m o v er f of the fu nction P z ∈Z p ( z ) φ ( f z , g ∗ z ( p, f )). By insp ectio n of (7.15), the follo win g elemen tary pr op erties hold for eac h p ∈ P Z and z ∈ Z : 31 • Th e min imizing g ∗ z dep ends on ( p, f ) via f z only , and we denote this limite d dep endency explicitly b y g ∗ z ( f z ). The minimizer satisﬁes φ ( f z , g ∗ z ( f z )) = min g z ∈G ( z ) φ ( f z , g z ) . ( 7.16) • Th e maximizing f ∗ z do es n ot dep end on p and satisﬁes φ ( f ∗ z , g ∗ z ( f ∗ z )) = max f z ∈F ( z ) min g z ∈G ( z ) φ ( f z , g z ) . (7.17) Substituting (7.17) in to (7.15), w e obtain V = max p ∈ P Z X z ∈Z p ( z ) φ ( f ∗ z , g ∗ z ( f ∗ z )) = max z ∈Z φ ( f ∗ z , g ∗ z ( f ∗ z )) = max z ∈Z max f z ∈F ( z ) min g z ∈G ( z ) φ ( f z , g z ) whic h prov es the claim. ✷ 8 Pro of of Theorem 3.7: Con v erse Under Detect-One Criterion By Lemma 3.3, it suﬃces to prov e the claim for the comp ound class of m emoryless c hann els W K . Let M N = { 1 , 2 , · · · , 2 N R } . F or notational sim p licit y , assume t wo colluders ( K = 2). Th e pro of extends straig htforw ardly to la rger coalit ions. F or the detect-one criterion, it is suﬃcient to co nsid er deco ding r ules that return exactly one user in dex, i.e., the deco ding rule is a mapping g N : Y N × S N × V N → M N . (8.1) Indeed, consider momen tarily a more ge ner al deco der that returns a list of ac cused users . By deﬁnition of the d etec t-one and false-p ositiv e err or criteria, correct d eco ding o ccurs if and only if the l ist size L ≥ 1 and all u sers on t h e output list are g uilty . One can then construct a new decoder of the form (8.1) that r eturns an arb itrary user if L = 0 and an arb itrary element of the original size- L list if L ≥ 1. Th e correct-decoding eve nt for the original deco der is also a correct-decoding ev en t for the new deco der, and so the new deco der has at le ast the same probabilit y of correct deco ding as th e original deco der. 6 In the fol lowing, we only consider deco ding rules of the form (8.1). Denote b y D i ( s , v ) the decodin g region for user i , i.e., y ∈ D i ( s , v ) ⇔ g N ( y , s , v ) = i, ∀ i ∈ M N . 6 The new deco der performs better than t h e original one i n the ev ent that the list has si ze L ≥ 2 and consists of a mix of guilty and innocent users (an error is t hen d eclared for the orig inal decoder), and the list member selected by new decoder is guilt y (a correct decision is made). 32 The d ecoding regions form a partition of Y N . The a verag e p robabilit y of correct deco ding is giv en b y P c ( f N , g N , p Y | X 1 X 2 ) = P r [ g N ( Y , S , V ) ∈ K ] = 1 2 2 N R X i,j ∈M N X s ∈S N p N S ( s ) X v ∈V N p V ( v ) X y ∈D i ( s ,v ) ∪D j ( s ,v ) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) . (8.2) Without loss of optimalit y w e assume that randomly mo dulated co des (Def. 2.2, Prop. 2.2) are used. The pro of is organized along thirteen steps. An arbitrarily small parameter δ > 0 is chosen. Step 1 deﬁnes for eac h ( s , v ) a set of b ad co dew ords that hav e exp onential ly many neigh b ors within Hamming balls of radius N δ cen tered at these cod ew ords. The remaining co dew ords constitute the so-called go o d set. Step 2 in tro du ces a dense, nested f amily W fair K,δ of su bsets of W fair K indexed b y δ and consisting of “nice c hannels”. An equiv alence is giv en b et we en Hamming distance of tw o co dew ords and statist ical distinguishabilit y of the output of any p Y | X 1 X 2 ∈ W fair K,δ . F or clarit y of the exp osition we initially derive error pr obabilities assu ming that the go od set is l arge and th at b oth colluders are assigned cod ewords in the go o d set; these assumptions are su bsequen tly relaxed in Steps 10 and 11. All the error probabilities up to that p oin t are conditioned on S , V . Step 3 in tro duces the basic random v ariables us ed in th e pro of. Step 4 do es (a) d eﬁne a reference pro duct conditional p.m.f for Y giv en S , V ; (b) asso ciate a conditional self-information to eac h pair of co dew ords; and (c) deﬁne a large set of co dew ord pairs whose cond itional sel f-inf orm atio n is within δ 2 of their av erage v alue. Step 5 deﬁnes a typica l set for Y giv en S , V , and K . Step 6 sho ws th at t ypical sets for go o d co dew ord pairs ha ve w eak o verlap. Step 7 deﬁnes a co llection of reﬁn ed t ypical sets for Y with b ounded o ve rlap. S tep 8 deﬁn es a t ypical set for the host sequence S . Step 9 up p er b ounds the conditional pr obabilit y of correct deco ding in terms of a mutual information. Step 10 deriv es an analo gous result conditioned on the ev ent th at b oth col lud ers a re assigned codewords from the bad set. Step 11 combines the b ounds for go od and bad co dew ords into a single b ound . Step 12 remo v es the conditioning on S , V and upp er b ounds the u nconditional p robabilit y of correct deco ding (8.2) in terms of a m utual information. Step 13 deriv es an upp er b ound on that mutual information and sho ws that an y ac hiev able rate R m ust b e less than half of th e upp er b ound . The pro of is completed by letting δ ↓ 0. Step 1 . Denote b y d H ( x , x ′ ) = N X t =1 1 { x t 6 = x ′ t } the Hamming d istance b etw een tw o sequences x and x ′ in X N , and by M j ( s , v , δ ) = { k ∈ M N : d H ( x j ( s , v ) , x k ( s , v )) ≤ N δ } , j ∈ M N , s ∈ S N , v ∈ V N , 0 ≤ δ ≤ 1 (8.3) the set of indices k for the co dew ords x k ( s , v ) that are within Hamming distance N δ of co dewo rd x j ( s , v ), and b y M j ( s , v , δ ) = |M j ( s , v , δ ) | the cardinalit y of this set. The fun ctio n M j ( s , v , · ) − 1 is akin to a cum ulativ e distance distribu tion. It is nond ecreasing, with M ( s , v , 0) ≥ 1 and M ( s , v , 1) = 33 2 N R . Note that f or S = ∅ and random co des ov er X = { 0 , 1 } , M j ( V , δ ) − 1 is a random v ariable whose exp ecta tion v anishes as N → ∞ for δ < δ GV ( R ), the Gilb ert-V arshamo v distance at rate R [24]. Denote b y M goo d N ( s , v , δ ) = { j ∈ M N : |M j ( s , v , δ ) | ≤ 2 N 3 √ δ } , v ∈ V N , 0 ≤ δ ≤ 1 (8.4) a set of “go od ” ind ices j (there are at m ost 2 N 3 √ δ co dew ords within Hamming distance N δ of co dew ord x j ( s , v )), and b y M bad N ( s , v , δ ) = M N \ M goo d N ( s , v , δ ) = { j ∈ M N : |M j ( s , v , δ ) | > 2 N 3 √ δ } (8.5) the complemen tary set of “bad” ind ices. Note that an y co de with normalized minim um distance δ min > 0 sat iﬁes M j ( s , v , δ ) ≡ 1 a n d th us M goo d N ( s , v , δ ) ≡ M N for all 0 < δ < δ min . Ho wev er the d eriv ations in Steps 2—8 of the pro of mak e no assu mption on the size of the sets M goo d N ( s , v , δ ). Finally , for the RM cod es considered here, the s ets (8.3), (8.4), and (8.5) dep end on the host sequence s only via its t yp e p s . Step 2 . Channels p Y | X 1 X 2 that satisfy p Y | X 1 X 2 ( y | x 1 , x 2 ) = 0 for some y , x 1 , x 2 or p Y | X 1 X 2 ( ·| x 1 , x 2 ) ≡ p Y | X 1 X 2 ( ·| x ′ 1 , x ′ 2 ) for some ( x 1 , x 2 ) 6 = ( x ′ 1 , x ′ 2 ) require sp ecial handling. T o this end, we d eﬁne th e follo wing n ested family of subsets of W fair K , indexed b y 0 < δ ≤ 1 / |Y | : W fair K,δ = n p Y | X 1 X 2 ∈ W fair K : p Y | X 1 X 2 ( y | x 1 , x 2 ) ≥ δ , ∀ y , x 1 , x 2 , , δ ≤ D ( p Y | X 1 = x 1 ,X 2 = x 2 k p Y | X 1 = x ′ 1 ,X 2 = x ′ 2 ) ≤ log δ − 1 , ∀ ( x 1 , x 2 ) 6 = ( x ′ 1 , x ′ 2 ) o (8.6) where the upp er b ound on divergence is implied by the lo wer b ound on p Y | X 1 X 2 . By con tin uity of the correct-decoding pr obabilit y fu nctional (8.2 ) and by the deﬁnition (8.6), w e ha v e e C one ( D 1 , W fair K,δ ) ↓ e C one ( D 1 , W fair K ) as δ ↓ 0 . Denote b y D ij k , 1 N N X t =1 D ( p Y | X 1 = x it ( s ,v ) ,X 2 = x j t ( s ,v ) k p Y | X 1 = x it ( s ,v ) ,X 2 = x kt ( s ,v ) ) (8.7) the normalized conditional K ullbac k-Leibler divergence (giv en s , v ) b et ween the distributions on Y induced by co d ew ord pairs ( i, j ) and ( i, k ), resp ectiv ely . It follo ws from (8.6) that for eac h p Y | X 1 X 2 ∈ W fair K,δ , d H ( x j ( s , v ) , x k ( s , v )) ≤ N δ ⇒ D ij k ≤ δ log δ − 1 . (8.8) and d H ( x j ( s , v ) , x k ( s , v )) > N δ ⇒ D ij k > δ 2 Con versely , D ij k ≤ δ 2 ⇒ d H ( x j ( s , v ) , x k ( s , v )) ≤ N δ . (8.9) 34 When D ij k is small, w e sa y that the c o dewords x j ( s , v ) and x k ( s , v ) are ne arly i ndistingui shable at the c h annel output. F or an y ( s , v , i, j, k ) and p Y | X 1 X 2 ∈ W fair K,δ , (8.8) and (8.9) d escrib e an equiv alence b et ween s tatistical distin gu ish abilit y of tw o co dew ord s and Hamming distance. Step 3 . T o analyze the pr obabilit y of correct deco ding conditioned on the ev en t K ∈ ( M goo d N ( S , V , δ )) 2 that b oth c olluders are assigned g o o d co d ew ords, w e deﬁne the f ollo wing rand om v ariables. Deﬁne Q t = { V , S j , j 6 = t } o ver the alphab et Q N , V N × S N − 1 . W e ha v e ( S t , Q t ) = ( S , V ) for eac h 1 ≤ t ≤ N . S in ce the host sequence t yp e p s together with an y q t , 1 ≤ t ≤ N , uniquely determines s t and th us the pair ( s , v ) (and vice-v ersa), w e ma y also u se ( p s , q ) as an equiv alent represent ation of the pair ( s , v ). Deﬁne a time-sharing random v ariable T u niformly distributed o v er { 1 , 2 , · · · , N } and indep endent of the other ran d om v ariables. Let S = S T , Q = Q T , Y = Y T , and X i = x i,T ( S , V ) , ∀ i ∈ M N . Deﬁne the rand om v ariable X drawn uniformly from { X i , i ∈ M goo d N ( S , V , δ ) } . T he conditional p.m.f of X given S , V , T is giv en b y p X | S V T ( x | s , v , t ) = 1 |M goo d N ( s , v , δ ) | X i ∈M goo d N ( s ,v,δ ) 1 { x it ( s , v ) = x } , ∀ x, s , v , t . (8.10) Giv en s , v , t , the conditional distribution of ( X i , X j , Y ) is p X i X j | S = s ,V = v ,T = t p Y | X 1 X 2 where p X i X j | S V T ( x 1 , x 2 | s , v , t ) = 1 { x it ( s , v ) = x 1 , x j t ( s , v ) = x 2 } , x 1 , x 2 ∈ X , i, j ∈ M goo d N ( s , v , δ ) , 1 ≤ t ≤ N . (8 .11) By (8.10), the a verag e of (8.11) o v er i, j ∈ M goo d N ( s , v , δ ) is the pro duct conditional p.m.f 1 |M goo d N ( s , v , δ ) | 2 X i,j ∈M goo d N ( s ,v,δ ) p X i X j | S V T ( x 1 , x 2 | s , v , t ) = p X | S V T ( x 1 | s , v , t ) p X | S V T ( x 2 | s , v , t ) . (8.12) Step 4 . T h e conditional distribu tion of eac h Y t , 1 ≤ t ≤ N , giv en ( S , V ) and K ∈ ( M goo d N ( S , V , δ )) 2 , is given by p Y t | S V ( y | s , v ) = p Y | S V T ( y | s , v , t ) = 1 |M goo d N ( s , v , δ ) | 2 X i,j ∈M goo d N ( s ,v,δ ) p Y | X 1 X 2 ( y | x it ( s , v ) , x j t ( s , v )) (8.13) = 1 |M goo d N ( s , v , δ ) | 2 X i,j ∈M goo d N ( s ,v,δ ) X x 1 ,x 2 ∈X p X i X j | S V T ( x 1 , x 2 | s , v , t ) p Y | X 1 X 2 ( y | x 1 , x 2 ) = X x 1 ,x 2 ∈X p X | S V T ( x 1 | s , v , t ) p X | S V T ( x 2 | s , v , t ) p Y | X 1 X 2 ( y | x 1 , x 2 ) . (8.14) F or any p erm utation π of { 1 , 2 , · · · , N } w e ha v e p Y π ( t ) | S V ( y | π ( s ) , v ) = p Y t | S V ( y | s , v ) for all RM co des. The pro duct conditional distrib ution r ( y | s , v ) , N Y t =1 p Y t | S V ( y t | s , v ) (8.15) 35 is strongly exc hangeable for eac h v ∈ V N and will b e used a s a r efe r enc e c onditional p.m.f for Y giv en S , V in the s equel. W e also d eﬁ ne th e follo win g conditional self-informations (i.e., m utual information for coa lition ( i, j ) av eraged ov er Y t (resp. Y ) and conditioned on S , V ): θ ij,t ( s , v ) , X y t ∈Y p Y | X 1 X 2 ( y t | x it ( s , v ) , x j t ( s , v )) log p Y | X 1 X 2 ( y t | x it ( s , v ) , x j t ( s , v )) p Y t | S V ( y t | s , v ) , = D ( p Y | X 1 = x it ( s ,v ) ,X 2 = x j t ( s ,v ) k p Y t | S = s ,V = v ) (8.16) θ ij ( s , v ) , 1 N N X t =1 θ ij,t ( s , v ) = 1 N N X t =1 X y t ∈Y p Y | X 1 X 2 ( y t | x it ( s , v ) , x j t ( s , v )) log p Y | X 1 X 2 ( y t | x it ( s , v ) , x j t ( s , v )) p Y t | S V ( y t | s , v ) = 1 N N X t =1 X x 1 ,x 2 ,y 1 { x it ( s , v ) = x 1 , x j t ( s , v ) = x 2 } p Y | X 1 X 2 ( y | x 1 , x 2 ) log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y t | S V ( y | s , v ) = X t,x 1 ,x 2 ,y p T ( t ) p X i X j | S V T ( x 1 , x 2 | s , v , t ) p Y | X 1 X 2 ( y | x 1 , x 2 ) log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y | S V T ( y | s , v , t ) . (8.17) Since p Y | X 1 X 2 is symmetric, the expr essions (8.16) and (8.17) are symmetric in i and j . The a verage of θ ij ( s , v ) o v er all ( i, j ) ∈ ( M goo d N ( s , v , δ )) 2 is the cond itional mutual inf ormation I ( s , v ) , 1 |M goo d N ( s , v , δ ) | 2 X i,j ∈M goo d N ( s ,v,δ ) θ ij ( s , v ) (8.18) = X t,x 1 ,x 2 ,y p T ( t ) p X | S V T ( x 1 | s , v , t ) p X | S V T ( x 2 | s , v , t ) p Y | X 1 X 2 ( y | x 1 , x 2 ) log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y | S V T ( y | s , v , t ) = I p T p 2 X | S V T p Y | X 1 X 2 ( X 1 X 2 ; Y | S = s , V = v, T ) . (8.19) F or RM co des, b oth θ ij ( s , v ) and I ( s , v ) dep end on s on ly via its t yp e p s . Since th e a ve rage v alue of θ ij ( s , v ) is I ( s , v ), there ma y n ot b e to o many pairs ( i, j ) for wh ich θ ij ( s , v ) is well ab o v e the mean. More precisely , ther e exists a symmetric s u bset ˜ A ( s , v , δ ) ⊆ ( M goo d N ( s , v , δ )) 2 of s ize | ˜ A ( s , v , δ ) | ≥ δ 2 δ 2 + I ( s , v ) |M goo d N ( s , v , δ ) | 2 ≥ δ 2 δ 2 + log |Y | |M goo d N ( s , v , δ ) | 2 suc h that ˜ A ( s , v , δ ) dep ends on s only via p s and ( i, j ) ∈ ˜ A ( s , v , δ ) ⇒ θ ij ( s , v ) ≤ I ( s , v ) + δ 2 . (8.20) This claim is seen to hold by con trap ositiv e. If there existed a subset ˜ A c ( s , v , δ ) of size I ( s ,v ) δ 2 + I ( s ,v ) |M goo d N ( s , v , δ ) | 2 or large r s u c h that ∀ ( i, j ) ∈ ˜ A c ( s , v , δ ) : θ ij ( s , v ) > I ( s , v ) + δ 2 36 w e wo uld hav e X i,j ∈M N θ ij ( s , v ) > ( I ( s , v ) + δ 2 ) | ˜ A c ( s , v , δ ) | ≥ |M goo d N ( s , v , δ ) | 2 I ( s , v ) whic h w ould con tradict (8.18). 7 Moreo v er, the in terv al [0 , log |Y | ] is co v ered by the ﬁ n ite collection of in terv als Θ l ,  l δ 2 2 , ( l + 1) δ 2 2  , l = 0 , 1 , · · · ,  2 log |Y | δ 2  , l max of width δ 2 / 2, and at le ast one of these in terv als must conta in man y θ ij ( s , v ). S p eciﬁcally , for some in teger 0 ≤ l < l max there m ust exist a subset A ( s , v , δ ) ⊆ ˜ A ( s , v , δ ) with the follo wing prop erties: ( i, j ) ∈ A ( s , v , δ ) ⇒ θ ij ( s , v ) ∈ Θ l ⇒ | θ ij ( s , v ) − I ( s , v ) | ≤ δ 2 4 , (8.21) I ( s , v ) ,  l + 1 2  δ 2 2 ≤ I ( s , v ) ≤ log |Y | , (8.22) A ( s , v , δ ) is symmetric with size at least equal to |A ( s , v , δ ) | ≥ δ 2 2 log |Y | | ˜ A ( s , v , δ ) | ≥ δ 4 2 log |Y | ( δ 2 + log |Y | ) |M goo d N ( s , v , δ ) | 2 ≥ δ 4 4 log 2 |Y | |M goo d N ( s , v , δ ) | 2 , (8.23) and A ( s , v , δ ) dep ends on s only via p s . T o summarize, the subset A ( s , v , δ ) ⊆ ( M goo d N ( s , v , δ )) 2 has size n early equal to |M goo d N ( s , v , δ ) | 2 and consists of the in dices of the co dew ord pairs whose conditional self-information θ ij ( s , v ) is close to some I ( s , v ) ≤ I ( s , v ). Recalling (8.19) and th e equiv alence of the repr esen tations ( S , V ) and ( S, Q ), we deﬁn e I ( p ′ S , q ) , I p T p ′ S p 2 X | S QT p Y | X 1 X 2 ( X 1 , X 2 ; Y | S, Q = q , T ) ∀ p ′ S ∈ P S , q ∈ Q N (8.24) whic h is a linear f u nctional of p ′ S and coincides w ith I ( s , v ) in (8.19) when p ′ S = p s . Step 5 . Deﬁne the follo wing subset of Y N : e T δ ( s , v , i, j ) , ( y ∈ Y N :      1 N N X t =1 log p Y | X 1 X 2 ( y t | x it ( s , v ) , x j t ( s , v )) p Y t | S V ( y t | s , v ) − θ ij ( s , v )      ≤ δ 2 8 ) (8.25) whic h satisﬁes the s y m metry pr op ert y e T δ ( s , v , i, j ) = e T δ ( s , v , j, i ) and th e letter p ermutat ion- in v ariance prop ert y (for RM co des) y ∈ e T δ ( s , v , i, j ) ⇒ π ( y ) ∈ e T δ ( π ( s ) , v , i, j ) 7 As mentioned b y a review er, the claim could alternatively b e prov en b y applicatio n of Mar ko v’s inequalit y . 37 for an y p erm utation π of { 1 , 2 , · · · , N } . W e show that e T δ ( s , v , i, j ) is a t ypical set for Y conditioned on S = s , V = v , and K = { i, j } , in the follo wing sense: P r [ Y / ∈ e T δ ( s , v , i, j ) | S = s , V = v , K = { i, j } ] ≤ 64 log 2 δ N δ 4 , ∀ s , v , i, j (8.26) v anishes as N → ∞ . Indeed we ma y r ewrite (8.26) as P r [ Y / ∈ e T δ ( s , v , i, j ) | S = s , V = v , K = { i, j } ] = P r  | ˆ θ ij ( s , v ) − θ ij ( s , v ) | ≥ δ 2 8     S = s , V = v , K = { i , j }  (8.27) where ˆ θ ij ( s , v ) , 1 N N X t =1 log p Y | X 1 X 2 ( Y t | x it ( s , v ) , x j t ( s , v )) p Y t | S V ( Y t | s , v ) . (8.28 ) Since Y t , 1 ≤ t ≤ N , are co nd itionally indep end en t giv en S , V , K , ˆ θ ij ( s , v ) is th e a v erage of N random v ariables that are conditionally indep endent giv en S = s , V = v , K = { i, j } . Recal ling (8.16), the conditional exp ectation of these random v ariables is given b y E Y t | S V K  log p Y | X 1 X 2 ( Y t | x it ( s , v ) , x j t ( s , v ) p Y t | S V ( Y t | s , v )  = θ ij,t ( s , v ) , 1 ≤ t ≤ N , (8.29) and a v eraging (8.29) o v er t yields E Y | S V K ( ˆ θ ij ( s , v )) = θ ij ( s , v ). The conditional v ariances of these random v ariables are ζ t ( s , v , i, j ) , v ar Y t | S V K  log p Y | X 1 X 2 ( Y t | x it ( s , v ) , x j t ( s , v ) p Y t | S V ( Y t | s , v )  , 1 ≤ t ≤ N . (8.30) By our assumption (8.6) that p Y | X 1 X 2 ( y | x 1 , x 2 ) ≥ δ for ev ery y , x 1 , x 2 , the argum en t of the log ab o v e is in the range [1 /δ, δ ]. Hence ζ t ( s , v , i, j ) ≤ lo g 2 δ , and v ar Y | S V K ( ˆ θ ij ( s , v )) = 1 N 2 N X t =1 ζ t ( s , v , i, j ) ≤ log 2 δ N . By Cheb ysh ev’s inequalit y , the probabilit y of (8.27) is upp er-b ounded by E Y | S V K [( ˆ θ ij ( s , v ) − θ ij ( s , v )) 2 ] ( δ 2 / 8) 2 = v ar Y | S V K ( ˆ θ ij ( s , v )) ( δ 2 / 8) 2 ≤ 64 log 2 δ N δ 4 whic h establishes (8.26). Step 6 . Deﬁne the follo wing subsets of M goo d N ( s , v , δ ), indexed b y i ∈ M goo d N ( s , v , δ ): M A− goo d N ( s , v , i, δ ) , { j ∈ M goo d N ( s , v , δ ) : ( i, j ) ∈ A ( s , v , δ ) } (8.31) whic h dep end on s only via p s . 38 W e sho w that th e typica l sets e T δ ( s , v , i, j ) , j ∈ M A− goo d N ( s , v , i, δ ), ha ve weak ov erlap for an y ﬁxed s , v , i . Deﬁne th e o verla p factor of the goo d sets at Y = y : M δ ( y , s , v , i ) , X k ∈M A− goo d N ( s ,v,i, δ ) 1 { y ∈ e T δ ( s , v , i, k ) } . (8.32) W e show there exists δ ∗ > 0 such that P r [ M δ ( Y , s , v , i ) > 2 N 3 √ δ | S = s , V = v , K = { i, j } , Y ∈ e T δ ( s , v , i, j )] < 1 N ∀ N > δ − 8 , δ < δ ∗ . (8.33 ) T o do so, deﬁne th e normalized loglik eliho od r atio ˆ D ij k ( Y ) = 1 N log p N Y | X 1 X 2 ( Y | x i ( s , v ) , x j ( s , v )) p N Y | X 1 X 2 ( Y | x i ( s , v ) , x k ( s , v )) . (8.3 4) If Y ∈ e T δ ( s , v , i, j ) ∩ e T δ ( s , v , i, k ) for some j, k ∈ M A− goo d N ( s , v , i, δ ), then ˆ D ij k ( Y ) ≤ | ˆ D ij k ( Y ) | ( a ) ≤ | θ ij ( s , v ) − θ ik ( s , v ) | + 2 × δ 2 8 ( b ) ≤ 3 δ 2 4 (8.35) where in equ alit y (a) follo ws from (8.2 5 ) and (b) from (8.2 1 ) and the fact that b oth ( i, j ) and ( i, k ) are in A ( s , v , δ ). If j ∈ M A− goo d N ( s , v , i, δ ) and Y ∈ e T δ ( s , v , i, j ), it follo ws from (8.32) and (8.35 ) th at M δ ( Y , s , v , i ) = X k ∈M A− goo d N ( s ,v,i, δ ) 1 { Y ∈ e T δ ( s , v , i, j ) ∩ e T δ ( s , v , i, k ) } ≤ ˆ ζ ( Y ) 1 { Y ∈ e T δ ( s , v , i, j ) } (8.36) where w e ha ve deﬁn ed the random v ariable ˆ ζ ( Y ) , X k ∈M A− goo d N ( s ,v,i, δ ) 1  ˆ D ij k ( Y ) ≤ 3 δ 2 4  . (8.3 7) No w recalling the deﬁnition of the norm alized diverge nce D ij k in (8.7), deﬁne ζ , X k ∈M A− goo d N ( s ,v,i, δ ) 1 { D ij k ≤ δ 2 } ( a ) ≤ X k ∈M A− goo d N ( s ,v,i, δ ) 1 { d H ( x j ( s , v ) , x k ( s , v )) ≤ N δ } ≤ X k ∈M N 1 { d H ( x j ( s , v ) , x k ( s , v )) ≤ N δ } ( b ) = |M j ( s , v , δ ) | ( c ) ≤ 2 N 3 √ δ (8.38) 39 where inequalit y (a) follo w s from (8.9 ), (b ) from (8.3), and (c) from (8.4). In App endix C, we sho w that ˆ ζ ( Y ) ≤ ζ with probability approac hing 1 as N → ∞ , and more sp eciﬁcally , P r [ ˆ ζ ( Y ) > ζ | S = s , V = v , K = { i, j } ] < |X | 3 |Y | ( N + 1 ) |X | 3 2 − N δ 7 (8.39) for all δ smaller than some δ ∗∗ > 0. Th en there exists some δ ∗ ∈ (0 , δ ∗∗ ) suc h that P r [ M δ ( Y , s , v , i ) > 2 N 3 √ δ | S = s , V = v , K = { i, j } , Y ∈ e T δ ( s , v , i, j )] ( a ) ≤ P r [ ˆ ζ ( Y ) > 2 N 3 √ δ | S = s , V = v , K = { i, j } , Y ∈ e T δ ( s , v , i, j )] ≤ P r [ ˆ ζ ( Y ) > 2 N 3 √ δ | S = s , V = v , K = { i, j } P r [ Y ∈ e T δ ( s , v , i, j ) | S = s , V = v , K = { i , j } ] ( b ) ≤ P r [ ˆ ζ ( Y ) > 2 N 3 √ δ | S = s , V = v , K = { i, j } 1 − 64 l og 2 δ N δ 4 ( c ) ≤ P r [ ˆ ζ ( Y ) > ζ | S = s , V = v , K = { i, j } 1 − 64 l og 2 δ N δ 4 ( d ) < 1 N ∀ N > δ − 8 , δ < δ ∗ where (a) follo ws from (8.36), (b) from (8 .26 ), (c) from (8.38), and (d) from (8. 39 ). This establishes (8.33). Step 7. W e now prun e th e t ypical sets e T δ ( s , v , i, j ) to exclude the p oints y that lie w ithin more than 2 N 3 √ δ of th e typical sets. F or eac h s , v , i, j , deﬁne the pruned t yp ical set T δ ( s , v , i, j ) , { y ∈ e T δ ( s , v , i, j ) : M δ ( y , s , v , i ) ≤ 2 N 3 √ δ } . ( 8.40) It follo ws from (8.40) and (8.32) that ∀ y , s , v , i : X j ∈M A− goo d N ( s ,v,i, δ ) 1 { y ∈ T δ ( s , v , i, j ) } = M δ ( y , s , v , i ) ≤ 2 N 3 √ δ . (8.41) The pr uned set T δ ( s , v , i, j ) is still t ypical for Y conditioned on S = s , V = v , and K = { i, j } b ecause P r [ Y / ∈ T δ ( s , v , i, j ) | S = s , V = v , K = { i, j } ] ( a ) = P r [ Y / ∈ e T δ ( s , v , i, j ) | S = s , V = v , K = { i, j } ] + P r [ M δ ( Y , s , v , i ) > 2 N 3 √ δ | Y ∈ e T δ ( s , v , i, j ) , S = s , V = v , K = { i, j } ] ( b ) ≤ 64 log 2 δ N δ 4 + 1 N ( c ) ≤ 72 log 2 δ N δ 4 , ∀ s , v , i, j, N > δ − 8 , δ < δ ∗ (8.42) 40 where (a) follo ws from the deﬁnition (8.40), (b) from the inequ aliti es (8.26) and (8.33), and (c) holds b ecause δ < 1 2 . Step 8 . Deﬁne the typica l set for S in the v ariational- distance sense: T δ , { s : d V ( p s , p S ) ≤ δ } . (8.43) W e hav e the inequalit y P r [ S / ∈ T δ ] = X T s : d V ( p s ,p S ) >δ P N S ( T s ) ( a ) ≤ X p s : d V ( p s ,p S ) >δ 2 − N D ( p s k p S ) ( b ) ≤ ( N + 1) |S | max p s : d V ( p s ,p S ) >δ 2 − N D ( p s k p S ) ( c ) ≤ ( N + 1) |S | max p s : D ( p s k p S ) >δ 2 / ln 4 2 − N D ( p s k p S ) ≤ ( N + 1) |S | 2 − N δ 2 / ln 4 (8.44) where in (a) w e ha ve used the upp er b ound of [12, p. 32] on th e p robabilit y of a t yp e class, in (b) the fact that the n u m b er of t yp e classes T s is at most ( N + 1) |S | [12, p. 29] a nd in ( c) Pinsk er’s inequalit y D ( p k q ) ≥ d 2 V ( p, q ) / ln 4 [12, p . 58]. Applying successiv ely (8.24) and (8.4 3 ), w e ha v e | I ( p s , q ) − I ( p S , q ) | =      X s ∈S ( p s ( s ) − p S ( s )) I ( X 1 , X 2 ; Y | S = s, Q = q , T )      ≤ δ max s ∈S I ( X 1 , X 2 ; Y | S = s, Q = q , T ) ≤ δ log |Y | , ∀ s ∈ T δ , q ∈ Q N . (8.45) Step 9 . Give n f N , g N , p Y | X 1 ,X 2 , s , v , w e will b e int erested in sev eral conditional pr obabilitie s that correct decod ing o ccurs in conjun ction with the t ypical even t Y ∈ T δ ( s , v , K ). Deﬁne the follo wing short hands: P c ( i, j | s , v ) = P r [correct deco ding and Y ∈ T δ ( s , v , i, j ) | S = s , V = v , K = { i, j } ] = X y ∈ T δ ( s ,v,i, j ) ∩ ( D i ( s ,v ) ∪D j ( s ,v )) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) , (8.46) P goo d c ( s , v ) = P r [correct decodin g and Y ∈ T δ ( s , v , K ) | S = s , V = v , K ∈ ( M goo d N ( s , v , δ )) 2 ] = 1 |M goo d N ( s , v , δ ) | 2 X i,j ∈M goo d N ( s ,v,δ ) P c ( i, j | s , v ) . (8.47) Note that P c ( i, j | s , v ) dep ends on s only via its t yp e p s (b ecause RM codes are used). 41 The conditional probabilit y of correct deco ding giv en ( s , v ) and th e even t that b oth colluders are assigned go o d co dew ords is P goo d c ( s , v ) = P r [correct d ecodin g | S = s , V = v , K ∈ ( M goo d N ( s , v , δ )) 2 ] ( a ) ≤ P goo d c ( s , v ) + P r [ Y / ∈ T δ ( s , v , K ) | S = s , V = v , K ∈ ( M goo d N ( s , v , δ )) 2 ] ( b ) ≤ P goo d c ( s , v ) + 72 log 2 δ N δ 4 , ∀ N > δ − 8 , δ < δ ∗ (8.48) where (a) an d (b) follo w from (8.4 7 ) and (8.42), resp ectiv ely . F or an y subset B ⊆ ( M goo d N ( s , v , δ )) 2 , p ossibly d ep enden t on s , v , we also deﬁne P c ( B | s , v ) , 1 |B | X ( i,j ) ∈B P c ( i, j | s , v ) . (8.49) Com binin g (8.47) and (8.49), we ha v e P goo d c ( s , v ) = |B | |M goo d N ( s , v , δ ) | 2 P c ( B | s , v ) + 1 − |B | |M goo d N ( s , v , δ ) | 2 ! P c (( M goo d N ( s , v , δ )) 2 \ B | s , v ) | {z } ≤ 1 ≤ 1 − |B | |M goo d N ( s , v , δ ) | 2 (1 − P c ( B | s , v )) . (8.50) Applying this in equalit y to B = A ( s , v , δ ) and using the cardinalit y b ound (8.23) yields P goo d c ( s , v ) ≤ 1 − δ 4 4 log 2 |Y | (1 − P c ( A ( s , v , δ ) | s , v )) . (8.51) Un til this p oint , no assumption h as m ade on the size of the set M goo d N ( s , v , δ ). W e now assume (this assumption w ill b e relaxed in Steps 10 and 11 of the pro of ) that |M goo d N ( s , v , δ ) | ≥ 2 N [ R − δ 2 / 3] . (8.52) Hence (8.23 ) imp lies |A ( s , v , δ ) | > 2 N (2 R − δ 2 ) for N larger than some N 0 ( δ ). Then we o btain the 42 sphere-pac king inequalit y P c ( A ( s , v , δ ) | s , v ) ( a ) = 1 |A ( s , v , δ ) | X ( i,j ) ∈A ( s ,v, δ ) P c ( i, j | s , v ) ( b ) = 1 |A ( s , v , δ ) | X ( i,j ) ∈A ( s ,v, δ ) X y ∈T δ ( s ,v,i, j ) ∩ ( D i ( s ,v ) ∪D j ( s ,v )) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) ( c ) = 2 |A ( s , v , δ ) | X ( i,j ) ∈A ( s ,v, δ ) X y ∈T δ ( s ,v,i, j ) ∩D i ( s ,v ) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) < 2 − N (2 R − δ 2 )+1 X ( i,j ) ∈A ( s ,v, δ ) X y ∈T δ ( s ,v,i, j ) ∩D i ( s ,v ) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) ( d ) ≤ 2 − N (2 R − δ 2 )+1 2 N R X i =1 X j ∈M A− goo d N ( s ,v,i, δ ) X y ∈T δ ( s ,v,i, j ) ∩D i ( s ,v ) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) ( e ) ≤ 2 − N (2 R − δ 2 )+1 2 N R X i =1 X j ∈M A− goo d N ( s ,v,i, δ ) 2 N [ θ ij ( s ,v )+ δ 2 / 8] X y ∈T δ ( s ,v,i, j ) ∩D i ( s ,v ) r ( y | s , v ) ( f ) ≤ 2 − N (2 R − δ 2 − I ( s ,v ) − δ 2 − δ 2 / 8)+1 2 N R X i =1 X j ∈M A− goo d N ( s ,v,i, δ ) X y ∈T δ ( s ,v,i, j ) ∩D i ( s ,v ) r ( y | s , v ) = 2 − N (2 R − I ( s ,v ) − 17 δ 2 / 8)+1 2 N R X i =1 X j ∈M A− goo d N ( s ,v,i, δ ) X y ∈D i ( s ,v ) 1 { y ∈ T δ ( s , v , i, j ) } r ( y | s , v ) ( g ) ≤ 2 − N (2 R − I ( s ,v ) − 17 δ 2 / 8)+1 2 N R X i =1 X y ∈D i ( s ,v ) r ( y | s , v ) | {z } =1 2 N 3 √ δ < 2 − N (2 R − I ( s ,v ) − 3 δ 2 − 3 √ δ ) ∀ N > 8 7 δ 2 (8.53) where (a) follo w s from (8.46), (b) from (8.4 6 ), (c) holds b ecause the deco ding sets D i ( s , v ) are disjoin t, and b ecause of the symmetry of p Y | X 1 X 2 , A ( s , v , δ ), and T δ ( s , v , i, j ); (d) holds b ecause A ( s , v , δ ) ⊆ { ( i, j ) : j ∈ M A− goo d N ( s , v , i, δ ) } ; (e) follo ws from (8.25) and (8.15); and (f ) and (g ) follo w from (8.20) and (8.41 ), r esp ectiv ely . Com binin g (8.48), (8.51), and (8.53) yields P goo d c ( s , v ) ≤ 1 − δ 4 4 log 2 |Y | (1 − 2 − N (2 R − I ( s ,v ) − 3 δ 2 − 3 √ δ ) ) + 72 log 2 δ N δ 4 , ∀ N > δ − 8 , δ < δ ∗ . (8.54) Observe that for all 2 R > I ( s , v ) + δ log |Y | + 3 δ 2 + 3 √ δ , the conditional correct-deco ding proba- bilit y P goo d c ( s , v ) . 1 − δ 4 4 log 2 |Y | is b ounded a wa y from 1 as N → ∞ , under the assumption that |M goo d N ( s , v , δ ) | ≥ 2 N [ R − δ 2 / 3] . 43 Step 10. W e no w relax the assumption (8.52). If (8.52) does not hold, then |M bad N ( s , v , δ ) | ≥ 2 N R (1 − 2 − N δ 2 / 3 ). F urther a ssu me b oth colluders are assigned bad cod ew ords. (This last assu m ption is relaxed in S tep 11). Analogo usly to (8.48), we d eﬁ ne the conditional probability of correct deco ding, P bad c ( s , v ) , P r [correct deco ding | S = s , V = v, K ∈ ( M bad N ( s , v , δ )) 2 ] . (8.55) W e show that this probabilit y v anishes as N → ∞ , for an y rate R > 0 and any c hannel p Y | X 1 X 2 . In particular, for δ < 1 4000 w e ha ve P bad c ( s , v ) ≤ 2 N δ 2 ∀ s , v . (8.56) The pro of of (8.56) uses the same tec hn iqu es as in Steps 3, 4, 5, 9, and is giv en in App endix D. Step 11 . W e ﬁ nally consider the most general scenario in whic h a mix of go o d co dewo rd s and bad co dewo rd s is u sed, and the mix dep end s on ( s , v ). By a pp licati on of the inequalit y P r [ A ] ≤ P r [ A ∩ B ] + P r [ B c ] for an y t wo ev en ts A and B , w e obtain th e f ollo win g t wo upp er b ound s on the correct-decodin g probabilit y , conditioned on S = s and V = v : P c ( s , v ) = P r [correct deco ding | S = s , V = v ] ≤  P r [ K / ∈ ( M goo d N ( s , v , δ )) 2 ] + P goo d c ( s , v ) P r [ K / ∈ ( M bad N ( s , v , δ )) 2 ] + P bad c ( s , v ) . (8.57) Let β N ( s , v ) , 2 − N R |M bad N ( s , v , δ ) | ∈ [0 , 1] b e the fraction of bad co dew ords . S u bstituting (8.56) in to (8.57) yields P c ( s , v ) ≤ min n 1 − (1 − β N ( s , v )) 2 + P goo d c ( s , v ) , 1 − β 2 N ( s , v ) o + 2 N δ 2 . The ﬁrst argumen t of min {· , ·} increases with β N and the second decreases. The v alue of β N ( s , v ) that maximizes th e expr ession ab o ve is the equalizer, 1 2 [1 − P goo d c ( s , v )], an d thus we obtain P c ( s , v ) ≤ 1 − [1 − P goo d c ( s , v )] 2 4 + 2 N δ 2 , ∀ β N ( s , v ) ∈ [0 , 1] . (8.58) Hence if P goo d c ( s , v ) is b ounded a wa y from 1, so is P c ( s , v ). In particular, if (8.52) do es not hold, then β N ( s , v ) ≥ 1 − 2 − N δ 2 / 3 , and P c ( s , v ) ≤ 1 − (1 − 2 − N δ 2 / 3 ) 2 + 2 N δ 2 whic h v anish es as N → ∞ for all R > 0. Con versely , if (8.52) holds, then so do es the upp er b ound (8.54) on P goo d c ( s , v ). Using th is upp er b ound together w ith the in equalit y ( b − a ) 2 ≥ b 2 − 2 ac whic h is v alid for 0 < a < b < c , w e obtain [1 − P goo d c ( s , v )] 2 ≤          δ 4 4 log 2 |Y | | {z } = c (1 − 2 − N (2 R − I ( s ,v ) − 3 δ 2 − 3 √ δ ) ) | {z } = b − 72 log 2 δ N δ 4 | {z } = a          2 ≥ δ 8 16 log 4 |Y | (1 − 2 − N (2 R − I ( s ,v ) − 3 δ 2 − 3 √ δ ) ) 2 − 36 log 2 δ N log 2 |Y | , ∀ N > δ − 8 , δ < δ ∗ . 44 Com binin g this inequalit y with (8.58) yields P c ( s , v ) ≤ 1 − δ 8 64 log 4 |Y | (1 − 2 − N (2 R − I ( s ,v ) − 3 δ 2 − 3 √ δ ) ) 2 + 9 log 2 δ N log 2 |Y | + 2 N δ 2 , ∀ N > δ − 8 , δ < δ ∗ . (8.59) Step 12. W e shall maximize the u pp er b ound of (8.5 9 ) ov er s ∈ T δ and v ∈ V N , which amounts to maximizing I ( s , v ) in the exp onen t. In view of the equiv alence of the repr esen tations ( s , v ) and ( p s , q ), and r ecall ing (8.24), w e ha ve max s ∈ T δ max v ∈V N I ( s , v ) = max p s : d V ( p s ,p S ) ≤ δ max q ∈Q N I ( p s , q ) ≤ max q ∈Q N I ( p S , q ) + δ log |Y | (8.60) where the inequ alit y follo ws from (8.45). The probabilit y of correct decod in g satisﬁes P c ( f N , g N , p Y | X 1 X 2 ) = X s ∈S N p N S X v ∈V N p V ( v ) P c ( s , v ) ≤ P r [ S / ∈ T δ ] + max s ∈T δ max v ∈V N P c ( s , v ) ( a ) ≤ ( N + 1) |S | 2 − N δ 2 / ln 4 + 9 log 2 δ N log 2 |Y | + 2 N δ 2 +1 − δ 8 64 log 4 |Y |  1 − max s ∈T δ max v ∈V N 2 − N (2 R − I ( s ,v ) − 3 δ 2 − 3 √ δ )  ( b ) ≤ ( N + 1) |S | 2 − N δ 2 / ln 4 + 9 log 2 δ N log 2 |Y | + 2 N δ 2 +1 − δ 8 64 log 4 |Y | (1 − 2 − N (2 R − max q I ( p S ,q ) − δ log |Y |− 3 δ 2 − 3 √ δ ) ) , ∀ N > δ − 8 , δ < δ ∗ (8.61) where (a) follo ws from (8. 44 ) and (8 .59 ) and (b ) from (8.60). Thus for all 2 R > max q I ( p S , q ) + δ log |Y | + 3 δ 2 + 3 √ δ , P c ( f N , g N , p Y | X 1 X 2 ) . 1 − δ 8 64 log 4 |Y | as N → ∞ (8.62) is b ounded aw ay from 1. Step 13. W e no w b oun d max q I ( p S , q ) in (8 .61 ) b y a quant ity that d oes n ot dep end on N . Since I p Q p T p S p 2 X | S QT p Y | X 1 X 2 ( X 1 , X 2 ; Y | S, Q, T ) = X q ∈Q N p Q ( q ) I ( p S , q ) , 45 w e ha ve max q ∈Q N I ( p S , q ) = m ax p Q ∈ P Q I p Q p T p S p 2 X | S QT p Y | X 1 X 2 ( X 1 , X 2 ; Y | S, Q, T ) ( a ) ≤ max p QT ∈ P QT I p QT p S p 2 X | S QT p Y | X 1 X 2 ( X 1 , X 2 ; Y | S, Q, T ) ( b ) = max p W ∈ P W I p W p S p 2 X | S W p Y | X 1 X 2 ( X 1 , X 2 ; Y | S, W ) (8.63) where (a) h olds b ecause th e maximization is o v er a larger domain ( p QT is no w unconstrained o ver W N , Q N × { 1 , 2 , · · · , N } ), and (b) is obtained b y deﬁning the random v ariable W = ( Q, T ) ∈ W N . Moreo v er max p W ∈ P W I ( X 1 , X 2 ; Y | S, W ) ≤ su p L →∞ max p W ∈ P W I ( X 1 , X 2 ; Y | S, W ) = lim L →∞ max p W ∈ P W I ( X 1 , X 2 ; Y | S, W ) (8.64) where the alphab et for W in the right side is { 1 , 2 , · · · , L } , and the suprem um and the limit are equal b ecause the sup remand is nondecreasing in L . Com binin g (8.61), (8.63), and (8.64), w e conclude th at P ∗ c ( f N , g N , W fair K,δ ) , min p Y | X 1 X 2 ∈ W fair K,δ P c ( f N , g N , p Y | X 1 X 2 ) (8.65) is b ounded aw ay from 1 as N → ∞ for all δ ∈ (0 , δ ∗ ) and all sequences of codes ( f N , g N ) of rate R > 1 2 " min p Y | X 1 X 2 ∈ W fair K,δ lim L →∞ max p W ∈ P W I ( X 1 , X 2 ; Y | S, W ) + δ log |Y | + 3 δ 2 + 3 √ δ # . Letting δ ↓ 0, we conclude that reliable decod ing is p ossible only if R ≤ min p Y | X 1 X 2 ∈ W fair K lim L →∞ max p W ∈ P W 1 2 I ( X 1 , X 2 ; Y | S, W ) = lim L →∞ min p Y | X 1 X 2 ∈ W fair K max p W ∈ P W 1 2 I ( X 1 , X 2 ; Y | S, W ) = lim L →∞ max p W ∈ P W min p Y | X 1 X 2 ∈ W fair K 1 2 I ( X 1 , X 2 ; Y | S, W ) where the s eco n d equalit y holds by application of the minimax theorem: the mutual information functional is linear (hence conca ve) in p W and con v ex in p Y | X 1 X 2 , and the domains of p W and p Y | X 1 X 2 are conv ex. Since the ab o ve in equalit y holds for all feasible p X | S W , w e obtain R ≤ lim L →∞ max p X 1 X 2 W | S ∈ P X 1 X 2 W | S ( p S ,L,D 1 ) min p Y | X 1 X 2 ∈ W fair K 1 2 I ( X 1 , X 2 ; Y | S, W ) = e C one ( D 1 , W fair K ) . 46 This concludes th e p ro of. 8 ✷ 9 Pro of of Theorem 4.1 W e deriv e the error exp onents for the th r eshold decision rule (4.1 ). By symm etry of th e co deb o ok construction, the error probabilities will b e indep end ent of K . Without loss of optimalit y , we assume that K = K = { 1 , 2 , · · · , K } . Recalling that W = { 1 , 2 , · · · , L } , denote by P [ N ] X W ( L ) the set of join t t yp es o v er X × W . Deﬁne P [ N ] Y X K | W ( p xw , W K , R, L, m ) =  p yx K | w : p x K | w ∈ M ( p x | w ) , p y | x K ∈ W K ( p x K ) , I ( x m ; y | w ) ≤ R  ˜ E psp ,m,N ( R, L, p xw , W K ) = min p yx K | w ∈ P [ N ] Y X K | W ( p xw , W K ,R,L,m ) D ( p yx K | w k p y | x K p K x | w | p w ) (9.1) ˜ E psp ,N ( R, L, p xw , W K ) = max m ∈ K ˜ E psp ,m,N ( R, L, p xw , W K ) , (9.2) ˜ E psp ,N ( R, L, p xw , W K ) = min m ∈ K ˜ E psp ,m,N ( R, L, p xw , W K ) (9.3) and E psp ,N ( R, L, W K ) = max p xw ∈ P [ N ] X W ( L ) ˜ E psp , 1 ,N ( R, L, p xw , W fair K nom ) . (9.4) Denote b y p ∗ xw the maximizer ab o ve (whic h implicitly dep end s on R ) and b y T ∗ xw the corresp onding t yp e class. Let E psp ,N ( R, L, W K ) = ˜ E psp ,N ( R, L, p ∗ xw , W K ) , ( 9.5) E psp ,N ( R, L, W K ) = ˜ E psp ,N ( R, L, p ∗ xw , W K ) . ( 9.6) The expr essions (9.1)—(9.6) d iﬀer from (4.3)—(4.8) in that the optimizations are p erformed o ve r t yp es instead of general p .m .f .’s. W e h a ve lim N →∞ E psp ,N ( R, L, W K ) = E psp ( R, L, W K ) (9.7) lim N →∞ E psp ,N ( R, L, W K ) = E psp ( R, L, W K ) (9.8) b y (2.11) and con tin uity of the dive rgence and mutual-information f u nctionals. With the j oin t t yp e class T ∗ xw sp eciﬁed belo w (9.4), w e no w restate the co ding and decoding sc heme. 8 The case of more than tw o colluders would be treated a s fo llows . Say there are three co lluders. The deﬁnition of the restricted class of c han n els (8.6) w ould b e extended as fo llows: W fair K,δ = n p Y | X 1 X 2 X 3 ∈ W fair K : p Y | X 1 X 2 X 3 ( y | x 1 , x 2 , x 3 ) ≥ δ , ∀ y , x 1 , x 2 , x 3 , , δ ≤ D ( p Y | X 1 = x 1 ,X 2 = x 2 ,X 3 = x 3 k p Y | X 1 = x ′ 1 ,X 2 = x ′ 2 ,X 3 = x ′ 3 ) ≤ log δ − 1 , ∀ ( x 1 , x 2 , x 3 ) 6 = ( x ′ 1 , x ′ 2 , x ′ 3 ) o Then the notions of equ iv alence of H amming distance and sta tistical i nd istinguishabilit y of t w o co dewo rds a pp ly similarly to the case of tw o colluders, as does the k ey p roperty of b ounded ov erlap of th e typical sets, and the deriv ation of the sphere-p acking i neq ualit y in Step 9. 47 Co deb o ok . A rand om constant-c omp osition co de C ( w ) = { x m , 1 ≤ m ≤ 2 N R } is generate d for ea ch w ∈ T ∗ w b y dra wing 2 N R sequences indep endent ly and uniformly from the conditional t y p e class T ∗ x | w . Enco der . A sequen ce w is drawn un iformly from T ∗ w and shared with the receiv er. User m is assigned cod ew ord x m from C ( w ), for 1 ≤ m ≤ 2 N R . Deco der . Giv en ( y , w ), the deco d er p lace s us er m on the guilt y list if I ( x m ; y | w ) > R + ∆. Collusion Channel. The r an d om cod e describ ed ab o v e is a RM co de. By Prop. 2.2 , it is suf- ﬁcien t to restrict our att entio n to strongly exc hangeable collusion channels for the error probability analysis. Recall from (2.1 6 ) and (2.1 7 ) that for suc h c hann els, p Y | X K ( ˜ y | x K ) = P r [ T y | x K ] | T y | x K | ≤ 1 | T y | x K | 1 { p y | x K ∈ W K ( p x K ) } , ∀ ˜ y ∈ T y | x K . (9.9) Error Exp onen ts. The deriv ation is b ased on the follo wing t wo asymptotic equalities wh ic h are sp ecial cases of (10.12) and (10.16 ) pr ov en later. 1) Fix w and y and d ra w x uniform ly from a ﬁxed conditional typ e class T ∗ x | w , ind ep enden tly of y . Th en for an y ν ≥ 0, P r [ I ( x ; y | w ) ≥ ν ]  ≤ 2 − N ν . (9.10) 2) Fix w , dra w x m , m ∈ K , i.i.d. uniform ly from a ﬁ xed conditional t yp e cla ss T x | w , and then dra w Y uniformly from th e t yp e cla ss T y | x K . F or an y str ongly exc h an geable collusion channel, for an y m ∈ K and ν ≥ 0 , we hav e P r [ I ( x m ; y | w ) ≤ ν ] . = exp 2 {− N ˜ E psp ,m,N ( ν, L, p xw , W K ) } . (9. 11) (i). F alse Pos itives. A false p ositive o ccurs if ∃ m / ∈ K : I ( x m ; y | w ) > R + ∆ . (9.12) By construction of the c o deb o ok, x m is cond itionally indep endent of y giv en w , f or ea ch m / ∈ K . There are at most 2 N R − K p ossible v alues for m in (9.12). Hence the probabilit y of false p ositive s, conditioned on th e j oin t t yp e class T yx K w , is P FP ( T yx K w , W K ) = P r [ ∃ m / ∈ K : I ( x m ; y | w ) > R + ∆] ( a ) ≤ (2 N R − K ) P r X [ I ( x ; y | w ) > R + ∆] ( b )  ≤ 2 N R 2 − N ( R +∆) = 2 − N ∆ (9.13) where (a) follo ws fr om the un ion b ound, and (b) f r om (9.1 0 ) with ν = R + ∆. Av eraging o v er all t yp e classes T yx K w , we obtain P FP  ≤ 2 − N ∆ , from whic h (4.9) follo ws. (ii). Detect-One Error Criterion . (Miss all colluders.) W e ﬁ r st deriv e the error exp onent for the ev ent that the deco der m iss es a sp eciﬁc colluder m ∈ K . An y coalitio n ˆ K that contai n s m fails the test (4.1), i.e., for an y such ˆ K , I ( x m ; y | w ) ≤ R + ∆ . (9.14) 48 The pr obabilit y of the miss- m ev en t, giv en the joint t yp e p ∗ xw , is therefore up p er-b ounded by the probabilit y of the ev en t (9.14). F rom (9.11) we obtain p miss − m ( p ∗ xw , W K ) ≤ P r [ I ( x m ; y | w ) ≤ R + ∆] ( a )  ≤ exp 2 {− N ˜ E psp ,m,N ( R + ∆ , L, p ∗ xw , W K ) } . (9.15) The miss-all even t is the in tersection of the miss- m even ts ov er m ∈ K . Its p robabilit y is p miss − all ( p ∗ xw , W K ) = P r " \ m ∈ K { miss m | p ∗ xw } # ≤ m in m ∈ K p miss − m ( p ∗ xw , W K ) ( a ) . = m in m ∈ K exp 2 {− N ˜ E psp ,m,N ( p ∗ xw , R + ∆ , L, W K ) } ( b ) . = exp 2 {− N E psp ,N ( R + ∆ , L, W K ) } ( c ) . = exp 2 {− N E psp ( R + ∆ , L, W K ) } w ere (a) follo w s from (9.15), (b) from (9.2) and (9.5), and (c) fr om (9.7). (iii). Detect-All Error C riterion. (Miss Some C olluders.) The miss-some even t is the u nion of the miss - m ev ents o v er m ∈ K . Its probabilit y is p miss − some ( p ∗ xw , W K ) = P r " [ m ∈ K { miss m | p ∗ xw } # ≤ X m ∈ K p miss − m ( p ∗ xw , W K ) . = max m ∈ K exp 2 {− N ˜ E psp ,m,N ( R + ∆ , L, p ∗ xw , W K ) } ( a ) = exp 2 {− N E psp ,N ( R + ∆ , L, W K ) } ( b ) . = exp 2 {− N E psp ( R + ∆ , L, W K ) } where (a) follo ws from (9.3) and (9.6), and (b) from (9.8). (iv). F air C ollusion Channels. Recall (4.2), restated here for conv enience: P Y X K | W ( p X W , W K , R, L, m ) ,  ˜ p Y X K | W : ˜ p X K | W ∈ M ( p X | W ) , ˜ p Y | X K ∈ W K ( ˜ p X K ) , I ˜ p Y X K | W p W ( X m ; Y | W ) ≤ R o , m ∈ K . The union of these sets o ve r m , P ∗ ( W K ) , \ m ∈ K P Y X K | W ( p X W , W K , R, L, m ) (9 .16) 49 is con v ex and p ermutati on-inv arian t b ecause so is W K , by assu mption. Combining (9.16 ), (4.2), and (4.3), we m a y w rite (4.4) as ˜ E psp ( R, L, p X W , W K ) = min ˜ p Y X K | W ∈ P ∗ ( W K ) D ( ˜ p Y X K | W k ˜ p Y | X K p K X | W | p W ) . (9.17) F or an y ˜ p Y X K | W ∈ P ∗ ( W K ) and p ermutation π of K , deﬁne the p erm uted conditional p.m.f. ˜ p π Y X K | W ( y , x K | w ) = ˜ p Y X K | W ( y , x π ( K ) | w ) and the p erm utation-a v eraged p.m.f. ˜ p fair Y X K | W = 1 K ! P π ˜ p π Y X K | W whic h a lso b elongs to t h e co nv ex set P ∗ ( W K ). W e similarly deﬁ ne ˜ p π Y | X K and ˜ p fair Y | X K . Observe that D ( ˜ p π Y X K | W k ˜ p π Y | X K p K X | W | p W ) is indep endent of π . By conv exit y of Kullb ac k-Leibler divergence , this implies D ( ˜ p fair Y X K | W k ˜ p fair Y | X K p K X | W | p W ) ≤ 1 K ! X π D ( ˜ p π Y X K | W k ˜ p π Y | X K p K X | W | p W ) = D ( ˜ p Y X K | W k ˜ p Y | X K p K X | W | p W ) . (9.18) Therefore the m inim um in (9.17) is ac h ieved b y a p erm utation-in v arian t ˜ p Y X K | W = ˜ p fair Y X K | W , and the same min im um w ould ha ve b een obtained if W K had b een r eplaced with W fair K . Hence ˜ E psp ( R, L, p X W , W K ) = ˜ E psp ( R, L, p X W , W fair K ) . Substituting in to (4.7) and (4.1 0 ), w e obtain E one ( R, L, W K , ∆) = E one ( R, L, W fair K , ∆) . (v). The equalit y E one ( R, L, W fair K , ∆) = E all ( R, L, W fair K , ∆) is straigh tforw ard b ecause ˜ E psp ,m ( R, L, p X W , W fair K ) in (4 .3 ) i s the same for all m ∈ K , and th us E psp ( R, L, W fair K ) = E psp ( R, L, W fair K ). (vi). P ositiv e E rror Exp onents. F rom P art (v) ab ov e, we ma y r estrict our atte ntion to W K = W fair K . Cons id er any W = { 1 , · · · , L } and p W that is p ositiv e o v er its su pp ort set (if it is not, red u ce the v alue of L accordingly .) F or any m ∈ K , th e minimand in the expression (4.3) for ˜ E psp ,m ( R, L, p X W , W fair K ) is zero if and only if ˜ p Y X K | W = ˜ p Y | X K p K X | W , with ˜ p Y | X K ∈ W fair K ( ˜ p X K ) . Suc h ˜ p Y X K | W is feasible f or (4.2) if and only if ( p X W , ˜ p Y | X K ) is such th at I ( X m ; Y | W ) ≤ R . It is not feasible, and th us a p ositiv e exp onen t E one is guaran teed, if R < I ( X 1 ; Y | W ). Th e suprem um of all s uc h R is give n by (4.12) and is achiev ed b y letting ∆ → 0 and L → ∞ . ✷ 50 10 Pro of of Theorem 5.2 W e deriv e the error exp onents for the MPMI decision r ule (5.7). Again by s y m metry of the co deb o ok construction, the er r or probabilities will b e in d ep enden t of K . Without loss of optimalit y , we assume that K = K = { 1 , 2 , · · · , K } . W e hav e also deﬁned W = { 1 , 2 , · · · , L } . Deﬁne for all A ⊆ K P [ N ] Y X K | S W ( p w , p s | w , p x | sw , W K , R, L, A ) =  p yx K | sw : p x K | sw ∈ M ( p x | sw ) , p y | x K ∈ W K ( p x K ) , ◦ I ( x A ; yx K \ A | sw ) ≤ | A | R  (10.1) ˘ E psp , A ,N ( R, L, p w , p s | w , p x | sw , W K ) = min p yx K | sw ∈ P [ N ] Y X K | S W ( p w , p s | w , p x | sw , W K ,R,L, A ) D ( p yx K | sw k p y | x K p K x | sw | p sw ) , (10.2) ˆ E psp , A ,N ( R, L, p w , p s | w , p x | sw , W K ) = D ( p s | w k p S | p w ) + ˘ E psp , A ,N ( R, L, p w , p s | w , p x | sw , W K ) = min p yx K | sw ∈ P [ N ] Y X K | S W ( p w , p s | w , p x | sw , W K ,R,L, A ) D ( p yx K | sw p s | w k p y | x K p K x | sw p S | p w ) , (10.3) ˆ E psp ,N ( R, L, p w , p s | w , p x | sw , W K ) = ˆ E psp , K ,N ( R, L, p w , p s | w , p x | sw , W K ) , (10.4) ˆ E psp ,N ( R, L, p w , p s | w , p x | sw , W K ) = min A ⊆ K ˆ E psp , A ,N ( R, L, p w , p s | w , p x | sw , W K ) , (10.5) E psp ,N ( R, L, D 1 , W K ) = max p w ∈ P [ N ] W min p s | w ∈ P [ N ] S | W max p x | sw ∈ P [ N ] X | S W ( p sw ,L,D 1 ) ˆ E psp , K ,N ( R, L, p w , p s | w , p x | sw , W fair K nom ) , (10.6) where the second equalit y in (10.3) is obtained by application of the c hain rule for div ergence. Denote b y p ∗ w and p ∗ x | sw the maximizers in (10.6), the latte r viewed as a function of p s | w . Moreo v er, b oth p ∗ w and p ∗ x | sw implicitly dep end on R and W fair K nom . Denote b y T ∗ w and T ∗ x | sw the corresp onding typ e an d conditional t yp e classes. Let E psp ,N ( R, L, D 1 , W K ) = min p s | w ˆ E psp ,N ( R, L, p ∗ w , p s | w , p ∗ x | sw , W K ) (10.7) E psp ,N ( R, L, D 1 , W K ) = min p s | w ˆ E psp ,N ( R, L, p ∗ w , p s | w , p ∗ x | sw , W K ) . (10.8) The exp onen ts (10.3)—(10.8) diﬀer from (5.11)—(5.16) in that the optimizations are p er f ormed o v er conditional t yp es instead of general conditional p.m.f.’s. W e hav e lim N →∞ E psp ,N ( R, L, D 1 , W K ) = E psp ( R, L, D 1 , W K ) (10.9) lim N →∞ E psp ,N ( R, L, D 1 , W K ) = E psp ( R, L, D 1 , W K ) (10.10) b y (2.11) and con tin uity of the dive rgence and mutual-information f u nctionals. Co deb o ok . F or eac h w ∈ T ∗ w and s ∈ S N , a cod eb o ok C ( s , w ) = { x m , 1 ≤ m ≤ 2 N R } is generated b y drawing 2 N R random v ectors indep end en tly and uniformly from T ∗ x | sw . 51 Enco der . A sequen ce w is dra wn uniformly from T ∗ w and shared with the decoder. Giv en s and w , user m is assigned co dew ord x m ∈ C ( s , w ). Deco der . The decodin g r ule is the MPMI r ule of (5.7). Collusion C ha nnel. T h is random cod e is a RM co de, hence by app licat ion of Prop. 2.2, it is suﬃcien t to restrict our atte ntion to s trongly exchangea ble collusion c hannels. Error Probabilit y Analysis. T o analyze the error pr obabilit y f or our random-co ding sc heme under strongly exc hangeable collusion c hannels, we will aga in u s e the b ound (9.9) as well a s the follo wing three prop erties, whic h originate from the basic inequalities (1.1) and (1.2). 1) Fix ( s , w ) and z ∈ Z N , and d ra w x K = { x m , m ∈ K } i.i.d. u niformly f r om a c ond itional typ e class T x | sw , in dep enden tly of z . W e ha v e the asymptotic equalit y P r [ T x K | zsw ] = | T x K | zsw | | T x | sw | K . = 2 − N [ K H ( x | sw ) − H ( x K | zsw )] = 2 − N ◦ I ( x K ; z | sw ) (10.11 ) where the last equality is du e to (5.2). Then P r [ ◦ I ( x K ; z | sw ) ≥ ν ] = X T x K | zsw P r [ T x K | zsw ] 1 { ◦ I ( x K ; z | sw ) ≥ ν } . = X T x K | zsw 2 − N ◦ I ( x K ; z | sw ) 1 { ◦ I ( x K ; z | sw ) ≥ ν } . = max T x K | zsw 2 − N ◦ I ( x K ; z | sw ) 1 { ◦ I ( x K ; z | sw ) ≥ ν }  ≤ 2 − N ν . (10.12 ) 2) Fix w and dra w s i.i.d. p S . W e ha ve [12] P r [ T s | w ] . = 2 − N D ( p s | w k p S | p w ) . (1 0.13) 3) Fix ( s , w ), draw x k , k ∈ K , i.i.d. un iformly fr om a conditional type class T x | sw , and then d r a w Y u niformly from a single conditional t yp e class T y | x K . W e ha ve P r [ T yx K | sw ] = P r [ T y | x K sw ] P r [ T x K | sw ] = | T y | x K sw | | T y | x K | | T x K | sw | | T x | sw | K . = 2 − N [ H ( y | x K ) − H ( y | x K sw )] 2 − N [ K H ( x | sw ) − H ( x K | sw )] = exp 2  − N [ I ( y ; sw | x K ) + ◦ I ( x 1 ; · · · ; x K | sw )]  . (10.14 ) Consider the tw o terms in b rac k ets ab o ve. The ﬁ rst one ma y b e written as I ( y ; sw | x K ) = D ( p ysw | x K k p y | x K p sw | x K | p x K ) = D ( p yswx K k p y | x K p swx K ) = D ( p yx K | sw k p y | x K p x K | sw | p sw ) 52 and the second one as ◦ I ( x 1 ; · · · ; x K | sw ) = D ( p x K | sw k p K x | sw | p sw ) . By applicati on of the c hain rule for diverge nce, the su m of these t w o terms is D ( p yx K | sw k p y | x K p K x | sw | p sw ). Substituting in to (10.14), we obtain P r [ T yx K | sw ] . = exp 2 n − N D ( p yx K | sw k p y | x K p K x | sw | p sw ) o . (10.15) In the deriv ation b elo w w e us e the shorthan d e ( p yx K | sw ) to represen t the exponential ab o v e, and ﬁx T x | sw = T ∗ x | sw . F or an y feasible, strongly exc hangeable co llusion channel, for an y A ⊆ K and ν > 0, conditioning on w ∈ T ∗ w and s ∈ S N , w e ha v e P r  ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ν  ( a ) ≤ X feasible T yx K | sw P r [ T yx K | sw ] 1 { ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ν } ( b ) . = X feasible p yx K | sw e ( p yx K | sw ) 1 { ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ν } ( c ) . = max feasible p yx K | sw e ( p yx K | sw ) 1 { ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ν } = max p yx K | sw : p x K | sw ∈ M ( p ∗ x | sw ) , p y | x K ∈ W K e ( p yx K | sw ) 1 { ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ν } = max p yx K | sw : p x K | sw ∈ M ( p ∗ x | sw ) , p y | x K ∈ W K , ◦ I ( x A ; yx K \ A | sw ) ≤| A | ν e ( p yx K | sw ) ( d ) = max p yx K | sw ∈ P [ N ] Y X K | S W ( p ∗ w , p s | w , p ∗ x | sw , W K ,ν,L, A ) e ( p yx K | sw ) ( e ) = exp 2 n − N ˘ E psp , A ,N ( ν, L, p ∗ w , p s | w , p ∗ x | sw , W K ) o (10.16 ) where (a) follo ws f rom (9.9), (b) from (10.15), (c) from the fact that the num b er of conditional t yp es is p olynomial in N , (d) from (10.1), and (e) from (10.2). (i). F alse Pos itives. A false p ositive o ccurs if ˆ K \ K 6 = ∅ . By app lication of (5.8), w e ha v e ∀A ⊆ ˆ K : ◦ I ( x A ; yx ˆ K\A | sw ) > |A| ( R + ∆) . (10.17 ) Denote b y B the set of co llud er indices m ∈ K that are correctly iden tiﬁed by t he deco der, and b y A , ˆ K \ B the complement set, w hic h is comprised of all incorrectly accused users and has cardinalit y |A| ≥ 1. By construction of the cod eb ook, x A is indep endent of y and x B . The probabilit y of the ev en t (10.17) is upp er-b oun d ed b y the probabilit y of th e larger ev ent ∃A 6⊆ K , ∃B ⊆ K : ◦ I ( x A ; yx B | sw ) > |A| ( R + ∆) . (10.18 ) 53 Hence the probabilit y of f alse p ositive s, conditioned on T yx K sw , satisﬁes P FP ( T yx K sw , W K ) ≤ P r   [ |A|≥ 1 [ B⊆ K  ∃A 6⊆ K : ◦ I ( x A ; yx B | sw ) > |A| ( R + ∆)    = P r   [ |A|≥ 1  ∃A 6⊆ K : max B⊆ K ◦ I ( x A ; yx B | sw ) > |A| ( R + ∆)    = P r   [ |A|≥ 1  ∃A 6⊆ K : ◦ I ( x A ; yx K | sw ) > |A| ( R + ∆)    ( a ) ≤ X |A|≥ 1 2 N |A| R P r  ◦ I ( x A ; yx K | sw ) > |A| ( R + ∆)  ( b ) . = X |A|≥ 1 2 N |A| R 2 − N |A| ( R +∆) = X |A|≥ 1 2 − N |A| ∆ . = 2 − N ∆ (10.19 ) where (a) follo ws from the union b oun d , and (b) from (10.12) with x A and yx K in place of x K and z , resp ectiv ely . Av eraging o v er all j oint type classes T yx K sw , w e obtain P FP  ≤ 2 − N ∆ , from which (5.17) follo w s. (ii). Det ect-All Error Criterion. (Miss Some Colluders.) Und er the detect-all err or eve nt, any coal ition ˜ K that c ontains K fails the test. By (5.8), this im p lies that ∃A ⊆ ˜ K : ◦ I ( x A ; yx ˜ K\A | sw ) ≤ |A| ( R + ∆) . (10.20 ) In particular, for ˜ K = K = K w e ha ve ∃ A ⊆ K : ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ( R + ∆) . (1 0.21) The pr obabilit y of the miss-some ev ent , conditioned on ( s , w ), is therefore upp er b oun ded b y the 54 probabilit y of the ev en t (10.21): p miss − some ( p ∗ w p s | w , p ∗ x | sw , W K ) ≤ P r   [ A ⊆ K  ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ( R + ∆)    ≤ X A ⊆ K P r  ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ( R + ∆)  ( a )  ≤ X A ⊆ K exp 2 n − N ˘ E psp , A ,N ( R + ∆ , L, p ∗ w , p s | w , p ∗ x | sw , W K ) o . = max A ⊆ K exp 2 n − N ˘ E psp , A ,N ( R + ∆ , L, p ∗ w , p s | w , p ∗ x | sw , W K ) o = exp 2  − N min A ⊆ K ˘ E psp , A ,N ( R + ∆ , L, p ∗ w , p s | w , p ∗ x | sw , W K )  (10.22 ) where (a) follo ws from (10.16) with ν = R + ∆. Av eraging o ve r S , w e obtain p miss − some ( W K ) = X p s | w P r [ T s | w ] p miss − some ( p ∗ w p s | w , p ∗ x | sw , W K ) ( a ) . = max p s | w exp 2  − N  D ( p s | w k p S | p ∗ w ) + min A ⊆ K ˘ E psp , { m } ,N ( R + ∆ , L, p ∗ w , p s | w , p ∗ x | sw , W K )  ( b ) = max p s | w exp 2 n − N ˆ E psp ,N ( R + ∆ , L, p ∗ w , p s | w , p ∗ x | sw , W K ) o ( c ) = exp 2  − N E psp ,N ( R + ∆ , L, D 1 , W K )  ( d ) . = exp 2  − N E psp ( R + ∆ , L, D 1 , W K )  whic h prov es (5.18). Here (a) f ollo ws fr om (10.13) and (10 .22 ), (b ) fr om the deﬁnitions (10 .5 ) and (10.3), (c) f r om (10.8), and (d) from th e limit pr op ert y (10.10). (iii). Det ect-One C rit erion. (Miss All Colluders.) Under the detect-o ne error even t, either the estimated coalitio n ˆ K is empty , or it is a set I of inn ocent u sers (disjoint with K ). Hence P one e ≤ P r [ ˆ K = ∅ ] + P r [ ˆ K = I ]. The ﬁrst probabilit y , conditioned on ( s , w ), is b oun ded as 9 P r [ ˆ K = ∅ ] = P r [ ∀K ′ : M P M I ( K ′ ) ≤ 0] ≤ P r [ M P M I ( K ) ≤ 0] = P r [ ◦ I ( x K ; y | sw ) ≤ K ( R + ∆)] (10 .23) . = exp 2 n − N ˘ E psp , K ,N ( R + ∆ , L, p ∗ w , p s | w , p ∗ x | sw , W K ) o . 9 Using the b ound min K ′ ⊆K P r [ M P M I ( K ′ ) ≤ 0] wo uld not strengthen the in equalit y in (10.23). 55 T o b ound the s eco n d pr obabilit y , we use prop erty (5.9) with ˆ K = I and A = K . W e obtain ◦ I ( x K ; yx I | sw ) ≤ K ( R + ∆) Since ◦ I ( x K ; yx I | sw ) = ◦ I ( x K ; y | sw ) + I ( x K ; x I | ysw ) ≥ ◦ I ( x K ; y | sw ) com bining the t w o inequ aliti es ab o v e yields ◦ I ( x K ; y | sw ) ≤ K ( R + ∆) . The probabilit y of this ev en t is again giv en by (10.23); w e conclude that p miss − all ( p ∗ w p s | w , p ∗ x | sw , W K ) . = exp 2 n − N ˘ E psp , K ,N ( R + ∆ , L, p ∗ w , p s | w , p ∗ x | sw , W K ) o . Av eraging o ve r S and pro ceeding as in P art (ii) ab o v e, we obtain p miss − all ( W K ) ≤ X p s | w P r [ T s | w ] p miss − all ( p ∗ w p s | w , p ∗ x | sw , W K ) . = exp 2  − N E psp ( R + ∆ , L, D 1 , K , W K )  whic h establishes (5.19). (iv). F air C ollusion Channels. Th e pro of p arallels that of Theorem 4.1, P art (iv). Deﬁne P ∗ ( W K ) , P Y X K | S W ( p W , ˜ p S | W , p X | S W , W K , R, L, K ) (10.24 ) whic h is con ve x and p ermutatio n-inv arian t. Then wr ite (5.12) as ˜ E psp ( R, L, p W , ˜ p S | W , p X | S W , W K ) = min ˜ p Y X K | S W ∈ P ∗ ( W K ) D ( ˜ p Y X K | S W k ˜ p Y | X K p K X | S W | ˜ p S | W p W ) . (10.25 ) F or a ny ˜ p Y X K | S W ∈ P ∗ ( W K ) and p ermutatio n π of K , deﬁne the p ermuted conditional p.m.f. ˜ p π Y X K | S W and the p erm utation-a v eraged p.m.f. ˜ p fair Y X K | S W = 1 K ! P π ˜ p π Y X K | S W , whic h also b elongs to the co nv ex set P ∗ ( W K ). W e similarly deﬁne ˜ p π Y | X K and ˜ p fair Y | X K . The conditional div ergence D ( ˜ p π Y X K | S W k ˜ p π Y | X K p K X | S W | ˜ p S | W p W ) is in d ep enden t of π . By con ve xity , we obtain D ( ˜ p fair Y X K | S W k ˜ p fair Y | X K p K X | S W | ˜ p S | W p W ) ≤ D ( ˜ p Y X K | S W k ˜ p Y | X K p K X | S W | ˜ p S | W p W ) . (10.26) Therefore the minim um in (10.2 5 ) is ac hiev ed b y a p ermuta tion-inv arian t ˜ p Y X K | S W = ˜ p fair Y X K | S W , and the same minimum would h a ve b een obtained if W K had b een replaced with W fair K . Hence ˜ E psp ( R, L, p W , ˜ p S | W , p X | S W , W K ) = ˜ E psp ( R, L, p W , ˜ p S | W , p X | S W , W fair K ) . Substituting in to (5.15) and (5.19 ), we obtain E one ( R, L, D 1 , W K , ∆) = E one ( R, L, D 1 , W fair K , ∆) . 56 (v). Detect-All E rror Exp onen t for F air Collusion Channels. Using (5.10) and (5.11 ), observ e that ˜ E psp in (5.1 3 ) ma y b e wr itten as ˜ E psp ( R, L, p W , p S | W , p X | S W , W K ) = m in ˜ p Y X K | S W ∈ P ∗ ( W K ) D ( ˜ p Y X K | S W ˜ p S | W k ˜ p Y | X K p K X | S W p S | p W ) (10.27) where P ∗ ( W K ) ,  ˜ p Y X K | S W : ˜ p X K | S W ∈ M ( p X | S W ) , ˜ p Y | X K ∈ W K ( ˜ p X K ) , min A ⊆ K 1 | A | ◦ I ( X A ; Y X K \ A | S W ) ≤ R  . Similarly to the d iscussion b elo w (10.25), when W K = W fair K the minim u m o ver ˜ p Y X K | S W in (10.27) is ac hieved by a p erm utation-in v arian t conditional p.m.f. Next w e sho w that K minimizes 1 | A | ◦ I ( X A ; Y X K \ A | S W ) ov er A ⊆ K . Indeed 1 | A | ◦ I ( X A ; Y X K \ A | S W ) = 1 | A | " X m ∈ A H ( X m | S W ) + H ( Y X K \ A | S W ) − H ( Y X K | S W ) # = H ( X | S W ) − 1 | A | H ( X A | Y X K \ A S W ) ( a ) ≥ H ( X | S W ) − 1 | K | H ( X K | Y S W ) = 1 | K | ◦ I ( X K ; Y | S W ) (10.28 ) where (a) follo ws from (3.2) with Z = ( Y , S, W ). Using (10.28) and (10.24), we obtain P ∗ ( W fair K ) = P ∗ ( W fair K ). Hence ˜ E psp ( R, L, p W , ˜ p S | W , p X | S W , W fair K ) = min ˜ p Y X K | S W ∈ P ∗ ( W fair K ) D ( ˜ p Y X K | S W ˜ p S | W k ˜ p Y | X K p K X | S W p S | p W ) = ˜ E psp ( R, L, p W , ˜ p S | W , p X | S W , W fair K ) and therefore E all ( R, L, D 1 , W fair K , ∆) = E one ( R, L, D 1 , W fair K , ∆) . (vi). P ositiv e Error Exp onen ts. Consider any W = { 1 , · · · , L } and p W that is p ositiv e o ver its supp ort set (if it is not, reduce the v alue o f L accordingly .) F or an y A ⊆ K , the dive r gence to b e minimized in the expression (5.11) for ˜ E psp , A ( R, L, p W , ˜ p S | W , p X | S W , W K ) is zero if and only if ˜ p Y X K | S W = ˜ p Y | X K p K X | S W and ˜ p S | W = p S . These p.m.f.’s are feasible for (5.10) if and only if the resulting I ( X A ; Y X K \ A | S W ) ≤ | A | R . They are infeasible, and thus p ositiv e error exp onen ts are guarantee d , if R < min A ⊆ K 1 | A | I ( X A ; Y X K \ A | S W ) . 57 F rom Part (iv) abov e, we may restrict our atten tion to W K = W fair K under the detect-one criterion. Sin ce th e p.m.f. of ( S, W, X K , Y ) is p erm utation-in v arian t, b y app lication of (3.3) we ha ve min A ⊆ K 1 | A | I ( X A ; Y X K \ A | S W ) = 1 K I ( X K ; Y | S W ) . (10.2 9) Hence the suprem um of all R for error exponents are p ositiv e is giv en by e C one ( D 1 , W K ) in (3 .8 ) and is obtained by letting ∆ → 0 and L → ∞ . F or an y W K , u nder the detect -all criterion, the sup rem um of all R for wh ic h error exp onents are p ositiv e is giv en by e C all ( D 1 , W K ) in (3.9) and is obtained by letting ∆ → 0 and L → ∞ . Since the optimal p.m.f. is not necessarily permutation-in v arian t, (10.29) do es not hold in ge n eral. How ever, if W K = W fair K , the same capacit y is obtained for the detect-one and detect-all problems. ✷ 11 Conclusion W e ha ve d eriv ed exact ﬁngerprinting capacit y form ulas as opp osed to b ounds derived in recent pap ers [4, 5, 10], and constr u cted a univ ersal ﬁngerpr in ting sc heme. A distinguishing feature of this new scheme is th e use of an aux iliary “ time-sharing” randomized s equence W . The analysis sho ws that optimal coalitions a re fair and that capacit y and rand om-co ding e xp on ents are the same whether the p roblem is formulated as catc hing one colluder or all of th em. Our study also a llo ws u s to reexamine previous ﬁngerprinting s y s tem d esigns fr om a new angle. First, randomization of the enco der via W is generally needed b ecause the p a yoﬀ fu nction in th e m utual-information game is n onconca ve with r esp ect to p X | S . Thus capacit y is obtained as the v alue of a mutual-information game with p X W | S as the maximizing v ariable. This has motiv ated the construction of our rand omized ﬁn gerprin ting sc heme, whic h may also b e though t of as a generalizat ion of T ardos’ d esign [9]. Two other randomization metho ds are also fund amental: randomized p ermutati on of user indices to ens u re that maxim um e rr or probabilit y (ov er all p ossible coaliti ons) equals av erage error pr obabilit y; and randomized p ermutat ion of th e letters { 1 , 2 , · · · , N } to cope with collusion c hannels with arbitrary memory . Second, single-user d ecoders are s im p le b ut sub optimal. Su c h decoders assign a s core to e ac h user based on h is individual ﬁ ngerprin t and t h e r ecei ved data, and decl are guilt y th ose u s ers wh ose score exceeds some threshold. While this is a reasonable approac h, p erformance can b e impr o ved b y making join t decisions about th e coa lition. S imilarly , the ﬁngerprinting schemes prop osed in [9] and in m uc h of the s ignal p ro cessing literature migh t b e impro ved by adopting a join t-decision principle, at th e exp en s e of increased decod ing complexit y . Finally , s ev er al inf ormation-theo retic approac hes to ﬁngerprinting hav e b een studied in the t wo y ears since this p ap er w as su bmitted for pub licati on, including work on sph erical ﬁngerprinting b y the author [25] an d his co w orkers W ang [26] and J ourdas [27], on blind ﬁ n gerprin ting [6, 28], on binary ﬁ ngerprin ting u nder the Boneh-Sha w mo del by Amiri and T ardos [29], Huang and Moulin [30, 31], and F u ron a nd P ´ e rez-F reire [32], as well as researc h on t w o-lev el ﬁngerprinti ng c o des b y An thapadmanabh an and Barg [33]. P articularly notew orth y is [29], whic h p r esen ts a random co d ing sc heme closely relate d to ours, with a join t deco der (impro ving on T ardos’ earlier wo rk [9 ]) th at maximizes a p enalized empirical 58 m utual information criterion, similarly to P lotnik and Satt’s un iv ersal d ecoder for the random MA C [11]. Amiri and T ardos use ordinary empirical mutual information instead o f our emp irical m utual information ◦ I of K v ariables. While b oth choic es are capacit y-ac hieving, our s is geared to w ards obtaining b etter error exp onen ts, as is th e case for th e classical MA C decod ing pr oblem [23]. The pap er [29] also outlines the proof of a con verse theorem f or the so-cal led we ak ﬁngerprinting mo del , in whic h a help er discloses all colluders except one to the deco der. Ac kno wledgments. The author is v ery grateful to Dr. Ying W ang for r eading sev eral dr afts of this pap er and making commen ts and suggestions that hav e imp ro v ed it. He also thanks Y en-W ei Huang, Dr. Pr asanth An thapadmanabhan, Profs. Barg and T ardos, and the anon ymous r eview ers for helpful commen ts and corrections; a nd Pr of. Ra ymond Y eu ng a n d an anonymous r eview er o f [28] for bringing references [22] and [11], resp ectiv ely , to our atten tion. 59 A Pro of of Lemma 3.1 Due to the p er mutation-in v arian t assumption on th e join t p .m.f. of ( X K , Z ), it suﬃces the e stablish (3.1) for A = { 1 , · · · , k − 1 } and B = { 1 , · · · , k } , where 2 ≤ k ≤ K . The claim then fo llo ws b y induction o v er k . Let Z k = ( Z , X N k +1 ), hence Z k − 1 = ( Z k , X k ). Then (3.1) take s the form 1 k − 1 H ( X k − 1 1 | Z, X N k ) ≤ 1 k H ( X k 1 | Z, X N k +1 ) or equiv alen tly ( k − 1) H ( X k 1 | Z k ) ≥ k H ( X k − 1 1 | Z k X k ) , 2 ≤ k ≤ K . (A.1) And indeed the diﬀeren ce b etw een left and righ t sides of (A.1 ) satisﬁes ( k − 1) H ( X k 1 | Z k ) − k H ( X k − 1 1 | Z k X k ) = ( k − 1)[ H ( X k | Z k ) + H ( X k − 1 1 | Z k X k )] − k H ( X k − 1 1 | Z k X k ) = ( k − 1) H ( X k | Z k ) − H ( X k − 1 1 | Z k X k ) ( a ) = k − 1 X i =1 H ( X i | Z k ) − H ( X k − 1 1 | Z k X k ) ( b ) ≥ H ( X k − 1 1 | Z k ) − H ( X k − 1 1 | Z k X k ) = I ( X k − 1 1 ; X k | Z k ) ( c ) ≥ 0 where (a) holds b ecause the conditional p.m.f .’s p X i | Z k , 1 ≤ i ≤ k , are iden tical due to the p erm u- tation inv ariance assump tion. Inequalities (b) and (c) hold with equ ality when X i , 1 ≤ i ≤ k , are conditionally indep endent giv en Z k . Similarly , to establish (3.2), it suﬃces to prov e that ( k − 1) H ( X k 1 | Z ) ≤ k H ( X k − 1 1 | Z ) . (A.2) W e hav e ( k − 1) H ( X k 1 | Z ) − k H ( X k − 1 1 | Z ) = ( k − 1)[ H ( X k − 1 1 | Z ) + H ( X k | Z, X k − 1 1 )] − k H ( X k − 1 1 | Z ) = ( k − 1) H ( X k | Z, X k − 1 1 ) − H ( X k − 1 1 | Z ) ( a ) = k − 1 X i =1 H ( X i | Z, X i − 1 1 , X k i +1 ) − H ( X k − 1 1 | Z ) ( b ) = k − 1 X i =1 H ( X i | Z, X i − 1 1 , X k i +1 ) − k − 1 X i =1 H ( X i | Z, X i − 1 1 ) = − k − 1 X i =1 I ( X i ; X k i +1 | Z, X i − 1 1 ) ≤ 0 where in (a) we ha v e u sed the p erm utation inv ariance of the d istribution of X k 1 , and in (b) the c hain rule for en trop y . ✷ 60 B Pro of of Lemma 3.3 The deriv ation b elo w is giv en in terms of the detect-one criterion b ut applies straigh tforw ardly to the detect-all criterion as well . Denote by C one memoryless ( D 1 , W K ) the comp ound capacit y under the detect-one criterion. T o prov e the claim C one ( D 1 , W K ) ≤ C one memoryless ( D 1 , W K ) , (B.1) it suﬃces to iden tify a family of collusion c hannels satisfying th e almost-sure ﬁdelit y constrain t (2.10) and for w hic h reli able deco d ing i s imp ossible at rates ab ov e C one memoryless ( D 1 , W K ). F or an y x K , consider the class W ǫ K ( p x K ) , ( ˜ p Y | X K ∈ P Y | X K : min p Y | X K ∈ W K ( p x K ) max x K ,y | ˜ p Y | X K ( y | x K ) − p Y | X K ( y | x K ) | ≤ ǫ ) , ǫ ≥ 0 , (B.2) whic h is sligh tly large r than W K ( p x K ) but shrin ks to wards W K ( p x K ) as ǫ ↓ 0. Con tinuit y of mutual information and the mapp ing W K ( · ) with resp ect to v ariational distance (p er (2.11) implies that C one ( D 1 , W ǫ K ) ↑ C one ( D 1 , W K ) as ǫ ↓ 0 . (B.3) W e no w clai m that if the coalitio n selects a memoryless channel p Y | X K ∈ W K ( p x K ), the constrain t p y | x K ∈ W ǫ K ( p x K ) is satisﬁed w ith pr obabilit y app roac h in g 1 as N → ∞ : ∀ ǫ > 0 ∃ N 0 ( ǫ ) : P r [ p y | x K ∈ W ǫ K ( p x K )] ≥ 1 − ǫ ∀ N > N 0 ( ǫ ) . (B.4) T o sho w this, deﬁne the set E =  x K : min x K ∈X K p x K ( x K ) ≥ ǫ |X | − K  . Without loss of generalit y 10 assume f N is suc h that P r [ x K ∈ E ] ≥ 1 − ǫ/ 2 (B.5) where the probabilit y is tak en with r esp ect to M K , S , V . F or an y x K ∈ E , x K ∈ X K , y ∈ Y , if y is generated conditionally i.i.d. p Y | X K , the random v ariable p y | x K ( y | x K ) conv erges in probabilit y to p Y | X K ( y | x K ) as N → ∞ . Hence P Y | X K = x K  max x K ,y | p y | x K ( y | x K ) − p Y | X K ( y | x K ) | ≤ ǫ  ≥ 1 − ǫ/ 2 , ∀ x K ∈ E (B.6) for an y N > N 0 ( ǫ ). Com binin g (B.5) and (B.6), we obtain (B.4). A lo wer b ou n d on error probability is obtained when a h elp er pro vides s ome information to the deco d er. Assume the constrain t on the coal ition is sligh tly r elaxe d s o that they are allo w ed to pro duce pirated copies that violate t h e constrain t p y | x K ∈ W ǫ K ( p x K ) with p r obabilit y at most ǫ , as in (B.4). I n this e ven t, the h elp er rev eals the entire coalition to the deco der. This contributes a t most ǫK N R bits of information to the d ecoder and d oes not increase the deco der’s error probab ility . Hence C one ( D 1 , W ǫ K ) + ǫK ≤ C one memoryless ( D 1 , W K ) . Com binin g this inequalit y with (B.3) establishes (B.1). ✷ 10 One may a lwa ys “ﬁll in” ea ch cod ew ord x m with 2 ǫ |X | − K N dummy sym b ols dra wn from the uniform p.m.f. on X to ensure that (B.5) holds. The ra te loss d u e to the “ﬁll-in” sym b ols va nishes as ǫ → 0. 61 C Pro of of (8.39) The quan tities ˆ ζ ( Y ) and ζ are deﬁn ed in (8.37) and (8.3 8 ), r esp ectiv ely . W e ﬁrst analyze P r  1  ˆ D ij k ( Y ) ≤ 3 δ 2 4  > 1 { D ij k ≤ δ 2 } | S = s , V = v , K = { i, j }  = P r  ˆ D ij k ( Y ) ≤ 3 δ 2 4 and D ij k > δ 2 | S = s , V = v , K = { i, j }  ≤ P r  ˆ D ij k ( Y ) < D ij k − δ 2 4 | S = s , V = v , K = { i, j }  (C.1) for an y k ∈ M A− goo d N ( s , v , i, δ ). The shorthand P r denotes the probabilit y distribution on ˆ D ij k ( Y ) induced by the conditional d istribution p N Y | X 1 X 2 ( Y | x i ( s , v ) , x j ( s , v )). C onditioned on S = s , V = v , K = { i, j } , the normalized loglik eliho o d ˆ D ij k ( Y ) of (8.3 4 ) is the a verage of N in dep enden t random v ariables. W e show that ˆ D ij k ( Y ) con verges in pr obabilit y (and exp onen tially with N ) to its exp ectati on D ij k of (8.7). W e may write ˆ D ij k ( Y ) as a fun ction of th e join t type p Yx i x j x k of the quadru ple ( Y , x i ( s , v ) , x j ( s , v ) , x k ( s , v )): ˆ D ij k ( Y ) = D ( p Yx i x j x k ) , X y , x 1 ,x 2 ,x ′ 2 p Yx i x j x k ( y , x 1 , x 2 , x ′ 2 ) log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y | X 1 X 2 ( y | x 1 , x ′ 2 ) . (C.2) Similarly , from (8.7) w e obtain D ij k = D ( p x i x j x k ) , X y , x 1 ,x 2 ,x ′ 2 p x i x j x k ( x 1 , x 2 , x ′ 2 ) p Y | X 1 X 2 ( y | x 1 , x 2 ) log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y | X 1 X 2 ( y | x 1 , x ′ 2 ) . (C.3) Subtracting (C.3) from (C.2) yields ˆ D ij k ( Y ) − D ij k = X y , x 1 ,x 2 ,x ′ 2 p x i x j x k ( x 1 , x 2 , x ′ 2 )[ p Y | x i x j x k ( y | x 1 , x 2 , x ′ 2 ) − p Y | X 1 X 2 ( y | x 1 , x 2 )] × log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y | X 1 X 2 ( y | x 1 , x ′ 2 ) = X y , x 1 ,x 2 ,x ′ 2 U ( y , x 1 , x 2 , x ′ 2 ) p Y | X 1 X 2 ( y | x 1 , x 2 ) log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y | X 1 X 2 ( y | x 1 , x ′ 2 ) whic h is a linear combinatio n of the random v ariables U ( y , x 1 , x 2 , x ′ 2 ) , p x i x j x k ( x 1 , x 2 , x ′ 2 )  p Y | x i x j x k ( y | x 1 , x 2 , x ′ 2 ) p Y | X 1 X 2 ( y | x 1 , x 2 ) − 1  . Note that for eac h x 1 , x 2 , x ′ 2 , the min im um of U ( y , x 1 , x 2 , x ′ 2 ) o v er y ∈ Y is nonp ositiv e. 62 Owing to (8.6 ), w e also h a ve X y p Y | X 1 X 2 ( y | x 1 , x 2 ) log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y | X 1 X 2 ( y | x 1 , x ′ 2 ) = D ( p Y | X 1 = x 1 ,X 2 = x 2 k p Y | X 1 = x 1 ,X 2 = x ′ 2 ) ∈ [ δ, log δ − 1 ] . Hence ˆ D ij k ( Y ) − D ij k ≥ (log δ − 1 ) min y , x 1 ,x 2 ,x ′ 2 U ( y , x 1 , x 2 , x ′ 2 ) . (C.4) In th e sequel we omit the conditioning on s , v , i, j , for conciseness of notation. W e b ound (C.1) b y P r  ˆ D ij k ( Y ) < D ij k − δ 2 4  ( a ) ≤ P r  min y , x 1 ,x 2 ,x ′ 2 U ( y , x 1 , x 2 , x ′ 2 ) < − δ 2 4 log δ − 1  ( b ) ≤ |X | 3 |Y | max y , x 1 ,x 2 ,x ′ 2 P r  U ( y , x 1 , x 2 , x ′ 2 ) < − ǫ  (C.5) where (a) follo ws from (C.4); and in (b) w e ha v e used the union b ound and the s horthand ǫ = δ 2 4 log δ − 1 . Denote by D b ( α k p ) = α ln α p + (1 − α ) ln 1 − α 1 − p , 0 < α < 1 , the large-deviations function for the Bernoulli random v ariable with probability p . Note that D b ((1 − ǫ ) p k p ) ∼ p 2 ǫ 2 2(1 − p ) as ǫ ↓ 0 and that p 2 1 − p > δ 2 1 − δ for all δ < p < 1 − δ . Deﬁne f ( δ ) = min δ ≤ p ≤ 1 − δ D b ((1 − ǫ ) p k p ). W e ha v e f ( δ ) ∼ D b ((1 − ǫ ) δ k δ ) ∼ ǫ 2 δ 2 2(1 − δ ) = δ 6 32(1 − δ ) log 2 δ − 1 ≫ δ 7 as δ ↓ 0 (C.6) hence there exists δ ∗ > 0 suc h that f ( δ ) > δ 7 for all 0 < δ < δ ∗ . Deﬁne th e sh orthand β = p x i x j x k ( x 1 , x 2 , x ′ 2 ) ∈ [0 , 1]. F or eac h i, j, k and eac h y , x 1 , x 2 , x ′ 2 , the coun t β N p Y | x i x j x k ( y | x 1 , x 2 , x ′ 2 ) = N p Yx i x j x k ( y , x 1 , x 2 , x ′ 2 ) = N X t =1 1 { Y t = y , x it = x 1 , x j t = x 2 , x k t = x 3 } is a binomial r andom v ariable with β N trials and probability p , p Y | X 1 X 2 ( y | x 1 , x 2 ) ∈ [ δ , 1 − δ ]. By (8.6), w e ha v e δ ≤ p ≤ 1 − δ . Next P r [ U ( y , x 1 , x 2 , x ′ 2 ) < − ǫ ] = P r  β  p Y | x i x j x k ( y | x 1 , x 2 , x ′ 2 ) p Y | X 1 X 2 ( y | x 1 , x 2 ) − 1  < − ǫ  = P r  Bi( β N , p ) N p − β < − ǫ  . 63 F or β ≤ ǫ , this probabilit y is zero. F or ǫ < β ≤ 1, w e ha v e P r [ U ( y , x 1 , x 2 , x ′ 2 ) < − ǫ ] = P r  β  p Y | x i x j x k ( y | x 1 , x 2 , x ′ 2 ) p Y | X 1 X 2 ( y | x 1 , x 2 ) − 1  < − ǫ  (C.7) = P r [Bi( β N , p ) < β N (1 − ǫ/β ) p ] ( a ) ≤ 2 − N β D b ((1 − ǫ/β ) p k p ) ( b ) ≤ 2 − N D b ((1 − ǫ ) p k p ) ( c ) ≤ 2 − N f ( δ ) ( d ) < 2 − N δ 7 ∀ δ < δ ∗ (C.8) where (a) holds b y deﬁnition of the large- d eviations function D b ; (b) holds b y con ve xity of the function D b ( ·k p ): for all ǫ ′ = ǫ/β ∈ [ ǫ, 1), we ha ve D b ((1 − ǫ ′ ) p k p ) ≥ ( ǫ ′ /ǫ ) D b ((1 − ǫ ) p k p ) with equalit y if ǫ ′ = ǫ , i.e., β = 1;(c) h olds by (C.6); and (d) holds by the lo wer b ound on f ( δ ). Com binin g (C.5 ) and (C.8 ), we conclude that (C.1) is upp er-b ou n ded b y an exp onentia lly v anishing function of N f or eac h δ < δ ∗ : ∀ i, j, k : P r  1  ˆ D ij k ( Y ) ≤ 3 δ 2 4  > 1 { D ij k ≤ δ 2 } | S = s , V = v , K = { i, j }  ≤ 2 − N δ 7 . (C.9) This do es n ot immediate ly imp ly th at ˆ ζ ( Y ) ≤ ζ with probabilit y approac h in g 1 b ecause the deﬁnition of ˆ ζ ( Y ) in (8.3 7 ) inv olv es p oten tially exp onen tially many terms ˆ D ij k ( Y ). Ho w ev er P r [ ˆ ζ ( Y ) > ζ | S = s , V = v , K = { i, j } ] = P r " X k 1  ˆ D ij k ( Y ) ≤ 3 δ 2 4  > X k 1 { D ij k ≤ δ 2 } | S = s , V = v , K = { i, j } # ≤ P r  ∃ p x i x j x k : 1  ˆ D ij k ( Y ) ≤ 3 δ 2 4  > 1 { D ij k ≤ δ 2 } | S = s , V = v , K = { i, j }  ( a ) ≤ ( N + 1) |X | 3 max p x i x j x k P r  1  ˆ D ij k ( Y ) ≤ 3 δ 2 4  > 1 { D ij k ≤ δ 2 } | S = s , V = v , K = { i, j }  ( b ) ≤ |X | 3 |Y | ( N + 1) |X | 3 2 − N δ 7 → 0 as N → ∞ where (a) follo ws from the un ion b ound and the fact that the num b er of joint t yp es p x i x j x k is at most ( N + 1) |X | 3 , an d (b) from (C.1) and (C.9). T his establishes (8.39). ✷ D Pro of of (8.56) Lemma D.1 Ther e exists a p artition { f M i } i ∈I of M bad N ( s , v , δ ) with the fol lowing pr op erties: (P1) ∀ i ∈ I , ∀ j ∈ f M i : d H ( x i ( s , v ) , x j ( s , v )) ≤ 2 N δ ; (P2) ∀ i ∈ I : | f M i | ≥ 2 N 3 √ δ . 64 Pr o of . By assump tion, |M bad N ( s , v , δ ) | ≥ 2 N R (1 − 2 − N δ 2 / 3 ). The index set I and the sets { f M i } i ∈I are constructed iterativ ely as follo ws. De n ote b y i the sm allest in dex in M bad N ( s , v , δ ) and initialize I = { i } and f M i = M i ( s , v , δ ). By the deﬁnition (8.3), f M i satisﬁes d H ( x i ( s , v ) , x j ( s , v )) ≤ N δ f or all j ∈ f M i , h en ce Prop ert y (P1) holds. Also, o win g to (8.5 ), Prop ert y (P2) holds as w ell. Next, ﬁnd the smallest i ∈ M bad N ( s , v , δ ) such that d H ( x j ( s , v ) , x i ( s , v )) > 2 N δ for all j ∈ I , and u p date I ← I ∪ { i } . By the triangle inequalit y , the sets { f M i } i ∈I are disjoin t. Rep eat this op eration till no such i can b e foun d . A t this p oin t, the set I is ﬁ xed, and ea c h remaining co deword index j / ∈ ∪ i ∈I f M i , satisﬁes d H ( x j ( s , v ) , x i ( s , v )) ≤ 2 N δ for some i ∈ I . Assign the index j of this co dew ord to f M i ; ties can b e broke n arbitrarily . Prop erties (P1) and (P2) of the set f M i are preserve d. Rep eat this op eration till all the c o deword indices in M bad N ( s , v , δ ) are exhausted. Up on completion of this pro cess, the sets { f M i } i ∈I form a partition of M bad N ( s , v , δ ) and satisfy (P1) and (P2). ✷ Assume that K = { i, j } ∈ ( M bad N ( s , v , δ )) 2 . Consid er the p artition of Lemma D.1 and a genie (help er) that r eveal s the t w o “clusters” of ind ices f M i ∗ and f M j ∗ ( i ∗ , j ∗ ∈ I ) to whic h i and j resp ectiv ely b elong. Than k s to the genie, we can enlarge the d ecoding regions, obtain- ing {D ′ m ( s , v ) , m ∈ f M i ∗ ∪ f M j ∗ } th at are at least as la r ge as the original d ecoding regions — D m ( s , v ) ⊆ D ′ m ( s , v ) — and form a partition of Y N . The conditional probabilit y that Y is typica l and that co rr ect decod ing occurs (giv en s , v , i ∈ f M i ∗ , j ∈ f M j ∗ ) for the orig inal deco der and for the genie-aided deco der are resp ective ly giv en b y P c ( i, j | s , v ) in (8.46) and b y P ′ c ( i, j | s , v , i ∗ , j ∗ ) , X y ∈ T δ ( s ,v,i, j ) ∩ ( D ′ i ( s ,v ) ∪D ′ j ( s ,v )) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) . Since D m ( s , v ) ⊆ D ′ m ( s , v ) for all m , we hav e P c ( i, j | s , v ) ≤ P ′ c ( i, j | s , v , i ∗ , j ∗ ) , ∀ i ∈ f M i ∗ , j ∈ f M j ∗ . (D.1) The a v erage of the righ t side o v er i ∈ f M i ∗ and j ∈ f M j ∗ is denoted b y P ′ c ( s , v , i ∗ , j ∗ ) , 1 | f M i ∗ | | f M j ∗ | X i ∈ f M i ∗ X j ∈ f M j ∗ P ′ c ( i, j | s , v , i ∗ , j ∗ ) . (D.2) Let ( i ∗∗ , j ∗∗ ) ac h iev e the maxim um of P ′ c ( s , v , i ∗ , j ∗ ) o v er ( i ∗ , j ∗ ), and denote b y C 1 = f M i ∗∗ and C 2 = f M j ∗∗ the corresp onding clusters of indices. Analogously to Step 3, deﬁn e the rand om v ariables X i = x iT ( S , V ) , i ∈ M N , where T is uniformly distributed o v er { 1 , 2 , · · · , N } and indep end en t of all other random v ariables. Deﬁne X and X ′ dra wn uniformly and indep end en tly from the sets { X i , i ∈ C 1 } and { X j , j ∈ C 2 } resp ectiv ely . The deﬁnitions and deriv ations in Steps 3 and 4 carry , with C 1 × C 2 in place of ( M goo d N ( s , v , δ )) 2 . In particular, (8.13) b ecomes p Y t | S V ( y | s , v ) = 1 |C 1 | |C 2 | X i ∈C 1 X j ∈C 2 p Y | X 1 X 2 ( y | x it ( s , v ) , x j t ( s , v )) . (D.3 ) W e again use the r eference conditional distribu tion (8.15), rep eated b elo w for con v enience: r ( y | s , v ) , N Y t =1 p Y t | S V ( y t | s , v ) . (D.4) 65 F or eac h i, k ∈ C 1 and j, l ∈ C 2 , it follo ws fr om the triangle inequalit y that d H ( x i ( s , v ) , x k ( s , v )) ≤ 4 N δ and d H ( x j ( s , v ) , x l ( s , v )) ≤ 4 N δ. Hence there are at m ost 8 N δ p ositions t at wh ic h ( x it ( s , v ) , x j t ( s , v )) 6 = ( x k t ( s , v ) , x lt ( s , v )). There- fore, o wing to ( 8.6 ), the Kullb ac k-Leibler divergence betw een the distributions of Y conditioned on co dew ord pairs ( i, j ) and ( k , l ) resp ective ly , satisﬁes D ij k l , 1 N N X t =1 D ( p Y | X 1 = x it ( s ,v ) ,X 2 = x j t ( s ,v ) k p Y | X 1 = x kt ( s ,v ) ,X 2 = x lt ( s ,v ) ) ≤ 8 δ log δ − 1 . (D.5) Hence the conditional self-information of (8.17) is θ ij ( s , v ) , 1 N N X t =1 D ( p Y | X 1 = x it ( s ,v ) ,X 2 = x j t ( s ,v ) k p Y t | S = s ,V = v ) ( a ) ≤ 1 |C 1 | |C 2 | X ( k, l ) ∈C 1 ×C 2 D ij k l ( b ) ≤ 8 δ log δ − 1 where (a) holds by (D. 3 ) and con v exit y of the Kullbac k-Leibler div ergence, and (b) from (D. 5 ). The a v erage of the self-information θ ij ( s , v ) o v er all ( i, j ) ∈ C 1 × C 2 is the conditional m utual information I p T p X 1 | S V T p X 2 | S V T p Y | X 1 X 2 ( X 1 X 2 ; Y | S = s , V = v, T ) = I ( s , v ) , 1 |C 1 | |C 2 | X ( i,j ) ∈C 1 ×C 2 θ ij ( s , v ) ≤ 8 δ log δ − 1 . Analogously to Step 5, deﬁn e the t ypical sets T δ ( s , v , i, j ) ,              y ∈ Y N : 1 N N X t =1 log p Y | X 1 X 2 ( y t | x it ( s , v ) , x j t ( s , v )) p Y t | S V ( y t | s , v ) | {z } ˆ θ ij ( s ,v ) < 9 δ log δ − 1              . (D.6) The random v ariable ˆ θ ij ( s , v ) ab o v e is the av erage of N conditionally in d ep enden t random v ariables (giv en s , v ) and conv er ges in probabilit y to its mean θ ij ( s , v ) ≤ 8 δ log δ − 1 . Similarly to (8.26 ), w e ha ve P r [ Y / ∈ T δ ( s , v , i, j ) | S = s , V = v , K = { i, j } ] ≤ 1 N δ 2 , ∀ s , v , i, j (D.7) whic h v anish es as N → ∞ . Analogously to (8.4 7 ), w e deﬁne P bad c ( s , v ) , P r [correct deco ding and Y ∈ T δ ( s , v , K ) | S = s , V = v , K ∈ ( M bad N ( s , v , δ )) 2 ] . (D.8) 66 W e hav e P bad c ( s , v ) = 1 |M bad N ( s , v , δ ) | 2 X i,j ∈M bad N ( s ,v,δ ) P c ( i, j | s , v ) ( a ) = 1 |M bad N ( s , v , δ ) | X i ∗ ,j ∗ ∈I X i ∈ f M i ∗ X j ∈ f M j ∗ P c ( i, j | s , v ) ( b ) ≤ 1 |M bad N ( s , v , δ ) | 2 X i ∗ ,j ∗ ∈I X i ∈ f M i ∗ X j ∈ f M j ∗ P ′ c ( i, j | s , v , i ∗ , j ∗ ) ( c ) = 1 |M bad N ( s , v , δ ) | 2 X i ∗ ,j ∗ ∈I | f M i ∗ | | f M j ∗ | P ′ c ( s , v , i ∗ , j ∗ ) ( d ) ≤ ma x i ∗ ,j ∗ ∈I P ′ c ( s , v , i ∗ , j ∗ ) = 1 |C 1 | |C 2 | X i ∈C 1 X j ∈C 2 X y ∈ T δ ( s ,v,i, j ) ∩ ( D ′ i ( s ,v ) ∪D ′ j ( s ,v )) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) ( e ) ≤ 2 N 9 δ log δ − 1 |C 1 | |C 2 | X i ∈C 1 X j ∈C 2 X y ∈ T δ ( s ,v,i, j ) ∩ ( D ′ i ( s ,v ) ∪D ′ j ( s ,v )) r ( y | s , v ) ( f ) ≤ 2 N 9 δ log δ − 1 |C 1 | |C 2 | X i ∈C 1 X j ∈C 2 X y ∈D ′ i ( s ,v ) ∪D ′ j ( s ,v ) r ( y | s , v ) ( g ) = 2 N 9 δ log δ − 1 |C 1 | |C 2 | X i ∈C 1 X j ∈C 2   X y ∈D ′ i ( s ,v ) r ( y | s , v ) + X y ∈D ′ j ( s ,v ) r ( y | s , v )   = 2 N 9 δ log δ − 1 |C 1 | |C 2 | ( |C 2 | + |C 1 | ) ( h ) ≤ 2 N [9 δ log δ − 1 − 3 √ δ ] +1 ≤ 2 − N √ δ +1 , ∀ δ < 1 4000 (D.9) where (a ) and (d) hold b ecause { f M i } i ∈I form a partition of M bad N ( s , v , δ ), (b) b ecause of (D.1), (c) b ecause of (D.2), (e) follo ws fr om (D.4) an d (D.6), (f ) is obtained by dropping the restriction y ∈ T δ ( s , v , i, j ), (g) holds b ecause the deco ding regions D ′ i ( s , v ) a nd D ′ j ( s , v ) a re disjoint , and (h) b ecause |C 1 | , |C 2 | ≥ 2 N 3 √ δ . Thus P bad c ( s , v ) ( a ) ≤ P r [ Y / ∈ T δ ( s , v , i, j ) | S = s , V = v , K = { i, j } ] + P bad c ( s , v ) ( b ) ≤ 1 N δ 2 + 2 − N √ δ +1 (D.10) where (a) follo ws from (8.55) and (D.8), and (b) from (D.7) and (D.9). Hence P bad c ( s , v ) v anishes for all R > 0, all ( s , v ), and all p Y | X 1 X 2 . Moreo v er P bad c ( s , v ) < 2 N δ 2 for all δ < 1 / 4000 , a nd this establishes (8.56). ✷ 67 References [1] P . Moulin and A. Briassouli, “Th e Gaussian Fingerprinti n g Game,” Pr o c. Co nf. Informatio n Scienc es and Systems , Princeton, NJ, Marc h 2002. [2] P . Moulin and J. A. O’S ulliv an, “Optimal Key Design for I n formation-Em b edding Systems,” Pr o c. Conf. Information Scienc es and Systems , Princeton, NJ, Marc h 2002. [3] P . Moulin and J. A. O’Sulliv an, “Information-theoretic analysis of information hid ing,” IEEE T r ans. on Information The ory , V ol. 49, No. 3, pp. 563—593, Marc h 2003. [4] A. S omekh-Baruc h and N. Merha v, “On the capacit y game of priv ate ﬁ ngerprint ing systems under collusion attac ks,” IEEE T r ans. Information The ory , vo l. 51, n o. 3, p p. 884—899 , Mar. 2005 . [5] A. Somekh-Baruc h and N. Merha v, “Ac hiev able error exp onen ts for the priv ate ﬁngerp r in ting game,” IEEE T r ans. Information The ory , V ol. 53, No. 5, pp. 1827—1 838, Ma y 2007. [6] Y. W ang and P . Moulin, “Capac ity and Random-Co ding Err or Exp onent for Public Finger- printi n g Game,” Pr o c. Int. Symp. on Information The ory , S eattle, W A, J uly 2006. [7] P . Moulin a n d N. Kiy av ash , “Expurgated Gaussian Fi n gerp rin ting Co des,” Pr o c. IEE E Int. Symp. on Information The ory , Nice, F rance, Ju ne 2007. [8] D. Boneh and J. Sh a w , “Collusion–Secure Fingerp rin ting for Digital Data,” in A dvanc es in Cryptolo gy: P r o c. CR YPTO’ 95 , Springer–V erlag, New Y ork, 1995. [9] G. T ardos, “Optimal Probabilistic Fingerprint ing Co des,” ACM Symp. on The ory of Comp ut- ing , San Diego, CA, 2003. [10] N. P . Anthapadmanabhan, A. Barg and I. Dumer, “On the Fingerprinting Capacit y Un der the Marking Assu mption,” IEEE T r ans. Information The ory , V ol. 54, No. 6, pp. 2678—268 9, June 200 8. [11] E. Plotnik and A. Satt, “Decod ing Rule and Err or Expon ent for the Random Multiple-Acc ess Channel,” Pr o c. Int. Symp. Information The ory , p. 216, Bud ap est, Hungary , 1991. [12] I. Csisz´ ar and J. K ¨ orn er, Information The ory: Co ding The ory for Discr ete M emoryless Sys- tems , Academic Press, NY, 1981. [13] I. Cs isz´ ar, ‘The Metho d of Typ es,” IEEE T r ans. on Information The ory , V ol. 44, No. 6, pp. 25 05—2523, Oct. 1998. [14] P . Moulin and Y. W ang, “Capacit y and Rand om-Cod in g Exp onen ts for Channel Co ding with Side In formation,” IEEE T r ans. on Informatio n The ory , V ol. 53, No. 4, pp. 1326—1347, Apr. 2007 . [15] G. D. F orney , J r., “Exp onent ial Err or Bounds for Er asu re, List, and Decision F eedback Sc hemes,” IEEE T r ans. Information The ory , V ol. 14, No. 2, pp. 206 —220, 1968. 68 [16] R. G. Gallage r, Information The ory and R eliable Communic ation , Wiley , New Y ork, 1968. [17] R. Ahlsw ede, “Multiw ay Communicati on Channels,” Pr o c. IEEE Int. Symp. on Information The ory , pp. 23—52, Tsahk adsor, Armenia, 1971 . [18] H. Liao , “Multiple Access C hannels,” Ph. D. dissertation , EE Departmen t, U. of Haw aii, 1972. [19] A. Das and P . Nara ya n , “Capacities of Time-V arying Multiple-Access Chann els With Side Information,” IEEE T r ans. Information The ory , V ol. 48, No. 1, pp . 4—25, Jan. 200 2. [20] A. Barg, p ersonal c ommunic ation , Jan . 2008. [21] R. Ahlsw ede, “An Elemen tary Pro of of the Strong Con verse Theorem for the Multiple-Access Channel,” J. Combinatorics, Information and System Sci. , V ol. 7, No. 3, pp . 216—230, 1982. [22] T. S. Han, “Nonnegativ e E ntrop y Me asur es of Multiv ariate Symmetric Correlatio ns,” Inf or- mation and Contr ol , V ol. 36, No. 2, pp . 133—15 6, 1978. [23] Y.-S. Liu and B. L. Hughes, “A n ew u niv ersal random coding b ound f or th e m ultiple-access c hannel,” IEEE T r ans. Information The ory , vo l. 42, n o. 2, pp . 376–3 86, Mar. 1996. [24] A. Barg and G. D. F orn ey , “Random Codes: Minimum Distances and Error Exp onen ts,” IEE E T r ans. Information The ory , V ol. 48, No. 9, pp. 2568—2573, Sep. 2002. [25] P . Moulin, “Optimal Gaussian Fi n gerp rin t Decod ers,” Pr o c. IEEE Int. Conf. A c oustics, Sp e e ch and Signal Pr o c essing , T aip ei, T aiw an, Ap r. 2009. [26] P . Moulin and Y. W an g, “Information-Theoretic Analysis of S p herical Fin gerp rin ting,” Pr o c. Symp. on Information The ory and Applic ations , San Diego, C A, F eb. 2009. [27] J.-F. Jourdas and P . Moulin, “High-Rate Random-Lik e Spherical Fingerprinting Co des w ith Linear Decod in g Complexit y ,” IEEE T r ansactions on Information F or ensics and Se curity , V ol. 4, No. 4, pp . 768—780, Dec. 2009. [28] Y. W ang and P . Moulin, “Blind Fingerprinting,” submitted to IEEE T r ans. Inf ormation The- ory , F eb. 2008. Av ailable from arXiv:0803. 0265 [cs.IT] [29] E. Amiri and G. T ard os, “High Rate Fingerprin ting Co des and the Fingerprin ting Capacit y ,” Pr o c. 20th Annual A CM- SIA M Symp osium on Discr ete Algorithms , Ne w Y ork, NY, Jan. 2009. [30] Y.-W. Huang and P . Moulin, “Saddle-P oint Solution of the Fingerpr in ting Capacit y Game Under the Ma rkin g Assumption,” Pr o c. IEEE Int. Symp . on Infor mation The ory , S eoul, K orea, July 2009 . [31] Y.-W. Huang and P . Moulin, “Capacit y-Ac hieving Fingerprint Decoding,” Pr o c . 1st IE EE Workshop on Information F or ensics and Se cu rity , Lond on, UK, Dec. 2009 . [32] T. F uron and L. P´ erez-F reire, “W orst C ase A ttac ks Against Binary Probabilistic T raitor T rac- ing Co des,” P r o c. 1st IEEE Worksh op on Informat ion F or ensics and Se curity , London, UK, Dec. 2009 . [33] N. P . Anthapadmanabhan and A. Barg, “Two-Lev el Fingerprin ting Co des,” Pr o c. IEEE Int. Symp. on Information The ory , Seoul, Korea, July 2009. 69

Universal Fingerprinting: Capacity and Random-Coding Exponents

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment