Universal Fingerprinting: Capacity and Random-Coding Exponents
This paper studies fingerprinting (traitor tracing) games in which the number of colluders and the collusion channel are unknown. The fingerprints are embedded into host sequences representing signals to be protected and provide the receiver with the…
Authors: Pierre Moulin
Univ ersal Fingerprin ting: Capacit y and Random-Co ding Exp one n ts Pierre Moulin ∗ Octob er 30, 2018 Abstract This pap er studies finger pr in ting (traitor tracing) ga mes in whic h the num ber of colluder s and the collusion c hannel ar e un known. The fingerprints a re em b edded in to host sequences representing signals to b e protected a nd provide the receiver with the capability to trace back pirated copies to the colluders. The colluders and the fingerprin t embedder are sub ject to signal fidelit y co nstraint s. Our problem setup unifies the signal-dis to rtion and Boneh- Sha w form ula- tions of fingerprinting. The fundamental tr adeoffs b etw een fingerprint co delength, num b er o f users, num ber of co lluders, fidelit y constraints, and decoding reliability are then determined. Several b ounds on fingerprinting capacit y hav e been prese nted in recen t literature. This pap er derives exact capacity form ulas a nd presents a new randomized fingerpr in ting scheme with the following prop erties: (1) the enco der and receiver assume a nominal coalition size but do not need to know the actual coalition size and the collusion c hannel; (2 ) a tunable parameter ∆ trades off false-pos itiv e and fals e-negative err o r exponents; (3) the receiver pro vides a reliabilit y metric for its decision; and (4) the scheme is capacit y- a c hieving when the false- positive exp onent ∆ tends to zero and the nominal coalition size coincides with the actual coa litio n size. A fundamen tal comp onen t of the new sc heme is the use of a “time-shar ing” ra ndomized se- quence. The deco de r is a maximum p enalize d mutu al information de c o der , where the sig nificance of each ca ndidate coalition is a ssessed relative to a thres ho ld, and the p enalty is prop ortional to the coalition siz e . A m uch simpler thr eshold de c o der that satisfies pr oper ties (1)—(3) ab ov e but not (4) is also given. Index T erms. Fingerpr in ting, traitor tracing, watermarking, data hiding, randomized co des, unive rsal codes, metho d of t yp es, maxim u m mutual inform atio n deco d er, m inim um equ iv o cati on deco der, c hannel co ding with side inf ormation, ca pacit y , strong co nv erse, err or exp onents, m ultiple access c hannels, mo del order s election. ∗ The author is with the ECE Department, t h e Co ordinated Science Lab oratory , and the Beckman Institute at the Universit y of Illinois at Urbana-Champaign, U rbana, IL 61801, USA. Ema il: moulin@ifp. uiuc.edu . This w ork w as supp orted by NSF und er gran ts CCR 03-25924, CCF 06-35137 and CCF 07-29061. A 5-page version of th is paper w as presented at I SIT in T oronto, July 200 8. The current manuscript w as submitted for publication on January 24, 2008 and revised on December 9, 2008, June 9, 2009, Jan uary 24, 2010, December 10, 2010, and Ma y 24, 2011. 1 1 In tro duction Digital fingerprinting ( a.k.a. traitor tr acing) is essen tially a m ultiuser v ersion of wate rm arking. A co verte xt — such as image, vid eo, audio, text, or soft w are — is to b e distr ibuted to man y users. Prior to distr ibution, ea ch user is assigned a fingerprint that is em b edded in to the co ve rtext. In a collusion attac k, a coalitio n of users combine their mark ed copies, creating a pirated cop y that con tains only w eak traces of their fingerprints. The pirated cop y is sub ject to a fidelit y requirement relativ e to th e co alition’s copies. The fidelit y requ iremen t ma y take the form of a distortion c onstr aint , whic h is a natural mo del f or media fingerprin ting applications [1 – 7]; or it ma y tak e the form of Boneh and Sh a w’s marking assumption , whic h is a p opular mod el for soft w are fingerprinting [8 – 10]. T o trace the forge ry bac k to the coalition mem b ers, one needs a fingerprin ting sc heme that can reliably iden tify the coll u ders’ fingerprints from the pirated copy . The fingerprinting problem presen ts t wo ke y c hallenges. 1. Th e n um b er of colluders ma y b e large, which mak es it ea sier for the co llud er s to moun t a strong attac k . Th e difficult y of the deco ding p roblem is comp ounded b y the fact th at the numb er of c ol luders and the c ol lusion channel ar e unknown to the enc o der and de c o der. 2. Th ere are t wo fu ndamen tal typ es of error ev ents, n amely false p ositives , b y w h ic h inno cen t users are wr ongly accused, and false ne gatives , by wh ic h one or m ore colluders escap e detec- tion. F or le gal reasons, a maxim um a dm issible v alue fo r the false-p ositiv e error probabilit y should b e sp ecified. This pap er prop oses a mathematic al mo del that satisfies these requirements and deriv es th e corre- sp onding information-theoretic p erformance limits. Prior art on related formulatio ns of the finger- printi n g problem is review ed b elo w. The basic p erformance metric is capacit y , wh ic h is d efined with resp ect to a class of collusion c hannels. A m u ltius er data hiding problem w as analyzed by Moulin and O’Sulliv an [3, Sec. 8], and capacit y expressions w ere obtained assu ming a comp ound class of memoryless c h annels, exp ected- distortion constraints for the distribu tor and the coal ition, and noncoop erating, single-user de- co ders. Despite clear mathematical similarities, this setup is quite differen t from the on e adopted in more recen t fi ngerprin ting pap ers. Somekh-Baruc h and Merhav [4, 5] studied a fingerp rin ting problem with a known n umb er of co llud er s and explored connections with th e problem of co d ing for the multiple-a ccess channel (MA C). The notion of false p ositive s do es n ot app ear in their problem form ulation. Lo wer b ounds on capacit y were obtained assuming almost-sure distortion constraints b et w een the pirated cop y and one [4] or all [5] of the coalition’s copies. The lo wer b ounds on capacit y corresp ond to a restrictiv e enco d ing strategy , namely rand om constan t-comp osition cod es without time-sharing. Other b ounds on capacit y and co nn ectio ns b etw een MACs an d fingerpr in ting under the Boneh- Sha w assum ption hav e b een recen tly s tu died by Anthapadmanabhan et al. [10]. The co ve rtext is degenerate, and s ide information d oes not ap p ear in th e information-theoretic formulat ion of this problem. In order to cop e with un k n o wn collusion c han n els and u nkno wn n umber of colluders, a sp ecial kind of univ ersal deco der sh ould b e designed, where u niv ersalit y holds not only with resp ect to 2 some set of channels, but also with resp ect to an u nkno wn num b er of in p uts. An early ve rs ion of this idea in the con text of the so-calle d r andom MAC wa s introdu ced b y Plotnik and Satt [11]. In the conte xt of fin gerprin ting, a tunable p aramete r should trade off the t wo fund amen tal t yp es of error probabilit y . When the num b er o f colluders is unkno wn, t w o extreme instances of this tradeoff are to accuse al l users or none of them . While fingerprin ting c apacit y is a f undamen tal mea sur e of the ability of an y sc h eme to resist colluders, it only guarantee s that the error probabilities v anish if the co des are “long enough”. Error exp onen ts provide a fin er descrip tion of system p erform ance. T hey p r o v id e estimates of the necessary length of a fingerprinti n g cod e that can withstand a sp ecified num b er of colluders , giv en target false-p ositiv e a nd fal se-negativ e error probabilities. This is esp ecially v aluable in any legal system where the reliabilit y of accusations should b e assessed. Besides capacit y and error-exp onen t formulas, the information-theoretic analysis sheds ligh t ab out the stru cture of optimal co des. P articularly r elev an t in this resp ect is a random co ding sc heme by T ard os [9], whic h uses an auxiliary rand om sequence for en codin g fi ngerprin ts. While his sc heme is presented at an algo rithm ic leve l (and n o optimization was inv olve d in its construction), in ou r game-theoreti c setting th e auxiliary random v ariable app ears fun damen tally as part of a randomized strategy in an information-theoretic game whose pa yo ff function is nonconca ve with resp ect to the maximizing v ariable (the fingerprint distrib ution). Another issue that can b e resolv ed i n our game-theoretic setting is the optimalit y o f coalition strategies that are in v arian t to p ermutations of the colluders. While one may heuristically exp ect that suc h strategies are optimal, a pro of of this pr op erty is established in this p ap er . The ap- proac h used in pr evious pap ers was to assume that coalitions emplo y such strategies, but often no p erformance guaran tee is giv en if the colluders emplo y asymmetric strategies. Finally , in the aforemen tioned pap er b y T ardos [9] and in the sig nal p r ocessing literature, sev eral simple algorithms ha ve been pr op osed to detect colluders, in vo lving computing some correla tion score b et wee n pirated cop y and users’ fi ngerprin ts, and setting u p a detec tion threshold. W e study the limits of such strategies and compare them with join t deco d ing strategies. 1.1 Organization of This Pa p er As indicated b y the b ibliographic references, probabilistic analyses of digital fin gerprin ting hav e b een rep orted b oth in the inform atio n theory literature and in the theoretical computer science literature. While the r esults d eriv ed in this pap er a re put in the con text of related inform ation- theoretic work, esp ecially multiple-ac cess c hann els, th is pap er is nev ertheless in tended to b e ac- cessible to a broader comm unit y of readers that are trai n ed in p robabilit y theory and statistics. The main to ols used in our d eriv ations are the m ethod of t yp es [12, 13] for analyzing random- co ding sc h emes, F ano’s lemma for der ivin g up p er b ounds on capacit y , spher e-packing metho ds, and elemen tary prop erties of in f ormation-theo retic functionals. A mathematica l statement of our generic fingerprinting problem is giv en in Sec. 2, together with the definitions of co des, collusion c hannels, error p r obabilities, capacit y , and error exp onen ts. Our first main resu lts are fingerpr in ting capacit y theorems. They are stated in Sec. 3. The next t w o sectio ns presen t the new random co ding sc heme and the resu lting error exp onen ts. Sec. 4 p r esen ts a simple bu t sub optimal deco der that compares empir ical m utual inf ormation scores 3 b et w een receiv ed data and in dividual fingerp rin ts, and outputs a guilt y decision wh enev er the score exceeds a ce rtain tunable thresh old. This su b optimal deco der is closely relate d to str ateg ies used in the signal pro cessing literature and in [9]. F or simplicit y of the exp osition, the sc heme and results are pr esen ted in the setup with degenerate sid e information, w h ic h is directly applicable to the Boneh-Sha w problem. Sec. 5 int ro du ces and analyzes a more elab orate joint deco der that assigns a p enalized emp irical equiv o catio n score to candidate coalitions and s elec ts t h e coalitio n with the lo w est score. Th e penalty is prop ortional to coa lition size. Th e join t decod er is capac ity- ac h ieving. Sec. 6 outlines an extension to the problem wher e the collusion c hann el is memoryless. The pro ofs of the main results app ear in Secs 7 — 10, and the pap er concludes in S ec. 11. 1.2 Notation W e use upp ercase letters for random v ariables, lo werca se letters for their individual v alues, calli- graphic lett ers for finite alphab ets, and b oldface letters f or sequences. Given an int eger K , we use the sp ecial sym b ol K for the set { 1 , 2 , · · · , K } . W e denote b y M ⋆ the set of sequences of arbitrary length (including 0) whose element s are in M . The probabilit y mass function (p.m.f.) of a random v ariable X ∈ X is denoted b y p X = { p X ( x ) , x ∈ X } . T he v ariational distance b et ween t w o p.m.f ’s p and q o ver X is den oted by d V ( p, q ) = P x ∈X | p ( x ) − q ( x ) | . The en trop y of a random v ariable X is denoted b y H ( X ), and the mutual information b et ween t w o random v ariables X and Y is denoted b y I ( X ; Y ) = H ( X ) − H ( X | Y ). Should the dep endency on the un derlying p .m.f.’s b e explicit, we write the p.m.f.’s as s u bscripts, e.g., H p X ( X ) and I p X p Y | X ( X ; Y ). The Kullbac k-Leibler d iv ergence b et w een t wo p.m.f.’s p and q is denoted b y D ( p || q ), and the conditional Kullb ac k-Leibler div ergence of p Y | X and q Y | X giv en p X is denoted b y D ( p Y | X || q Y | X | p X ) = D ( p Y | X p X || q Y | X p X ). All log arithms are in b ase 2 unless sp ecified otherwise. Giv en a sequence x ∈ X N , denote by p x its t yp e, or emp irical p.m.f. o v er the finite alphab et X . Denote by T x the t yp e class asso ciated with p x , i.e., the set of all sequences of t yp e p x . Lik ewise, p xy denotes the joint t yp e of a pair of sequences ( x , y ) ∈ X N × Y N , and T xy the asso ciated join t t yp e class. The conditional t yp e p y | x of a pair of s equ ences ( x , y ) is defined by p xy ( x, y ) /p x ( x ) for all x ∈ X suc h that p x ( x ) > 0. Th e cond itional typ e class T y | x giv en x , is the set of all sequences ˜ y suc h that ( x , ˜ y ) ∈ T xy . W e denote b y H ( x ) th e empirical entrop y of the p.m .f . p x , b y H ( y | x ) the empirical conditional entrop y , and b y I ( x ; y ) the empirical mutual inf ormation for the j oin t p .m .f . p xy . Recall that the num b er of typ es and conditional t yp es is p olynomial in N and that [12] ( N + 1) −|X | 2 N H ( x ) ≤ | T x | ≤ 2 N H ( x ) , (1.1) ( N + 1) −|X | |Y | 2 N H ( y | x ) ≤ | T y | x | ≤ 2 N H ( y | x ) . (1.2) W e use the calligraphic fon ts P X and P [ N ] X to represent the set o f all p.m.f.’s and all empirical p.m.f.’s for le ngth- N sequences, resp ectiv ely , on the alphab et X . Lik ewise, P Y | X and P [ N ] Y | X denote the set of all conditional p .m .f .’s and all empirical conditional p.m.f.’s on the alphab et Y . The sp ecial sym b ol W K will b e used to denote the feasible set of collusion channels p Y | X 1 , ··· , X K that can b e selec ted b y a size- K coali tion. Mathematica l exp ectation is denoted b y the sym b ol E . The sh orthands a N . = b N and a N ≤ b N denote asymptotic relations in the exp onen tial s cale , resp ectiv ely lim N →∞ 1 N log a N b N = 0 and 4 lim sup N →∞ 1 N log a N b N ≤ 0. W e d efine | t | + , max( t, 0), and exp 2 ( t ) , 2 t . The in dicator function of a set A is denoted b y 1 { x ∈ A} . The sy mb ol A \ B is used to denote the relativ e complemen t (or set-theoretic d ifference) of set B in set A . (Note that B is generally not a subset of A .) Finally , w e adop t the notational con ven tion that the minimum of a fu nction o v er an empt y set is + ∞ , and the maxim um is 0. 2 Problem Statemen t and Basic Definitions 2.1 Ov erview class [ W K ] D Host sequence Secret key S V X 2 NR 2 1 Fingerprints fingerprints Decoded Pirated copy Coalition copies Y g Decoder Collusion channel 1 mK m1 Y|X , ..., X P Fingerprinted . . K 2 NR 2 N N mK m2 m1 1 X K X X Choose . . X X Encoder f . . Figure 1: Mod el for fi ngerprint ing game, using randomized co de ( f N , g N ). In the Boneh-Sh a w setup, the host sequence S is degenerate and there is n o distortio n constrain t ( D 1 ). T he class W K c haracterizes th e fidelit y constraint on the collusion channel. The enco der and deco der kn o w neither K nor the collusion c hann el. Our mo del for digital fingerprinting is d iagrammed in Fig. 1. Let S , X , and Y b e three finite alph ab ets. The co ve rtext sequen ce S = ( S 1 , · · · , S N ) ∈ S N consists of N indep en d en t and iden tically distributed (i.i.d.) samples dr a wn from a p.m.f. p S ( s ), s ∈ S . A secret ke y V taking v alues in an alph ab et V N , whose cardin alit y potent ially gro ws with N , is shared b et w een enco der and deco der, and not publicly reve aled. T he key V is a rand om v ariable indep en den t of S . Th er e are 2 N R users, eac h of whic h receiv es a fingerpr inted copy: X m = f N ( S , V , m ) , 1 ≤ m ≤ 2 N R , (2.3) where f N : S N × V N × { 1 , · · · , 2 N R } → X N is the enco ding function, and m is the ind ex of the user. Th e fidelit y requirement b et w een S and X m is expr essed via a distortion constraint. Let d : S × X → R + b e the distortion measure and d N ( s , x ) = 1 N P N i =1 d ( s i , x i ) the extension of this measure to length- N sequ en ces. T he co de f N is sub j ect to the distortion constrain t d N ( s , x m ) ≤ D 1 , 1 ≤ m ≤ 2 N R . (2.4) Let K , { m 1 , m 2 , · · · , m K } b e a coalit ion of K users; no constraints are imp osed on the formation of coaliti ons. The coalit ion uses its copies X K , { X m , m ∈ K } to pro duce a pirated 5 cop y Y ∈ Y N . Without loss of generalit y , we assume that Y is generated sto c hastically according to a c ond itio n al p.m.f. p Y | X K called the c ollusion channel . This includes deterministic mappings as a sp ecial case. A fi delit y constraint is imp osed on p Y | X K to ensur e that Y is “close” to the fingerprinted copies X m , m ∈ K . This constraint may tak e the form of a distortion constrain t (analogo u s ly to (2.4)), or alternativ ely , a constraint that will b e r eferred to as the Boneh-Sha w constrain t. T he form ulation of these constrain ts is detailed b elo w and results in the definition of a feasible set W K ( p x K ) f or the conditional t yp e p y | x K . The enco der and deco der assume a nominal coalition size K nom but know neither K n or p Y | X K selected by the K colluders 1 . The deco der has access to th e pirated cop y Y , the host S , and the secret k ey V . It pro duces an estimate ˆ K = g N ( Y , S , V ) (2.5) of the c oalition. Su ccess can b e defined a s catc hing one colluder or catc hing all colluders, the latter task b eing seemingly muc h more difficult. An admissible d eco der outpu t is the empty set, ˆ K = ∅ , reflecting the p ossibilit y that the signal su bmitted to the deco der is unr elat ed to the fingerprints. If this p ossibilit y w as not allo we d, an inno cen t user would b e ac cused. Another go o d reason to allo w ˆ K = ∅ is simply that reliable d etec tion is imp ossible when there are to o man y colluders, and the constrain t on the probabilit y of f alse p ositive s w ould b e violate d if ˆ K = ∅ w as not an option. 2.2 Randomized Fingerprin ting Co des The formal defi nition of a fingerpr inting co de is as follo ws. Definition 2.1 A randomized rat e- R length- N fingerprin ting co de ( f N , g N ) with emb e dding distortion D 1 is a p air of enc o der mapping f N : S N × V N × { 1 , 2 , · · · , ⌈ 2 N R ⌉} → X N and de c o der mapping g N : Y N × S N × V N → { 1 , 2 , · · · , ⌈ 2 N R ⌉} ⋆ . Man y kinds of randomization are p ossible. In the most general setting, the k ey space V N can gro w sup erexp onentia lly with N . F or fingerpr in ting, three kinds of randomization seem to b e fundamental, eac h serving a differen t pur p ose. All three kin d s can b e combined. Th e first one is randomized p ermutati on of th e letters { 1 , 2 , · · · , N } to cop e with c hann els with arbitrary memory , similarly to [14]. Definition 2.2 A randomly modulat e d (R M ) fingerprinting c o de is a r andomize d fingerprinting c o de define d via p ermutations of a pr ototyp e ( ˜ f N , ˜ g N ) . The c o de is of the form x m = ˜ f π N ( s , w , m ) , π − 1 ˜ f N ( π s , w , m ) ˜ g π N ( y , s , w ) , ˜ g N ( π y , π s , w ) (2.6) wher e π is chosen uniformly fr om the set of al l N ! p ermutations of the letters { 1 , 2 , · · · , N } and is not r ev e ale d publicly. The se quenc e π x m is obtaine d by applying π to the elements of x m . The se cr et ke y is V = ( π, W ) , wher e W is indep endent of π . The second kin d of randomization is u niform p erm utations of the 2 N R fingerprint assignmen ts, to equalize er r or pr obabilities ov er all p ossible coalitions [7, 10]. 1 If K nom = K , our random coding sc heme of Sec. 5 is capacit y-achieving. 6 Definition 2.3 A randomly p ermuted (RP) fingerprinting c o de is a r andomize d fingerp rinting c o de define d via p ermutations of a pr ototyp e ( ˜ f N , ˜ g N ) . The c o de is of the form x m = ˜ f π N ( s , w , m ) , ˜ f N ( s , w , π − 1 ( m )) ˜ g π N ( y , s , w ) , π ( ˜ g N ( y , s , w )) (2.7) wher e π i s chosen uniformly fr om the set of al l 2 N R ! p ermutations of the user indic es { 1 , 2 , · · · , 2 N R } and is not r eve ale d publicly. The se cr et key is V = ( π, W ) , wher e W is indep endent of π . In (2.7), we have use d the shorthand π ( ˆ K ) , { π ( m ) , m ∈ ˆ K} . The thir d kind of r an d omizati on arises via an auxiliary “time-sharing” rand om sequence. This strategy w as not used in [4, 5 , 10] b ut a remark able example was dev elop ed by T ardos [9]. F or binary alphab ets S , X , and Y , i.i.d. rand om v ariables W i ∈ (0 , 1) , 1 ≤ i ≤ N , a re generated, and next the fingerpr in t letters X i ( m ) are generated as indep end en t Bern oulli ( W i ) random v ariables. Here V = { W i , 1 ≤ i ≤ N } is the secret key sh ared by enco der and decod er. Giv en an embedd ing distortion D 1 and a size– K coalitio n using a collusion c hannel from class W K , th ere corresp onds a c apacit y C ( D 1 , W K ) w hic h is the suprem um o ve r ( f N , g N ) of all ac h iev able R , under a p rescrib ed error criterion. 2.3 Collusion Channels First we d efine some basic terminology for MACs with K inpu ts, common input alphab et X , and output alph ab et Y . Recall that K = { 1 , 2 , · · · , K } and let X K = { X 1 , · · · , X K } . Giv en a conditional p.m.f. p Y | X K , consider the p erm uted conditional p.m.f. p Y | X π ( K ) ( y | x 1 , · · · , x K ) , p Y | X K ( y | x π (1) , · · · , x π ( K ) ) (2.8) where π is any p ermutatio n of the K inputs. W e sa y that p Y | X K is p erm utation-in v ariant if p Y | X π ( K ) = p Y | X K , ∀ π . A s ubset W K of P Y | X K is said to b e p ermuta tion-inv ariant if p Y | X K ∈ W K ⇒ p Y | X π ( K ) ∈ W K , ∀ π . In general, n ot all elemen ts of suc h W K are p ermutation-in v arian t. The subset of p ermutation- in v arian t W K that consists of p erm utation-in v ariant conditional p.m.f.’s will b e denoted b y W fair K = n p Y | X K ∈ W K : p Y | X π ( K ) = p Y | X K , ∀ π o . (2.9) Finally , if W K is p ermutati on-inv arian t and conv ex, the p ermutatio n-av eraged conditional p.m.f. 1 K ! P π p Y | X π ( K ) is also in W K and is p er mutation-in v arian t b y construction. In the fi ngerprin ting problem, the conditional type p y | x K ∈ P [ N ] Y | X K is a ran d om v ariable whose conditional distribu tion giv en x K dep ends on the collusion channel p Y | X K . Our fid elity constrain t on the coalition is of the general form P r [ p y | x K ∈ W K ( p x K )] = 1 , (2.10) 7 where for e ac h p x K , W K ( p x K ) is a con v ex, p ermutation-in v arian t subset of P Y | X K . That is, the empir ical conditional p.m.f . of th e pirated cop y giv en the mark ed copies is restricted. The c hoice o f the feasible set W K ( p x K ) dep en ds on the a pp licat ion, as el ab orated b elow. The e xp licit dep endency o f W K on p x K will sometimes be omitted to simplify n ota tion. Note that assuming W K is p erm utation-in v arian t do es not imply that p y | x K actually selec ted b y the co alition is p ermutat ion- in v arian t. Finally , it is assu med that the set-v alued mapp ing W K ( p ) is defin ed for p ∈ P X K and is uniform ly con tin uous in th e v ariational distance, in the sense that for ev ery ǫ > 0, there exists δ > 0 such that ∀ p X K , p ′ X K ∈ P X K s . t . d V ( p X K , p ′ X K ) < δ : max p Y | X K ∈ W K ( p X K ) min p ′ Y | X K ∈ W K ( p ′ X K ) d V ( p Y | X K p X K , p ′ Y | X K p ′ X K ) < ǫ. (2.11) The mod el (2.10) can b e used to imp ose hard distortion constrain ts on the coa lition o r to enforce the Boneh-Sha w marking assumption when X = Y . 1. Distortion Constrain ts. Consider the follo w ing v ariation on the constrain ts used in [3–5]. Define a p ermutation-invariant estimator f : X K → S which pr od uces an estimate ˆ S = f ( X K ) of the h ost s ignal sample based on the co r r esp onding mark ed samples. 2 The estimator could b e, e.g., a maxim um-like liho o d estimator. T hen W K ( p x K ) = ( p Y | X K : X x K ,y p x K ( x K ) p Y | X K ( y | x K ) d 2 ( f ( x K ) , y ) ≤ D 2 ) (2.12) where d 2 : S × Y → R + is the coalitio n’s distortion fu n ction, and D 2 is the maxim um allo wed distortion. Th e constrain t (2.10) ma y b e equiv alen tly wr itten as P r " d N 2 ( f ( x K ) , y ) = 1 N N X t =1 d 2 ( f ( x K ,t ) , y t ) ≤ D 2 # = 1 . (2.13) 2. Interlea ving A tta ck. Here eac h col lud er cont rib utes N /K samples to the forgery – tak en at arbitrary p ositions. T he class W K is a sin gleton: p Y | X K ( y | x K ) = 1 K X k ∈K 1 { y = x k } . (2.14) 3. Boneh-Shaw Marking Assumption. Assum e X = Y and W K is the set of cond itional p.m.f.’s that satisfy x 1 = · · · = x K ⇒ y = x 1 . (2.15) Then the constraint (2.10) enforces the Boneh-Sh a w marking assump tion : the c olluders are not allo wed to mo dify their samples a t an y lo cation where these s amples agree. Thus y t = x m 1 ,t at any p osition 1 ≤ t ≤ N su c h that x m 1 ,t = · · · = x m K ,t . Note th at W K do es not dep end on p x K and that the interlea ving attac k (2.14) satisfies the Boneh-Sha w condition. 2 A permutation-inv ariant estimator dep ends on the samples { X k , k ∈ K} only via their empirical distribution on X . 8 2.4 Strongly Excha ngeable Collusion Channels Recall the definition of RM co des in (2. 6 ); a du al notion app lies to co llusion c hannels. F or an y p Y | X K and p erm utation π of { 1 , 2 , · · · , N } , define the p ermuted c hann el p π Y | X K ( y | x K ) , p Y | X K ( π y | π x K ). Then w e ha v e Definition 2.4 [4] A strongly exchangeable collusion c hannel p Y | X K is a channel such that p π Y | X K ( y | x K ) is indep endent of π , for e very ( x K , y ) . A strongly exc hangeable collusion c hannel is defin ed by a pr obabilit y assignment P r [ T y | x K ] on the conditional typ e classes. The distribu tion of Y conditioned on Y ∈ T y | x K is uniform: p Y | X K ( ˜ y | x K ) = P r [ T y | x K ] | T y | x K | , ∀ ˜ y ∈ T y | x K . (2.16) In Sec. 2.6 we sho w that for RM co des ( f N , g N ), it is suffi cient to consider strongly exc hange- able collusion c hann els to derive w orst-case error p robabilities. Moreo v er, in the error p robabilit y calculatio ns for rand om cod es it will b e sufficien t to use th e trivial up p er b ound P r [ T y | x K ] ≤ 1 { p y | x K ∈ W K ( p x K ) } . (2.17) 2.5 F air Coalitions Tw o notions of fairness for coalitions will b e usefu l. Denote by π a p ermutat ion of { 1 , 2 , · · · , K } . Definition 2.5 The c ol lusion channel p Y | X K is p erm uta t ion-in v arian t if p Y | X K ( y | x m 1 , · · · , x m K ) = p Y | X K ( y | x π ( m 1 ) , · · · , x π ( m K ) ) , ∀ π . ( 2.18) F or instance, if X = Y and K = 2, the collusion c hannel p Y | X 1 X 2 ( y | x 1 , x 2 ) = 1 2 [ 1 { y = x 1 } + 1 { y = x 2 } ] (2.19) is p erm utation-in v arian t. Giv en x 1 , x 2 , there are t wo equally lik ely choice s for the pirated cop y , namely y = x 1 and y = x 2 . Note that one coll u d er carries full risk an d the other one zero risk. A stronger definition of fairness (whic h will not b e n eeded in this pap er) w ould requir e some kind of ergo dic b eha vior of the inputs and output of the collusion c hann el. Definition 2.6 The c ol lusion channel p Y | X K is first-order fair if P r [ p y | x K ∈ W fair K ( p x K )] = 1 . F or an y first-order fair collusion channel, the conditional t yp e p y | x K is inv arian t to permutations of the colluders, with probabilit y 1. F or instance, if X = Y an d K = 2, an y collusion c hannel p Y | X K resulting in the conditional typ e p y | x 1 x 2 ( y | x 1 , x 2 ) = 1 2 [ 1 { y = x 1 } + 1 { y = x 2 } ] is first-order 9 fair. This is an interlea ving attac k in whic h eac h colluder con tributes exactly N / 2 samples (in an y order) to the pir ated cop y . A first-order fair collusion channel is not necessarily p erm utation-in v arian t, and vice-v ers a. F ur- ther, if a collusion c h annel is first-order fair and strongly exc hangeable, then it is also p erm utation- in v arian t. Ho w eve r the con verse is not tr ue. F or instance the collusion chanel of (2.19) is p erm utation-inv ariant and strongly exc hangeable bu t not fi rst-order fair b ecause the conditional t yp e p y | x K ( y | x 1 , x 2 ) is g iven by either 1 { y = x 1 } or 1 { y = x 2 } , neit her of wh ic h is p ermutati on- in v arian t. 2.6 Error Probabilities Let K b e the coalition and ˆ K = g N ( Y , S , V ) the deco der’s outp ut. There are sev eral error pr oba- bilities of in terest: the probabilit y of false positives (one or more inno cen t u sers are accused): P FP ( f N , g N , p Y | X K ) = P r [ ˆ K \ K 6 = ∅ ] , (2.20) the probabilit y of missed detecti on for a sp ecific coalition member m ∈ K : P e,m ( f N , g N , p Y | X K ) = P r [ m / ∈ ˆ K ] , the probabilit y of failing to catc h a single colluder: P one e ( f N , g N , p Y | X K ) = P r [ ˆ K ∩ K = ∅ ] , (2.21) and the p r obabilit y of failing to catc h the full coalitio n: P all e ( f N , g N , p Y | X K ) = P r [ K 6⊆ ˆ K ] . (2.22) The error criteria (2.21) and (2.22) will b e referr ed to as the detect-one and det e ct- a ll criteria, resp ectiv ely . The ab o v e error probabilities ma y b e w ritten in the explicit form P e ( f N , g N , p Y | X K ) = X v, s , x K , y p V ( v ) p N S ( s ) Y m ∈K 1 { x m = f N ( s , v , m ) } ! p Y | X K ( y | x K ) 1 {E } (2.23) where the error even t E is giv en by E FP = { g N ( y , s , v ) \ K 6 = ∅} , or E one = { g N ( y , s , v ) ∩ K = ∅} , or E all = {K 6⊆ g N ( y , s , v ) } , when P e is giv en b y (2.20), (2.21), a nd (2.22), resp ectiv ely . The w orst-case probabilit y is giv en by P e ( f N , g N , W K ) = max p Y | X K P e ( f N , g N , p Y | X K ) where the maximum is ov er all feasible collusion c hannels, i.e., suc h that (2.10) holds. Maxim um vs av erage error probabilit y . The error probabilities (2.20)—(2.22) generally dep end on K . Prop. 2.1 b elo w states that (a) in order to make them indep endent of K and pro vide guaran tees on error pr obabilit y for any coalition, one ma y use RP co des, and (b) random p erm utations of fi ngerprin t assignmen ts cannot increase the a verag e error probabilit y of an y co de. Let ( ˜ f N , ˜ g N ) b e an arbitrary code and ( f N , g N ) the RP co de of (2.7), obtained using ( ˜ f N , ˜ g N ) as a protot yp e. Let p Y | X K b e an arbitrary collusion c hannel when coalition K is in effect. Given an y other coalit ion K ′ = π ( K ) of the same size, let p Y | X K ′ b e the corresp onding collusion c hann el, obtained b y applying (2.8), where π is no w a p erm utation of { 1 , · · · , 2 N R } . 10 Prop osition 2.1 F or any c o de ˜ f N , ˜ g N and c ol lu si on ch annel p Y | X K , we have ∀K ′ : P e ( f N , g N , p Y | X K ′ ) = P e ( f N , g N , p Y | X K ) ≤ max K ′ P e ( ˜ f N , ˜ g N , p Y | X K ′ ) (2.24) wher e ( f N , g N ) is the RP c o de of (2.7), and P e denotes any of the err or pr ob ability c riteria (2.20), (2.21), and (2.22). Pr o of . First consider the detect-one error criterion of (2.21): an error arises if g N ( Y , S , V ) ∩ K = ∅ . Giv en a RP fingerp rin ting code with p r otot yp e ( ˜ f N , ˜ g N ) and p erm utation parameter π , the detect- one error probabilit y when coa lition K is in effect is giv en b y P one e ( f N , g N , p Y | X K ) = P r [ g N ( Y , S , V ) ∩ K = ∅ ] = P r [ ˜ g π N ( Y , S , W ) ∩ K = ∅ ] = P r [ π ( ˜ g N ( Y , S , W )) ∩ K = ∅ ] = P r [ ˜ g N ( Y , S , W ) ∩ π − 1 ( K ) = ∅ ] = E Y , S ,W 1 2 N R ! X π 1 { ˜ g N ( Y , S , W ) ∩ π − 1 ( K ) = ∅} | {z } independent of K (2.25) whic h is in d ep enden t of K , by virtu e of the uniform distribution on π . The deriv ation for the detect-all an d the false-positive error probabilities is analogous to (2.25). This establishes the first equalit y in (2.24). The inequalit y is pro ved similarly . ✷ F alse P ositiv es vs F alse Negatives. The tradeoff b et ween false positiv es and false negativ es is central to statistica l detectio n theory (the Neyman-Pearson problem) and list deco ding [15]. Note that in the cla ssical formula tion of list decodin g [16, p. 166 ], an error is d ecla red only if the message sent do es not app ear on the deco der’s outp ut list. T h e false-negati ve error exp onen t increases with list siz e and approac hes the sphere pac king exp onent if the list size is allo wed to grow sub exp onentia lly with N . This classical form ulation do es not include a cost for “false p ositive s”. 2.7 Strongly Excha ngeable Collusion Channels Prop. 2.2 b elo w states that randomly mo dulated cod es (Def. 2.2) and str on gly exc h angeable chan- nels (Def. 2.4) satisfy a certain equilibrium prop erty: neither the fi n gerprin t em b edder nor the coaliti on has in terest in deviating fr om those strategies. Let ( ˜ f N , ˜ g N ) b e an arb itrary co d e and ( f N , g N ) the RM co de of (2.6), obtained usin g ˜ f N , ˜ g N as a pr otot yp e. Give n any feasible collusion c hannel p Y | X K , d enote by p Y | X K ( y | x K ) = 1 N ! X π p Y | X K ( π y | π x K ) (2.26) the p erm utation-a v eraged c hann el, wh ic h is feasible and strongly exc han geable. Prop osition 2.2 F or any c o de ˜ f N , ˜ g N and c ol lu si on ch annel p Y | X K , we have P e ( f N , g N , p Y | X K ) = P e ( f N , g N , p Y | X K ) = P e ( ˜ f N , ˜ g N , p Y | X K ) ≤ max π P e ( ˜ f N , ˜ g N , p π Y | X K ) (2.27) 11 wher e ( f N , g N ) is the RM c o de of (2.6) and P e denotes any of the err or pr ob ability criteria (2.20), (2.21), and (2.22). Pr o of. First consider the detect -one error criterion of (2.21): an error arises if ˜ g N ( Y , S , V ) ∩ K = ∅ . F or an y fixed K , the detect-one error probability is an a verag e o ve r all p ossible p erm utations π and the other rand om v ariables V , S , Y : P one e ( f N , g N , p Y | X K ) ( a ) = 1 N ! X π X w , s , x K , y p W ( w ) p N S ( s ) Y m ∈K 1 { π x m = ˜ f N ( π s , w , m ) } ! p Y | X K ( y | x K ) × 1 { ˜ g N ( π y , π s , w ) ∩ K = ∅} ( b ) = 1 N ! X π X w , s , x K , y p W ( w ) p N S ( π − 1 s ) Y m ∈K 1 { x m = ˜ f N ( s , w , m ) } ! p Y | X K ( π − 1 y | π − 1 x K ) × 1 { ˜ g N ( y , s , w ) ∩ K = ∅} ( c ) = X w , s , x K , y p W ( w ) p N S ( s ) Y m ∈K 1 { x m = ˜ f N ( s , w , m ) } ! 1 N ! X π p Y | X K ( π − 1 y | π − 1 x K ) ! × 1 { ˜ g N ( y , s , w ) ∩ K = ∅} = X w , s , x K , y p W ( w ) p N S ( s ) Y m ∈K 1 { x m = ˜ f N ( s , w , m ) } ! 1 N ! X π p π Y | X K ( y | x K ) ! × 1 { ˜ g N ( y , s , w ) ∩ K = ∅} = X w , s , x K , y p W ( w ) p N S ( s ) Y m ∈K 1 { x m = ˜ f N ( s , w , m ) } ! p Y | X K ( y | x K ) 1 { ˜ g N ( y , s , w ) ∩ K = ∅} = P one e ( f N , g N , p Y | X K ) (2.28) where (a) holds by d efi nition of the RM co de, (b) is obtained by applying the c hange of v ariables z ← π z to the sequences s , x K , y , and (c) the fact that p N S ( s ) = p N S ( π s ). The deriv ation for the detect-all an d the false-positive error probabilities is analogous to (2.28). This establishes the first equalit y in (2.27). The second equalit y and the inequalit y are pro v ed similarly . ✷ 2.8 Risk for F air Coalitions The maxim um and th e minimum of the error p robabilities P e,m ( K ) , m ∈ K , will b e u s eful. T h e maxim um v alue, P e ( f N , g N , p Y | X K ) = max m ∈K P e,m ( f N , g N , p Y | X K ) , (2.29) is the conv entional error criterion for inf orm atio n tr an s mission. Ho w ev er, the minim um v alue, P e ( f N , g N , p Y | X K ) = min m ∈K P e,m ( f N , g N , p Y | X K ) , (2.30) is more relev an t to the coalition b ecause it r ep resen ts the risk of their m ost vulnerable mem b er. Note that P one e ( f N , g N , p Y | X K ) ≤ P e ( f N , g N , p Y | X K ) ≤ P e ( f N , g N , p Y | X K ) ≤ P all e ( f N , g N , p Y | X K ) . 12 While it is concei v able that some colluders could b e tric ked or coer ced into taking a h igher risk than others, suc h str ategy is not secure b ecause the whole coalition w ould b e at r isk if some of its memb ers, esp ecially the vuln erable ones, are caught. The pr oof of the follo win g pr op ositio n is elemen tary . Prop osition 2.3 F or r andomly p ermute d c o des (Def. 2.3), if the c ol lusion channel p Y | X K is p ermutation-invariant, then al l c ol luders i ncur the same risk: P e ( f N , g N , p Y | X K ) = P e ( f N , g N , p Y | X K ) . The pro of of the follo wing prop osition is omitted b ecause it is similar to that of Pr op. 2.2. Assuming the fingerprint d istributor uses R P co des, it follo ws f r om Pr op. 2.4 that p erm utation- in v arian t collusion c hann els are optimal for the colluders u n der the detect-one error criterion. Prop osition 2.4 F or r andomly p ermute d c o des, the maximum of the e rr or pr ob ability criteria (2.20) and (2.21) is achieve d by a p ermutation-invariant c ol lusion cha nnel ((2 .18)) under the dete ct- one criterion. T ak en toge ther with Pr op . 2.1 on optimalit y of r andomly-p erm uted fingerprinting co d es, Prop. 2.4 implies an equilibrium prop erty: neither the fin gerprin t emb ed der nor th e coalition has in terest in deviating from these symmetric strategies, un der the detect-one criterion. 2.9 Capacit y Ha ving defin ed the detect -one and detect-all error criteria and f easible cl asses of cod es and coll us ion c hannels, we ma y n ow defin e the corresp onding notions of fin gerprin ting capacit y . Definition 2.7 A r ate R is achievable f or e mb e dding distortion D 1 , c ol lusion class W K , and detect-one criterion if ther e exists a se quenc e of ( N , ⌈ 2 N R ⌉ ) r andomize d c o des ( f N , g N ) with max- imum emb e dding distortion D 1 , such that b oth P one e ( f N , g N , W K ) and P FP ( f N , g N , W K ) vanish as N → ∞ . Definition 2.8 A r ate R is achievable f or e mb e dding distortion D 1 , c ol lusion class W K , and detect-all criterion i f ther e exists a se quenc e of ( N , ⌈ 2 N R ⌉ ) r andomize d c o des ( f N , g N ) with max- imum emb e dding distortion D 1 , such that b oth P all e ( f N , g N , W K ) and P FP ( f N , g N , W K ) vanish as N → ∞ . Definition 2.9 Fingerprinting c ap acities C one ( D 1 , W K ) and C all ( D 1 , W K ) ar e the supr ema of al l achievable r ates with r esp e c t to the dete ct-one and dete ct- al l criteria, r esp e ctively. W e h a ve C all ( D 1 , W K ) ≤ C one ( D 1 , W K ) b ecause an error ev en t for the detect-one problem is also an error even t for the detect-all problem. 13 2.10 Random-Coding Exp onen ts F or a sequence of rand omized co des ( f N , g N ), the error exp onents are defined as E ( R , D 1 , W K ) = lim in f N →∞ − 1 N log P e ( f N , g N , W K ) where E repr esen ts th e r andom co ding exp onen t E FP , E one , or E all . Moreo v er, E all ( R, D 1 , W K ) ≤ E one ( R, D 1 , W K ) b ecause an error even t for the d etect -one problem is also an error ev ent for the detect-all p roblem. W e ha ve E all = 0 if the class W K includes c hannels in whic h one colluder can “sta y out,” i.e., not con tribute to the pir ated copy . Fig. 2 gives a preview of E one and E FP for our random co ding sc h eme, view ed as a function of the num b er K of colluders. The false-p ositiv e exp onent E FP is equal to ∆, for an y v alue of K . The false-negativ e exp onen t E one decreases with K , up to some maxim um v alue K R, ∆ where it b ecomes zero. The d ecoder o utp uts ˆ K = ∅ with high probabilit y , and therefore r eliable d ecodin g of an y colluder is imp ossib le, for an y K ≥ K R, ∆ . Fig. 3 illustrates the maxim u m rate R ( K, ∆) that can b e accommo dated by the random co ding sc heme, for fixed ∆. This rate decreases with K and b ecomes zero for K ≥ K ∆ . If ∆ ↓ 0, the rate curv e R ( K, ∆) tends to the capacit y function C ( K ). Note that C ( K ) v anishes as K → ∞ but is generally p ositiv e for an y finite K ; in this case, lim ∆ → 0 K ∆ = ∞ . 1 nom K R, ∆ E FP (K, ) ∆ E one Error Exponents K ∆ K Figure 2: F alse-p ositiv e and false-negat ive error exp onen ts, as a fun ction of coa lition size K , for fixed v alues of R and ∆ . 14 C(K) K ∆ K ∆ R(K, ) Rate K 1 nom Figure 3: C apacit y C and ac hiev able rate R (for false-p ositiv e error exp onent equal to ∆), as a function of coal ition size K . 2.11 Memoryless Collusion Channels As an alternativ e to the collusion c hann els sub j ect to the h ard constrain t P r [ p y | x K ∈ W K ( p x K )] = 1, w e ma y consider memoryless collusion c hannels: p Y | X K ( y | x K ) = N Y t =1 p Y | X K ( y t | x K ,t ) (2.31) where p Y | X K ∈ W K ( p x K ), view ed as a c omp ound class of c hannels [12 ]. As w e sh all see there is a strong lin k b et w een the tw o problems in th e form of Lemma 3.3 whic h is used to esta blish our con v erse theorems; also s ee Sec. 6. 3 Fingerprin ting Capacit y This section p resen ts fingerprinting capacit y form ulas u nder the detect-one and detect- all error criteria. T o put these results in con text, let us first recall related results for MA Cs. In th e absen ce of side inform atio n, the capacit y region of th e MA C w as determined by Ahlsw ede [17] and Lia o [1 8]. This region is also ac h iev able for the random MA C [11]. F or the MA C with common side information at the transmitter and receiv er, some v ery general capacit y form ulas w ere d eriv ed by Das and Nara y an [19 ] under the assumption that S is an er go dic p ro cess. In some sp ecia l cases these form ulas can b e sin gle-l etterized. F o r fi ngerprin ting with i.i.d. S and coalitio n size equal to 2, b ounds on capacit y were deriv ed in [4, 5]. Thus the presence of the sid e information S causes difficulties in deriving sin gle- letter capacit y formulas for b oth MA C and fingerprint ing problems. The pro of of t h e conv ers e under the detect -all criterio n is based on the standard F ano in equalit y . Surp r isingly , F ano’s inequ alit y do es not seem to b e the right tool to prov e the conv erse un der the detect-one criterion [2 0]. A strong c onv ers e was pr esen ted in [10], but the resulting up p er b ound 15 on capacit y is lo ose. The directio n w e ha ve pursued is based on explici t sphere-pac king argument s, sp ecifically the fact that t ypical sets for Y giv en the em b edded fingerprint s cannot ha v e to o m uc h statistica l o v erlap, otherwise reliable decod ing is imp ossible. T he to ols used here are differen t from those used f or c lassical problems suc h as the single-user discrete memoryless c hannel [16, pp. 173— 176] and the MA C [21]. Th e use of a detect-o ne criterion requires a differen t mac hinery . A simp le tec hnique is us ed to dea l with co dew ord pairs whose self-information sco re is w ell a b ov e a verage , and suffices to show that the error pr obabilit y cannot v anish for rates a b ov e capacit y . W e conjecture that a strong con verse holds, n amely: for any r ate ab o ve capaci ty , lim N →∞ min f N ,g N max { P one e ( f N , g N , W K ) , P FP ( f N , g N , W K ) } = 1 . Ho w eve r, esta blish ing this stronger result ma y require use of elab orate wr inging tec hniqu es [21]. Our lo w er b ound on error probabilit y do es not tend to 1 as N → ∞ b ecause th e b ound (8.50) is lik ely lo ose. 3.1 Mutual-Information Games The follo w ing lemma relate s to Han’s inequalities [22] and will b e useful throughout this paper. Its pro of app ears in App endix A. Lemma 3.1 L et K = { 1 , 2 , · · · , K } and assume the distribution of ( X K , Z ) is invariant to p ermu- tations of K . Then for any neste d sets A ⊆ B ⊆ K , we have 1 | A | H ( X A | Z X K \ A ) ≤ 1 | B | H ( X B | Z X K \ B ) , (3.1) 1 | A | H ( X A | Z ) ≥ 1 | B | H ( X B | Z ) . (3.2) Both ine qualities hold with e quality if X k , k ∈ K , ar e c onditional ly indep endent given Z . W e will deriv e tw o simple formulas b y application of this lemma. First, applying (3.1) with Z = ( Y , S, W ) and (3.2) with Z = ( S, W ) and subtr act ing the first inequalit y from the second, we obtain 1 | A | I ( X A ; Y X K \ A | S W ) ≥ 1 | B | I ( X B ; Y X K \ B | S W ) , ∀ A ⊆ B ⊆ K (3.3) with equalit y if X k , k ∈ K , are conditionally indep enden t giv en Z . S econd, for X k , k ∈ K co nd i- tionally i.i.d. giv en ( S, W ), w e ha ve I ( X 1 ; Y | S, W ) = H ( X 1 | S, W ) − H ( X 1 | Y , S, W ) = 1 K H ( X K | S, W ) − H ( X 1 | Y , S, W ) ≤ 1 K H ( X K | S, W ) − 1 K H ( X K | Y , S, W ) = 1 K I ( X K ; Y | S, W ) (3.4) 16 where the inequ alit y follo ws from (3.2) with Z = ( Y , S, W ). No w consider an auxilia ry rand om v ariable W defin ed o ve r an a lph ab et W = { 1 , 2 , · · · , L } , and indep endent of S . Define the set of cond itional p .m.f.’s P X K W | S ( p S , L, D 1 ) , ( p X K W | S = p W Y k ∈ K p X k | S W : p X 1 | S W = · · · = p X K | S W , E d ( S, X 1 ) ≤ D 1 ) (3.5) and the fu nctions C one L ( D 1 , W K ) = max p X K W | S ∈ P X K W | S ( p S ,L,D 1 ) min p Y | X K ∈ W fair K ( p X K ) 1 K I ( X K ; Y | S, W ) (3.6) C all L ( D 1 , W K ) = max p X K W | S ∈ P X K W | S ( p S ,L,D 1 ) min p Y | X K ∈ W K ( p X K ) min A ⊆ K 1 | A | I ( X A ; Y | S, X K \ A , W ) . (3.7) Using the same d eriv ation as in Lemma 2. 1 of [14], it is easily sho wn that C one L ( D 1 , W K ) and C all L ( D 1 , W K ) are nondecreasing fu nctions of L and con v erge to finite limits: e C one ( D 1 , W K ) , lim L →∞ C one L ( D 1 , W K ) (3.8) e C all ( D 1 , W K ) , lim L →∞ C all L ( D 1 , W K ) . (3.9) Moreo v er, t he gap to eac h limit ma y b e b ounded b y a p olynomial function of L , see [14, Sec. 3.5] for a similar deriv ation. Th e basic idea is to d iscretize eac h W K ( p X K ) to a fin e grid of ˜ L collusion c hannels. By applicatio n of Caratheodory’s theorem, t he supremum of C L o v er L is ac h iev ed by L ≤ |S | |X | + ˜ L . The gap betw een the m in im um of the cost function ov er W K ( p X K ) and o ver its discrete appro ximation can b e b ounded by c ˜ L −|Y | − 1 |X | − K where c is a constan t. Since W fair K ( p X K ) ⊆ W K ( p X K ), w e hav e e C all ( D 1 , W fair K ) ≥ e C all ( D 1 , W K ). I n fact the righ t side is zero if W K ( p X K ) con tains cond itional p.m.f.’s p Y | X K suc h that Y is indep endent of one of the inputs X k , k ∈ K . Lemma 3.2 F or any D 1 and W K we have e C all ( D 1 , W K ) ≤ e C one ( D 1 , W K ) . (3.10) Equality holds for any class of fair c ol lusion channels ( W K = W fair K ). Pr o of : Prop ert y (3.10) follo ws from (3. 6 )—(3.9 ) and t he fact that W fair K ( p X K ) ⊆ W K ( p X K ). No w consider W K = W fair K . Application of Prop ert y (3.3) to any fair collusion c hann el yields 1 | K | I ( X K ; Y | S, W ) ≤ 1 | A | I ( X A ; Y | X K \ A , S, W ) , ∀ A ⊆ K . Hence the inner minimum in (3.7) is ac hiev ed by A = K , and equalit y holds in (3.1 0 ). ✷ 17 3.2 Capacit y Theorems The follo wing lemma will b e used to pro ve Theorems 3.5 and 3.7 b elo w. Its pro of is giv en in App endix B and b orrows ideas from [14, Th eorem 3.7]. Lemma 3.3 Consider the c omp ound family W K ( p x K ) of memoryless channels in (2.31). Under b oth the dete ct-one and dete ct-al l criteria, the c omp ound c ap acity for this pr oblem is an upp er b ound on the c ap acity for the main pr oblem of (2.10), in which p y | x K ∈ W K ( p x K ) with pr ob ability 1. W e n o w giv e a direct co ding theorem 3.4 and t wo con v erse theorems 3.5 and 3.7 p ertaining to the dete ct-all and the detect-one criteria, resp ectiv ely . These th eorems, combined with Lemma 3 .2, establish the capacit y theorem 3.8. Theorem 3.4 Under the c ontinuity assumption (2.11), al l fingerprinting c o de r ates b elow e C all ( D 1 , W K ) and e C one ( D 1 , W K ) ar e achievable under the dete c t- al l and the dete ct- one criteria, r esp e ctively. Theorem 3.4 is a direct consequen ce of Theorem 5.2(vi), stated and pro ved later in this p ap er. Theorem 3.5 When W K is indep endent of p x K , no finge rprinting c o de r ate R exc e e ding e C all ( D 1 , W K ) is achievable under the dete ct-al l c riterion. The same holds for the c omp ound mem- oryless class of (2.31). Corollary 3.6 Under the c ontinuity assumption (2.11), when W K dep ends on p x K , the fol lowing holds. If the c ol luders ar e c onstr aine d to sele ct a fair c ol lusion channel, then W K ( p x K ) = W fair K ( p x K ) , and no r ate ab ove e C all ( D 1 , W fair K ) is achievable u nder the dete ct-al l criterion. The pro of of Theorem 3.5 and Corollary 3.6 is giv en in Sec. 7. Theorem 3.7 When W K is indep endent of p x K , no fingerprinting c o de r ate R exc e e ding e C one ( D 1 , W fair K ) = e C one ( D 1 , W K ) (3.11) is achievable under the dete c t- one criterion. The same hold s for the c omp ound memoryless class of (2.31). The pro of of Theorem 3.7 is give n in Sec. 8 . Theorem 3.8 Consider fingerprinting for c o alitions of size at mo st K . L e t W K b e the set of al l c onditional distributions p Y | X K (c ol lusion attacks) that c an b e sele cte d by the c o alition. (a) Det ect-all case. Fingerprinting c ap acity is lower-b ounde d by e C all ( D 1 , W K ) given by (3.9). If in addition one of the fol lowing holds: 18 (i) The set W K of attacks available to the c o alition is indep endent of the joint typ e of the fingerprints p x K assigne d to the c o alition; or (ii) F or eve ry p x K , the set W K ( p x K ) of attacks given the joint typ e p x K c ontains only p ermutation-invariant attacks ( W K ( p x K ) = W fair K ( p x K ) ), then fingerprint c ap acity under the dete ct-al l criterion is e qual to e C all ( D 1 , W K ) . (b) Dete ct- one case. Fingerprinting c ap acity i s lower-b ounde d by e C one ( D 1 , W K ) given b y (3.8). If i n addition the set W K of attacks available to the c o alition is indep endent of the joint typ e of the fingerprints p x K assigne d to the c o alition, th en e C one ( D 1 , W K ) = e C one ( D 1 , W fair K ) = e C all ( D 1 , W fair K ) , and fingerprint c ap acity under the dete ct-one criterion is e qual to this c ommon value. The low er b ound s on fingerp rin ting capacit y derived in [4, 5] are of the form (3.6) with L = 1, i.e., the auxiliary random v ariable W is degenerat e. Since the pay off function I p S p K X | S p Y | X K ( X K ; Y | S ) is generally n onconca ve with resp ect to p X | S , a randomized strategy in whic h the v ariable p X | S is randomized will generally outp erform a deterministic str ategy in whic h p X | S is fixed. The a ux iliary random v ariable W pla ys the role of selector of p X | S in this mutual-information game . Apparent ly the b enefits o f th is randomization can b e dramatic f or large K . F or the Boneh-S h a w problem, the v alue of the maxmin of (3.6) with L = 1 is C one 1 ( D 1 , W K ) = K − 1 2 − ( K − 1) . Ho we ver T ardos’ scheme [9 ] us es W = [0 , 1] and ac hiev es a rate O ( K − 2 ) whic h is therefore m uch larger than C one 1 ( D 1 , W K ) f or large K . The rate of his co de is necessarily a low er b ound on C one ( D 1 , W K ). 4 Simple Fingerprin t Deco d er This section int ro du ces our rand om co ding scheme and a simple deco der that tests candidate fin- gerprint s one by one. Th is d ecoder is closely related to the correlat ion d ecoders that hav e b een used in T ardos’ pap er [9] and in the signal pr ocessing lit erature. (Such deco ders ev aluate a mea su re of corr elation b et wee n the receiv ed sequence and the individual fin gerprin ts, and retain the fin ger- prints whose correlation score is ab o ve a certain thr eshold.) W e deriv e err or exp onen ts for this sc heme and establish maxim u m rates for reliable decoding. These rate s fall sh ort of the fingerprint - ing capacitie s C all ( D 1 , W K ) and C one ( D 1 , W K ) giv en by Theorem 3.5 and 3.7. Th e deriv ations are giv en for the c ase without side information ( S = ∅ ) or distortio n constraint ( D 1 ) for the fin gerprin t distributor. T his setup is directly applicable to the Boneh-Sha w mo del, and the deriv ations are m uch easier to follo w. This setup also cont ains several ke y ingredients of the err or analysis for the more elab orate joint fingerprint decoder of Sec. 5. In particular, the false-negativ e error exp onents are determined b y the w orst conditional t yp e T yx K | w . 4.1 Co deb o ok The sc heme is designed to ac hiev e a false-p ositiv e err or e xp on ent equal to ∆ and assu mes a nomina l value K nom for coalition size. (Reliable d ecoding will generally b e p ossib le for K > K nom though.) 19 These paramet ers are used to iden tify a join t type cla ss T ∗ wx defined belo w (9.4). An arbitrarily large L is selected, defining an alphab et W = { 1 , 2 , · · · , L } . A random constan t-comp osition co de C ( w ) = { x m , 1 ≤ m ≤ 2 N R } is generated for eac h w ∈ T ∗ w b y dra wing 2 N R sequences indep endent ly and uniformly f rom the conditional t yp e class T ∗ x | w . 4.2 Enco ding Sc heme A s equence W is dra wn u niformly fr om the typ e class T ∗ w and sh ared with the receiv er. User m is assigned cod ew ord x m from C ( W ), for 1 ≤ m ≤ 2 N R . 4.3 Deco ding Sc heme The receiv er mak es an inno cen t/guilt y decision on eac h user indep e ndently of the other users , and there lies the simplicit y but also the sub optimalit y of this d ecoder. Sp ecifically , the estimated coaliti on ˆ K is the colle ction of all m su c h that I ( x m ; y | w ) > R + ∆ . (4.1) If no such ˆ K is found, the receiv er outputs ˆ K = ∅ . T he users whose empirical mutual information score exceeds th e thr eshold R + ∆ are declared guilt y . 4.4 Error Exp onen ts Theorem 4.1 b elo w gi ves the false-p ositiv e and false-nega tive error exp onent s for this cod ing scheme. These exp onen ts are given in term s of the functions defined b elo w. Define the set of conditional p.m.f.’s for X K giv en W wh ose co nd itional marginals are the same for all comp onents of X K : M ( p X | W ) = { p X K | W : p X m | W = p X | W , ∀ m ∈ K } . Denote by P X W ( L ) the set of p.m.f.’s p X W defined ov er X × W . Define for eac h m ∈ K the set of conditional p.m.f.’s P Y X K | W ( p X W , W K , R, L, m ) , ˜ p Y X K | W : ˜ p X K | W ∈ M ( p X | W ) , ˜ p Y | X K ∈ W K ( ˜ p X K ) , I ˜ p Y X K | W p W ( X m ; Y | W ) ≤ R o (4.2) and the pseudo spher e p acking exp onent ˜ E psp ,m ( R, L, p X W , W K ) = min ˜ p Y X K | W ∈ P Y X K | W ( p X W , W K ,R,L,m ) D ( ˜ p Y X K | W k ˜ p Y | X K p K X | W | p W ) . (4.3) The termin olog y pseudo spher e-p acking exp onent is used b ecause despite its sup erficial resem blance to a sph ere-pac kin g exp onent [12], (4. 3 ) does n ot p r o vide a f u ndamen tal asymptotic low er b ound on error p robabilit y . 20 T aking the maxim um and minimum of ˜ E psp ,m ab o v e o ve r m ∈ K , w e r esp ectiv ely define ˜ E psp ( R, L, p X W , W K ) = max m ∈ K ˜ E psp ,m ( R, L, p X W , W K ) , (4.4) ˜ E psp ( R, L, p X W , W K ) = min m ∈ K ˜ E psp ,m ( R, L, p X W , W K ) . (4.5) If these expressions are ev aluated for the s et W fair K whic h is p ermuta tion in v arian t, then (4.2) and (4.3) are in dep enden t of m ∈ K , and the expressions (4.4) and (4.5) coincide. Define E psp ( R, L, W K ) = max p X W ∈ P X W ( L ) ˜ E psp , 1 ( R, L, p X W , W fair K nom ) . (4.6) Denote b y p ∗ X W the maximize r in (4.6), whic h dep ends on R and W fair K nom . Finally , define E psp ( R, L, W K ) = ˜ E psp ( R, L, p ∗ X W , W K ) , (4.7) E psp ( R, L, W K ) = ˜ E psp ( R, L, p ∗ X W , W K ) , (4.8) where no fairn ess requ ir emen t is imp osed on W K . Theorem 4.1 The thr eshold de cision rule (4.1) yields the fol lowing err or exp onents. (i) The false-p ositive err or e xp onent is E FP ( R, L, W K , ∆) = ∆ . (4.9) (ii) The dete ct-one err or exp onent is E one ( R, L, W K , ∆) = E psp ( R + ∆ , L, W K ) . (4.10) (iii) The dete ct-al l err or exp onent is E all ( R, L, W K , ∆) = E psp ( R + ∆ , L, W K ) . (4.11) (iv) A fair c ol lu si on str ate gy is optimal under the dete ct- one err or criterion: E one ( R, L, W K , ∆) = E one ( R, L, W fair K , ∆) . (v) The dete ct-one and dete ct-al l err or exp onents ar e the same when the c ol luders r estrict their choic e to fair str ate gi e s: E one ( R, L, W fair K , ∆) = E all ( R, L, W fair K , ∆) . (vi) F or K = K nom , the supr emum of al l r ates for w hich the dete ct-one err or exp onent of (4.10) is p ositive is g i ven by C simple ( W K ) = C simple ( W fair K ) = lim L →∞ max p X W ∈ P X W ( L ) min p Y | X K ∈ W fair K ( p X K ) I p W p K X | W p Y | X K ( X 1 ; Y | W ) (4.12) and is achieve d by letting ∆ → 0 and L → ∞ . Note. Applying (3.4) with S = ∅ , w e h av e I ( X 1 ; Y | W ) ≤ 1 K I ( X K ; Y | W ) for an y p erm utation- in v arian t p Y | X K . Since this inequalit y is generally s trict, C simple ( W K ) is generally lo wer than the fingerprinting capacit y C one ( W K ) of (3.8). Hence the simple thr esholding ru le (4.1) is generall y n ot capacit y-ac hieving. 21 5 Join t Fingerprin t Deco d er The enco d er and join t deco der are pr esen ted in this section, and the p erformance of the new sc heme is analyzed. As in the previous section, the enco der ensures a false-p ositiv e error exp on ent ∆ and assumes a nominal value K nom for coalition size. An arbitrarily large L is selecte d, defining an alphab et W = { 1 , 2 , · · · , L } . A random constan t-comp osition co de C ( s , w ) = { x m , 1 ≤ m ≤ 2 N R } is generate d for eac h s ∈ S N and w ∈ T ∗ w b y dra win g 2 N R sequences indep enden tly and u niformly from a cond itional t yp e class T ∗ x | sw . Both T ∗ w and T ∗ x | sw dep end on ∆ and K nom as defined b elo w (10.6). Prior to enco ding, a sequence W ∈ W N is dra w n indep end en tly of S and un iformly fr om T ∗ w , and shared w ith the receiv er. Next, user m is assigned co dewo rd x m ∈ C ( S , W ), for 1 ≤ m ≤ 2 N R . In terms of decoding, the fundamen tal impr o vemen t o ver th e simple strate gy of Sec. 4 resides in the use of a joint decodin g rule. Sp ecifically , the deco der maximizes a p enalized emp irical m utu al information score o ve r all p ossible coalitions of an y s ize. The p enalt y is prop ortional to the size of the coalit ion. 5.1 Mutual Information of k Random V ariables Our fin gerprin t deco ding sc heme is based on the notion of mutual information b et ween k r andom v ariables X 1 , · · · , X k . F or k = 3 , this m utu al inform ation is defined as [12, p . 57] [23, p. 378 ] ◦ I ( X 1 ; X 2 ; X 3 ) = H ( X 1 ) + H ( X 2 ) + H ( X 3 ) − H ( X 1 , X 2 , X 3 ) . W e use the symb ol ◦ I to distinguish it from the sy mb ol I fo r standard mutual information b et ween t w o r andom v ariables. Note the c h ain rule ◦ I ( X 1 ; X 2 ; X 3 ) = I ( X 1 ; X 2 X 3 ) + I ( X 2 ; X 3 ) . The mutual inform atio n b et we en k r andom v ariables X 1 , · · · , X k is simila rly defin ed as the sum of their ind ividual en tropies min us their j oin t entrop y [12, p. 57] or equiv alen tly , the divergence b et w een their join t d istribution and the pro duct of their marginals: ◦ I ( X 1 ; · · · ; X k ) = H ( X 1 ) + · · · + H ( X k ) − H ( X 1 , · · · , X k ) (5.1) = D ( p X 1 ··· X k k p X 1 · · · p X k ) . Note the follo win g prop erties, including the c h ain rules (P3) and (P4): (P1) T he m utu al inf ormation (5.1) is symmetric in its argu m en ts; (P2) ◦ I ( X 1 ; X 2 ) = I ( X 1 ; X 2 ); (P3) ◦ I ( X 1 ; · · · ; X k ) = I ( X 1 ; X 2 · · · X k ) + ◦ I ( X 2 ; · · · ; X k ) = P k − 1 i =1 I ( X i ; X i +1 · · · X k ); (P4) ◦ I ( X 1 ; · · · ; X k ) = ◦ I ( X 1 ; · · · ; X i ; X i +1 · · · X k ) + ◦ I ( X i +1 ; · · · ; X k ) for any i ∈ { 1 , 2 , · · · , k − 2 } ; 22 (P5) ◦ I ( X 1 ; · · · ; X k ) = P k − 1 i =1 H ( X i ) − H ( X 1 · · · X k − 1 | X k ). Similarly to (5.1), w e define the empirical mutual information ◦ I ( x 1 ; · · · ; x k ) b et w een k sequences x 1 , · · · , x k , as the mutual information with resp ect to the joint t yp e of x 1 , · · · , x k . Analogously to Prop ert y (P5), w e ha ve ◦ I ( x 1 ; · · · ; x k ; y ) = k X i =1 H ( x i ) − H ( x 1 · · · x k | y ) . (5.2) This leads to the follo wing alternativ e in terpretation of the minim um -equivocation deco der of Liu and Hughes [23]. If x 1 , · · · , x k are co dew ords from a constant- comp osition code C , th en H ( x i ) is the same for all i , th en th e minim um-equiv o cation deco der is equiv alen t to a maxim u m-m utual- information deco der: min x 1 ··· x k ∈C H ( x 1 · · · x k | y ) ⇔ max x 1 ··· x k ∈C ◦ I ( x 1 ; · · · ; x k ; y ) . (5.3) There is no similar interpretatio n when ordinary m u tu al information I ( x 1 · · · x k ; y ) is u s ed [2 3 ]. Liu and Hughes sho we d th at the minimum-equiv o catio n deco der outp erforms the ordinary maxim um- m utual-information deco der in terms of rand om-co ding exp onen t. 5.2 MPMI Criterion The restriction of x M to a subset A of M will b e d enoted b y x A = { x m , m ∈ A} . F or d isjoin t sets A = { m 1 , · · · , m |A| } and B = { m |A| +1 , · · · , m |A| + |B | } , w e use the shorthand ◦ I ( x A ; yx B | sw ) , ◦ I ( x m 1 ; · · · ; x m |A| ; yx B | sw ) (5.4) for the mutual information b et w een the |A| + 1 r andom v ariables x m 1 , · · · , x m |A| , and ( y , x B ), conditioned on ( s , w ). Define the f unction M P M I ( k ) = 0 : if k = 0 max x K ∈C k ( s , w ) ◦ I ( x K ; y | sw ) − k ( R + ∆ ) : if k = 1 , 2 , · · · (5.5) where k = |K| and ◦ I ( x K ; y | sw ) = ◦ I ( x 1 ; · · · ; x k ; y | sw ) = kH ( x | s w ) − H ( x K | ysw ) (5.6) is the mutual information b et we en the k +1 sequences x 1 , · · · , x k , y , conditioned on ( s , w ), as defi n ed in (5 .4). Again w e stress that ◦ I ( x 1 ; · · · ; x k ; y | sw ) should not b e confused with the ordinary m utual information I ( x 1 · · · x k ; y | sw ) b et we en the k -tup le ( x 1 , · · · , x k ) and y , conditioned on ( s , w ). Our join t fingerpr in t decoder is a Maxim um P enalized Mutual Information (MPMI) deco der: max k ≥ 0 M P M I ( k ) . (5.7) In case of a tie, the largest v alue o f k is reta ined . The deco der seeks the coalition size k and the co dew ords { x m , m ∈ ˆ K} i n C ( s , w ) that ac hieve the M PMI crite rion ab o ve. The i n d ices of t h ese co dew ords form the deco ded coalition ˆ K . If the maximizing k in (5.7) is zero, the receiv er outputs ˆ K = ∅ . Similarly to (5.3), the MPMI decod er ma y equiv alen tly b e in terpr eted as a Minimum P enalized Equivocation criterion. 23 5.3 Prop erties The follo w ing lemma shows that 1) eac h subset of the estimated coalition is significan t, and 2) any extension of the estimated coalition w ould fail a significance test. Lemma 5.1 L et ˆ K achieve the maximum in (5.5) (5.7). Then ∀A ⊆ ˆ K : ◦ I ( x A ; yx ˆ K\A | sw ) > |A| ( R + ∆) . (5.8) Mor e over, for ev ery A disjoint with ˆ K , ◦ I ( x A ; yx ˆ K | sw ) ≤ |A| ( R + ∆) . (5.9) Pr o of . F or any A ⊆ ˆ K , w e hav e ◦ I ( x A ; yx ˆ K\A | sw ) − |A| ( R + ∆) ( a ) = [ ◦ I ( x ˆ K ; y | sw ) − ˆ K ( R + ∆ )] − [ ◦ I ( x ˆ K\A ; y | sw ) − ( ˆ K − |A| ) ( R + ∆)] ( b ) = M P M I ( ˆ K ) − [ ◦ I ( x ˆ K\A ; y | sw ) − ( ˆ K − |A| ) ( R + ∆)] ≥ M P M I ( ˆ K ) − M P M I ( ˆ K − |A | ) ( c ) ≥ 0 where (a ) follo w s f rom the chain r u le for ◦ I , (b) holds b ecause ˆ K ac h iev es the maxim um in (5.5), and (c) b ecause ˆ K ac h iev es the maxim um in (5.7). This pro ve s (5.8). T o prov e (5.9), consider an y A disjoint with ˆ K and let K ′ = ˆ K ∪ A . W e ha ve ◦ I ( x A ; yx ˆ K | sw ) − |A| ( R + ∆) ( a ) = [ ◦ I ( x K ′ ; y | sw ) − K ′ ( R + ∆)] − [ ◦ I ( x ˆ K ; y | sw ) − ˆ K ( R + ∆ )] ( b ) = [ ◦ I ( x K ′ ; y | sw ) − K ′ ( R + ∆)] − M P M I ( ˆ K ) ≤ M P M I ( K ′ ) − M P M I ( ˆ K ) ( c ) ≤ 0 , where (a), (b), (c) are justified in the same wa y as ab o v e. This p ro v es (5.9). ✷ Reliabilit y metric. The score ◦ I ( x ˆ K ; y | sw ) − ˆ K R > ˆ K ∆ represent s a guilt index for the estimated coalit ion ˆ K . The larger this qu an tit y is, th e stronger the evidence that the m em b ers of ˆ K are guilt y . Lik ewise, ◦ I ( x m ; yx ˆ K\{ m } | sw ) − R > ∆ is a guilt index for accused user m ∈ ˆ K , and ◦ I ( x m ; yx ˆ K | sw ) − R ≤ ∆ is a guilt in dex for user m / ∈ ˆ K . The smaller this in dex is, the stronger the evidence that m is inno cen t. 24 5.4 Error Exp onen ts Theorem 5.2 b elo w giv es the false-p ositiv e and false-nega tive error exp onen ts for our c o ding sc heme. These exp onen ts are given in term s of the functions defined b elo w. Recall P X K W | S ( p S , L, D 1 ) defined in (3.5 ). W e similarly defi n e P X K | S W ( p S W , L, D 1 ) , ( p X K | S W = Y k ∈ K p X k | S W : p X 1 | S W = · · · = p X K | S W , E d ( S, X 1 ) ≤ D 1 ) . Define no w the follo wing set of conditional p.m.f.’s for X K giv en S, W wh ose conditional marginal p.m.f. p X | S W is the s ame for eac h X m , m ∈ K : M ( p X | S W ) = { p X K | S W : p X m | S W = p X | S W , ∀ m ∈ K } . Define for eac h A ⊆ K the set of conditional p.m.f.’s P Y X K | S W ( p W , ˜ p S | W , p X | S W , W K , R, L, A ) , ˜ p Y X K | S W : ˜ p X K | S W ∈ M ( p X | S W ) , ˜ p Y | X K ∈ W K ( ˜ p X K ) , 1 | A | ◦ I p W ˜ p S | W ˜ p Y X K | S W ( X A ; Y X K \ A | S, W ) ≤ R (5.10) and the pseudo spher e p acking exp onent ˜ E psp , A ( R, L, p W , ˜ p S | W , p X | S W , W K ) = min ˜ p Y X K | S W ∈ P Y X K | S W ( p W , ˜ p S | W ,p X | S W , W K ,R,L, A ) D ( ˜ p Y X K | S W ˜ p S | W k ˜ p Y | X K p K X | S W p S | p W ) . (5.11) T aking the maxim um 3 and th e min imum of ˜ E psp , A ab o v e ov er all subsets A ⊆ K , we defin e ˜ E psp ( R, L, p W , ˜ p S | W , p X | S W , W K ) = ˜ E psp , K ( R, L, p W , ˜ p S | W , p X | S W , W K ) , (5.12) ˜ E psp ( R, L, p W , ˜ p S | W , p X | S W , W K ) = min A ⊆ K ˜ E psp , A ( R, L, p W , ˜ p S | W , p X | S W , W K ) . (5.13) No w define E psp ( R, L, D 1 , W K ) = max p W ∈ P W min ˜ p S | W ∈ P S | W max p X | S W ∈ P X | S W ( p W , ˜ p S | W ,L,D 1 ) ˜ E psp , K ( R, L, p W , ˜ p S | W , p X | S W , W fair K nom ) . (5.14) Denote b y p ∗ W and p ∗ X | S W the maximizers in (5.14), where the latter is to b e viewed as a f unction of ˜ p S | W . Also note that b oth p ∗ W and p ∗ X | S W implicitly dep end on R and W fair K nom . Finally , defin e E psp ( R, L, D 1 , W K ) = min ˜ p S | W ∈ P S | W ˜ E psp ( R, L, p ∗ W , ˜ p S | W , p ∗ X | S W , W K ) , (5.15) E psp ( R, L, D 1 , W K ) = min ˜ p S | W ∈ P S | W ˜ E psp ( R, L, p ∗ W , ˜ p S | W , p ∗ X | S W , W K ) . (5.16) 3 The p roperty that K achiev es max A ⊆ K ˜ E psp , A is derived in the pro of of Theorem 5.2, Part (iv). 25 Theorem 5.2 The de cision rule (5.7 ) yields the fol lowing err or exp onents. (i) The false-p ositive err or e xp onent is E FP ( R, D 1 , W K , ∆) = ∆ . (5.17) (ii) The err or exp onent for the (false ne gative) pr ob ability tha t the de c o der fails to c atch al l c ol luders (misses some of them) is E all ( R, L, D 1 , W K , ∆) = E psp ( R + ∆ , L, D 1 , W K ) . (5.18) (iii) The err or exp onent f or the (false ne gative) pr ob ability that the de c o der fails to c atch even one c ol luder (misses every si ng le c ol luder) is E one ( R, L, D 1 , W K , ∆) = E psp ( R + ∆ , L, D 1 , W K ) . (5.19) (iv) E one ( R, L, D 1 , W K , ∆) = E one ( R, L, D 1 , W fair K , ∆) . (v) E all ( R, L, D 1 , W fair K , ∆) = E one ( R, L, D 1 , W fair K , ∆) . (vi) If K = K nom , the sup r emum of al l r ates for which the err or exp onents of (5.18) and (5.19 ) ar e p ositive ar e C all ( D 1 , W K ) and C one ( D 1 , W K ) of (3.9) and (3.8), r esp e ctive ly. Note. The expressions (5.18) and (5.19) for the false-negativ e err or exp onents m ay b e viewed as sequences indexed b y L . As discussed b elo w (3.7) and in [14, Sec. 3.5], one may sho w that these sequences are n ondecreasing and conv erge to finite limits at a p olynomial rate. 6 Error Exp onent s for Memoryless Collusion Channels Consider the comp ound class (2.31) of memoryless channels. Th e theorems of Sec. 3 show ed that comp ound capacit y is the same as for the main problem of (2.1 0 ). W e no w outline ho w the deriv ation of the err or exp onents. Retracing the steps of the pro of o f Th eorem 5 .2, it ma y b e seen that the expressions (5 .17 ), (5.18) and (5.19) for the error exp onen ts remain v alid, with t wo mo difications. First, in (5. 10 ), the constrain t ˜ p Y | X K ∈ W K is remov ed, and so th e resulting set P memoryless Y X K | S W is larger than P Y X K | S W of (5.10). Second, the div ergence cost function D ( ˜ p Y X K | S W ˜ p S | W k ˜ p Y | X K p K X | S W p S | p W ) (6.1) in the expression (5.11) for the pseudo sphere p ac king exp onen t ˜ E psp , A is replaced b y 4 min p Y | X K ∈ W K D ( ˜ p Y X K | S W ˜ p S | W k p Y | X K p K X | S W p S | p W ); (6.2) 4 This can be traced b ac k to (10.15), where p y | x K is now replaced with p Y | X K in the asymptotic ex pression for the prob ab ility of the cond itional type clas s T yx K | sw . 26 denote b y ˜ E memoryless psp , A the corresp onding p seudo spher e pac king exp onent. The dive rgences in (6.1) and (6.2) coi n cide when p Y | X K = ˜ p Y | X K , thus (6.2) is u pp er-b ounded b y (6.1). S ince p Y | X K = ˜ p Y | X K is feasible f or P Y X K | S W of (5.10), w e conclud e that ˜ E memoryless psp , A ≤ ˜ E psp , A of (5.11). Hence the false-negativ e e rr or exp onen ts in the memoryless case are upp er-b ounded b y those of Th eorem 5.2. This p henomenon is similar to r esults in [14]: due to the use of RM co des, the col lud ers’ optimal strategy is a nearly-memoryless strategy , but they are precluded from using a truly memoryless strat egy b ecause that w ould violate the hard co nstr ain t p y | x K ∈ W K . In the memoryless case, the worst conditional typ e (whic h determines the false-nega tive err or exp onen ts) migh t b e suc h that p y | x K / ∈ W K . 7 Pro of of Con ve rse Under Detect-All Criterion 7.1 Pro of of Theorem 3.5 The encod er generates marked copies x m = f N ( s , v , m ) for 1 ≤ m ≤ 2 N R and the decod er outpu ts an e stimated coalitio n g N ( y , s , v ) ∈ { 1 , · · · , 2 N R } ⋆ . By Lemma 3 .3 , it suffices to prov e the claim for the comp oun d class of memoryless c hannels W K of (2.31). Let K b e the siz e of the coalition and ( f N , g N ) a sequence of length- N , rate- R co d es. W e sh o w that for any such sequence of cod es, reliable deco ding of the fingerprints is p ossible only if R ≤ e C all ( D 1 , W K ) under the detect-all criterion. Step 1. A low er b ound on error pr obabilit y is obtained when a help er provides some in formation to the deco der. Here th e help er inf orms the decod er that the coalit ion size is K . There are 2 N R K ≤ 2 K N R p ossible coalitio ns of s ize K . W e repr esen t a coalitio n as M K , { M 1 , · · · , M K } , where M k , k ∈ K = { 1 , 2 , · · · , K } , a re assumed to b e d ra wn i.i.d. uniform ly 5 from { 1 , · · · , 2 N R } . W e similarly write X k , x M k , k ∈ K , and X K , { X 1 , · · · , X K } . The comp onen t of X K at p osition t ∈ { 1 , · · · , N } is denoted by X K ,t , { X 1 t , · · · , X K t } . Assumin g memoryless collusion c hann el p Y | X K ∈ W K is in effect, the join t p.m.f. of ( M K , S , V , X K , Y ) is giv en b y p M K S V X K Y = p N S p V Y k ∈ K ( p M k 1 { X k = f N ( S , V , M k ) } ) p N Y | X K . (7.1) Define the random v ariables Q t = { V , S j , j 6 = t } ∈ V N × S N − 1 for 1 ≤ t ≤ N . By assumption, S t and Q t are indep endent, and X k t , k ∈ K , are conditionally i.i.d. give n ( S t , Q t ) = ( S , V ). How ever, note that X k t , 1 ≤ k ≤ K , are generally conditionally dep endent giv en ( S t , V ) alone. The join t p.m.f. of ( S t , Q t , X K ,t , Y t ) is p S t p Q t Y 1 ≤ k ≤ K p X kt | S t Q t p Y | X K , 1 ≤ t ≤ N (7.2) where the conditional p.m.f. p X kt | S t Q t is the same for al l k ∈ K . No w define a time-sharing random v ariable T , uniformly distrib u ted o ver { 1 , · · · , N } , and in dep enden t of the other random v ariables. 5 Capacit y could b e hig her if there w ere constrain ts on the formatio n of coalitions, for instance if the u sers form social netw orks [25]. 27 Let X K , X K ,T ∈ X K , Y , Y T ∈ Y , S , S T ∈ S , W , ( Q T , T ) ∈ W , V N × S N − 1 × { 1 , · · · , N } . (7.3) By (7.2) and (7.3), the co de f N and the r andom v ariables S , V , M K induce an empirical p.m.f. p X K whic h can b e view ed as a fun ction of f N . The join t p.m.f. of ( S, W , X K , Y ) is p S p W Y k ∈ K p X k | S W ! p Y | X K (7.4) where the conditional p .m .f . p X k | S W is the s ame for all k ∈ K . Moreo ver D 1 ≥ E " 1 N N X t =1 d ( S t , X k t ) # = E d ( S, X k ) , k ∈ K . Hence p X K W | S b elongs to the set P X K W | S ( p S , L, D 1 ) of (3.5 ), with L = |W | = N × V N × | S | N . Step 2. Our single-lette r expressions are d eriv ed from the follo wing inequalit y , whic h is v alid for all A ⊆ K and p Y | X K ∈ W K : I ( M A ; Y | S , V ) ( a ) = I ( X A ; Y | S , V ) = I ( X A ; Y | X K \ A , S , V ) + I ( X A ; X K \ A | S , V ) | {z } =0 − I ( X A ; X K \ A | Y , S , V ) ( b ) ≤ I ( X A ; Y | X K \ A , S , V ) = H ( Y | X K \ A , S , V ) − H ( Y | X K , S , V ) ( c ) = H ( Y | X K \ A , S , V ) − H ( Y | X K ) ( d ) = N X t =1 H ( Y t | Y t − 1 , X K \ A , S , V ) − N X t =1 H ( Y t | X K ,t ) ( e ) ≤ N X t =1 H ( Y t | X K \ A ,t , S , V ) − N X t =1 H ( Y t | X K ,t ) ( f ) = N X t =1 H ( Y t | X K \ A ,t , S t , Q t ) − N X t =1 H ( Y t | X K ,t , S t , Q t ) = N X t =1 I ( X A ,t ; Y t | X K \ A ,t , S t , Q t ) = N I ( X A ; Y | X K \ A , S, W ) (7.5) where (a) is due to the data processing inequalit y and the f act that X A is a function of ( M A , S , V ), (b) h olds b ecause the co dewo rd s { X k , 1 ≤ k ≤ K } are m utually indep endent giv en ( S , V ), (c) b ecause ( S , V ) → X K → Y forms a Mark o v c hain, (d) is obtained using th e c hain ru le for en tropy 28 and the fact that the collusion c hann el is memoryless, (e) holds b ecause conditioning r educes en tropy , and (f ) b ecause ( S , V ) = ( S t , Q t ) → X K ,t → Y t forms a Mark o v c hain. Step 3. Under co llusion c hannel p Y | X K ∈ W K , let P all e ( p Y | X K ) = P r [ ˆ K 6 = K ] b e the decod ing error probabilit y of the detect-all d eco der. The follo w in g inequalities h old for ev er y sub s et A of K and for every p Y | X K : | A | N R ( a ) = H ( M A ) ( b ) = H ( M A | S , V ) = H ( M A | Y , S , V ) + I ( M A ; Y | S , V ) ≤ H ( M K | Y , S , V ) + I ( M A ; Y | S , V ) ( c ) ≤ 1 + P all e ( p Y | X K ) · K N R + I ( M A ; Y | S , V ) (7.6) where (a) h olds b ecause M A is u niformly distributed o ve r { 1 , · · · , 2 | A | N R } , (b) b ecause M A and ( S , V ) are in dep enden t, and (c) b ecause of F ano’s inequalit y . F or the error probabilit y P all e ( p Y | X K ) to v anish for eac h p Y | X K ∈ W K , we need R ≤ lim inf N →∞ min p Y | X K ∈ W K min A ⊆ K 1 N | A | I ( M A ; Y | S , V ) . (7.7) W e hav e min p Y | X K ∈ W K min A ⊆ K 1 N | A | I ( M A ; Y | S , V ) ( a ) ≤ min p Y | X K ∈ W K min A ⊆ K 1 | A | I ( X A ; Y | X K \ A , S, W ) ( b ) ≤ max p X K W | S ∈ P X K W | S ( p S ,L ( N ) ,D 1 ) min p Y | X K ∈ W K min A ⊆ K 1 | A | I ( X A ; Y | X K \ A , S, W ) ≤ sup L →∞ max p X K W | S ∈ P X K W | S ( p S ,L,D 1 ) min p Y | X K ∈ W K min A ⊆ K 1 | A | I ( X A ; Y | X K \ A , S, W ) ( c ) ≤ lim L →∞ max p X K W | S ∈ P X K W | S ( p S ,L,D 1 ) min p Y | X K ∈ W K min A ⊆ K 1 | A | I ( X A ; Y | X K \ A , S, W ) (7.8) where (a) is due to (7.5), (b) to the fact that p X K W | S giv en in (7.4) b elongs to the set P X K W | S ( p S , L, D 1 ) defined in (3.5), with L = L ( N ) = N × V N × |S | N , and (c) b ecause the supremand is nondecreasing in L . Com binin g (7.7) and (7.8), w e obtain R ≤ lim L →∞ max p X K W | S ∈ P X K W | S ( p S ,L,D 1 ) min p Y | X K ∈ W K ( p X K ) min A ⊆ K 1 | A | I ( X A ; Y | X K \ A , S, W ) = lim L →∞ C all L ( D 1 , W K ) = e C all ( D 1 , W K ) (7.9) whic h concludes the pro of of Theorem 3.5. ✷ 29 7.2 Pro of of Corollary 3.6 By assumption, here the coalitio n is fair and W K = W fair K dep ends on the join t t yp e p x K of the colluders’ fingerprint ed sequences. W e denote this join t type b y Z ∈ Z = P [ N ] X K to make th e notation more compact. Note that Z is a fu n ction of ( S , V , K ) and that the cardin alit y o f Z is at most ( N + 1) |X | K . Since the c hannel p Y | X K selected b y the coalition ma y dep end on Z , w e indicate this dep end ency explicitly by represent ing t h e channel as p Y | X K Z and the set o f feasible channels as f W fair K = { p Y | X K Z : p Y | X K ,Z = z ∈ W fair K ( z ) , ∀ z ∈ Z } . (7.10) By Lemma 3.3, it suffices to pro ve the claim for the comp ound class of memoryless channels W fair K ( p x K ). Define the s et P X K W S ( p S , L, D 1 ) , p S p X K W | S : p X K W | S ∈ P X K W | S ( p S , L, D 1 ) and slice it into the follo wing d isjoin t collec tion of sets: ∀ z ∈ Z : P X K W S ( p S , L, D 1 , z ) , { p X K W S ∈ P X K W S ( p S , L, D 1 ) : p X K = z } . (7. 11) The error probability of the deco der is n ot increased if a help er r ev eals th e join t typ e Z . The en tropy of Z is at most log |Z | ≤ |X | K log( N + 1) . F ano’s inequalit y (7.6) applied to A = K b ecomes K N R = H ( M K | S , V ) ≤ H ( M K , Z | S , V ) = |X | K log( N + 1) + H ( M K | S , V , Z ) ≤ |X | K log( N + 1) + 1 + P all e ( p Y | X K Z ) · K N R + I ( M K ; Y | S , V , Z ) . (7.12) Analogously to (7.5), the follo wing single-let ter expression holds for eve r y z ∈ Z a nd p Y | X K ∈ W K ( z ): I ( M K ; Y | S , V , Z = z ) = I ( X K ; Y | S , V , Z = z ) = H ( Y | S , V , Z = z ) − H ( Y | X K , S , V , Z = z ) ( a ) = H ( Y | S , V , Z = z ) − H ( Y | X K , Z = z ) ( b ) = N X t =1 H ( Y t | Y t − 1 , S , V , Z = z ) − N X t =1 H ( Y t | X K ,t , Z = z ) ≤ N X t =1 H ( Y t | S , V , Z = z ) − N X t =1 H ( Y t | X K ,t , Z = z ) ( c ) = N X t =1 H ( Y t | S t , Q t , Z = z ) − N X t =1 H ( Y t | X K ,t , S t , Q t , Z = z ) = N X t =1 I ( X K ,t ; Y t | S t , Q t , Z = z ) = N I ( X K ; Y | S, W , Z = z ) (7.13) = N I p X K W S | Z = z p Y | X K ( X K ; Y | S, W ) 30 where (a) holds b ecause ( S , V ) → ( X K , Z ) → Y forms a Marko v c hain, (b) b ecause the collusion c hannel r emains memoryless eve n when conditioned on Z , and (c) b ecause ( S , V ) = ( S t , Q t ) → ( X K , Z ) → Y t forms a Mark o v c hain for eac h 1 ≤ t ≤ N . F or the error probabilit y P all e ( p Y | X K Z ) to v anish for eac h p Y | X K Z ∈ f W fair K , w e need R ( a ) ≤ lim inf N →∞ min p Y | X K Z ∈ f W fair K 1 N K I ( M K ; Y | S , V , Z ) ( b ) ≤ lim inf N →∞ min p Y | X K Z ∈ f W fair K 1 K I p S W X K Z p Y | X K Z ( X K ; Y | S, W , Z ) ≤ lim N →∞ max p Z ∈ P Z max { p X K W S | Z = z ∈ P X K W S ( p S ,L ( N ) ,D 1 ,z ) } z ∈Z min { p Y | X K ∈ W fair K ( z ) } z ∈Z 1 K X z ∈Z p Z ( z ) I p X K W S | Z = z p Y | X K ( X K ; Y | S, W ) ( c ) = lim N →∞ max z ∈Z max p X K W S | Z = z ∈ P X K W S ( p S ,L ( N ) ,D 1 ,z ) min p Y | X K ∈ W fair K ( z ) 1 K I p X K W S | Z = z p Y | X K ( X K ; Y | S, W ) = lim N →∞ max p X K W S ∈ P X K W S ( p S ,L ( N ) ,D 1 ) min p Y | X K ∈ W fair K ( p X K ) 1 K I p X K W S p Y | X K ( X K ; Y | S, W ) ≤ lim L →∞ max p X K W S ∈ P X K W S ( p S ,L,D 1 ) min p Y | X K ∈ W fair K ( p X K ) 1 K I ( X K ; Y | S, W ) = lim L →∞ max p X K W | S ∈ P X K W | S ( p S ,L,D 1 ) min p Y | X K ∈ W fair K ( p X K ) 1 K I ( X K ; Y | S, W ) = e C all ( D 1 , W fair K ) (7.14) where (a) follo ws fr om (7.1 2 ), (b) from (7.13), and (c) from the fact that in a g ame in whic h Z is a v ariable c hosen by the first pla y er (here the em b edder) but kno w n to all pla y ers (embedd er, co llud - ers, receiv er), there can b e no adv antag e in randomizing Z , i.e., a deterministic c hoice of Z suffices to achiev e the v alue of the ma xmin game. More formally , equalit y (c) is a direct consequence of the fo llo win g sim p le lemma, using the fingerprint distributor’s feasible set P X K W S ( p S , L ( N ) , D 1 , z ) in plac e of F ( z ), the c olluders ’ feasible set W fair K ( z ) in place of G ( z ), and the conditional m utual information I ( X K ; Y | S, W ) as th e p a yoff function φ . Th is concludes the pro of. ✷ Lemma 7.1 Consider a discr ete set Z and two families of sets F ( z ) , z ∈ Z and G ( z ) , z ∈ Z indexe d by the elements of Z . Then the fol lowing game with p ayoff function φ : V = m ax p ∈ P Z max { f z ∈F ( z ) } z ∈Z min { g z ∈G ( z ) } z ∈Z X z ∈Z p ( z ) φ ( f z , g z ) (7.15) admits a pur e-str ate gy solution, i.e., the maximum over th e p.m.f. p ∈ P Z is achieve d by deter- ministic p . Pr o of : W rite f = { f z } z ∈Z and g = { g z } z ∈Z where eac h f z ∈ F ( z ) and g z ∈ G ( z ). F or eac h ( p, f ), let g ∗ ( p, f ) ac hiev e the m inim um o v er g of the function P z ∈Z p ( z ) φ ( f z , g z ). F or eac h p , let f ∗ ( p ) ac hieve the maxim u m o v er f of the fu nction P z ∈Z p ( z ) φ ( f z , g ∗ z ( p, f )). By insp ectio n of (7.15), the follo win g elemen tary pr op erties hold for eac h p ∈ P Z and z ∈ Z : 31 • Th e min imizing g ∗ z dep ends on ( p, f ) via f z only , and we denote this limite d dep endency explicitly b y g ∗ z ( f z ). The minimizer satisfies φ ( f z , g ∗ z ( f z )) = min g z ∈G ( z ) φ ( f z , g z ) . ( 7.16) • Th e maximizing f ∗ z do es n ot dep end on p and satisfies φ ( f ∗ z , g ∗ z ( f ∗ z )) = max f z ∈F ( z ) min g z ∈G ( z ) φ ( f z , g z ) . (7.17) Substituting (7.17) in to (7.15), w e obtain V = max p ∈ P Z X z ∈Z p ( z ) φ ( f ∗ z , g ∗ z ( f ∗ z )) = max z ∈Z φ ( f ∗ z , g ∗ z ( f ∗ z )) = max z ∈Z max f z ∈F ( z ) min g z ∈G ( z ) φ ( f z , g z ) whic h prov es the claim. ✷ 8 Pro of of Theorem 3.7: Con v erse Under Detect-One Criterion By Lemma 3.3, it suffices to prov e the claim for the comp ound class of m emoryless c hann els W K . Let M N = { 1 , 2 , · · · , 2 N R } . F or notational sim p licit y , assume t wo colluders ( K = 2). Th e pro of extends straig htforw ardly to la rger coalit ions. F or the detect-one criterion, it is sufficient to co nsid er deco ding r ules that return exactly one user in dex, i.e., the deco ding rule is a mapping g N : Y N × S N × V N → M N . (8.1) Indeed, consider momen tarily a more ge ner al deco der that returns a list of ac cused users . By definition of the d etec t-one and false-p ositiv e err or criteria, correct d eco ding o ccurs if and only if the l ist size L ≥ 1 and all u sers on t h e output list are g uilty . One can then construct a new decoder of the form (8.1) that r eturns an arb itrary user if L = 0 and an arb itrary element of the original size- L list if L ≥ 1. Th e correct-decoding eve nt for the original deco der is also a correct-decoding ev en t for the new deco der, and so the new deco der has at le ast the same probabilit y of correct deco ding as th e original deco der. 6 In the fol lowing, we only consider deco ding rules of the form (8.1). Denote b y D i ( s , v ) the decodin g region for user i , i.e., y ∈ D i ( s , v ) ⇔ g N ( y , s , v ) = i, ∀ i ∈ M N . 6 The new deco der performs better than t h e original one i n the ev ent that the list has si ze L ≥ 2 and consists of a mix of guilty and innocent users (an error is t hen d eclared for the orig inal decoder), and the list member selected by new decoder is guilt y (a correct decision is made). 32 The d ecoding regions form a partition of Y N . The a verag e p robabilit y of correct deco ding is giv en b y P c ( f N , g N , p Y | X 1 X 2 ) = P r [ g N ( Y , S , V ) ∈ K ] = 1 2 2 N R X i,j ∈M N X s ∈S N p N S ( s ) X v ∈V N p V ( v ) X y ∈D i ( s ,v ) ∪D j ( s ,v ) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) . (8.2) Without loss of optimalit y w e assume that randomly mo dulated co des (Def. 2.2, Prop. 2.2) are used. The pro of is organized along thirteen steps. An arbitrarily small parameter δ > 0 is chosen. Step 1 defines for eac h ( s , v ) a set of b ad co dew ords that hav e exp onential ly many neigh b ors within Hamming balls of radius N δ cen tered at these cod ew ords. The remaining co dew ords constitute the so-called go o d set. Step 2 in tro du ces a dense, nested f amily W fair K,δ of su bsets of W fair K indexed b y δ and consisting of “nice c hannels”. An equiv alence is giv en b et we en Hamming distance of tw o co dew ords and statist ical distinguishabilit y of the output of any p Y | X 1 X 2 ∈ W fair K,δ . F or clarit y of the exp osition we initially derive error pr obabilities assu ming that the go od set is l arge and th at b oth colluders are assigned cod ewords in the go o d set; these assumptions are su bsequen tly relaxed in Steps 10 and 11. All the error probabilities up to that p oin t are conditioned on S , V . Step 3 in tro duces the basic random v ariables us ed in th e pro of. Step 4 do es (a) d efine a reference pro duct conditional p.m.f for Y giv en S , V ; (b) asso ciate a conditional self-information to eac h pair of co dew ords; and (c) define a large set of co dew ord pairs whose cond itional sel f-inf orm atio n is within δ 2 of their av erage v alue. Step 5 defines a typica l set for Y giv en S , V , and K . Step 6 sho ws th at t ypical sets for go o d co dew ord pairs ha ve w eak o verlap. Step 7 defines a co llection of refin ed t ypical sets for Y with b ounded o ve rlap. S tep 8 defin es a t ypical set for the host sequence S . Step 9 up p er b ounds the conditional pr obabilit y of correct deco ding in terms of a mutual information. Step 10 deriv es an analo gous result conditioned on the ev ent th at b oth col lud ers a re assigned codewords from the bad set. Step 11 combines the b ounds for go od and bad co dew ords into a single b ound . Step 12 remo v es the conditioning on S , V and upp er b ounds the u nconditional p robabilit y of correct deco ding (8.2) in terms of a m utual information. Step 13 deriv es an upp er b ound on that mutual information and sho ws that an y ac hiev able rate R m ust b e less than half of th e upp er b ound . The pro of is completed by letting δ ↓ 0. Step 1 . Denote b y d H ( x , x ′ ) = N X t =1 1 { x t 6 = x ′ t } the Hamming d istance b etw een tw o sequences x and x ′ in X N , and by M j ( s , v , δ ) = { k ∈ M N : d H ( x j ( s , v ) , x k ( s , v )) ≤ N δ } , j ∈ M N , s ∈ S N , v ∈ V N , 0 ≤ δ ≤ 1 (8.3) the set of indices k for the co dew ords x k ( s , v ) that are within Hamming distance N δ of co dewo rd x j ( s , v ), and b y M j ( s , v , δ ) = |M j ( s , v , δ ) | the cardinalit y of this set. The fun ctio n M j ( s , v , · ) − 1 is akin to a cum ulativ e distance distribu tion. It is nond ecreasing, with M ( s , v , 0) ≥ 1 and M ( s , v , 1) = 33 2 N R . Note that f or S = ∅ and random co des ov er X = { 0 , 1 } , M j ( V , δ ) − 1 is a random v ariable whose exp ecta tion v anishes as N → ∞ for δ < δ GV ( R ), the Gilb ert-V arshamo v distance at rate R [24]. Denote b y M goo d N ( s , v , δ ) = { j ∈ M N : |M j ( s , v , δ ) | ≤ 2 N 3 √ δ } , v ∈ V N , 0 ≤ δ ≤ 1 (8.4) a set of “go od ” ind ices j (there are at m ost 2 N 3 √ δ co dew ords within Hamming distance N δ of co dew ord x j ( s , v )), and b y M bad N ( s , v , δ ) = M N \ M goo d N ( s , v , δ ) = { j ∈ M N : |M j ( s , v , δ ) | > 2 N 3 √ δ } (8.5) the complemen tary set of “bad” ind ices. Note that an y co de with normalized minim um distance δ min > 0 sat ifies M j ( s , v , δ ) ≡ 1 a n d th us M goo d N ( s , v , δ ) ≡ M N for all 0 < δ < δ min . Ho wev er the d eriv ations in Steps 2—8 of the pro of mak e no assu mption on the size of the sets M goo d N ( s , v , δ ). Finally , for the RM cod es considered here, the s ets (8.3), (8.4), and (8.5) dep end on the host sequence s only via its t yp e p s . Step 2 . Channels p Y | X 1 X 2 that satisfy p Y | X 1 X 2 ( y | x 1 , x 2 ) = 0 for some y , x 1 , x 2 or p Y | X 1 X 2 ( ·| x 1 , x 2 ) ≡ p Y | X 1 X 2 ( ·| x ′ 1 , x ′ 2 ) for some ( x 1 , x 2 ) 6 = ( x ′ 1 , x ′ 2 ) require sp ecial handling. T o this end, we d efine th e follo wing n ested family of subsets of W fair K , indexed b y 0 < δ ≤ 1 / |Y | : W fair K,δ = n p Y | X 1 X 2 ∈ W fair K : p Y | X 1 X 2 ( y | x 1 , x 2 ) ≥ δ , ∀ y , x 1 , x 2 , , δ ≤ D ( p Y | X 1 = x 1 ,X 2 = x 2 k p Y | X 1 = x ′ 1 ,X 2 = x ′ 2 ) ≤ log δ − 1 , ∀ ( x 1 , x 2 ) 6 = ( x ′ 1 , x ′ 2 ) o (8.6) where the upp er b ound on divergence is implied by the lo wer b ound on p Y | X 1 X 2 . By con tin uity of the correct-decoding pr obabilit y fu nctional (8.2 ) and by the definition (8.6), w e ha v e e C one ( D 1 , W fair K,δ ) ↓ e C one ( D 1 , W fair K ) as δ ↓ 0 . Denote b y D ij k , 1 N N X t =1 D ( p Y | X 1 = x it ( s ,v ) ,X 2 = x j t ( s ,v ) k p Y | X 1 = x it ( s ,v ) ,X 2 = x kt ( s ,v ) ) (8.7) the normalized conditional K ullbac k-Leibler divergence (giv en s , v ) b et ween the distributions on Y induced by co d ew ord pairs ( i, j ) and ( i, k ), resp ectiv ely . It follo ws from (8.6) that for eac h p Y | X 1 X 2 ∈ W fair K,δ , d H ( x j ( s , v ) , x k ( s , v )) ≤ N δ ⇒ D ij k ≤ δ log δ − 1 . (8.8) and d H ( x j ( s , v ) , x k ( s , v )) > N δ ⇒ D ij k > δ 2 Con versely , D ij k ≤ δ 2 ⇒ d H ( x j ( s , v ) , x k ( s , v )) ≤ N δ . (8.9) 34 When D ij k is small, w e sa y that the c o dewords x j ( s , v ) and x k ( s , v ) are ne arly i ndistingui shable at the c h annel output. F or an y ( s , v , i, j, k ) and p Y | X 1 X 2 ∈ W fair K,δ , (8.8) and (8.9) d escrib e an equiv alence b et ween s tatistical distin gu ish abilit y of tw o co dew ord s and Hamming distance. Step 3 . T o analyze the pr obabilit y of correct deco ding conditioned on the ev en t K ∈ ( M goo d N ( S , V , δ )) 2 that b oth c olluders are assigned g o o d co d ew ords, w e define the f ollo wing rand om v ariables. Define Q t = { V , S j , j 6 = t } o ver the alphab et Q N , V N × S N − 1 . W e ha v e ( S t , Q t ) = ( S , V ) for eac h 1 ≤ t ≤ N . S in ce the host sequence t yp e p s together with an y q t , 1 ≤ t ≤ N , uniquely determines s t and th us the pair ( s , v ) (and vice-v ersa), w e ma y also u se ( p s , q ) as an equiv alent represent ation of the pair ( s , v ). Define a time-sharing random v ariable T u niformly distributed o v er { 1 , 2 , · · · , N } and indep endent of the other ran d om v ariables. Let S = S T , Q = Q T , Y = Y T , and X i = x i,T ( S , V ) , ∀ i ∈ M N . Define the rand om v ariable X drawn uniformly from { X i , i ∈ M goo d N ( S , V , δ ) } . T he conditional p.m.f of X given S , V , T is giv en b y p X | S V T ( x | s , v , t ) = 1 |M goo d N ( s , v , δ ) | X i ∈M goo d N ( s ,v,δ ) 1 { x it ( s , v ) = x } , ∀ x, s , v , t . (8.10) Giv en s , v , t , the conditional distribution of ( X i , X j , Y ) is p X i X j | S = s ,V = v ,T = t p Y | X 1 X 2 where p X i X j | S V T ( x 1 , x 2 | s , v , t ) = 1 { x it ( s , v ) = x 1 , x j t ( s , v ) = x 2 } , x 1 , x 2 ∈ X , i, j ∈ M goo d N ( s , v , δ ) , 1 ≤ t ≤ N . (8 .11) By (8.10), the a verag e of (8.11) o v er i, j ∈ M goo d N ( s , v , δ ) is the pro duct conditional p.m.f 1 |M goo d N ( s , v , δ ) | 2 X i,j ∈M goo d N ( s ,v,δ ) p X i X j | S V T ( x 1 , x 2 | s , v , t ) = p X | S V T ( x 1 | s , v , t ) p X | S V T ( x 2 | s , v , t ) . (8.12) Step 4 . T h e conditional distribu tion of eac h Y t , 1 ≤ t ≤ N , giv en ( S , V ) and K ∈ ( M goo d N ( S , V , δ )) 2 , is given by p Y t | S V ( y | s , v ) = p Y | S V T ( y | s , v , t ) = 1 |M goo d N ( s , v , δ ) | 2 X i,j ∈M goo d N ( s ,v,δ ) p Y | X 1 X 2 ( y | x it ( s , v ) , x j t ( s , v )) (8.13) = 1 |M goo d N ( s , v , δ ) | 2 X i,j ∈M goo d N ( s ,v,δ ) X x 1 ,x 2 ∈X p X i X j | S V T ( x 1 , x 2 | s , v , t ) p Y | X 1 X 2 ( y | x 1 , x 2 ) = X x 1 ,x 2 ∈X p X | S V T ( x 1 | s , v , t ) p X | S V T ( x 2 | s , v , t ) p Y | X 1 X 2 ( y | x 1 , x 2 ) . (8.14) F or any p erm utation π of { 1 , 2 , · · · , N } w e ha v e p Y π ( t ) | S V ( y | π ( s ) , v ) = p Y t | S V ( y | s , v ) for all RM co des. The pro duct conditional distrib ution r ( y | s , v ) , N Y t =1 p Y t | S V ( y t | s , v ) (8.15) 35 is strongly exc hangeable for eac h v ∈ V N and will b e used a s a r efe r enc e c onditional p.m.f for Y giv en S , V in the s equel. W e also d efi ne th e follo win g conditional self-informations (i.e., m utual information for coa lition ( i, j ) av eraged ov er Y t (resp. Y ) and conditioned on S , V ): θ ij,t ( s , v ) , X y t ∈Y p Y | X 1 X 2 ( y t | x it ( s , v ) , x j t ( s , v )) log p Y | X 1 X 2 ( y t | x it ( s , v ) , x j t ( s , v )) p Y t | S V ( y t | s , v ) , = D ( p Y | X 1 = x it ( s ,v ) ,X 2 = x j t ( s ,v ) k p Y t | S = s ,V = v ) (8.16) θ ij ( s , v ) , 1 N N X t =1 θ ij,t ( s , v ) = 1 N N X t =1 X y t ∈Y p Y | X 1 X 2 ( y t | x it ( s , v ) , x j t ( s , v )) log p Y | X 1 X 2 ( y t | x it ( s , v ) , x j t ( s , v )) p Y t | S V ( y t | s , v ) = 1 N N X t =1 X x 1 ,x 2 ,y 1 { x it ( s , v ) = x 1 , x j t ( s , v ) = x 2 } p Y | X 1 X 2 ( y | x 1 , x 2 ) log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y t | S V ( y | s , v ) = X t,x 1 ,x 2 ,y p T ( t ) p X i X j | S V T ( x 1 , x 2 | s , v , t ) p Y | X 1 X 2 ( y | x 1 , x 2 ) log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y | S V T ( y | s , v , t ) . (8.17) Since p Y | X 1 X 2 is symmetric, the expr essions (8.16) and (8.17) are symmetric in i and j . The a verage of θ ij ( s , v ) o v er all ( i, j ) ∈ ( M goo d N ( s , v , δ )) 2 is the cond itional mutual inf ormation I ( s , v ) , 1 |M goo d N ( s , v , δ ) | 2 X i,j ∈M goo d N ( s ,v,δ ) θ ij ( s , v ) (8.18) = X t,x 1 ,x 2 ,y p T ( t ) p X | S V T ( x 1 | s , v , t ) p X | S V T ( x 2 | s , v , t ) p Y | X 1 X 2 ( y | x 1 , x 2 ) log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y | S V T ( y | s , v , t ) = I p T p 2 X | S V T p Y | X 1 X 2 ( X 1 X 2 ; Y | S = s , V = v, T ) . (8.19) F or RM co des, b oth θ ij ( s , v ) and I ( s , v ) dep end on s on ly via its t yp e p s . Since th e a ve rage v alue of θ ij ( s , v ) is I ( s , v ), there ma y n ot b e to o many pairs ( i, j ) for wh ich θ ij ( s , v ) is well ab o v e the mean. More precisely , ther e exists a symmetric s u bset ˜ A ( s , v , δ ) ⊆ ( M goo d N ( s , v , δ )) 2 of s ize | ˜ A ( s , v , δ ) | ≥ δ 2 δ 2 + I ( s , v ) |M goo d N ( s , v , δ ) | 2 ≥ δ 2 δ 2 + log |Y | |M goo d N ( s , v , δ ) | 2 suc h that ˜ A ( s , v , δ ) dep ends on s only via p s and ( i, j ) ∈ ˜ A ( s , v , δ ) ⇒ θ ij ( s , v ) ≤ I ( s , v ) + δ 2 . (8.20) This claim is seen to hold by con trap ositiv e. If there existed a subset ˜ A c ( s , v , δ ) of size I ( s ,v ) δ 2 + I ( s ,v ) |M goo d N ( s , v , δ ) | 2 or large r s u c h that ∀ ( i, j ) ∈ ˜ A c ( s , v , δ ) : θ ij ( s , v ) > I ( s , v ) + δ 2 36 w e wo uld hav e X i,j ∈M N θ ij ( s , v ) > ( I ( s , v ) + δ 2 ) | ˜ A c ( s , v , δ ) | ≥ |M goo d N ( s , v , δ ) | 2 I ( s , v ) whic h w ould con tradict (8.18). 7 Moreo v er, the in terv al [0 , log |Y | ] is co v ered by the fi n ite collection of in terv als Θ l , l δ 2 2 , ( l + 1) δ 2 2 , l = 0 , 1 , · · · , 2 log |Y | δ 2 , l max of width δ 2 / 2, and at le ast one of these in terv als must conta in man y θ ij ( s , v ). S p ecifically , for some in teger 0 ≤ l < l max there m ust exist a subset A ( s , v , δ ) ⊆ ˜ A ( s , v , δ ) with the follo wing prop erties: ( i, j ) ∈ A ( s , v , δ ) ⇒ θ ij ( s , v ) ∈ Θ l ⇒ | θ ij ( s , v ) − I ( s , v ) | ≤ δ 2 4 , (8.21) I ( s , v ) , l + 1 2 δ 2 2 ≤ I ( s , v ) ≤ log |Y | , (8.22) A ( s , v , δ ) is symmetric with size at least equal to |A ( s , v , δ ) | ≥ δ 2 2 log |Y | | ˜ A ( s , v , δ ) | ≥ δ 4 2 log |Y | ( δ 2 + log |Y | ) |M goo d N ( s , v , δ ) | 2 ≥ δ 4 4 log 2 |Y | |M goo d N ( s , v , δ ) | 2 , (8.23) and A ( s , v , δ ) dep ends on s only via p s . T o summarize, the subset A ( s , v , δ ) ⊆ ( M goo d N ( s , v , δ )) 2 has size n early equal to |M goo d N ( s , v , δ ) | 2 and consists of the in dices of the co dew ord pairs whose conditional self-information θ ij ( s , v ) is close to some I ( s , v ) ≤ I ( s , v ). Recalling (8.19) and th e equiv alence of the repr esen tations ( S , V ) and ( S, Q ), we defin e I ( p ′ S , q ) , I p T p ′ S p 2 X | S QT p Y | X 1 X 2 ( X 1 , X 2 ; Y | S, Q = q , T ) ∀ p ′ S ∈ P S , q ∈ Q N (8.24) whic h is a linear f u nctional of p ′ S and coincides w ith I ( s , v ) in (8.19) when p ′ S = p s . Step 5 . Define the follo wing subset of Y N : e T δ ( s , v , i, j ) , ( y ∈ Y N : 1 N N X t =1 log p Y | X 1 X 2 ( y t | x it ( s , v ) , x j t ( s , v )) p Y t | S V ( y t | s , v ) − θ ij ( s , v ) ≤ δ 2 8 ) (8.25) whic h satisfies the s y m metry pr op ert y e T δ ( s , v , i, j ) = e T δ ( s , v , j, i ) and th e letter p ermutat ion- in v ariance prop ert y (for RM co des) y ∈ e T δ ( s , v , i, j ) ⇒ π ( y ) ∈ e T δ ( π ( s ) , v , i, j ) 7 As mentioned b y a review er, the claim could alternatively b e prov en b y applicatio n of Mar ko v’s inequalit y . 37 for an y p erm utation π of { 1 , 2 , · · · , N } . W e show that e T δ ( s , v , i, j ) is a t ypical set for Y conditioned on S = s , V = v , and K = { i, j } , in the follo wing sense: P r [ Y / ∈ e T δ ( s , v , i, j ) | S = s , V = v , K = { i, j } ] ≤ 64 log 2 δ N δ 4 , ∀ s , v , i, j (8.26) v anishes as N → ∞ . Indeed we ma y r ewrite (8.26) as P r [ Y / ∈ e T δ ( s , v , i, j ) | S = s , V = v , K = { i, j } ] = P r | ˆ θ ij ( s , v ) − θ ij ( s , v ) | ≥ δ 2 8 S = s , V = v , K = { i , j } (8.27) where ˆ θ ij ( s , v ) , 1 N N X t =1 log p Y | X 1 X 2 ( Y t | x it ( s , v ) , x j t ( s , v )) p Y t | S V ( Y t | s , v ) . (8.28 ) Since Y t , 1 ≤ t ≤ N , are co nd itionally indep end en t giv en S , V , K , ˆ θ ij ( s , v ) is th e a v erage of N random v ariables that are conditionally indep endent giv en S = s , V = v , K = { i, j } . Recal ling (8.16), the conditional exp ectation of these random v ariables is given b y E Y t | S V K log p Y | X 1 X 2 ( Y t | x it ( s , v ) , x j t ( s , v ) p Y t | S V ( Y t | s , v ) = θ ij,t ( s , v ) , 1 ≤ t ≤ N , (8.29) and a v eraging (8.29) o v er t yields E Y | S V K ( ˆ θ ij ( s , v )) = θ ij ( s , v ). The conditional v ariances of these random v ariables are ζ t ( s , v , i, j ) , v ar Y t | S V K log p Y | X 1 X 2 ( Y t | x it ( s , v ) , x j t ( s , v ) p Y t | S V ( Y t | s , v ) , 1 ≤ t ≤ N . (8.30) By our assumption (8.6) that p Y | X 1 X 2 ( y | x 1 , x 2 ) ≥ δ for ev ery y , x 1 , x 2 , the argum en t of the log ab o v e is in the range [1 /δ, δ ]. Hence ζ t ( s , v , i, j ) ≤ lo g 2 δ , and v ar Y | S V K ( ˆ θ ij ( s , v )) = 1 N 2 N X t =1 ζ t ( s , v , i, j ) ≤ log 2 δ N . By Cheb ysh ev’s inequalit y , the probabilit y of (8.27) is upp er-b ounded by E Y | S V K [( ˆ θ ij ( s , v ) − θ ij ( s , v )) 2 ] ( δ 2 / 8) 2 = v ar Y | S V K ( ˆ θ ij ( s , v )) ( δ 2 / 8) 2 ≤ 64 log 2 δ N δ 4 whic h establishes (8.26). Step 6 . Define the follo wing subsets of M goo d N ( s , v , δ ), indexed b y i ∈ M goo d N ( s , v , δ ): M A− goo d N ( s , v , i, δ ) , { j ∈ M goo d N ( s , v , δ ) : ( i, j ) ∈ A ( s , v , δ ) } (8.31) whic h dep end on s only via p s . 38 W e sho w that th e typica l sets e T δ ( s , v , i, j ) , j ∈ M A− goo d N ( s , v , i, δ ), ha ve weak ov erlap for an y fixed s , v , i . Define th e o verla p factor of the goo d sets at Y = y : M δ ( y , s , v , i ) , X k ∈M A− goo d N ( s ,v,i, δ ) 1 { y ∈ e T δ ( s , v , i, k ) } . (8.32) W e show there exists δ ∗ > 0 such that P r [ M δ ( Y , s , v , i ) > 2 N 3 √ δ | S = s , V = v , K = { i, j } , Y ∈ e T δ ( s , v , i, j )] < 1 N ∀ N > δ − 8 , δ < δ ∗ . (8.33 ) T o do so, define th e normalized loglik eliho od r atio ˆ D ij k ( Y ) = 1 N log p N Y | X 1 X 2 ( Y | x i ( s , v ) , x j ( s , v )) p N Y | X 1 X 2 ( Y | x i ( s , v ) , x k ( s , v )) . (8.3 4) If Y ∈ e T δ ( s , v , i, j ) ∩ e T δ ( s , v , i, k ) for some j, k ∈ M A− goo d N ( s , v , i, δ ), then ˆ D ij k ( Y ) ≤ | ˆ D ij k ( Y ) | ( a ) ≤ | θ ij ( s , v ) − θ ik ( s , v ) | + 2 × δ 2 8 ( b ) ≤ 3 δ 2 4 (8.35) where in equ alit y (a) follo ws from (8.2 5 ) and (b) from (8.2 1 ) and the fact that b oth ( i, j ) and ( i, k ) are in A ( s , v , δ ). If j ∈ M A− goo d N ( s , v , i, δ ) and Y ∈ e T δ ( s , v , i, j ), it follo ws from (8.32) and (8.35 ) th at M δ ( Y , s , v , i ) = X k ∈M A− goo d N ( s ,v,i, δ ) 1 { Y ∈ e T δ ( s , v , i, j ) ∩ e T δ ( s , v , i, k ) } ≤ ˆ ζ ( Y ) 1 { Y ∈ e T δ ( s , v , i, j ) } (8.36) where w e ha ve defin ed the random v ariable ˆ ζ ( Y ) , X k ∈M A− goo d N ( s ,v,i, δ ) 1 ˆ D ij k ( Y ) ≤ 3 δ 2 4 . (8.3 7) No w recalling the definition of the norm alized diverge nce D ij k in (8.7), define ζ , X k ∈M A− goo d N ( s ,v,i, δ ) 1 { D ij k ≤ δ 2 } ( a ) ≤ X k ∈M A− goo d N ( s ,v,i, δ ) 1 { d H ( x j ( s , v ) , x k ( s , v )) ≤ N δ } ≤ X k ∈M N 1 { d H ( x j ( s , v ) , x k ( s , v )) ≤ N δ } ( b ) = |M j ( s , v , δ ) | ( c ) ≤ 2 N 3 √ δ (8.38) 39 where inequalit y (a) follo w s from (8.9 ), (b ) from (8.3), and (c) from (8.4). In App endix C, we sho w that ˆ ζ ( Y ) ≤ ζ with probability approac hing 1 as N → ∞ , and more sp ecifically , P r [ ˆ ζ ( Y ) > ζ | S = s , V = v , K = { i, j } ] < |X | 3 |Y | ( N + 1 ) |X | 3 2 − N δ 7 (8.39) for all δ smaller than some δ ∗∗ > 0. Th en there exists some δ ∗ ∈ (0 , δ ∗∗ ) suc h that P r [ M δ ( Y , s , v , i ) > 2 N 3 √ δ | S = s , V = v , K = { i, j } , Y ∈ e T δ ( s , v , i, j )] ( a ) ≤ P r [ ˆ ζ ( Y ) > 2 N 3 √ δ | S = s , V = v , K = { i, j } , Y ∈ e T δ ( s , v , i, j )] ≤ P r [ ˆ ζ ( Y ) > 2 N 3 √ δ | S = s , V = v , K = { i, j } P r [ Y ∈ e T δ ( s , v , i, j ) | S = s , V = v , K = { i , j } ] ( b ) ≤ P r [ ˆ ζ ( Y ) > 2 N 3 √ δ | S = s , V = v , K = { i, j } 1 − 64 l og 2 δ N δ 4 ( c ) ≤ P r [ ˆ ζ ( Y ) > ζ | S = s , V = v , K = { i, j } 1 − 64 l og 2 δ N δ 4 ( d ) < 1 N ∀ N > δ − 8 , δ < δ ∗ where (a) follo ws from (8.36), (b) from (8 .26 ), (c) from (8.38), and (d) from (8. 39 ). This establishes (8.33). Step 7. W e now prun e th e t ypical sets e T δ ( s , v , i, j ) to exclude the p oints y that lie w ithin more than 2 N 3 √ δ of th e typical sets. F or eac h s , v , i, j , define the pruned t yp ical set T δ ( s , v , i, j ) , { y ∈ e T δ ( s , v , i, j ) : M δ ( y , s , v , i ) ≤ 2 N 3 √ δ } . ( 8.40) It follo ws from (8.40) and (8.32) that ∀ y , s , v , i : X j ∈M A− goo d N ( s ,v,i, δ ) 1 { y ∈ T δ ( s , v , i, j ) } = M δ ( y , s , v , i ) ≤ 2 N 3 √ δ . (8.41) The pr uned set T δ ( s , v , i, j ) is still t ypical for Y conditioned on S = s , V = v , and K = { i, j } b ecause P r [ Y / ∈ T δ ( s , v , i, j ) | S = s , V = v , K = { i, j } ] ( a ) = P r [ Y / ∈ e T δ ( s , v , i, j ) | S = s , V = v , K = { i, j } ] + P r [ M δ ( Y , s , v , i ) > 2 N 3 √ δ | Y ∈ e T δ ( s , v , i, j ) , S = s , V = v , K = { i, j } ] ( b ) ≤ 64 log 2 δ N δ 4 + 1 N ( c ) ≤ 72 log 2 δ N δ 4 , ∀ s , v , i, j, N > δ − 8 , δ < δ ∗ (8.42) 40 where (a) follo ws from the definition (8.40), (b) from the inequ aliti es (8.26) and (8.33), and (c) holds b ecause δ < 1 2 . Step 8 . Define the typica l set for S in the v ariational- distance sense: T δ , { s : d V ( p s , p S ) ≤ δ } . (8.43) W e hav e the inequalit y P r [ S / ∈ T δ ] = X T s : d V ( p s ,p S ) >δ P N S ( T s ) ( a ) ≤ X p s : d V ( p s ,p S ) >δ 2 − N D ( p s k p S ) ( b ) ≤ ( N + 1) |S | max p s : d V ( p s ,p S ) >δ 2 − N D ( p s k p S ) ( c ) ≤ ( N + 1) |S | max p s : D ( p s k p S ) >δ 2 / ln 4 2 − N D ( p s k p S ) ≤ ( N + 1) |S | 2 − N δ 2 / ln 4 (8.44) where in (a) w e ha ve used the upp er b ound of [12, p. 32] on th e p robabilit y of a t yp e class, in (b) the fact that the n u m b er of t yp e classes T s is at most ( N + 1) |S | [12, p. 29] a nd in ( c) Pinsk er’s inequalit y D ( p k q ) ≥ d 2 V ( p, q ) / ln 4 [12, p . 58]. Applying successiv ely (8.24) and (8.4 3 ), w e ha v e | I ( p s , q ) − I ( p S , q ) | = X s ∈S ( p s ( s ) − p S ( s )) I ( X 1 , X 2 ; Y | S = s, Q = q , T ) ≤ δ max s ∈S I ( X 1 , X 2 ; Y | S = s, Q = q , T ) ≤ δ log |Y | , ∀ s ∈ T δ , q ∈ Q N . (8.45) Step 9 . Give n f N , g N , p Y | X 1 ,X 2 , s , v , w e will b e int erested in sev eral conditional pr obabilitie s that correct decod ing o ccurs in conjun ction with the t ypical even t Y ∈ T δ ( s , v , K ). Define the follo wing short hands: P c ( i, j | s , v ) = P r [correct deco ding and Y ∈ T δ ( s , v , i, j ) | S = s , V = v , K = { i, j } ] = X y ∈ T δ ( s ,v,i, j ) ∩ ( D i ( s ,v ) ∪D j ( s ,v )) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) , (8.46) P goo d c ( s , v ) = P r [correct decodin g and Y ∈ T δ ( s , v , K ) | S = s , V = v , K ∈ ( M goo d N ( s , v , δ )) 2 ] = 1 |M goo d N ( s , v , δ ) | 2 X i,j ∈M goo d N ( s ,v,δ ) P c ( i, j | s , v ) . (8.47) Note that P c ( i, j | s , v ) dep ends on s only via its t yp e p s (b ecause RM codes are used). 41 The conditional probabilit y of correct deco ding giv en ( s , v ) and th e even t that b oth colluders are assigned go o d co dew ords is P goo d c ( s , v ) = P r [correct d ecodin g | S = s , V = v , K ∈ ( M goo d N ( s , v , δ )) 2 ] ( a ) ≤ P goo d c ( s , v ) + P r [ Y / ∈ T δ ( s , v , K ) | S = s , V = v , K ∈ ( M goo d N ( s , v , δ )) 2 ] ( b ) ≤ P goo d c ( s , v ) + 72 log 2 δ N δ 4 , ∀ N > δ − 8 , δ < δ ∗ (8.48) where (a) an d (b) follo w from (8.4 7 ) and (8.42), resp ectiv ely . F or an y subset B ⊆ ( M goo d N ( s , v , δ )) 2 , p ossibly d ep enden t on s , v , we also define P c ( B | s , v ) , 1 |B | X ( i,j ) ∈B P c ( i, j | s , v ) . (8.49) Com binin g (8.47) and (8.49), we ha v e P goo d c ( s , v ) = |B | |M goo d N ( s , v , δ ) | 2 P c ( B | s , v ) + 1 − |B | |M goo d N ( s , v , δ ) | 2 ! P c (( M goo d N ( s , v , δ )) 2 \ B | s , v ) | {z } ≤ 1 ≤ 1 − |B | |M goo d N ( s , v , δ ) | 2 (1 − P c ( B | s , v )) . (8.50) Applying this in equalit y to B = A ( s , v , δ ) and using the cardinalit y b ound (8.23) yields P goo d c ( s , v ) ≤ 1 − δ 4 4 log 2 |Y | (1 − P c ( A ( s , v , δ ) | s , v )) . (8.51) Un til this p oint , no assumption h as m ade on the size of the set M goo d N ( s , v , δ ). W e now assume (this assumption w ill b e relaxed in Steps 10 and 11 of the pro of ) that |M goo d N ( s , v , δ ) | ≥ 2 N [ R − δ 2 / 3] . (8.52) Hence (8.23 ) imp lies |A ( s , v , δ ) | > 2 N (2 R − δ 2 ) for N larger than some N 0 ( δ ). Then we o btain the 42 sphere-pac king inequalit y P c ( A ( s , v , δ ) | s , v ) ( a ) = 1 |A ( s , v , δ ) | X ( i,j ) ∈A ( s ,v, δ ) P c ( i, j | s , v ) ( b ) = 1 |A ( s , v , δ ) | X ( i,j ) ∈A ( s ,v, δ ) X y ∈T δ ( s ,v,i, j ) ∩ ( D i ( s ,v ) ∪D j ( s ,v )) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) ( c ) = 2 |A ( s , v , δ ) | X ( i,j ) ∈A ( s ,v, δ ) X y ∈T δ ( s ,v,i, j ) ∩D i ( s ,v ) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) < 2 − N (2 R − δ 2 )+1 X ( i,j ) ∈A ( s ,v, δ ) X y ∈T δ ( s ,v,i, j ) ∩D i ( s ,v ) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) ( d ) ≤ 2 − N (2 R − δ 2 )+1 2 N R X i =1 X j ∈M A− goo d N ( s ,v,i, δ ) X y ∈T δ ( s ,v,i, j ) ∩D i ( s ,v ) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) ( e ) ≤ 2 − N (2 R − δ 2 )+1 2 N R X i =1 X j ∈M A− goo d N ( s ,v,i, δ ) 2 N [ θ ij ( s ,v )+ δ 2 / 8] X y ∈T δ ( s ,v,i, j ) ∩D i ( s ,v ) r ( y | s , v ) ( f ) ≤ 2 − N (2 R − δ 2 − I ( s ,v ) − δ 2 − δ 2 / 8)+1 2 N R X i =1 X j ∈M A− goo d N ( s ,v,i, δ ) X y ∈T δ ( s ,v,i, j ) ∩D i ( s ,v ) r ( y | s , v ) = 2 − N (2 R − I ( s ,v ) − 17 δ 2 / 8)+1 2 N R X i =1 X j ∈M A− goo d N ( s ,v,i, δ ) X y ∈D i ( s ,v ) 1 { y ∈ T δ ( s , v , i, j ) } r ( y | s , v ) ( g ) ≤ 2 − N (2 R − I ( s ,v ) − 17 δ 2 / 8)+1 2 N R X i =1 X y ∈D i ( s ,v ) r ( y | s , v ) | {z } =1 2 N 3 √ δ < 2 − N (2 R − I ( s ,v ) − 3 δ 2 − 3 √ δ ) ∀ N > 8 7 δ 2 (8.53) where (a) follo w s from (8.46), (b) from (8.4 6 ), (c) holds b ecause the deco ding sets D i ( s , v ) are disjoin t, and b ecause of the symmetry of p Y | X 1 X 2 , A ( s , v , δ ), and T δ ( s , v , i, j ); (d) holds b ecause A ( s , v , δ ) ⊆ { ( i, j ) : j ∈ M A− goo d N ( s , v , i, δ ) } ; (e) follo ws from (8.25) and (8.15); and (f ) and (g ) follo w from (8.20) and (8.41 ), r esp ectiv ely . Com binin g (8.48), (8.51), and (8.53) yields P goo d c ( s , v ) ≤ 1 − δ 4 4 log 2 |Y | (1 − 2 − N (2 R − I ( s ,v ) − 3 δ 2 − 3 √ δ ) ) + 72 log 2 δ N δ 4 , ∀ N > δ − 8 , δ < δ ∗ . (8.54) Observe that for all 2 R > I ( s , v ) + δ log |Y | + 3 δ 2 + 3 √ δ , the conditional correct-deco ding proba- bilit y P goo d c ( s , v ) . 1 − δ 4 4 log 2 |Y | is b ounded a wa y from 1 as N → ∞ , under the assumption that |M goo d N ( s , v , δ ) | ≥ 2 N [ R − δ 2 / 3] . 43 Step 10. W e no w relax the assumption (8.52). If (8.52) does not hold, then |M bad N ( s , v , δ ) | ≥ 2 N R (1 − 2 − N δ 2 / 3 ). F urther a ssu me b oth colluders are assigned bad cod ew ords. (This last assu m ption is relaxed in S tep 11). Analogo usly to (8.48), we d efi ne the conditional probability of correct deco ding, P bad c ( s , v ) , P r [correct deco ding | S = s , V = v, K ∈ ( M bad N ( s , v , δ )) 2 ] . (8.55) W e show that this probabilit y v anishes as N → ∞ , for an y rate R > 0 and any c hannel p Y | X 1 X 2 . In particular, for δ < 1 4000 w e ha ve P bad c ( s , v ) ≤ 2 N δ 2 ∀ s , v . (8.56) The pro of of (8.56) uses the same tec hn iqu es as in Steps 3, 4, 5, 9, and is giv en in App endix D. Step 11 . W e fi nally consider the most general scenario in whic h a mix of go o d co dewo rd s and bad co dewo rd s is u sed, and the mix dep end s on ( s , v ). By a pp licati on of the inequalit y P r [ A ] ≤ P r [ A ∩ B ] + P r [ B c ] for an y t wo ev en ts A and B , w e obtain th e f ollo win g t wo upp er b ound s on the correct-decodin g probabilit y , conditioned on S = s and V = v : P c ( s , v ) = P r [correct deco ding | S = s , V = v ] ≤ P r [ K / ∈ ( M goo d N ( s , v , δ )) 2 ] + P goo d c ( s , v ) P r [ K / ∈ ( M bad N ( s , v , δ )) 2 ] + P bad c ( s , v ) . (8.57) Let β N ( s , v ) , 2 − N R |M bad N ( s , v , δ ) | ∈ [0 , 1] b e the fraction of bad co dew ords . S u bstituting (8.56) in to (8.57) yields P c ( s , v ) ≤ min n 1 − (1 − β N ( s , v )) 2 + P goo d c ( s , v ) , 1 − β 2 N ( s , v ) o + 2 N δ 2 . The first argumen t of min {· , ·} increases with β N and the second decreases. The v alue of β N ( s , v ) that maximizes th e expr ession ab o ve is the equalizer, 1 2 [1 − P goo d c ( s , v )], an d thus we obtain P c ( s , v ) ≤ 1 − [1 − P goo d c ( s , v )] 2 4 + 2 N δ 2 , ∀ β N ( s , v ) ∈ [0 , 1] . (8.58) Hence if P goo d c ( s , v ) is b ounded a wa y from 1, so is P c ( s , v ). In particular, if (8.52) do es not hold, then β N ( s , v ) ≥ 1 − 2 − N δ 2 / 3 , and P c ( s , v ) ≤ 1 − (1 − 2 − N δ 2 / 3 ) 2 + 2 N δ 2 whic h v anish es as N → ∞ for all R > 0. Con versely , if (8.52) holds, then so do es the upp er b ound (8.54) on P goo d c ( s , v ). Using th is upp er b ound together w ith the in equalit y ( b − a ) 2 ≥ b 2 − 2 ac whic h is v alid for 0 < a < b < c , w e obtain [1 − P goo d c ( s , v )] 2 ≤ δ 4 4 log 2 |Y | | {z } = c (1 − 2 − N (2 R − I ( s ,v ) − 3 δ 2 − 3 √ δ ) ) | {z } = b − 72 log 2 δ N δ 4 | {z } = a 2 ≥ δ 8 16 log 4 |Y | (1 − 2 − N (2 R − I ( s ,v ) − 3 δ 2 − 3 √ δ ) ) 2 − 36 log 2 δ N log 2 |Y | , ∀ N > δ − 8 , δ < δ ∗ . 44 Com binin g this inequalit y with (8.58) yields P c ( s , v ) ≤ 1 − δ 8 64 log 4 |Y | (1 − 2 − N (2 R − I ( s ,v ) − 3 δ 2 − 3 √ δ ) ) 2 + 9 log 2 δ N log 2 |Y | + 2 N δ 2 , ∀ N > δ − 8 , δ < δ ∗ . (8.59) Step 12. W e shall maximize the u pp er b ound of (8.5 9 ) ov er s ∈ T δ and v ∈ V N , which amounts to maximizing I ( s , v ) in the exp onen t. In view of the equiv alence of the repr esen tations ( s , v ) and ( p s , q ), and r ecall ing (8.24), w e ha ve max s ∈ T δ max v ∈V N I ( s , v ) = max p s : d V ( p s ,p S ) ≤ δ max q ∈Q N I ( p s , q ) ≤ max q ∈Q N I ( p S , q ) + δ log |Y | (8.60) where the inequ alit y follo ws from (8.45). The probabilit y of correct decod in g satisfies P c ( f N , g N , p Y | X 1 X 2 ) = X s ∈S N p N S X v ∈V N p V ( v ) P c ( s , v ) ≤ P r [ S / ∈ T δ ] + max s ∈T δ max v ∈V N P c ( s , v ) ( a ) ≤ ( N + 1) |S | 2 − N δ 2 / ln 4 + 9 log 2 δ N log 2 |Y | + 2 N δ 2 +1 − δ 8 64 log 4 |Y | 1 − max s ∈T δ max v ∈V N 2 − N (2 R − I ( s ,v ) − 3 δ 2 − 3 √ δ ) ( b ) ≤ ( N + 1) |S | 2 − N δ 2 / ln 4 + 9 log 2 δ N log 2 |Y | + 2 N δ 2 +1 − δ 8 64 log 4 |Y | (1 − 2 − N (2 R − max q I ( p S ,q ) − δ log |Y |− 3 δ 2 − 3 √ δ ) ) , ∀ N > δ − 8 , δ < δ ∗ (8.61) where (a) follo ws from (8. 44 ) and (8 .59 ) and (b ) from (8.60). Thus for all 2 R > max q I ( p S , q ) + δ log |Y | + 3 δ 2 + 3 √ δ , P c ( f N , g N , p Y | X 1 X 2 ) . 1 − δ 8 64 log 4 |Y | as N → ∞ (8.62) is b ounded aw ay from 1. Step 13. W e no w b oun d max q I ( p S , q ) in (8 .61 ) b y a quant ity that d oes n ot dep end on N . Since I p Q p T p S p 2 X | S QT p Y | X 1 X 2 ( X 1 , X 2 ; Y | S, Q, T ) = X q ∈Q N p Q ( q ) I ( p S , q ) , 45 w e ha ve max q ∈Q N I ( p S , q ) = m ax p Q ∈ P Q I p Q p T p S p 2 X | S QT p Y | X 1 X 2 ( X 1 , X 2 ; Y | S, Q, T ) ( a ) ≤ max p QT ∈ P QT I p QT p S p 2 X | S QT p Y | X 1 X 2 ( X 1 , X 2 ; Y | S, Q, T ) ( b ) = max p W ∈ P W I p W p S p 2 X | S W p Y | X 1 X 2 ( X 1 , X 2 ; Y | S, W ) (8.63) where (a) h olds b ecause th e maximization is o v er a larger domain ( p QT is no w unconstrained o ver W N , Q N × { 1 , 2 , · · · , N } ), and (b) is obtained b y defining the random v ariable W = ( Q, T ) ∈ W N . Moreo v er max p W ∈ P W I ( X 1 , X 2 ; Y | S, W ) ≤ su p L →∞ max p W ∈ P W I ( X 1 , X 2 ; Y | S, W ) = lim L →∞ max p W ∈ P W I ( X 1 , X 2 ; Y | S, W ) (8.64) where the alphab et for W in the right side is { 1 , 2 , · · · , L } , and the suprem um and the limit are equal b ecause the sup remand is nondecreasing in L . Com binin g (8.61), (8.63), and (8.64), w e conclude th at P ∗ c ( f N , g N , W fair K,δ ) , min p Y | X 1 X 2 ∈ W fair K,δ P c ( f N , g N , p Y | X 1 X 2 ) (8.65) is b ounded aw ay from 1 as N → ∞ for all δ ∈ (0 , δ ∗ ) and all sequences of codes ( f N , g N ) of rate R > 1 2 " min p Y | X 1 X 2 ∈ W fair K,δ lim L →∞ max p W ∈ P W I ( X 1 , X 2 ; Y | S, W ) + δ log |Y | + 3 δ 2 + 3 √ δ # . Letting δ ↓ 0, we conclude that reliable decod ing is p ossible only if R ≤ min p Y | X 1 X 2 ∈ W fair K lim L →∞ max p W ∈ P W 1 2 I ( X 1 , X 2 ; Y | S, W ) = lim L →∞ min p Y | X 1 X 2 ∈ W fair K max p W ∈ P W 1 2 I ( X 1 , X 2 ; Y | S, W ) = lim L →∞ max p W ∈ P W min p Y | X 1 X 2 ∈ W fair K 1 2 I ( X 1 , X 2 ; Y | S, W ) where the s eco n d equalit y holds by application of the minimax theorem: the mutual information functional is linear (hence conca ve) in p W and con v ex in p Y | X 1 X 2 , and the domains of p W and p Y | X 1 X 2 are conv ex. Since the ab o ve in equalit y holds for all feasible p X | S W , w e obtain R ≤ lim L →∞ max p X 1 X 2 W | S ∈ P X 1 X 2 W | S ( p S ,L,D 1 ) min p Y | X 1 X 2 ∈ W fair K 1 2 I ( X 1 , X 2 ; Y | S, W ) = e C one ( D 1 , W fair K ) . 46 This concludes th e p ro of. 8 ✷ 9 Pro of of Theorem 4.1 W e deriv e the error exp onents for the th r eshold decision rule (4.1 ). By symm etry of th e co deb o ok construction, the error probabilities will b e indep end ent of K . Without loss of optimalit y , we assume that K = K = { 1 , 2 , · · · , K } . Recalling that W = { 1 , 2 , · · · , L } , denote by P [ N ] X W ( L ) the set of join t t yp es o v er X × W . Define P [ N ] Y X K | W ( p xw , W K , R, L, m ) = p yx K | w : p x K | w ∈ M ( p x | w ) , p y | x K ∈ W K ( p x K ) , I ( x m ; y | w ) ≤ R ˜ E psp ,m,N ( R, L, p xw , W K ) = min p yx K | w ∈ P [ N ] Y X K | W ( p xw , W K ,R,L,m ) D ( p yx K | w k p y | x K p K x | w | p w ) (9.1) ˜ E psp ,N ( R, L, p xw , W K ) = max m ∈ K ˜ E psp ,m,N ( R, L, p xw , W K ) , (9.2) ˜ E psp ,N ( R, L, p xw , W K ) = min m ∈ K ˜ E psp ,m,N ( R, L, p xw , W K ) (9.3) and E psp ,N ( R, L, W K ) = max p xw ∈ P [ N ] X W ( L ) ˜ E psp , 1 ,N ( R, L, p xw , W fair K nom ) . (9.4) Denote b y p ∗ xw the maximizer ab o ve (whic h implicitly dep end s on R ) and b y T ∗ xw the corresp onding t yp e class. Let E psp ,N ( R, L, W K ) = ˜ E psp ,N ( R, L, p ∗ xw , W K ) , ( 9.5) E psp ,N ( R, L, W K ) = ˜ E psp ,N ( R, L, p ∗ xw , W K ) . ( 9.6) The expr essions (9.1)—(9.6) d iffer from (4.3)—(4.8) in that the optimizations are p erformed o ve r t yp es instead of general p .m .f .’s. W e h a ve lim N →∞ E psp ,N ( R, L, W K ) = E psp ( R, L, W K ) (9.7) lim N →∞ E psp ,N ( R, L, W K ) = E psp ( R, L, W K ) (9.8) b y (2.11) and con tin uity of the dive rgence and mutual-information f u nctionals. With the j oin t t yp e class T ∗ xw sp ecified belo w (9.4), w e no w restate the co ding and decoding sc heme. 8 The case of more than tw o colluders would be treated a s fo llows . Say there are three co lluders. The definition of the restricted class of c han n els (8.6) w ould b e extended as fo llows: W fair K,δ = n p Y | X 1 X 2 X 3 ∈ W fair K : p Y | X 1 X 2 X 3 ( y | x 1 , x 2 , x 3 ) ≥ δ , ∀ y , x 1 , x 2 , x 3 , , δ ≤ D ( p Y | X 1 = x 1 ,X 2 = x 2 ,X 3 = x 3 k p Y | X 1 = x ′ 1 ,X 2 = x ′ 2 ,X 3 = x ′ 3 ) ≤ log δ − 1 , ∀ ( x 1 , x 2 , x 3 ) 6 = ( x ′ 1 , x ′ 2 , x ′ 3 ) o Then the notions of equ iv alence of H amming distance and sta tistical i nd istinguishabilit y of t w o co dewo rds a pp ly similarly to the case of tw o colluders, as does the k ey p roperty of b ounded ov erlap of th e typical sets, and the deriv ation of the sphere-p acking i neq ualit y in Step 9. 47 Co deb o ok . A rand om constant-c omp osition co de C ( w ) = { x m , 1 ≤ m ≤ 2 N R } is generate d for ea ch w ∈ T ∗ w b y dra wing 2 N R sequences indep endent ly and uniformly from the conditional t y p e class T ∗ x | w . Enco der . A sequen ce w is drawn un iformly from T ∗ w and shared with the receiv er. User m is assigned cod ew ord x m from C ( w ), for 1 ≤ m ≤ 2 N R . Deco der . Giv en ( y , w ), the deco d er p lace s us er m on the guilt y list if I ( x m ; y | w ) > R + ∆. Collusion Channel. The r an d om cod e describ ed ab o v e is a RM co de. By Prop. 2.2 , it is suf- ficien t to restrict our att entio n to strongly exc hangeable collusion channels for the error probability analysis. Recall from (2.1 6 ) and (2.1 7 ) that for suc h c hann els, p Y | X K ( ˜ y | x K ) = P r [ T y | x K ] | T y | x K | ≤ 1 | T y | x K | 1 { p y | x K ∈ W K ( p x K ) } , ∀ ˜ y ∈ T y | x K . (9.9) Error Exp onen ts. The deriv ation is b ased on the follo wing t wo asymptotic equalities wh ic h are sp ecial cases of (10.12) and (10.16 ) pr ov en later. 1) Fix w and y and d ra w x uniform ly from a fixed conditional typ e class T ∗ x | w , ind ep enden tly of y . Th en for an y ν ≥ 0, P r [ I ( x ; y | w ) ≥ ν ] ≤ 2 − N ν . (9.10) 2) Fix w , dra w x m , m ∈ K , i.i.d. uniform ly from a fi xed conditional t yp e cla ss T x | w , and then dra w Y uniformly from th e t yp e cla ss T y | x K . F or an y str ongly exc h an geable collusion channel, for an y m ∈ K and ν ≥ 0 , we hav e P r [ I ( x m ; y | w ) ≤ ν ] . = exp 2 {− N ˜ E psp ,m,N ( ν, L, p xw , W K ) } . (9. 11) (i). F alse Pos itives. A false p ositive o ccurs if ∃ m / ∈ K : I ( x m ; y | w ) > R + ∆ . (9.12) By construction of the c o deb o ok, x m is cond itionally indep endent of y giv en w , f or ea ch m / ∈ K . There are at most 2 N R − K p ossible v alues for m in (9.12). Hence the probabilit y of false p ositive s, conditioned on th e j oin t t yp e class T yx K w , is P FP ( T yx K w , W K ) = P r [ ∃ m / ∈ K : I ( x m ; y | w ) > R + ∆] ( a ) ≤ (2 N R − K ) P r X [ I ( x ; y | w ) > R + ∆] ( b ) ≤ 2 N R 2 − N ( R +∆) = 2 − N ∆ (9.13) where (a) follo ws fr om the un ion b ound, and (b) f r om (9.1 0 ) with ν = R + ∆. Av eraging o v er all t yp e classes T yx K w , we obtain P FP ≤ 2 − N ∆ , from whic h (4.9) follo ws. (ii). Detect-One Error Criterion . (Miss all colluders.) W e fi r st deriv e the error exp onent for the ev ent that the deco der m iss es a sp ecific colluder m ∈ K . An y coalitio n ˆ K that contai n s m fails the test (4.1), i.e., for an y such ˆ K , I ( x m ; y | w ) ≤ R + ∆ . (9.14) 48 The pr obabilit y of the miss- m ev en t, giv en the joint t yp e p ∗ xw , is therefore up p er-b ounded by the probabilit y of the ev en t (9.14). F rom (9.11) we obtain p miss − m ( p ∗ xw , W K ) ≤ P r [ I ( x m ; y | w ) ≤ R + ∆] ( a ) ≤ exp 2 {− N ˜ E psp ,m,N ( R + ∆ , L, p ∗ xw , W K ) } . (9.15) The miss-all even t is the in tersection of the miss- m even ts ov er m ∈ K . Its p robabilit y is p miss − all ( p ∗ xw , W K ) = P r " \ m ∈ K { miss m | p ∗ xw } # ≤ m in m ∈ K p miss − m ( p ∗ xw , W K ) ( a ) . = m in m ∈ K exp 2 {− N ˜ E psp ,m,N ( p ∗ xw , R + ∆ , L, W K ) } ( b ) . = exp 2 {− N E psp ,N ( R + ∆ , L, W K ) } ( c ) . = exp 2 {− N E psp ( R + ∆ , L, W K ) } w ere (a) follo w s from (9.15), (b) from (9.2) and (9.5), and (c) fr om (9.7). (iii). Detect-All Error C riterion. (Miss Some C olluders.) The miss-some even t is the u nion of the miss - m ev ents o v er m ∈ K . Its probabilit y is p miss − some ( p ∗ xw , W K ) = P r " [ m ∈ K { miss m | p ∗ xw } # ≤ X m ∈ K p miss − m ( p ∗ xw , W K ) . = max m ∈ K exp 2 {− N ˜ E psp ,m,N ( R + ∆ , L, p ∗ xw , W K ) } ( a ) = exp 2 {− N E psp ,N ( R + ∆ , L, W K ) } ( b ) . = exp 2 {− N E psp ( R + ∆ , L, W K ) } where (a) follo ws from (9.3) and (9.6), and (b) from (9.8). (iv). F air C ollusion Channels. Recall (4.2), restated here for conv enience: P Y X K | W ( p X W , W K , R, L, m ) , ˜ p Y X K | W : ˜ p X K | W ∈ M ( p X | W ) , ˜ p Y | X K ∈ W K ( ˜ p X K ) , I ˜ p Y X K | W p W ( X m ; Y | W ) ≤ R o , m ∈ K . The union of these sets o ve r m , P ∗ ( W K ) , \ m ∈ K P Y X K | W ( p X W , W K , R, L, m ) (9 .16) 49 is con v ex and p ermutati on-inv arian t b ecause so is W K , by assu mption. Combining (9.16 ), (4.2), and (4.3), we m a y w rite (4.4) as ˜ E psp ( R, L, p X W , W K ) = min ˜ p Y X K | W ∈ P ∗ ( W K ) D ( ˜ p Y X K | W k ˜ p Y | X K p K X | W | p W ) . (9.17) F or an y ˜ p Y X K | W ∈ P ∗ ( W K ) and p ermutation π of K , define the p erm uted conditional p.m.f. ˜ p π Y X K | W ( y , x K | w ) = ˜ p Y X K | W ( y , x π ( K ) | w ) and the p erm utation-a v eraged p.m.f. ˜ p fair Y X K | W = 1 K ! P π ˜ p π Y X K | W whic h a lso b elongs to t h e co nv ex set P ∗ ( W K ). W e similarly defi ne ˜ p π Y | X K and ˜ p fair Y | X K . Observe that D ( ˜ p π Y X K | W k ˜ p π Y | X K p K X | W | p W ) is indep endent of π . By conv exit y of Kullb ac k-Leibler divergence , this implies D ( ˜ p fair Y X K | W k ˜ p fair Y | X K p K X | W | p W ) ≤ 1 K ! X π D ( ˜ p π Y X K | W k ˜ p π Y | X K p K X | W | p W ) = D ( ˜ p Y X K | W k ˜ p Y | X K p K X | W | p W ) . (9.18) Therefore the m inim um in (9.17) is ac h ieved b y a p erm utation-in v arian t ˜ p Y X K | W = ˜ p fair Y X K | W , and the same min im um w ould ha ve b een obtained if W K had b een r eplaced with W fair K . Hence ˜ E psp ( R, L, p X W , W K ) = ˜ E psp ( R, L, p X W , W fair K ) . Substituting in to (4.7) and (4.1 0 ), w e obtain E one ( R, L, W K , ∆) = E one ( R, L, W fair K , ∆) . (v). The equalit y E one ( R, L, W fair K , ∆) = E all ( R, L, W fair K , ∆) is straigh tforw ard b ecause ˜ E psp ,m ( R, L, p X W , W fair K ) in (4 .3 ) i s the same for all m ∈ K , and th us E psp ( R, L, W fair K ) = E psp ( R, L, W fair K ). (vi). P ositiv e E rror Exp onents. F rom P art (v) ab ov e, we ma y r estrict our atte ntion to W K = W fair K . Cons id er any W = { 1 , · · · , L } and p W that is p ositiv e o v er its su pp ort set (if it is not, red u ce the v alue of L accordingly .) F or any m ∈ K , th e minimand in the expression (4.3) for ˜ E psp ,m ( R, L, p X W , W fair K ) is zero if and only if ˜ p Y X K | W = ˜ p Y | X K p K X | W , with ˜ p Y | X K ∈ W fair K ( ˜ p X K ) . Suc h ˜ p Y X K | W is feasible f or (4.2) if and only if ( p X W , ˜ p Y | X K ) is such th at I ( X m ; Y | W ) ≤ R . It is not feasible, and th us a p ositiv e exp onen t E one is guaran teed, if R < I ( X 1 ; Y | W ). Th e suprem um of all s uc h R is give n by (4.12) and is achiev ed b y letting ∆ → 0 and L → ∞ . ✷ 50 10 Pro of of Theorem 5.2 W e deriv e the error exp onents for the MPMI decision r ule (5.7). Again by s y m metry of the co deb o ok construction, the er r or probabilities will b e in d ep enden t of K . Without loss of optimalit y , we assume that K = K = { 1 , 2 , · · · , K } . W e hav e also defined W = { 1 , 2 , · · · , L } . Define for all A ⊆ K P [ N ] Y X K | S W ( p w , p s | w , p x | sw , W K , R, L, A ) = p yx K | sw : p x K | sw ∈ M ( p x | sw ) , p y | x K ∈ W K ( p x K ) , ◦ I ( x A ; yx K \ A | sw ) ≤ | A | R (10.1) ˘ E psp , A ,N ( R, L, p w , p s | w , p x | sw , W K ) = min p yx K | sw ∈ P [ N ] Y X K | S W ( p w , p s | w , p x | sw , W K ,R,L, A ) D ( p yx K | sw k p y | x K p K x | sw | p sw ) , (10.2) ˆ E psp , A ,N ( R, L, p w , p s | w , p x | sw , W K ) = D ( p s | w k p S | p w ) + ˘ E psp , A ,N ( R, L, p w , p s | w , p x | sw , W K ) = min p yx K | sw ∈ P [ N ] Y X K | S W ( p w , p s | w , p x | sw , W K ,R,L, A ) D ( p yx K | sw p s | w k p y | x K p K x | sw p S | p w ) , (10.3) ˆ E psp ,N ( R, L, p w , p s | w , p x | sw , W K ) = ˆ E psp , K ,N ( R, L, p w , p s | w , p x | sw , W K ) , (10.4) ˆ E psp ,N ( R, L, p w , p s | w , p x | sw , W K ) = min A ⊆ K ˆ E psp , A ,N ( R, L, p w , p s | w , p x | sw , W K ) , (10.5) E psp ,N ( R, L, D 1 , W K ) = max p w ∈ P [ N ] W min p s | w ∈ P [ N ] S | W max p x | sw ∈ P [ N ] X | S W ( p sw ,L,D 1 ) ˆ E psp , K ,N ( R, L, p w , p s | w , p x | sw , W fair K nom ) , (10.6) where the second equalit y in (10.3) is obtained by application of the c hain rule for div ergence. Denote b y p ∗ w and p ∗ x | sw the maximizers in (10.6), the latte r viewed as a function of p s | w . Moreo v er, b oth p ∗ w and p ∗ x | sw implicitly dep end on R and W fair K nom . Denote b y T ∗ w and T ∗ x | sw the corresp onding typ e an d conditional t yp e classes. Let E psp ,N ( R, L, D 1 , W K ) = min p s | w ˆ E psp ,N ( R, L, p ∗ w , p s | w , p ∗ x | sw , W K ) (10.7) E psp ,N ( R, L, D 1 , W K ) = min p s | w ˆ E psp ,N ( R, L, p ∗ w , p s | w , p ∗ x | sw , W K ) . (10.8) The exp onen ts (10.3)—(10.8) differ from (5.11)—(5.16) in that the optimizations are p er f ormed o v er conditional t yp es instead of general conditional p.m.f.’s. W e hav e lim N →∞ E psp ,N ( R, L, D 1 , W K ) = E psp ( R, L, D 1 , W K ) (10.9) lim N →∞ E psp ,N ( R, L, D 1 , W K ) = E psp ( R, L, D 1 , W K ) (10.10) b y (2.11) and con tin uity of the dive rgence and mutual-information f u nctionals. Co deb o ok . F or eac h w ∈ T ∗ w and s ∈ S N , a cod eb o ok C ( s , w ) = { x m , 1 ≤ m ≤ 2 N R } is generated b y drawing 2 N R random v ectors indep end en tly and uniformly from T ∗ x | sw . 51 Enco der . A sequen ce w is dra wn uniformly from T ∗ w and shared with the decoder. Giv en s and w , user m is assigned co dew ord x m ∈ C ( s , w ). Deco der . The decodin g r ule is the MPMI r ule of (5.7). Collusion C ha nnel. T h is random cod e is a RM co de, hence by app licat ion of Prop. 2.2, it is sufficien t to restrict our atte ntion to s trongly exchangea ble collusion c hannels. Error Probabilit y Analysis. T o analyze the error pr obabilit y f or our random-co ding sc heme under strongly exc hangeable collusion c hannels, we will aga in u s e the b ound (9.9) as well a s the follo wing three prop erties, whic h originate from the basic inequalities (1.1) and (1.2). 1) Fix ( s , w ) and z ∈ Z N , and d ra w x K = { x m , m ∈ K } i.i.d. u niformly f r om a c ond itional typ e class T x | sw , in dep enden tly of z . W e ha v e the asymptotic equalit y P r [ T x K | zsw ] = | T x K | zsw | | T x | sw | K . = 2 − N [ K H ( x | sw ) − H ( x K | zsw )] = 2 − N ◦ I ( x K ; z | sw ) (10.11 ) where the last equality is du e to (5.2). Then P r [ ◦ I ( x K ; z | sw ) ≥ ν ] = X T x K | zsw P r [ T x K | zsw ] 1 { ◦ I ( x K ; z | sw ) ≥ ν } . = X T x K | zsw 2 − N ◦ I ( x K ; z | sw ) 1 { ◦ I ( x K ; z | sw ) ≥ ν } . = max T x K | zsw 2 − N ◦ I ( x K ; z | sw ) 1 { ◦ I ( x K ; z | sw ) ≥ ν } ≤ 2 − N ν . (10.12 ) 2) Fix w and dra w s i.i.d. p S . W e ha ve [12] P r [ T s | w ] . = 2 − N D ( p s | w k p S | p w ) . (1 0.13) 3) Fix ( s , w ), draw x k , k ∈ K , i.i.d. un iformly fr om a conditional type class T x | sw , and then d r a w Y u niformly from a single conditional t yp e class T y | x K . W e ha ve P r [ T yx K | sw ] = P r [ T y | x K sw ] P r [ T x K | sw ] = | T y | x K sw | | T y | x K | | T x K | sw | | T x | sw | K . = 2 − N [ H ( y | x K ) − H ( y | x K sw )] 2 − N [ K H ( x | sw ) − H ( x K | sw )] = exp 2 − N [ I ( y ; sw | x K ) + ◦ I ( x 1 ; · · · ; x K | sw )] . (10.14 ) Consider the tw o terms in b rac k ets ab o ve. The fi rst one ma y b e written as I ( y ; sw | x K ) = D ( p ysw | x K k p y | x K p sw | x K | p x K ) = D ( p yswx K k p y | x K p swx K ) = D ( p yx K | sw k p y | x K p x K | sw | p sw ) 52 and the second one as ◦ I ( x 1 ; · · · ; x K | sw ) = D ( p x K | sw k p K x | sw | p sw ) . By applicati on of the c hain rule for diverge nce, the su m of these t w o terms is D ( p yx K | sw k p y | x K p K x | sw | p sw ). Substituting in to (10.14), we obtain P r [ T yx K | sw ] . = exp 2 n − N D ( p yx K | sw k p y | x K p K x | sw | p sw ) o . (10.15) In the deriv ation b elo w w e us e the shorthan d e ( p yx K | sw ) to represen t the exponential ab o v e, and fix T x | sw = T ∗ x | sw . F or an y feasible, strongly exc hangeable co llusion channel, for an y A ⊆ K and ν > 0, conditioning on w ∈ T ∗ w and s ∈ S N , w e ha v e P r ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ν ( a ) ≤ X feasible T yx K | sw P r [ T yx K | sw ] 1 { ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ν } ( b ) . = X feasible p yx K | sw e ( p yx K | sw ) 1 { ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ν } ( c ) . = max feasible p yx K | sw e ( p yx K | sw ) 1 { ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ν } = max p yx K | sw : p x K | sw ∈ M ( p ∗ x | sw ) , p y | x K ∈ W K e ( p yx K | sw ) 1 { ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ν } = max p yx K | sw : p x K | sw ∈ M ( p ∗ x | sw ) , p y | x K ∈ W K , ◦ I ( x A ; yx K \ A | sw ) ≤| A | ν e ( p yx K | sw ) ( d ) = max p yx K | sw ∈ P [ N ] Y X K | S W ( p ∗ w , p s | w , p ∗ x | sw , W K ,ν,L, A ) e ( p yx K | sw ) ( e ) = exp 2 n − N ˘ E psp , A ,N ( ν, L, p ∗ w , p s | w , p ∗ x | sw , W K ) o (10.16 ) where (a) follo ws f rom (9.9), (b) from (10.15), (c) from the fact that the num b er of conditional t yp es is p olynomial in N , (d) from (10.1), and (e) from (10.2). (i). F alse Pos itives. A false p ositive o ccurs if ˆ K \ K 6 = ∅ . By app lication of (5.8), w e ha v e ∀A ⊆ ˆ K : ◦ I ( x A ; yx ˆ K\A | sw ) > |A| ( R + ∆) . (10.17 ) Denote b y B the set of co llud er indices m ∈ K that are correctly iden tified by t he deco der, and b y A , ˆ K \ B the complement set, w hic h is comprised of all incorrectly accused users and has cardinalit y |A| ≥ 1. By construction of the cod eb ook, x A is indep endent of y and x B . The probabilit y of the ev en t (10.17) is upp er-b oun d ed b y the probabilit y of th e larger ev ent ∃A 6⊆ K , ∃B ⊆ K : ◦ I ( x A ; yx B | sw ) > |A| ( R + ∆) . (10.18 ) 53 Hence the probabilit y of f alse p ositive s, conditioned on T yx K sw , satisfies P FP ( T yx K sw , W K ) ≤ P r [ |A|≥ 1 [ B⊆ K ∃A 6⊆ K : ◦ I ( x A ; yx B | sw ) > |A| ( R + ∆) = P r [ |A|≥ 1 ∃A 6⊆ K : max B⊆ K ◦ I ( x A ; yx B | sw ) > |A| ( R + ∆) = P r [ |A|≥ 1 ∃A 6⊆ K : ◦ I ( x A ; yx K | sw ) > |A| ( R + ∆) ( a ) ≤ X |A|≥ 1 2 N |A| R P r ◦ I ( x A ; yx K | sw ) > |A| ( R + ∆) ( b ) . = X |A|≥ 1 2 N |A| R 2 − N |A| ( R +∆) = X |A|≥ 1 2 − N |A| ∆ . = 2 − N ∆ (10.19 ) where (a) follo ws from the union b oun d , and (b) from (10.12) with x A and yx K in place of x K and z , resp ectiv ely . Av eraging o v er all j oint type classes T yx K sw , w e obtain P FP ≤ 2 − N ∆ , from which (5.17) follo w s. (ii). Det ect-All Error Criterion. (Miss Some Colluders.) Und er the detect-all err or eve nt, any coal ition ˜ K that c ontains K fails the test. By (5.8), this im p lies that ∃A ⊆ ˜ K : ◦ I ( x A ; yx ˜ K\A | sw ) ≤ |A| ( R + ∆) . (10.20 ) In particular, for ˜ K = K = K w e ha ve ∃ A ⊆ K : ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ( R + ∆) . (1 0.21) The pr obabilit y of the miss-some ev ent , conditioned on ( s , w ), is therefore upp er b oun ded b y the 54 probabilit y of the ev en t (10.21): p miss − some ( p ∗ w p s | w , p ∗ x | sw , W K ) ≤ P r [ A ⊆ K ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ( R + ∆) ≤ X A ⊆ K P r ◦ I ( x A ; yx K \ A | sw ) ≤ | A | ( R + ∆) ( a ) ≤ X A ⊆ K exp 2 n − N ˘ E psp , A ,N ( R + ∆ , L, p ∗ w , p s | w , p ∗ x | sw , W K ) o . = max A ⊆ K exp 2 n − N ˘ E psp , A ,N ( R + ∆ , L, p ∗ w , p s | w , p ∗ x | sw , W K ) o = exp 2 − N min A ⊆ K ˘ E psp , A ,N ( R + ∆ , L, p ∗ w , p s | w , p ∗ x | sw , W K ) (10.22 ) where (a) follo ws from (10.16) with ν = R + ∆. Av eraging o ve r S , w e obtain p miss − some ( W K ) = X p s | w P r [ T s | w ] p miss − some ( p ∗ w p s | w , p ∗ x | sw , W K ) ( a ) . = max p s | w exp 2 − N D ( p s | w k p S | p ∗ w ) + min A ⊆ K ˘ E psp , { m } ,N ( R + ∆ , L, p ∗ w , p s | w , p ∗ x | sw , W K ) ( b ) = max p s | w exp 2 n − N ˆ E psp ,N ( R + ∆ , L, p ∗ w , p s | w , p ∗ x | sw , W K ) o ( c ) = exp 2 − N E psp ,N ( R + ∆ , L, D 1 , W K ) ( d ) . = exp 2 − N E psp ( R + ∆ , L, D 1 , W K ) whic h prov es (5.18). Here (a) f ollo ws fr om (10.13) and (10 .22 ), (b ) fr om the definitions (10 .5 ) and (10.3), (c) f r om (10.8), and (d) from th e limit pr op ert y (10.10). (iii). Det ect-One C rit erion. (Miss All Colluders.) Under the detect-o ne error even t, either the estimated coalitio n ˆ K is empty , or it is a set I of inn ocent u sers (disjoint with K ). Hence P one e ≤ P r [ ˆ K = ∅ ] + P r [ ˆ K = I ]. The first probabilit y , conditioned on ( s , w ), is b oun ded as 9 P r [ ˆ K = ∅ ] = P r [ ∀K ′ : M P M I ( K ′ ) ≤ 0] ≤ P r [ M P M I ( K ) ≤ 0] = P r [ ◦ I ( x K ; y | sw ) ≤ K ( R + ∆)] (10 .23) . = exp 2 n − N ˘ E psp , K ,N ( R + ∆ , L, p ∗ w , p s | w , p ∗ x | sw , W K ) o . 9 Using the b ound min K ′ ⊆K P r [ M P M I ( K ′ ) ≤ 0] wo uld not strengthen the in equalit y in (10.23). 55 T o b ound the s eco n d pr obabilit y , we use prop erty (5.9) with ˆ K = I and A = K . W e obtain ◦ I ( x K ; yx I | sw ) ≤ K ( R + ∆) Since ◦ I ( x K ; yx I | sw ) = ◦ I ( x K ; y | sw ) + I ( x K ; x I | ysw ) ≥ ◦ I ( x K ; y | sw ) com bining the t w o inequ aliti es ab o v e yields ◦ I ( x K ; y | sw ) ≤ K ( R + ∆) . The probabilit y of this ev en t is again giv en by (10.23); w e conclude that p miss − all ( p ∗ w p s | w , p ∗ x | sw , W K ) . = exp 2 n − N ˘ E psp , K ,N ( R + ∆ , L, p ∗ w , p s | w , p ∗ x | sw , W K ) o . Av eraging o ve r S and pro ceeding as in P art (ii) ab o v e, we obtain p miss − all ( W K ) ≤ X p s | w P r [ T s | w ] p miss − all ( p ∗ w p s | w , p ∗ x | sw , W K ) . = exp 2 − N E psp ( R + ∆ , L, D 1 , K , W K ) whic h establishes (5.19). (iv). F air C ollusion Channels. Th e pro of p arallels that of Theorem 4.1, P art (iv). Define P ∗ ( W K ) , P Y X K | S W ( p W , ˜ p S | W , p X | S W , W K , R, L, K ) (10.24 ) whic h is con ve x and p ermutatio n-inv arian t. Then wr ite (5.12) as ˜ E psp ( R, L, p W , ˜ p S | W , p X | S W , W K ) = min ˜ p Y X K | S W ∈ P ∗ ( W K ) D ( ˜ p Y X K | S W k ˜ p Y | X K p K X | S W | ˜ p S | W p W ) . (10.25 ) F or a ny ˜ p Y X K | S W ∈ P ∗ ( W K ) and p ermutatio n π of K , define the p ermuted conditional p.m.f. ˜ p π Y X K | S W and the p erm utation-a v eraged p.m.f. ˜ p fair Y X K | S W = 1 K ! P π ˜ p π Y X K | S W , whic h also b elongs to the co nv ex set P ∗ ( W K ). W e similarly define ˜ p π Y | X K and ˜ p fair Y | X K . The conditional div ergence D ( ˜ p π Y X K | S W k ˜ p π Y | X K p K X | S W | ˜ p S | W p W ) is in d ep enden t of π . By con ve xity , we obtain D ( ˜ p fair Y X K | S W k ˜ p fair Y | X K p K X | S W | ˜ p S | W p W ) ≤ D ( ˜ p Y X K | S W k ˜ p Y | X K p K X | S W | ˜ p S | W p W ) . (10.26) Therefore the minim um in (10.2 5 ) is ac hiev ed b y a p ermuta tion-inv arian t ˜ p Y X K | S W = ˜ p fair Y X K | S W , and the same minimum would h a ve b een obtained if W K had b een replaced with W fair K . Hence ˜ E psp ( R, L, p W , ˜ p S | W , p X | S W , W K ) = ˜ E psp ( R, L, p W , ˜ p S | W , p X | S W , W fair K ) . Substituting in to (5.15) and (5.19 ), we obtain E one ( R, L, D 1 , W K , ∆) = E one ( R, L, D 1 , W fair K , ∆) . 56 (v). Detect-All E rror Exp onen t for F air Collusion Channels. Using (5.10) and (5.11 ), observ e that ˜ E psp in (5.1 3 ) ma y b e wr itten as ˜ E psp ( R, L, p W , p S | W , p X | S W , W K ) = m in ˜ p Y X K | S W ∈ P ∗ ( W K ) D ( ˜ p Y X K | S W ˜ p S | W k ˜ p Y | X K p K X | S W p S | p W ) (10.27) where P ∗ ( W K ) , ˜ p Y X K | S W : ˜ p X K | S W ∈ M ( p X | S W ) , ˜ p Y | X K ∈ W K ( ˜ p X K ) , min A ⊆ K 1 | A | ◦ I ( X A ; Y X K \ A | S W ) ≤ R . Similarly to the d iscussion b elo w (10.25), when W K = W fair K the minim u m o ver ˜ p Y X K | S W in (10.27) is ac hieved by a p erm utation-in v arian t conditional p.m.f. Next w e sho w that K minimizes 1 | A | ◦ I ( X A ; Y X K \ A | S W ) ov er A ⊆ K . Indeed 1 | A | ◦ I ( X A ; Y X K \ A | S W ) = 1 | A | " X m ∈ A H ( X m | S W ) + H ( Y X K \ A | S W ) − H ( Y X K | S W ) # = H ( X | S W ) − 1 | A | H ( X A | Y X K \ A S W ) ( a ) ≥ H ( X | S W ) − 1 | K | H ( X K | Y S W ) = 1 | K | ◦ I ( X K ; Y | S W ) (10.28 ) where (a) follo ws from (3.2) with Z = ( Y , S, W ). Using (10.28) and (10.24), we obtain P ∗ ( W fair K ) = P ∗ ( W fair K ). Hence ˜ E psp ( R, L, p W , ˜ p S | W , p X | S W , W fair K ) = min ˜ p Y X K | S W ∈ P ∗ ( W fair K ) D ( ˜ p Y X K | S W ˜ p S | W k ˜ p Y | X K p K X | S W p S | p W ) = ˜ E psp ( R, L, p W , ˜ p S | W , p X | S W , W fair K ) and therefore E all ( R, L, D 1 , W fair K , ∆) = E one ( R, L, D 1 , W fair K , ∆) . (vi). P ositiv e Error Exp onen ts. Consider any W = { 1 , · · · , L } and p W that is p ositiv e o ver its supp ort set (if it is not, reduce the v alue o f L accordingly .) F or an y A ⊆ K , the dive r gence to b e minimized in the expression (5.11) for ˜ E psp , A ( R, L, p W , ˜ p S | W , p X | S W , W K ) is zero if and only if ˜ p Y X K | S W = ˜ p Y | X K p K X | S W and ˜ p S | W = p S . These p.m.f.’s are feasible for (5.10) if and only if the resulting I ( X A ; Y X K \ A | S W ) ≤ | A | R . They are infeasible, and thus p ositiv e error exp onen ts are guarantee d , if R < min A ⊆ K 1 | A | I ( X A ; Y X K \ A | S W ) . 57 F rom Part (iv) abov e, we may restrict our atten tion to W K = W fair K under the detect-one criterion. Sin ce th e p.m.f. of ( S, W, X K , Y ) is p erm utation-in v arian t, b y app lication of (3.3) we ha ve min A ⊆ K 1 | A | I ( X A ; Y X K \ A | S W ) = 1 K I ( X K ; Y | S W ) . (10.2 9) Hence the suprem um of all R for error exponents are p ositiv e is giv en by e C one ( D 1 , W K ) in (3 .8 ) and is obtained by letting ∆ → 0 and L → ∞ . F or an y W K , u nder the detect -all criterion, the sup rem um of all R for wh ic h error exp onents are p ositiv e is giv en by e C all ( D 1 , W K ) in (3.9) and is obtained by letting ∆ → 0 and L → ∞ . Since the optimal p.m.f. is not necessarily permutation-in v arian t, (10.29) do es not hold in ge n eral. How ever, if W K = W fair K , the same capacit y is obtained for the detect-one and detect-all problems. ✷ 11 Conclusion W e ha ve d eriv ed exact fingerprinting capacit y form ulas as opp osed to b ounds derived in recent pap ers [4, 5, 10], and constr u cted a univ ersal fingerpr in ting sc heme. A distinguishing feature of this new scheme is th e use of an aux iliary “ time-sharing” randomized s equence W . The analysis sho ws that optimal coalitions a re fair and that capacit y and rand om-co ding e xp on ents are the same whether the p roblem is formulated as catc hing one colluder or all of th em. Our study also a llo ws u s to reexamine previous fingerprinting s y s tem d esigns fr om a new angle. First, randomization of the enco der via W is generally needed b ecause the p a yoff fu nction in th e m utual-information game is n onconca ve with r esp ect to p X | S . Thus capacit y is obtained as the v alue of a mutual-information game with p X W | S as the maximizing v ariable. This has motiv ated the construction of our rand omized fin gerprin ting sc heme, whic h may also b e though t of as a generalizat ion of T ardos’ d esign [9]. Two other randomization metho ds are also fund amental: randomized p ermutati on of user indices to ens u re that maxim um e rr or probabilit y (ov er all p ossible coaliti ons) equals av erage error pr obabilit y; and randomized p ermutat ion of th e letters { 1 , 2 , · · · , N } to cope with collusion c hannels with arbitrary memory . Second, single-user d ecoders are s im p le b ut sub optimal. Su c h decoders assign a s core to e ac h user based on h is individual fi ngerprin t and t h e r ecei ved data, and decl are guilt y th ose u s ers wh ose score exceeds some threshold. While this is a reasonable approac h, p erformance can b e impr o ved b y making join t decisions about th e coa lition. S imilarly , the fingerprinting schemes prop osed in [9] and in m uc h of the s ignal p ro cessing literature migh t b e impro ved by adopting a join t-decision principle, at th e exp en s e of increased decod ing complexit y . Finally , s ev er al inf ormation-theo retic approac hes to fingerprinting hav e b een studied in the t wo y ears since this p ap er w as su bmitted for pub licati on, including work on sph erical fingerprinting b y the author [25] an d his co w orkers W ang [26] and J ourdas [27], on blind fi n gerprin ting [6, 28], on binary fi ngerprin ting u nder the Boneh-Sha w mo del by Amiri and T ardos [29], Huang and Moulin [30, 31], and F u ron a nd P ´ e rez-F reire [32], as well as researc h on t w o-lev el fingerprinti ng c o des b y An thapadmanabh an and Barg [33]. P articularly notew orth y is [29], whic h p r esen ts a random co d ing sc heme closely relate d to ours, with a join t deco der (impro ving on T ardos’ earlier wo rk [9 ]) th at maximizes a p enalized empirical 58 m utual information criterion, similarly to P lotnik and Satt’s un iv ersal d ecoder for the random MA C [11]. Amiri and T ardos use ordinary empirical mutual information instead o f our emp irical m utual information ◦ I of K v ariables. While b oth choic es are capacit y-ac hieving, our s is geared to w ards obtaining b etter error exp onen ts, as is th e case for th e classical MA C decod ing pr oblem [23]. The pap er [29] also outlines the proof of a con verse theorem f or the so-cal led we ak fingerprinting mo del , in whic h a help er discloses all colluders except one to the deco der. Ac kno wledgments. The author is v ery grateful to Dr. Ying W ang for r eading sev eral dr afts of this pap er and making commen ts and suggestions that hav e imp ro v ed it. He also thanks Y en-W ei Huang, Dr. Pr asanth An thapadmanabhan, Profs. Barg and T ardos, and the anon ymous r eview ers for helpful commen ts and corrections; a nd Pr of. Ra ymond Y eu ng a n d an anonymous r eview er o f [28] for bringing references [22] and [11], resp ectiv ely , to our atten tion. 59 A Pro of of Lemma 3.1 Due to the p er mutation-in v arian t assumption on th e join t p .m.f. of ( X K , Z ), it suffices the e stablish (3.1) for A = { 1 , · · · , k − 1 } and B = { 1 , · · · , k } , where 2 ≤ k ≤ K . The claim then fo llo ws b y induction o v er k . Let Z k = ( Z , X N k +1 ), hence Z k − 1 = ( Z k , X k ). Then (3.1) take s the form 1 k − 1 H ( X k − 1 1 | Z, X N k ) ≤ 1 k H ( X k 1 | Z, X N k +1 ) or equiv alen tly ( k − 1) H ( X k 1 | Z k ) ≥ k H ( X k − 1 1 | Z k X k ) , 2 ≤ k ≤ K . (A.1) And indeed the differen ce b etw een left and righ t sides of (A.1 ) satisfies ( k − 1) H ( X k 1 | Z k ) − k H ( X k − 1 1 | Z k X k ) = ( k − 1)[ H ( X k | Z k ) + H ( X k − 1 1 | Z k X k )] − k H ( X k − 1 1 | Z k X k ) = ( k − 1) H ( X k | Z k ) − H ( X k − 1 1 | Z k X k ) ( a ) = k − 1 X i =1 H ( X i | Z k ) − H ( X k − 1 1 | Z k X k ) ( b ) ≥ H ( X k − 1 1 | Z k ) − H ( X k − 1 1 | Z k X k ) = I ( X k − 1 1 ; X k | Z k ) ( c ) ≥ 0 where (a) holds b ecause the conditional p.m.f .’s p X i | Z k , 1 ≤ i ≤ k , are iden tical due to the p erm u- tation inv ariance assump tion. Inequalities (b) and (c) hold with equ ality when X i , 1 ≤ i ≤ k , are conditionally indep endent giv en Z k . Similarly , to establish (3.2), it suffices to prov e that ( k − 1) H ( X k 1 | Z ) ≤ k H ( X k − 1 1 | Z ) . (A.2) W e hav e ( k − 1) H ( X k 1 | Z ) − k H ( X k − 1 1 | Z ) = ( k − 1)[ H ( X k − 1 1 | Z ) + H ( X k | Z, X k − 1 1 )] − k H ( X k − 1 1 | Z ) = ( k − 1) H ( X k | Z, X k − 1 1 ) − H ( X k − 1 1 | Z ) ( a ) = k − 1 X i =1 H ( X i | Z, X i − 1 1 , X k i +1 ) − H ( X k − 1 1 | Z ) ( b ) = k − 1 X i =1 H ( X i | Z, X i − 1 1 , X k i +1 ) − k − 1 X i =1 H ( X i | Z, X i − 1 1 ) = − k − 1 X i =1 I ( X i ; X k i +1 | Z, X i − 1 1 ) ≤ 0 where in (a) we ha v e u sed the p erm utation inv ariance of the d istribution of X k 1 , and in (b) the c hain rule for en trop y . ✷ 60 B Pro of of Lemma 3.3 The deriv ation b elo w is giv en in terms of the detect-one criterion b ut applies straigh tforw ardly to the detect-all criterion as well . Denote by C one memoryless ( D 1 , W K ) the comp ound capacit y under the detect-one criterion. T o prov e the claim C one ( D 1 , W K ) ≤ C one memoryless ( D 1 , W K ) , (B.1) it suffices to iden tify a family of collusion c hannels satisfying th e almost-sure fidelit y constrain t (2.10) and for w hic h reli able deco d ing i s imp ossible at rates ab ov e C one memoryless ( D 1 , W K ). F or an y x K , consider the class W ǫ K ( p x K ) , ( ˜ p Y | X K ∈ P Y | X K : min p Y | X K ∈ W K ( p x K ) max x K ,y | ˜ p Y | X K ( y | x K ) − p Y | X K ( y | x K ) | ≤ ǫ ) , ǫ ≥ 0 , (B.2) whic h is sligh tly large r than W K ( p x K ) but shrin ks to wards W K ( p x K ) as ǫ ↓ 0. Con tinuit y of mutual information and the mapp ing W K ( · ) with resp ect to v ariational distance (p er (2.11) implies that C one ( D 1 , W ǫ K ) ↑ C one ( D 1 , W K ) as ǫ ↓ 0 . (B.3) W e no w clai m that if the coalitio n selects a memoryless channel p Y | X K ∈ W K ( p x K ), the constrain t p y | x K ∈ W ǫ K ( p x K ) is satisfied w ith pr obabilit y app roac h in g 1 as N → ∞ : ∀ ǫ > 0 ∃ N 0 ( ǫ ) : P r [ p y | x K ∈ W ǫ K ( p x K )] ≥ 1 − ǫ ∀ N > N 0 ( ǫ ) . (B.4) T o sho w this, define the set E = x K : min x K ∈X K p x K ( x K ) ≥ ǫ |X | − K . Without loss of generalit y 10 assume f N is suc h that P r [ x K ∈ E ] ≥ 1 − ǫ/ 2 (B.5) where the probabilit y is tak en with r esp ect to M K , S , V . F or an y x K ∈ E , x K ∈ X K , y ∈ Y , if y is generated conditionally i.i.d. p Y | X K , the random v ariable p y | x K ( y | x K ) conv erges in probabilit y to p Y | X K ( y | x K ) as N → ∞ . Hence P Y | X K = x K max x K ,y | p y | x K ( y | x K ) − p Y | X K ( y | x K ) | ≤ ǫ ≥ 1 − ǫ/ 2 , ∀ x K ∈ E (B.6) for an y N > N 0 ( ǫ ). Com binin g (B.5) and (B.6), we obtain (B.4). A lo wer b ou n d on error probability is obtained when a h elp er pro vides s ome information to the deco d er. Assume the constrain t on the coal ition is sligh tly r elaxe d s o that they are allo w ed to pro duce pirated copies that violate t h e constrain t p y | x K ∈ W ǫ K ( p x K ) with p r obabilit y at most ǫ , as in (B.4). I n this e ven t, the h elp er rev eals the entire coalition to the deco der. This contributes a t most ǫK N R bits of information to the d ecoder and d oes not increase the deco der’s error probab ility . Hence C one ( D 1 , W ǫ K ) + ǫK ≤ C one memoryless ( D 1 , W K ) . Com binin g this inequalit y with (B.3) establishes (B.1). ✷ 10 One may a lwa ys “fill in” ea ch cod ew ord x m with 2 ǫ |X | − K N dummy sym b ols dra wn from the uniform p.m.f. on X to ensure that (B.5) holds. The ra te loss d u e to the “fill-in” sym b ols va nishes as ǫ → 0. 61 C Pro of of (8.39) The quan tities ˆ ζ ( Y ) and ζ are defin ed in (8.37) and (8.3 8 ), r esp ectiv ely . W e first analyze P r 1 ˆ D ij k ( Y ) ≤ 3 δ 2 4 > 1 { D ij k ≤ δ 2 } | S = s , V = v , K = { i, j } = P r ˆ D ij k ( Y ) ≤ 3 δ 2 4 and D ij k > δ 2 | S = s , V = v , K = { i, j } ≤ P r ˆ D ij k ( Y ) < D ij k − δ 2 4 | S = s , V = v , K = { i, j } (C.1) for an y k ∈ M A− goo d N ( s , v , i, δ ). The shorthand P r denotes the probabilit y distribution on ˆ D ij k ( Y ) induced by the conditional d istribution p N Y | X 1 X 2 ( Y | x i ( s , v ) , x j ( s , v )). C onditioned on S = s , V = v , K = { i, j } , the normalized loglik eliho o d ˆ D ij k ( Y ) of (8.3 4 ) is the a verage of N in dep enden t random v ariables. W e show that ˆ D ij k ( Y ) con verges in pr obabilit y (and exp onen tially with N ) to its exp ectati on D ij k of (8.7). W e may write ˆ D ij k ( Y ) as a fun ction of th e join t type p Yx i x j x k of the quadru ple ( Y , x i ( s , v ) , x j ( s , v ) , x k ( s , v )): ˆ D ij k ( Y ) = D ( p Yx i x j x k ) , X y , x 1 ,x 2 ,x ′ 2 p Yx i x j x k ( y , x 1 , x 2 , x ′ 2 ) log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y | X 1 X 2 ( y | x 1 , x ′ 2 ) . (C.2) Similarly , from (8.7) w e obtain D ij k = D ( p x i x j x k ) , X y , x 1 ,x 2 ,x ′ 2 p x i x j x k ( x 1 , x 2 , x ′ 2 ) p Y | X 1 X 2 ( y | x 1 , x 2 ) log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y | X 1 X 2 ( y | x 1 , x ′ 2 ) . (C.3) Subtracting (C.3) from (C.2) yields ˆ D ij k ( Y ) − D ij k = X y , x 1 ,x 2 ,x ′ 2 p x i x j x k ( x 1 , x 2 , x ′ 2 )[ p Y | x i x j x k ( y | x 1 , x 2 , x ′ 2 ) − p Y | X 1 X 2 ( y | x 1 , x 2 )] × log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y | X 1 X 2 ( y | x 1 , x ′ 2 ) = X y , x 1 ,x 2 ,x ′ 2 U ( y , x 1 , x 2 , x ′ 2 ) p Y | X 1 X 2 ( y | x 1 , x 2 ) log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y | X 1 X 2 ( y | x 1 , x ′ 2 ) whic h is a linear combinatio n of the random v ariables U ( y , x 1 , x 2 , x ′ 2 ) , p x i x j x k ( x 1 , x 2 , x ′ 2 ) p Y | x i x j x k ( y | x 1 , x 2 , x ′ 2 ) p Y | X 1 X 2 ( y | x 1 , x 2 ) − 1 . Note that for eac h x 1 , x 2 , x ′ 2 , the min im um of U ( y , x 1 , x 2 , x ′ 2 ) o v er y ∈ Y is nonp ositiv e. 62 Owing to (8.6 ), w e also h a ve X y p Y | X 1 X 2 ( y | x 1 , x 2 ) log p Y | X 1 X 2 ( y | x 1 , x 2 ) p Y | X 1 X 2 ( y | x 1 , x ′ 2 ) = D ( p Y | X 1 = x 1 ,X 2 = x 2 k p Y | X 1 = x 1 ,X 2 = x ′ 2 ) ∈ [ δ, log δ − 1 ] . Hence ˆ D ij k ( Y ) − D ij k ≥ (log δ − 1 ) min y , x 1 ,x 2 ,x ′ 2 U ( y , x 1 , x 2 , x ′ 2 ) . (C.4) In th e sequel we omit the conditioning on s , v , i, j , for conciseness of notation. W e b ound (C.1) b y P r ˆ D ij k ( Y ) < D ij k − δ 2 4 ( a ) ≤ P r min y , x 1 ,x 2 ,x ′ 2 U ( y , x 1 , x 2 , x ′ 2 ) < − δ 2 4 log δ − 1 ( b ) ≤ |X | 3 |Y | max y , x 1 ,x 2 ,x ′ 2 P r U ( y , x 1 , x 2 , x ′ 2 ) < − ǫ (C.5) where (a) follo ws from (C.4); and in (b) w e ha v e used the union b ound and the s horthand ǫ = δ 2 4 log δ − 1 . Denote by D b ( α k p ) = α ln α p + (1 − α ) ln 1 − α 1 − p , 0 < α < 1 , the large-deviations function for the Bernoulli random v ariable with probability p . Note that D b ((1 − ǫ ) p k p ) ∼ p 2 ǫ 2 2(1 − p ) as ǫ ↓ 0 and that p 2 1 − p > δ 2 1 − δ for all δ < p < 1 − δ . Define f ( δ ) = min δ ≤ p ≤ 1 − δ D b ((1 − ǫ ) p k p ). W e ha v e f ( δ ) ∼ D b ((1 − ǫ ) δ k δ ) ∼ ǫ 2 δ 2 2(1 − δ ) = δ 6 32(1 − δ ) log 2 δ − 1 ≫ δ 7 as δ ↓ 0 (C.6) hence there exists δ ∗ > 0 suc h that f ( δ ) > δ 7 for all 0 < δ < δ ∗ . Define th e sh orthand β = p x i x j x k ( x 1 , x 2 , x ′ 2 ) ∈ [0 , 1]. F or eac h i, j, k and eac h y , x 1 , x 2 , x ′ 2 , the coun t β N p Y | x i x j x k ( y | x 1 , x 2 , x ′ 2 ) = N p Yx i x j x k ( y , x 1 , x 2 , x ′ 2 ) = N X t =1 1 { Y t = y , x it = x 1 , x j t = x 2 , x k t = x 3 } is a binomial r andom v ariable with β N trials and probability p , p Y | X 1 X 2 ( y | x 1 , x 2 ) ∈ [ δ , 1 − δ ]. By (8.6), w e ha v e δ ≤ p ≤ 1 − δ . Next P r [ U ( y , x 1 , x 2 , x ′ 2 ) < − ǫ ] = P r β p Y | x i x j x k ( y | x 1 , x 2 , x ′ 2 ) p Y | X 1 X 2 ( y | x 1 , x 2 ) − 1 < − ǫ = P r Bi( β N , p ) N p − β < − ǫ . 63 F or β ≤ ǫ , this probabilit y is zero. F or ǫ < β ≤ 1, w e ha v e P r [ U ( y , x 1 , x 2 , x ′ 2 ) < − ǫ ] = P r β p Y | x i x j x k ( y | x 1 , x 2 , x ′ 2 ) p Y | X 1 X 2 ( y | x 1 , x 2 ) − 1 < − ǫ (C.7) = P r [Bi( β N , p ) < β N (1 − ǫ/β ) p ] ( a ) ≤ 2 − N β D b ((1 − ǫ/β ) p k p ) ( b ) ≤ 2 − N D b ((1 − ǫ ) p k p ) ( c ) ≤ 2 − N f ( δ ) ( d ) < 2 − N δ 7 ∀ δ < δ ∗ (C.8) where (a) holds b y definition of the large- d eviations function D b ; (b) holds b y con ve xity of the function D b ( ·k p ): for all ǫ ′ = ǫ/β ∈ [ ǫ, 1), we ha ve D b ((1 − ǫ ′ ) p k p ) ≥ ( ǫ ′ /ǫ ) D b ((1 − ǫ ) p k p ) with equalit y if ǫ ′ = ǫ , i.e., β = 1;(c) h olds by (C.6); and (d) holds by the lo wer b ound on f ( δ ). Com binin g (C.5 ) and (C.8 ), we conclude that (C.1) is upp er-b ou n ded b y an exp onentia lly v anishing function of N f or eac h δ < δ ∗ : ∀ i, j, k : P r 1 ˆ D ij k ( Y ) ≤ 3 δ 2 4 > 1 { D ij k ≤ δ 2 } | S = s , V = v , K = { i, j } ≤ 2 − N δ 7 . (C.9) This do es n ot immediate ly imp ly th at ˆ ζ ( Y ) ≤ ζ with probabilit y approac h in g 1 b ecause the definition of ˆ ζ ( Y ) in (8.3 7 ) inv olv es p oten tially exp onen tially many terms ˆ D ij k ( Y ). Ho w ev er P r [ ˆ ζ ( Y ) > ζ | S = s , V = v , K = { i, j } ] = P r " X k 1 ˆ D ij k ( Y ) ≤ 3 δ 2 4 > X k 1 { D ij k ≤ δ 2 } | S = s , V = v , K = { i, j } # ≤ P r ∃ p x i x j x k : 1 ˆ D ij k ( Y ) ≤ 3 δ 2 4 > 1 { D ij k ≤ δ 2 } | S = s , V = v , K = { i, j } ( a ) ≤ ( N + 1) |X | 3 max p x i x j x k P r 1 ˆ D ij k ( Y ) ≤ 3 δ 2 4 > 1 { D ij k ≤ δ 2 } | S = s , V = v , K = { i, j } ( b ) ≤ |X | 3 |Y | ( N + 1) |X | 3 2 − N δ 7 → 0 as N → ∞ where (a) follo ws from the un ion b ound and the fact that the num b er of joint t yp es p x i x j x k is at most ( N + 1) |X | 3 , an d (b) from (C.1) and (C.9). T his establishes (8.39). ✷ D Pro of of (8.56) Lemma D.1 Ther e exists a p artition { f M i } i ∈I of M bad N ( s , v , δ ) with the fol lowing pr op erties: (P1) ∀ i ∈ I , ∀ j ∈ f M i : d H ( x i ( s , v ) , x j ( s , v )) ≤ 2 N δ ; (P2) ∀ i ∈ I : | f M i | ≥ 2 N 3 √ δ . 64 Pr o of . By assump tion, |M bad N ( s , v , δ ) | ≥ 2 N R (1 − 2 − N δ 2 / 3 ). The index set I and the sets { f M i } i ∈I are constructed iterativ ely as follo ws. De n ote b y i the sm allest in dex in M bad N ( s , v , δ ) and initialize I = { i } and f M i = M i ( s , v , δ ). By the definition (8.3), f M i satisfies d H ( x i ( s , v ) , x j ( s , v )) ≤ N δ f or all j ∈ f M i , h en ce Prop ert y (P1) holds. Also, o win g to (8.5 ), Prop ert y (P2) holds as w ell. Next, find the smallest i ∈ M bad N ( s , v , δ ) such that d H ( x j ( s , v ) , x i ( s , v )) > 2 N δ for all j ∈ I , and u p date I ← I ∪ { i } . By the triangle inequalit y , the sets { f M i } i ∈I are disjoin t. Rep eat this op eration till no such i can b e foun d . A t this p oin t, the set I is fi xed, and ea c h remaining co deword index j / ∈ ∪ i ∈I f M i , satisfies d H ( x j ( s , v ) , x i ( s , v )) ≤ 2 N δ for some i ∈ I . Assign the index j of this co dew ord to f M i ; ties can b e broke n arbitrarily . Prop erties (P1) and (P2) of the set f M i are preserve d. Rep eat this op eration till all the c o deword indices in M bad N ( s , v , δ ) are exhausted. Up on completion of this pro cess, the sets { f M i } i ∈I form a partition of M bad N ( s , v , δ ) and satisfy (P1) and (P2). ✷ Assume that K = { i, j } ∈ ( M bad N ( s , v , δ )) 2 . Consid er the p artition of Lemma D.1 and a genie (help er) that r eveal s the t w o “clusters” of ind ices f M i ∗ and f M j ∗ ( i ∗ , j ∗ ∈ I ) to whic h i and j resp ectiv ely b elong. Than k s to the genie, we can enlarge the d ecoding regions, obtain- ing {D ′ m ( s , v ) , m ∈ f M i ∗ ∪ f M j ∗ } th at are at least as la r ge as the original d ecoding regions — D m ( s , v ) ⊆ D ′ m ( s , v ) — and form a partition of Y N . The conditional probabilit y that Y is typica l and that co rr ect decod ing occurs (giv en s , v , i ∈ f M i ∗ , j ∈ f M j ∗ ) for the orig inal deco der and for the genie-aided deco der are resp ective ly giv en b y P c ( i, j | s , v ) in (8.46) and b y P ′ c ( i, j | s , v , i ∗ , j ∗ ) , X y ∈ T δ ( s ,v,i, j ) ∩ ( D ′ i ( s ,v ) ∪D ′ j ( s ,v )) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) . Since D m ( s , v ) ⊆ D ′ m ( s , v ) for all m , we hav e P c ( i, j | s , v ) ≤ P ′ c ( i, j | s , v , i ∗ , j ∗ ) , ∀ i ∈ f M i ∗ , j ∈ f M j ∗ . (D.1) The a v erage of the righ t side o v er i ∈ f M i ∗ and j ∈ f M j ∗ is denoted b y P ′ c ( s , v , i ∗ , j ∗ ) , 1 | f M i ∗ | | f M j ∗ | X i ∈ f M i ∗ X j ∈ f M j ∗ P ′ c ( i, j | s , v , i ∗ , j ∗ ) . (D.2) Let ( i ∗∗ , j ∗∗ ) ac h iev e the maxim um of P ′ c ( s , v , i ∗ , j ∗ ) o v er ( i ∗ , j ∗ ), and denote b y C 1 = f M i ∗∗ and C 2 = f M j ∗∗ the corresp onding clusters of indices. Analogously to Step 3, defin e the rand om v ariables X i = x iT ( S , V ) , i ∈ M N , where T is uniformly distributed o v er { 1 , 2 , · · · , N } and indep end en t of all other random v ariables. Define X and X ′ dra wn uniformly and indep end en tly from the sets { X i , i ∈ C 1 } and { X j , j ∈ C 2 } resp ectiv ely . The definitions and deriv ations in Steps 3 and 4 carry , with C 1 × C 2 in place of ( M goo d N ( s , v , δ )) 2 . In particular, (8.13) b ecomes p Y t | S V ( y | s , v ) = 1 |C 1 | |C 2 | X i ∈C 1 X j ∈C 2 p Y | X 1 X 2 ( y | x it ( s , v ) , x j t ( s , v )) . (D.3 ) W e again use the r eference conditional distribu tion (8.15), rep eated b elo w for con v enience: r ( y | s , v ) , N Y t =1 p Y t | S V ( y t | s , v ) . (D.4) 65 F or eac h i, k ∈ C 1 and j, l ∈ C 2 , it follo ws fr om the triangle inequalit y that d H ( x i ( s , v ) , x k ( s , v )) ≤ 4 N δ and d H ( x j ( s , v ) , x l ( s , v )) ≤ 4 N δ. Hence there are at m ost 8 N δ p ositions t at wh ic h ( x it ( s , v ) , x j t ( s , v )) 6 = ( x k t ( s , v ) , x lt ( s , v )). There- fore, o wing to ( 8.6 ), the Kullb ac k-Leibler divergence betw een the distributions of Y conditioned on co dew ord pairs ( i, j ) and ( k , l ) resp ective ly , satisfies D ij k l , 1 N N X t =1 D ( p Y | X 1 = x it ( s ,v ) ,X 2 = x j t ( s ,v ) k p Y | X 1 = x kt ( s ,v ) ,X 2 = x lt ( s ,v ) ) ≤ 8 δ log δ − 1 . (D.5) Hence the conditional self-information of (8.17) is θ ij ( s , v ) , 1 N N X t =1 D ( p Y | X 1 = x it ( s ,v ) ,X 2 = x j t ( s ,v ) k p Y t | S = s ,V = v ) ( a ) ≤ 1 |C 1 | |C 2 | X ( k, l ) ∈C 1 ×C 2 D ij k l ( b ) ≤ 8 δ log δ − 1 where (a) holds by (D. 3 ) and con v exit y of the Kullbac k-Leibler div ergence, and (b) from (D. 5 ). The a v erage of the self-information θ ij ( s , v ) o v er all ( i, j ) ∈ C 1 × C 2 is the conditional m utual information I p T p X 1 | S V T p X 2 | S V T p Y | X 1 X 2 ( X 1 X 2 ; Y | S = s , V = v, T ) = I ( s , v ) , 1 |C 1 | |C 2 | X ( i,j ) ∈C 1 ×C 2 θ ij ( s , v ) ≤ 8 δ log δ − 1 . Analogously to Step 5, defin e the t ypical sets T δ ( s , v , i, j ) , y ∈ Y N : 1 N N X t =1 log p Y | X 1 X 2 ( y t | x it ( s , v ) , x j t ( s , v )) p Y t | S V ( y t | s , v ) | {z } ˆ θ ij ( s ,v ) < 9 δ log δ − 1 . (D.6) The random v ariable ˆ θ ij ( s , v ) ab o v e is the av erage of N conditionally in d ep enden t random v ariables (giv en s , v ) and conv er ges in probabilit y to its mean θ ij ( s , v ) ≤ 8 δ log δ − 1 . Similarly to (8.26 ), w e ha ve P r [ Y / ∈ T δ ( s , v , i, j ) | S = s , V = v , K = { i, j } ] ≤ 1 N δ 2 , ∀ s , v , i, j (D.7) whic h v anish es as N → ∞ . Analogously to (8.4 7 ), w e define P bad c ( s , v ) , P r [correct deco ding and Y ∈ T δ ( s , v , K ) | S = s , V = v , K ∈ ( M bad N ( s , v , δ )) 2 ] . (D.8) 66 W e hav e P bad c ( s , v ) = 1 |M bad N ( s , v , δ ) | 2 X i,j ∈M bad N ( s ,v,δ ) P c ( i, j | s , v ) ( a ) = 1 |M bad N ( s , v , δ ) | X i ∗ ,j ∗ ∈I X i ∈ f M i ∗ X j ∈ f M j ∗ P c ( i, j | s , v ) ( b ) ≤ 1 |M bad N ( s , v , δ ) | 2 X i ∗ ,j ∗ ∈I X i ∈ f M i ∗ X j ∈ f M j ∗ P ′ c ( i, j | s , v , i ∗ , j ∗ ) ( c ) = 1 |M bad N ( s , v , δ ) | 2 X i ∗ ,j ∗ ∈I | f M i ∗ | | f M j ∗ | P ′ c ( s , v , i ∗ , j ∗ ) ( d ) ≤ ma x i ∗ ,j ∗ ∈I P ′ c ( s , v , i ∗ , j ∗ ) = 1 |C 1 | |C 2 | X i ∈C 1 X j ∈C 2 X y ∈ T δ ( s ,v,i, j ) ∩ ( D ′ i ( s ,v ) ∪D ′ j ( s ,v )) p N Y | X 1 X 2 ( y | x i ( s , v ) , x j ( s , v )) ( e ) ≤ 2 N 9 δ log δ − 1 |C 1 | |C 2 | X i ∈C 1 X j ∈C 2 X y ∈ T δ ( s ,v,i, j ) ∩ ( D ′ i ( s ,v ) ∪D ′ j ( s ,v )) r ( y | s , v ) ( f ) ≤ 2 N 9 δ log δ − 1 |C 1 | |C 2 | X i ∈C 1 X j ∈C 2 X y ∈D ′ i ( s ,v ) ∪D ′ j ( s ,v ) r ( y | s , v ) ( g ) = 2 N 9 δ log δ − 1 |C 1 | |C 2 | X i ∈C 1 X j ∈C 2 X y ∈D ′ i ( s ,v ) r ( y | s , v ) + X y ∈D ′ j ( s ,v ) r ( y | s , v ) = 2 N 9 δ log δ − 1 |C 1 | |C 2 | ( |C 2 | + |C 1 | ) ( h ) ≤ 2 N [9 δ log δ − 1 − 3 √ δ ] +1 ≤ 2 − N √ δ +1 , ∀ δ < 1 4000 (D.9) where (a ) and (d) hold b ecause { f M i } i ∈I form a partition of M bad N ( s , v , δ ), (b) b ecause of (D.1), (c) b ecause of (D.2), (e) follo ws fr om (D.4) an d (D.6), (f ) is obtained by dropping the restriction y ∈ T δ ( s , v , i, j ), (g) holds b ecause the deco ding regions D ′ i ( s , v ) a nd D ′ j ( s , v ) a re disjoint , and (h) b ecause |C 1 | , |C 2 | ≥ 2 N 3 √ δ . Thus P bad c ( s , v ) ( a ) ≤ P r [ Y / ∈ T δ ( s , v , i, j ) | S = s , V = v , K = { i, j } ] + P bad c ( s , v ) ( b ) ≤ 1 N δ 2 + 2 − N √ δ +1 (D.10) where (a) follo ws from (8.55) and (D.8), and (b) from (D.7) and (D.9). Hence P bad c ( s , v ) v anishes for all R > 0, all ( s , v ), and all p Y | X 1 X 2 . Moreo v er P bad c ( s , v ) < 2 N δ 2 for all δ < 1 / 4000 , a nd this establishes (8.56). ✷ 67 References [1] P . Moulin and A. Briassouli, “Th e Gaussian Fingerprinti n g Game,” Pr o c. Co nf. Informatio n Scienc es and Systems , Princeton, NJ, Marc h 2002. [2] P . Moulin and J. A. O’S ulliv an, “Optimal Key Design for I n formation-Em b edding Systems,” Pr o c. Conf. Information Scienc es and Systems , Princeton, NJ, Marc h 2002. [3] P . Moulin and J. A. O’Sulliv an, “Information-theoretic analysis of information hid ing,” IEEE T r ans. on Information The ory , V ol. 49, No. 3, pp. 563—593, Marc h 2003. [4] A. S omekh-Baruc h and N. Merha v, “On the capacit y game of priv ate fi ngerprint ing systems under collusion attac ks,” IEEE T r ans. Information The ory , vo l. 51, n o. 3, p p. 884—899 , Mar. 2005 . [5] A. Somekh-Baruc h and N. Merha v, “Ac hiev able error exp onen ts for the priv ate fingerp r in ting game,” IEEE T r ans. Information The ory , V ol. 53, No. 5, pp. 1827—1 838, Ma y 2007. [6] Y. W ang and P . Moulin, “Capac ity and Random-Co ding Err or Exp onent for Public Finger- printi n g Game,” Pr o c. Int. Symp. on Information The ory , S eattle, W A, J uly 2006. [7] P . Moulin a n d N. Kiy av ash , “Expurgated Gaussian Fi n gerp rin ting Co des,” Pr o c. IEE E Int. Symp. on Information The ory , Nice, F rance, Ju ne 2007. [8] D. Boneh and J. Sh a w , “Collusion–Secure Fingerp rin ting for Digital Data,” in A dvanc es in Cryptolo gy: P r o c. CR YPTO’ 95 , Springer–V erlag, New Y ork, 1995. [9] G. T ardos, “Optimal Probabilistic Fingerprint ing Co des,” ACM Symp. on The ory of Comp ut- ing , San Diego, CA, 2003. [10] N. P . Anthapadmanabhan, A. Barg and I. Dumer, “On the Fingerprinting Capacit y Un der the Marking Assu mption,” IEEE T r ans. Information The ory , V ol. 54, No. 6, pp. 2678—268 9, June 200 8. [11] E. Plotnik and A. Satt, “Decod ing Rule and Err or Expon ent for the Random Multiple-Acc ess Channel,” Pr o c. Int. Symp. Information The ory , p. 216, Bud ap est, Hungary , 1991. [12] I. Csisz´ ar and J. K ¨ orn er, Information The ory: Co ding The ory for Discr ete M emoryless Sys- tems , Academic Press, NY, 1981. [13] I. Cs isz´ ar, ‘The Metho d of Typ es,” IEEE T r ans. on Information The ory , V ol. 44, No. 6, pp. 25 05—2523, Oct. 1998. [14] P . Moulin and Y. W ang, “Capacit y and Rand om-Cod in g Exp onen ts for Channel Co ding with Side In formation,” IEEE T r ans. on Informatio n The ory , V ol. 53, No. 4, pp. 1326—1347, Apr. 2007 . [15] G. D. F orney , J r., “Exp onent ial Err or Bounds for Er asu re, List, and Decision F eedback Sc hemes,” IEEE T r ans. Information The ory , V ol. 14, No. 2, pp. 206 —220, 1968. 68 [16] R. G. Gallage r, Information The ory and R eliable Communic ation , Wiley , New Y ork, 1968. [17] R. Ahlsw ede, “Multiw ay Communicati on Channels,” Pr o c. IEEE Int. Symp. on Information The ory , pp. 23—52, Tsahk adsor, Armenia, 1971 . [18] H. Liao , “Multiple Access C hannels,” Ph. D. dissertation , EE Departmen t, U. of Haw aii, 1972. [19] A. Das and P . Nara ya n , “Capacities of Time-V arying Multiple-Access Chann els With Side Information,” IEEE T r ans. Information The ory , V ol. 48, No. 1, pp . 4—25, Jan. 200 2. [20] A. Barg, p ersonal c ommunic ation , Jan . 2008. [21] R. Ahlsw ede, “An Elemen tary Pro of of the Strong Con verse Theorem for the Multiple-Access Channel,” J. Combinatorics, Information and System Sci. , V ol. 7, No. 3, pp . 216—230, 1982. [22] T. S. Han, “Nonnegativ e E ntrop y Me asur es of Multiv ariate Symmetric Correlatio ns,” Inf or- mation and Contr ol , V ol. 36, No. 2, pp . 133—15 6, 1978. [23] Y.-S. Liu and B. L. Hughes, “A n ew u niv ersal random coding b ound f or th e m ultiple-access c hannel,” IEEE T r ans. Information The ory , vo l. 42, n o. 2, pp . 376–3 86, Mar. 1996. [24] A. Barg and G. D. F orn ey , “Random Codes: Minimum Distances and Error Exp onen ts,” IEE E T r ans. Information The ory , V ol. 48, No. 9, pp. 2568—2573, Sep. 2002. [25] P . Moulin, “Optimal Gaussian Fi n gerp rin t Decod ers,” Pr o c. IEEE Int. Conf. A c oustics, Sp e e ch and Signal Pr o c essing , T aip ei, T aiw an, Ap r. 2009. [26] P . Moulin and Y. W an g, “Information-Theoretic Analysis of S p herical Fin gerp rin ting,” Pr o c. Symp. on Information The ory and Applic ations , San Diego, C A, F eb. 2009. [27] J.-F. Jourdas and P . Moulin, “High-Rate Random-Lik e Spherical Fingerprinting Co des w ith Linear Decod in g Complexit y ,” IEEE T r ansactions on Information F or ensics and Se curity , V ol. 4, No. 4, pp . 768—780, Dec. 2009. [28] Y. W ang and P . Moulin, “Blind Fingerprinting,” submitted to IEEE T r ans. Inf ormation The- ory , F eb. 2008. Av ailable from arXiv:0803. 0265 [cs.IT] [29] E. Amiri and G. T ard os, “High Rate Fingerprin ting Co des and the Fingerprin ting Capacit y ,” Pr o c. 20th Annual A CM- SIA M Symp osium on Discr ete Algorithms , Ne w Y ork, NY, Jan. 2009. [30] Y.-W. Huang and P . Moulin, “Saddle-P oint Solution of the Fingerpr in ting Capacit y Game Under the Ma rkin g Assumption,” Pr o c. IEEE Int. Symp . on Infor mation The ory , S eoul, K orea, July 2009 . [31] Y.-W. Huang and P . Moulin, “Capacit y-Ac hieving Fingerprint Decoding,” Pr o c . 1st IE EE Workshop on Information F or ensics and Se cu rity , Lond on, UK, Dec. 2009 . [32] T. F uron and L. P´ erez-F reire, “W orst C ase A ttac ks Against Binary Probabilistic T raitor T rac- ing Co des,” P r o c. 1st IEEE Worksh op on Informat ion F or ensics and Se curity , London, UK, Dec. 2009 . [33] N. P . Anthapadmanabhan and A. Barg, “Two-Lev el Fingerprin ting Co des,” Pr o c. IEEE Int. Symp. on Information The ory , Seoul, Korea, July 2009. 69
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment