Optimal Watermark Embedding and Detection Strategies Under Limited Detection Resources

Optimal W atermark Em b edding and Detection Strategies Under Limited Detection Resources ∗ Neri Merha v and Erez Sabbag Octob er 26, 2018 Department of E lectrical Engineering T ec hnion - Isra el Institute of T ec hno logy T ec hnion Cit y , Haifa 32 000, Israel { merh av@ee, erezs@ tx } .techn ion .ac.il Abstract An information–theoretic app roa c h is prop osed to w atermark embedding and d etectio n under limited detector resources. First, w e consider the attac k-free scenario under which as ymptotically op- timal d ecision regions in the Ney man-P earson sense are p roposed, along with th e optimal embed d ing rule. Later, we exp lo re the case of zero-mean i.i. d. Gaussian co vertext distribution with u nkno wn v ariance un der the attac k-free scenario. F or this case, we p ropose a lo w er b ound on the exp onential deca y rate of the false-negativ e probability and p ro ve that the optimal embedd ing and detecting strategy is sup erior to the customary linear, additive em b edding strategy in the exp onentia l sense. Finally , these results are extended t o th e case of memoryless attacks and general worst case attac ks. Optimal decision regions and embed ding rules are oﬀered, and the w orst attack channel is identiﬁed. 1 In tro duction The ﬁeld o f infor mation embedding and watermarking has become a very active ﬁeld of research in the last decade, b oth in the academic communit y and in the industry , due to the need of protecting the v ast amount of digita l information av ailable ov er the Int ernet and other da ta s torage media a nd devices (see, e.g.,[1]–[4]). W atermarking (WM) is a form of embedding information secretly in a host da ta set (e.g., image, audio sig na l, vide o , etc.). In this work, we raise and e xamine certain fundamental q uestions with regar d to customary metho ds of embedding and detection and suggest s ome new idea s for the mo st ba sic setup. Consider the system depicted in Fig. 1: Let x =  x 1 , . . . , x n  denote a covertext seque nce emitted from a memoryless s o urce P X , and let u =  u 1 , . . . , u n  denote a w atermark sequence a v ailable at the embedder and at the detector. O ur w or k fo cuses on ﬁnding the optimal embedding and detection r ules for the following bina ry hypothesis problem: under h yp othesis H 1 , the stegotext seq uenc e y =  y 1 , . . . , y n  is “watermarked” us ing the embedder y = f n ( x , u ), while under H 0 , y = x , i.e, the stego text sequence in not “watermarked”. An attack channel W n ( z | y ), fed by the s tegotext, pro duces a forg ery z , which in turn, is obs e rv ed by the detecto r. Now, given the forgery sequence z a nd the watermark sequence u , the ∗ This r esea rch was supported by the Isr ae l Science F oundation (grant no. 223/05) . 1 detector nee ds to dec ide whether the forg ery is “watermarked” or not. Performance is ev a luated under the Neyman-Pearson criterion, namely , minim um fals e detection probability while the false alarm probability is kept low er than a prescrib ed level. The pr oblem is a ddressed under diﬀere nt s tatistical assumptions: the cov e rtext dis tribution is kno wn or unknown to the em b edder/detector , the attack channel is known to be a memoryless a ttac k o r it is a gener al attack channel, and the watermark sequence is deter ministic or rando m. Embedder Detector Attack P S f r a g r e p la c e m e n t s x =  x 1 , x 2 , . . . , x n  u =  u 1 , u 2 , . . . , u n  y z W n ( z | y ) f n ( x , u )  H 0 , H 1  H 0 H 1 Figure 1 : The watermarking and detection problem. Surprisingly , this problem did not receiv e m uch atten tion in the information theory comm unity . In [5], the problem of universal detectio n of messa ges via ﬁnite state channel was consider ed, and a n optimal decision rule was prop osed for deciding whether the obse r v e d sequence is the pro duct of an unknown ﬁnite- state channel fed by one of t wo predeﬁned sequences . Liu a nd Mo ulin [6],[7] explo red the erro r exp onent of tw o p opular o ne-bit WM systems: the sprea d-spectrum s c heme and the quantized-index-modulatio n (QIM) watermarking scheme, under a gener al additive a ttac k . Bo unds and clo sed form expr essions were oﬀered fo r the er ror exp onen ts. W e note that the setting o f [6] is diﬀerent fr o m ours: her e, we ar e tr ying to ﬁnd the b est embedder g iv en detection resource under Neyman- Pearso n c riterion of optimality , while in [6], the p e rformance (the erro r exp onen t) of a given embedding schemes a nd a given sour ce distribution are ev aluated under additiv e attacks. In [8], the pro ble m of embedding/ detection was formulated under limited detection resour ces a nd the optimal decision regio n and the o ptim al embedding rule were o ﬀered to the attack-free scenario. Many r esearchers fro m the signal/imag e pro cessing co mm unity (e.g., [2],[3],[9]– [13], [1 4, Sec.4.2 ] a nd references ther ein) hav e devoted resear c h e ﬀorts to explore the problem of optimal watermark embedding and detection with one common assumption: the watermark e m b edding rule is no rmally ta ken to b e additive (linear ), i.e., the stegotext vector y is g iv en by y = x + γ u (1) 2 or multiplicativ e , where each comp onen t of y is given by y i = x i (1 + γ u i ) , i = 1 , . . . , n, (2) where in b oth c a ses, u i = ± 1, a nd the choice of γ controls the trade o ﬀ betw een quality of the stego- signal (in terms o f the distortion r elativ e to the covertext signal x ) a nd the detectability of the watermark - the “signal– to–noise” r atio. Once the linear em b edder (1) is adopted, elementary detection theory tells us that the optimal likelihoo d–ratio detector under the atta ck fr e e scena rio (i.e., z = y ), assuming a ze ro–mean, Gaus- sian, i.i.d. covertext distribution, is a co rrelation detecto r, which decides po sitiv ely ( H 1 : y = x + γ u ) if the correla tion, P n i =1 u i y i , excee ds a certain threshold, and negatively ( H 0 : y = x ) otherwise. The reason is that in this ca se, x simply plays the role of additive noise (the additiv e em b edding scheme is, in fact, the s pr ead-spectr um mo dulation technique [15] in which the c overtext is trea ted as a n additive noise). In a similar manner, the optimal test for the m ultiplicative embedder (2) is ba s ed on the diﬀerent v ariances of the y i ’s corresp onding to u i = +1 relativ e to those corresp onding to u i = − 1, the former being σ 2 x (1 + γ ) 2 , a nd the latter being σ 2 x (1 − γ ) 2 , where σ 2 x is the v ar iance of each comp onen t of x . While in classical detection theory , the additivity (1), (o r somewhat less commonly , the mult iplicativity (2)) of the nois e is part of the channel mo del, a nd hence cannot be controlled, this is not quite the case in watermark embedding, wher e one has , at lea st in pr inciple, the freedo m to des ig n a n arbitrar y embedding function y = f n ( x , u ), trading oﬀ the quality of y and the detectability of u . Clearly , for an arbitrary choice of f n , the ab o ve descr ibed detector s are no long er optimal in g eneral. Malv ar and Flor ˆ encio [16] have noticed that better p erformance can b e ga ined if γ is c hose n as a function of the watermark a nd the cov ertext. How ever, their choice do es not le a d to the optimal per formance as will be shown later . Rece ntly , F uron [1 7] explor ed the zero-bit watermark pr oblem us ing a diﬀerent setting in w hich the w a ter mark sequence is a function of the covertext and under a diﬀerent criterion of optimality . While many pap ers in the liter a ture addr essed the problem o f computing the per formance of diﬀerent embedding and detection str a tegies and plotting their receiver op erating characteris tics (ROC) for diﬀer - ent v alues of the problem dimension n (see , e.g.,[1 1],[12],[18] and r eferences therein), very few works [6],[7] deal with the optimal asymptotic behavior of the t wo kinds of error probabilities, i.e., the e x ponential decay ra te of the tw o k ind of the err or proba bilities a s n tends to inﬁnity . The problem of ﬁnding the o ptim um watermark embedder f n for reliable WM detection is not trivia l: The pro babilities of errors of the t w o kinds (false p ositive and false negative) corre sponding to the likelihoo d–ratio detector induced by a given f n , are, in general, ha r d to co mput e, and a–fortior i ha rd to optimize in closed for m. Moreover, o btaining closed form expr essions for the optimal embedder and decision reg io ns when the cov ertext distr ibution is unknown is even harder (see Section 2 for mo re details). Thu s, instead of str iving to seek the s tr ictly o ptim um embedder, we take the following approa c h: Suppo se that one w ould lik e to limit the complexity of the detector by conﬁning its decision to dep end on a given set of statistics computed fr om z and u . F or example, the energ y of z , P n i =1 z 2 i , and the correla tion P n i =1 u i z i , which a r e the suﬃcient statistics used by the ab ov e descr ib ed corre lation detector. Other po ssible statistics are those cor responding to the likelihoo d–ratio detector of (2), namely , the 3 energies P i : u i =+1 z 2 i , and P i : u i = − 1 z 2 i , and so o n. Within the cla ss of detectors based on a given set of s tatistics, we present the o ptimal (in the Neyma n-P earson sense) em bedder a nd its co rresp o nding detector for diﬀerent s ettings of the problem. First, we for m ulate the embedding a nd detection pr oblem under the a ttack free scenar io . W e devis e an asymptotically o ptimal detecto r and em bedding r ule among all detectors whic h base their decisions on the empirical joint distr ibution o f z and u . This modeling assumption, where the detecto r has access to a limited set of empir ical statistics of u and z , ha s tw o motiv a tions. First, it ena bles a fair compar is on (in terms o f detection computational r esources) to diﬀerent em bedding/ detection methods rep orted in the literature of WM in which most of the detectors use a simila r set of statistics (mostly , correla tion and e nergy) to base their decisions. Second, this approach highlights the tradeoﬀ b et ween detectio n complexity and p erformance: Ex tending the s et o f sta tis tics on which the detecto r can base its decisions, might improve the system per formance, howev er , it increases the detector’s complexity . Later, we discuss diﬀere n t asp ects of the basic proble m, namely , prac tical issues rega rding the imple- men tability of the embedder, universality w.r .t. the cov er text distribution, other detecto r ’s statistics, and the case where the watermark seque nc e is random to o. These results are obtained b y extending the tech- niques, presented in [5],[19]–[21], which are clos e ly related to universal hypo thesis testing problems. W e apply these res ults to a ze r o-mean i.i.d. Gaus s ian cov ertext distribution with unknown v a riance. W e pro- po se a closed-fo rm expre s sion for the o ptimal embedder, and s ug gest a low er b ound on the false- negativ e probability erro r expo nen t. B y analy zing the error exp onen t of the additive em be dder and using the sug- gested lower b ound, we show that the o ptimal embedder is sup erior to the customary additive embedder in the ex ponential sense. Finally , we extend these results to memoryle s s a ttac k channels and worst-case general a ttac k channels. The worst-attack c hannel is identiﬁed and optimal embedding and detection rules a r e oﬀered. The mo del o f general worst-case a ttac k channels, treated here, was alrea dy consider ed in the WM literature but in a diﬀerent co n text. In [2 2], gener al a ttac k channels w ere considered, where the c a pacit y a nd r andom-coding err or e x ponent wher e derived for the priv ate watermarking game under general a ttac k channels. In [23], the capacity of public w ater mark game under general attack channels was derived for consta n t comp osition c o des. This pap er is a further developmen t a nd a n ex tension of [8], [24] and it g iv es a detailed account for the results of [25 ]. 2 Basic Deriv ation W e b egin with some notation and deﬁnitio ns. Throughout this work, ca pital letters repres e nt scalar random v ar iables (R Vs) and sp e ciﬁc realizatio ns of them ar e deno ted by the corres p onding low ercase letters. Random vectors of dimension n will be de no ted by b o ld-face letters. The no tation 1 { A } , where A is a n even t, will de s ignate the indicator function o f A (i.e., 1 { A } = 1 if A o ccurs and 1 { A } = 0 otherwise). W e adopt the following conven tions: The minimum (maximum) of a function over an empty set is understo od to b e ∞ ( −∞ ). The notation a n . = b n , for tw o p o sitiv e sequences { a n } n ≥ 1 and { b n } n ≥ 1 , expresses a symptotic equality in the logar ithmic sca le, i.e., lim n →∞ 1 n ln  a n b n  = 0 . 4 Let the v ec to r ˆ P x =  ˆ P x ( a ) , a ∈ X  denotes the empirical distribution induced by a vector x ∈ X n , where ˆ P x ( a ) = 1 n P n i =1 1 { x i = a } . The type clas s T ( x ) is the se t of vectors ˜ x ∈ X n such that ˆ P ˜ x = ˆ P x . Similarly , the joint empirical distr ibution induced by ( x , y ) ∈ X n × Y n is the vector: ˆ P xy = n ˆ P xy ( a, b ) , a ∈ X , b ∈ Y o , (3) where ˆ P xy ( a, b ) = 1 n n X i =1 1  x i = a, y i = b  , x ∈ X , y ∈ Y , (4) i.e., ˆ P xy ( a, b ) is the rela tiv e fr equency of the pair ( a, b ) a long the pa ir sequence ( x , y ). Likewise, the type class T ( x , y ) is the set o f all pairs ( ˜ x , ˜ y ) ∈ X n × Y n such that ˆ P ˜ x ˜ y = ˆ P xy . The co nditional type class T ( y | x ), for given vectors x ∈ X n , and y ∈ Y n is the set of a ll vectors ˜ y ∈ Y n such that T ( x , ˜ y ) = T ( x , y ). W e denote by ˆ E xy ( · ) e x pectation with re s pect to empirical joint distribution ˆ P xy . The Kullback-Leibler divergence b et ween tw o distributions P and Q on A , where |A| < ∞ is deﬁned as D ( P k Q ) = X a ∈A P ( a ) ln P ( a ) Q ( a ) , with the conv en tions that 0 ln 0 = 0, and p ln p 0 = ∞ if p > 0. W e denote the empirical entrop y of a vector x ∈ X n by ˆ H x ( X ), whe r e ˆ H x ( X ) = − X a ∈X ˆ P x ( a ) ln ˆ P x ( a ) . Other information theor etic quantities g o verned by empirical distributions (e.g., conditional empirica l ent ropy , empiric al mutual infor mation) will b e denoted similarly . F or tw o vectors, a , b ∈ R n , the Euclidean inner pro duct is deﬁned as h a , b i = P n i =1 a i · b i and the L 2 -norm of a vector is deﬁned as k a k = p h a , a i . Let V ol { A } denote the v olume of a set A ⊂ I R n , i.e., V ol { A } = R A d x . W e denote by sg n( · ) the signum function, where sgn( x ) = 1 { x ≥ 0 } − 1 { x < 0 } . Throughout this pap er, and without essential loss of genera lit y , we ass ume that the co mp onents of x , y , a nd z all take on v alues in the s ame ﬁnite alphabet A . In Section 4, the a ssumption that A is ﬁnite will be dropp ed, and A will b e allo wed to be an inﬁnite set, lik e the real line. The components of the watermark u will alw a ys take o n v a lues in B = {− 1 , +1 } , as mentioned e arlier. Let us further assume that x is drawn from a given memoryle s s sour c e P X . Throughout the sequel, until Section 5 (exclusively), we a ssume that there is no attack, i.e., the channel W n ( z | y ) is the identit y channel: W n ( z | y ) =  1 , z = y 0 , else . This is r eferred to as the attack-fr e e s c enario. In this scenario, the detector will use y and u to bas e its decisions. F or a given u ∈ B n , we w ould like to devise a dec ision rule that partitions the space A n of sequences { y } , observed by the detector, in to tw o complementary r egions, Λ a nd Λ c , such that fo r y ∈ Λ, we decide in fav o r of H 1 (w atermark u is pr esen t) and for y ∈ Λ c , we decide in fav or of H 0 (w atermark absent: 5 y = x ). Consider the Neyman- P e arson criter ion of minimizing the fals e neg ativ e pr obabilit y P f n = X x : f n ( x , u ) ∈ Λ c P X ( x ) (5) sub ject to the following co nstrain ts: (1) Given a ce rtain distor tion measur e d e ( · , · ) and distor tion level D e , the distortion b et w een x and y , d e ( x , y ) = d e  x , f n ( x , u )  , do es not exceed nD e . (2) The false p ositive probability is upp er b ounded by P f p ∆ = X y ∈ Λ P X ( y ) ≤ e − λn , (6) where λ > 0 is a prescrib ed constant. In other words, we would like to choose f n and Λ so as to minimize P f n sub ject to a distortion constraint and the constraint that the exp onen tial decay rate o f P f p would b e at least a s larg e a s λ . Clearly , the problem is a classic a l hypothesis problem (under the Neyman-Pearson cr iterion of opti- mality), with the following hypotheses: H 0 : y = x (the cov ertext is no t “marked”) and H 1 : y = f n ( x , u ) (the cov ertext is “marked”). Given f n and u , we ca n de ﬁne the co nditio nal distribution o f y given the t wo hypotheses: P ( y | H 0 ) = P X ( y ) , P ( y | H 1 ) = X x : f n ( x , u )= y P X ( x ) . where P X ( x ) is the covertext distribution. The optimal test which minimizes the fals e-negative probability under the Neyman-Pearson criterion o f optimality is the likelihoo d r atio test (LR T) [26, p. 34]: L ( y ) = P ( y | H 1 ) P ( y | H 0 ) H 1 > < H 0 η where η is chosen such tha t P f p  f n , u  = X y : L ( y ) ≥ η P X ( y ) = e − nλ . (7) Note that η is a fu nction o f λ , f n and u , therefore, we could not ﬁnd a clo sed-form expr ession for η for any g eneral embedding rule and watermark seq uence. The fals e-negative pro babilit y a ssocia ted with the ab o v e o ptimal test is given by P f n ( f n , λ, u ) = X y : L ( y ) <η X x : f n ( x , u )= y P X ( x ) . (8) Now, given a distortion level D e measured using a distortion function d e ( · , · ), we would like to devise a n embedder f n which minimizes the false- negativ e pro babilit y while the distortio n b et ween the cov er text x and the stegotext y do es not exceed nD e and the false-p ositive probability is kept lower than e − nλ , i.e., f ∗ n = a rg min f n : d e ( x , f n ( x , u )) ≤ nD e , ∀ x P f p ( f n , u ) ≤ e − nλ P f n ( f n , λ ) . (9) 6 The a bov e general pro blem of ﬁnding the optimal embedding r ule and detection re gions is b y no means trivial. The fact that the proba bilities of the tw o kinds of erro r c a nnot be expres s ed in a close for m make it very hard to solve this o ptimization problem and, a s far as we know, there is no known solutio n for it. Moreov er, obtaining clo sed form expr essions for the o ptimal embedder and decisio n reg ions when P X is unknown is even ha rder. W e ther efore make an additiona l assumption regar ding the s ta tistics employed by the detector . Sup- po se that we limit ourselves to the class of a ll detectors which ba se their decisions on certain empirica l statistics asso ciated with u and y , for exa mple, the empirical join t distribution of y and u , i.e., ˆ P uy . Note that the requir e ment that the decision of the detec to r dep ends solely on ˆ P uy means that Λ a nd Λ c are unions of co nditional t ype clas ses of y giv e n u . It may seem, a t a ﬁrst glance, that the sequence u is sup erﬂuous in the deﬁnition of the pro blem, since it is av ailable to all leg itimate parities. Ho wever, the presence of the watermark sequence u a t the detector provides the detecto r with a reﬁned version of the statistics of its input (based on the joint empirical statistics of y and u ) a nd can b e reg a rded as a secr et key shar e d by b oth legitimate s ides. This additional infor mation at the detector impr oves the ov erall p erformance of the system. F or a given λ > 0, deﬁne Λ ∗ = n y : ln P X ( y ) + n ˆ H uy ( Y | U ) + λn − |A| ln( n + 1) ≤ 0 o . (10) The following theorem asserts that Λ ∗ is a symptotically optimal de c ision region: Theorem 1. (i) P f p (Λ ∗ ) ≤ e − n ( λ − δ n ) wher e lim n →∞ δ n = 0 . (ii) F or every Λ ⊆ A n that satisﬁes P f p (Λ) ≤ e − nλ ′ for some λ ′ > λ , we have Λ c ∗ ⊆ Λ c for al l suﬃciently lar ge n . In the ab o v e theor em it is argued that Λ ∗ fulﬁlls the false-p ositive constraint while minimizes the false-negative probability , i.e., for any decision r egion Λ which fulﬁlls the false- positive co nstrain t and for any embedding rule f n ( x , u ) the following holds P f n (Λ c ∗ ) ≤ P f n (Λ c ) . (11) Pr o of. Let T ( y | u ) ⊆ Λ . Then, w e hav e e − λn ≥ X y ′ ∈ Λ P X ( y ′ ) ≥ X y ′ ∈ T ( y | u ) P X ( y ′ ) ≥ | T ( y | u ) | · P X ( y ) ≥ ( n + 1) −|A| e n ˆ H uy ( Y | U ) · P X ( y ) , (12) where the ﬁr st inequality is by the ass umed false p ositive constraint, the seco nd inequality is since T ( y | u ) ⊆ Λ, and the third inequa lit y is due to the fact that all sequences within T ( y | u ) are equipr o bable under P X as they all ha ve the sa me empirical distribution, which forms the s uﬃcie nt statistics for the 7 memoryless sour ce P X . In the four th inequality , we us e the well known lower b ound o n the cardinality of a conditional type class in terms of the empirica l co nditional en tropy [27], deﬁned as: ˆ H uy ( Y | U ) = − X u,y ˆ P uy ( u, y ) ln ˆ P uy ( y | u ) , (13) where ˆ P uy ( y | u ) is the empir ical conditional proba bilit y o f Y given U . W e hav e actually shown that every T ( y | u ) in Λ is a lso in Λ ∗ , in other words, if Λ satisﬁes the false p ositiv e constraint (6), it must b e a subset of Λ ∗ . This mea ns that Λ c ∗ ⊆ Λ c and so the proba bilit y of Λ c ∗ is smaller than the pr obabilit y of Λ c , i.e., Λ c ∗ minimizes P f n among all Λ c corres p onding to detecto r s that sa tisfy (6). T o establish the asy mptotic optimality of Λ ∗ , it remains to show that Λ ∗ itself has a false po sitiv e exp onen t at least λ , which is very easy to s ho w using the tec hniques of [5, eq. (6)] a nd r eferences ther ein. Therefore, we will not include the pr o of o f this fac t here. Finally , note als o that Λ ∗ bases its decisio n so lely on ˆ P uy , a s r equired. While this solves the problem of the optimal detector for a given f n , we still hav e to sp ecify the optimal embedder f ∗ n . Deﬁning Γ c ∗ ( f n ) to b e the inv er se image of Λ c ∗ given u , i.e., Γ c ∗ ( f n ) = n x : f n ( x , u ) ∈ Λ c ∗ o = n x : ln P X ( f n ( x , u )) + n ˆ H u ,f n ( x , u ) ( Y | U ) + λn − |A| ln( n + 1) > 0 o , (14) then following eq. (5), P f n can b e expr essed as P f n = X x ∈ Γ c ∗ ( f n ) P X ( x ) . (15) Consider now the fo llo wing embedder: f ∗ n ( x , u ) = argmin y : d e ( x , y ) ≤ nD e h ln P X ( y ) + n ˆ H uy ( Y | U ) i , (16) where ties are resolv ed in a n arbitra ry fashion. Then, it is clea r by deﬁnition, that Γ c ∗ ( f ∗ n ) ⊆ Γ c ∗ ( f n ) for any other comp eting f n that satisﬁes the distortion co nstrain t, and th us f ∗ n minimizes P f n sub ject to the constraints. 3 Discussion In this section, we pause to disc uss a few imp ortant asp ects of our bas ic results, as well as p ossible mo diﬁcations that mig h t b e o f theoretical and prac tica l interest. 3.1 Implemen tabilit y of the Em b edder (16) The ﬁrs t impression might b e that the minimization in (16) is prohibitively co mplex as it app ears to require an exhaustiv e search over the s pher e { y : d e ( x , y ) ≤ nD e } , who se complexity is expo nen tial in n . A c lo ser lo ok, how e v er , reveals that the situation is not that bad. Note that for a memoryless source P X , ln P X ( y ) = − n h ˆ H y ( Y ) + D ( ˆ P y k P X ) i , (17) 8 where ˆ H y ( Y ) is the empirical entrop y of y and D ( ˆ P y k P X ) is the divergence b etw een the empirical distribution of y , ˆ P y , a nd the sour ce P X . Mo reo ver, if d e ( · , · ) is an a dditiv e distor tion measur e, i.e., d e ( x , y ) = P n i =1 d e ( x i , y i ), then d e ( x , y ) /n can be repres en ted as the exp ected distortion with resp ect to the empir ical distribution of x and y , ˆ P xy . Thu s, the minimization in (16) b ecomes eq uiv alen t to maximizing [ ˆ I uy ( U ; Y ) + D ( ˆ P y k P X )] sub ject to ˆ E xy d e ( X, Y ) ≤ D e , where ˆ I uy ( U ; Y ) deno tes the empirical mutual information induced by the joint empirica l distributio n ˆ P uy and ˆ E xy denotes the aforementioned expec ta tion with r e spect to ˆ P xy . Now, observe that for given x and u , b o th [ ˆ I uy ( U ; Y ) + D ( ˆ P y k P X )] and ˆ E xy d e ( X, Y ) ≤ D e depe nd on y o nly via its conditional type class given ( x , u ), namely , the conditional empirica l distribution ˆ P uxy ( y | x, u ). Once the optimal ˆ P uxy ( y | x, u ) has b een found, it do es no t matter which v ector y is chosen from the corr esponding conditional type class T ( y | x , u ). Therefore, the optimizatio n acros s n – v ecto rs in (16) b oils down to optimiza tion ov er empirica l conditional distributions, and s ince the total num ber o f empirica l conditional distributions of n – v e c tors increases only po lynomially with n , the search co mplexit y r educes from exp onent ial to p olynomial a s w ell. In practice, one may not p erform such an exhaustive search ov er the dis c r ete set of empirical distr ibutions, but apply an optimizatio n pro cedure in the co n tinuous space of conditional distributions { P ( y | x, u ) } (and then approximate the solution b y the closest feasible empirical distribution). A t an y rate, this optimization pro cedure is ca rried o ut in a s pace of ﬁxe d dimension, that do es not gr o w with n . 3.2 Univ ersalit y in the Co v ertext Distr ibu tion Thu s far we have assumed that the distribution P X is known. In pr actice, even if it is ﬁne to assume a certain mo del class, like the mo del of a memor yless source, the assumption that the exact par a meters of P X are known is ra ther questiona ble. Supp ose then that P X is known to b e memor yless but is o therwise unknown. How should we mo dify our results? Fir st obser v e , that it would then ma ke sense to insist on the constraint (6) for every memor yless so urce, to b e on the s afe side. In other words, eq. (6) would b e replaced by max P X X y ∈ Λ P X ( y ) ≤ e − λn , (18) where the max imization ov e r P X is ac r oss all memor yless s ources with alphab et A . It is then easy to see that our ea r lier deriv ation go es through as b efore except that P X ( y ) should b e re placed by max P X P X ( y ) in all places (see also [5]). Since ln max P X P X ( y ) = − n ˆ H y ( Y ), this means that the mo diﬁed version of Λ ∗ compares the empirical mutual informa tion ˆ I uy ( U ; Y ) to the threshold λn − |A| ln ( n + 1) (the divergence ter m now disa pp ears). By the s ame token, and in light of the discussio n in the previous paragr aph, the mo diﬁed version o f the o ptim al embedder (16) maximizes ˆ I uy ( U ; Y ) s ub ject to the distortion constr a in t. Bo th the embedding r ule and the detection rule ar e then based on the idea o f maximum mu t ual information , which is intuit ively a ppealing. F or more on this idea and its use as a universal deco ding r ule see [27, Sec. 2 .5]. 3.3 Other Detect or St atistics In the previous se c tion, we fo cused on the cla ss of detectors that base their decision on the empirical joint distribution of pairs of letters { ( u, y ) } . What a bout class es of detectors that bas e their decisio ns 9 on larg er (and mor e reﬁned) sets of statistics ? It tur ns out that such extensio ns ar e p ossible as long a s we a re able to assess th e cardinalit y of the cor responding conditio nal t yp e class. F or exa mple, suppo se that the stegotex t is susp e cted to under go a desynchronization attack that cy clically shifts the data by k po sitions, where k lies in some uncer tain ty region, say , {− K , − K + 1 , . . . , − 1 , 0 , 1 , . . . , K } . Then, it would make sense to allow the detector dep end on the joint distr ibution of 2 K + 2 vectors: y , u , and all the 2 K corres p onding cyclic s hift s of u . Our ear lier a na lysis will carr y over provided that the ab ov e deﬁnition of ˆ H uy ( Y | U ) would b e replaced the conditiona l empir ic al entrop y of y given u a nd all its cyclic shifts. This is diﬀerent from the ex haustiv e search (ES) approach (see, e.g., [28]) to co nfr on t such desynchronization attacks. Note, how ever, that this works as lo ng as K is ﬁxed and do es no t gr o w with n . 3.4 Random W atermarks Thu s far, our mo del assumption was that x emer g es from a probabilistic source P X , wher eas the water- mark u is ﬁxed, and henc e can b e tho ug h t of as being deterministic. Another pos s ible setting a s sumes that u is r andom as w ell, in particular, b eing dra wn from another sour ce P U , independently o f x , nor- mally , the binar y symmetric so urce (BSS). This s ituation may arise, for exa mple, when security is an issue and then the watermark is e ncr ypted. In such a cas e, the randomness of u is induced b y the randomness of the k ey . Here, the decision regions Λ and Λ c will be deﬁned a s subsets of A n × B n and the probabilities of er r ors P f n and P f p will b e deﬁned, of course , as the corres p onding summations of pro ducts P X ( x ) P U ( u ). The fact that u is emitted from a memor yless s o urce with a known distribution, makes this mo del weaker compared to the mo del treated ab o v e in which u is an individua l sequence. Although this model is somewhat weak e r , it can b e a nalyzed for more genera l cla sses of detectors. This is b ecause the role of the conditional type class T ( y | u ) would b e repla c ed b y the joint type class T ( u , y ), namely , the set of a ll p airs of s equences { ( u ′ , y ′ ) } that hav e the same e mpir ical distribution as ( u , y ) (as opp osed to the conditional type cla ss which is deﬁned as the set of a ll such y ’s for a given u ). Thus, the corres p onding version of Λ ∗ would b e Λ ∗ = n ( u , y ) : ln P X ( y ) + ln P U ( u ) + n ˆ H uy ( U, Y ) + λn − |A| ln( n + 1) ≤ 0 o , (19) where ˆ H uy ( U, Y ) is the empirical joint entrop y induced by ( u , y ), a nd the deriv a tion o f the optimal embedder is ac c ordingly . 1 The adv antage of this model, alb eit somewhat weaker, is that it is easier to assess | T ( u , y ) | in mo re general situations tha n it is fo r | T ( y | u ) | . F or example, if x is a ﬁrst or der Markov source, rather than i.i.d., and one is then naturally int erested in the statistics formed b y the frequency counts of triples { u i = u , y i = y , y i − 1 = y ′ } , then there is no known e xpression for the cardina lit y of the cor responding conditional type class, but it is still p ossible to assess th e size of the joint type class in terms of the empirical ﬁrst-o rder Markov en tropy of the pairs { ( u i , y i ) } . Another exa mple for the diﬀerences b et w een rando m watermark and deterministic watermark can b e seen in Section 6. It sho uld be als o p ointed out that once u is assumed random (s ay , dr a wn from a BSS), it is p ossible to devis e a decisio n rule that is asymptotically optimum for a n individual cov ertext sequence, i.e., to dro p the assumption that x emerges from a pr obabilistic sour ce of a known mo del. The resulting decision 1 Note that in the unive rsal case (where b ot h P X and P U are unknown ), this leads again to the same empir ical mut ual information detector as b efore. 10 rule, o btained us ing a similar technique, accepts H 1 whenever ˆ H uy ( U | Y ) ≤ 1 − λ , and the embedder minimizes ˆ H uy ( U | Y ) sub ject to the disto rtion constraint ac c ordingly . 4 Con tin u ous Alphab et – the Gaussian Case In the prev ious s ections, we cons idered, for con v enience, the s imple case where the components of b oth x and y take on v a lues in a ﬁnite alpha b et. It is more common and more natural, ho w ever, to mo del x and y as vectors in I R n . Bey ond the fact that, summations should be replace d b y integrals, in the analysis of the previous sectio n, this requires , in genera l, an extensio n of the metho d o f types [27], used ab o v e, to vectors with re a l–v a lued comp onen ts (see, e.g., [2 9],[30 ],[31]). In a nutshell, a co nditional t ype class, in such a case, is the se t of all y –v ectors in I R n whose jo in t suﬃcient statistics with u have (within inﬁnitesimally small tolerance) prescr ib ed v alues, a nd to hav e a parallel ana lysis to tha t of the previous section, we hav e to b e able to assess the exp onen tial order o f the volume of the conditiona l type clas s . Suppo se t hat x is a zero–mean Ga ussian vector whose co v a riance matrix is σ 2 I , I b eing the n × n ident it y matrix, and σ 2 is unknown (cf. Subsection 3.2). Let us supp ose also that the statistics to b e employ ed by the detector are the energy of P n i =1 y 2 i and the correla tion P n i =1 u i y i . These assumptions a re the same as in man y theoretical pa p ers in the literature of watermark detection. Then, the conditional empirical entropy ˆ H uy ( Y | U ) should be replaced b y the empir ical diﬀerential entropy ˆ h uy ( Y | U ), giv en by [30]: ˆ h uy ( Y | U ) = 1 2 ln " 2 π e · min β 1 n n X i =1 ( y i − β u i ) 2 !# = 1 2 ln " 2 π e 1 n n X i =1 y 2 i − ( 1 n P n i =1 u i y i ) 2 1 n P n i =1 u 2 i !# = 1 2 ln " 2 π e 1 n n X i =1 y 2 i − ( 1 n n X i =1 u i y i ) 2 !# . (20) The justiﬁcation of eq. (20) is as follows: F or a g iv en ǫ > 0 deﬁne the set T ǫ ( y | u ) = ( ˜ y ∈ R n :    n X i =1 y 2 i − n X i =1 ˜ y 2 i    ≤ nǫ,    n X i =1 y i u i − n X i =1 ˜ y i u i    ≤ nǫ ) . (21) Similarly a s in Lemma 3 [30], it c an b e s ho wn that lim ǫ → 0 lim n →∞ 1 n ln h V ol  T ǫ ( y | u )  i = ˆ h uy ( Y | U ) . (22) T o see this, deﬁne an a ux iliary c hannel y = β u + z , where z ∼ N  0 , σ 2 z I  (this c hannel is used only to ev aluate V ol { T ǫ ( y | u ) } and is not r elated to the a ctual distribution of y given u , see [30, p. 126 2]). By tuning the para meters β a nd σ 2 z such that the e xpectations of 1 n P n i =1 y 2 i and 1 n P n i =1 y i u i would be 1 n P n i =1 ˜ y 2 i and 1 n P n i =1 ˜ y i u i , resp ectiv ely , the s et T ǫ ( y | u ) has a high probability under the auxiliary channel given u . Moreov er, a n y t wo vectors in T ǫ ( y | u ) ha ve conditional pdf ’s which ar e exp onen tially equiv alent. Accordingly , us ing the sa me technique as in the pr oof of Lemma 3 in [3 0 , p. 1268 ] (which is based on these observ a tion) we der iv e an upp er and a lower bo und on V ol  T ǫ ( y | u )  . These bounds are 11 ident ical in the logarithmic sca le , and so, V ol  T ǫ ( y | u )  . = e n  ˆ h uy ( Y | U )+∆( ǫ )  , (23) and lim ǫ → 0 ∆( ǫ ) = 0. Note that the order in whic h the limits a re taken in (22) is impor ta n t: W e ﬁrst take the dimension n to inﬁnity , and o nly then we take ǫ to zero . Mathematica lly sp eaking, if ǫ go es to zer o for a ﬁnite dimension n the volume of T ǫ ( y | u ) equals zer o. The order of the limits has a practical mea ning to o. The fact that ǫ is p ositiv e for any g iv en dimension means that the detector ca n calc ulate the cor relation and energy with limited precisio n. In the absence of such a realistic limitation, one c a n oﬀer an embedding rule (under the attack-free cas e and for contin uous alphabet) with zero false-negative a nd false-p ositive probabilities by desig ning an embedder with a range having measure zero 2 . This additional limitatio n that we implicitly impo se o n the detec to r, is very natural and it exists in every practical system. Using the same technique us e d to ev alua te ˆ h uy ( Y | U ) in (20), it can easily b e s ho wn that lim ǫ → 0 lim n →∞ 1 n ln h V ol { T ǫ ( y ) } i = 1 2 ln 2 π e · 1 n n X i =1 y 2 i ! △ = ˆ h y ( Y ) , (24) where T ǫ ( y ) = ( ˜ y ∈ R n : | n X i =1 y 2 i − n X i =1 ˜ y 2 i | ≤ nǫ ) . (25) Therefore, the optimal embedder ma x imizes ˆ I uy ( U ; Y ) = − 1 2 ln  1 − ( 1 n P n i =1 u i y i ) 2 1 n P n i =1 y 2 i  . (26) or, equiv alently , 3 maximizes R ( u , y ) △ = h u , y i 2 k y k 2 (27) sub ject to the distortion constr a in t, which in this case, will naturally be taken to b e E uclidean, P n i =1 ( x i − y i ) 2 ≤ nD e . While our discussio n in Subsection 3 .1, reg arding optimization over conditiona l distributions, do es not apply directly to the co n tinuous case cons ide r ed here, it ca n s till b e represented a s optimizatio n ov er a ﬁnite dimensiona l space whose dimensio n is ﬁxed, indep endent ly of n . In fact, this ﬁxed dimension is 2, as is implied by the next lemma. Lemma 1. The optimal emb e dding r u le under the ab ove set t ing has the fol lowing form: f ∗ n ( x , u ) = a x + b u . (28) 2 E.g., the spread-transfor m dither mo dulation (STDM) embedder prop osed in [32, Sec. V.B] achiev es zero f alse-negat iv e probability under the attac k-fr ee scenario b ecause the em bedder range has measure zero. W e thank M. Barni for dr awing our atten tion to this fact. 3 Note also that the corresp on ding detector, which compares ˆ I u y ( U ; Y ) to a threshold, is equiv alen t to a correlation detect or, whi c h compares the (absolute) correlation to a threshold that depends on the energy of y , rather than a ﬁxed threshold (see, e.g., [28]). 12 Pr o of. Clear ly , every y ∈ I R n can b e re pr esen ted as y = a x + b u + z , where a a nd b ar e real v alued co eﬃcien ts and z is orthogo nal to bo th x and u (i.e., h u , z i = h x , z i = 0). Now, for any given y = a x + b u + z such that z 6 = 0, the vector pro jected onto the subspa ce spanned by x and u , ˜ y = a x + b u , achiev e s a higher squared normalized correlation w.r.t. u than the v ecto r y . T o see this, conside r the following chain of inequalities: R ( u , y ) = h u , y i 2 k y k 2 = h u , a x + b u + z i 2 h a x + b u + z , a x + b u + z i = h u , a x + b u i 2 k a x + b u k 2 + k z k 2 ≤ R ( u , ˜ y ) . (29) In addition, if y fulﬁlls the distor tio n co nstrain t, then so doe s the pr o jected vector ˜ y , i.e., k y − x k 2 = k ( a − 1 ) x + b u + z k 2 = k ( a − 1 ) x + b u k 2 + k z k 2 ≥ k ( a − 1 ) x + b u k 2 = k ˜ y − x k 2 . (30) Therefore, the optimal embedder m ust hav e the form y = a x + b u . In summar y , given any y that satisﬁes the distortion constrain t, b y pro jecting y o n to the subspace spanned b y x and u , we improve the co rrelation without violating the distortion co nstrain t. Upo n manipulating this optimization problem, by tak ing adv a ntage of its specia l structure, one can further reduce its dimensiona lit y and transform it into a s earc h over one pa rameter only (the details are in Subsection 4.1). Going back to the o pening discussion in the Intro duction, at ﬁrst glance, this seems to b e very close to the linear em bedder (1) that is so customarily used (with one additional degree o f freedom allowing also scaling of x ). A closer loo k, how ever, reveals that this is not quite the case beca use the optimal v alues of a and b dep end here on x and u (via the jo in t statistics P n i =1 x 2 i and P n i =1 u i x i ) rather than being ﬁxed. Therefore , this is not a linea r embedder. 4.1 Explicit Deriv ation of the Optimal Em b edder In this subse c tion, we present a closed-fo rm express io n for the optimal embedder . As w as s ho wn in the previous sectio n, the following optimization problem should b e solved: max "  1 n P n i =1 y i u i  2 1 n P n i =1 y 2 i # . sub ject to: n X i =1 ( y i − x i ) 2 ≤ nD e (31) 13 Substituting y = a x + b u in eq. (31), gives: max a,b ∈ R " a 2 ρ 2 + 2 abρ + b 2 a 2 α 2 + 2 abρ + b 2 # sub ject to: ( a − 1) 2 α 2 + 2( a − 1) bρ + b 2 ≤ D (32) where α 2 △ = 1 n P n i =1 x 2 i and ρ △ = 1 n P n i =1 x i u i . Note that α 2 ≥ ρ 2 by Cauch y- Schw ar z inequality . Theorem 2. The optimal values of ( a, b ) a r e: • If D e ≥ α 2 − ρ 2 : a ∗ = 0 ; b ∗ = ρ + p ρ 2 − α 2 + D (33) • If D e < α 2 − ρ 2 : a ∗ = arg max n t ( a )   a ∈ { a 1 , a 2 , a 3 , a 4 } \ R o b ∗ = a ∗ · t ( a ∗ ) (34) wher e t ( a ) = (1 − a ) ρ + sgn ( ρ ) p D e − ( a − 1 ) 2 ( α 2 − ρ 2 ) a R = " 1 − s D e α 2 − ρ 2 , 1 + s D e α 2 − ρ 2 # , (35) (36) and a 1 , 2 = ( α 2 − ρ 2 )( α 2 − D e ) ± p D ρ 2 p ( α 2 − ρ 2 )( α 2 − D e ) α 2 ( α 2 − ρ 2 ) a 3 , 4 = 1 ± s D e α 2 − ρ 2 . (37) The pro of is pure ly tec hnica l and therefore is deferred to the App endix. W e note that in the case where D e ≪ α 2 − ρ 2 , the v a lue of a ∗ tends to 1, and the v alue of b ∗ tends to sgn( ρ ) √ D e . Hence, the linear embedder is not optimal ev en in the case wher e D e ≪ α 2 . W e will next use the a bov e v alues to devise a low er b ound on the exp onen tia l decay ra te of the false-negative pro babilit y of the o ptimal embedder, and then compare it to an upp er b ound o n the false negative expo nen t of the linear embedder. 4.2 Lo wer Bound to the F alse Negative E rror Exp onen t of the Optimal Em- b edder Since the calculation of the exact false-ne g ativ e exp onen t of the o ptimal optimal embedder is highly non-trivial, in this subsection we der iv e a lo wer-b ound on this exponent. La ter, we show that ev en this low er b ound is by far larg er than the exp onent of the false - negativ e probability of the additive embedder. Therefore, the a dditiv e embedder is sub-optimal in terms of the exp onential decay rate o f its false negative probability . 14 The lower b ound will b e obtained by exploring the per fo rmance of a sub-optimal embedder of the form y = x + s gn ( ρ ) √ D e u , which w e name the sign emb e dder . This embedder is o btained b y setting a = 1 in (28) (note that this v a lue is in the allow able range R of a ). W e assume that X ∼ N  0 , σ 2 I  . First, we ca lculate a threshold v alue T whic h a lw ays guar an tees a fals e-positive exp onent not smaller than λ . Using the pro posed de tecto r (26), the false-p ositive probability can b e e x pressed as P f p = Pr n ˆ I uy ( U ; Y ) > T   H 0 o = Pr n ˆ ρ 2 uy > 1 − e − 2 T   H 0 o = 2 Pr n ˆ ρ uy > p 1 − e − 2 T   H 0 o where ˆ ρ uy = h u , y i k u k·k y k is the normalized correlation b et w een u and y . Because under H 0 Y = X , and bec ause of the radial symmetry of the p df of X , w e ca n co nclude that for large n [33, p. 295 ]: P f p = 2 A n ( θ ) A n ( π ) . = e n l n (sin θ ) , where A n ( θ ) 4 is the surface area of the n - dimensional s pher ical ca p cut from a unit sphere ab out the origin b y a righ t circular cone of ha lf ang le θ = arcco s  √ 1 − e − 2 T  (0 < θ ≤ π/ 2 ). Since we required that P f p ≤ e − nλ , then ln(sin θ ) must not exceed − λ , whic h means that − λ ≥ ln(sin θ ) T ≥ − 1 2 ln  1 − co s 2  arcsin( e − λ )  = λ , (38) where the last equality was obta ine d using the fact that cos  arcsin( x )  = √ 1 − x 2 . Hence, setting T = λ ensures a false p ositive pr obabilit y no t gr eater tha n e − nλ for lar ge n . Deﬁne the false- negativ e exp onent of the sign embedder E se f n △ = lim n →∞ − 1 n ln P f n (39) where the false-neg ativ e proba bilit y is given by P f n = Pr n ˆ I uy ( U ; Y ) ≤ λ   H 1 o = P r n ˆ ρ 2 uy ≤ 1 − e − 2 λ   H 1 o . (40) Theorem 3. The false-ne gative exp onent of the s ign emb e dder i s given by E se f n ( λ, D e ) = ( 0 , D e e − 2 λ 1 − e − 2 λ ≤ σ 2 1 2 h D e e − 2 λ σ 2 (1 − e − 2 λ ) − ln  D e e − 2 λ σ 2 (1 − e − 2 λ )  − 1 i , else (41) The pro of, which is mainly technical, is deferred to the Appe ndix. Let us explore some of the prop erties of E se f n ( λ, D e ). Fir st, it is clear tha t E se f n (0 , D e ) = ∞ (the detector output is constan tly H 1 ) since ˆ ρ 2 uy ≥ 0. In addition, E se f n ( λ, 0 ) = 0 ( y = x a nd therefore do es not co n tain any information on u ). F or a given D e , E se f n ( λ, D e ) = 0 for λ ≥ 1 2 ln  1 + D e σ 2  . The exact v a lue o f the o ptimal exp onent achieved when the o ptimal embedder is employ ed is to o inv olved to ca lculate. How ever, we can use some o f the proper ties of the optimal embedder to improv e the lo w er bound on the optimal exp onent. Accor ding to Theo rem 2 , in the ca se where D e ≥ α 2 − ρ 2 , 4 It is well-kno wn [33, p. 293] that A n ( θ ) = ( n − 1) π ( n − 1) / 2 Γ ( n 2 ) R θ 0 sin ( n − 2) ( ϕ ) dϕ and A n ( π ) = 2 A n ( π/ 2). 15 the optimal e mbedder can completely “ erase” the co vertext and therefor e ac hiev es a zero fals e nega tiv e probability . W e use this prop erty to improv e the p erformance by introducing sub-optimum em bedder which o utperforms the s ig n em bedder . Since D e ≥ α 2 ≥ α 2 − ρ 2 , the following em bedding r ule is obtained: y = a x + b u wher e ( a, b ) =  (0 , ρ + p ρ 2 − α 2 + D e ) , D e ≥ α 2 (1 , sgn( ρ ) √ D e ) , else . (42) This embedder , which is an improved v ersion of the sign embedder (but still sub-optimal), er ases the cov er text in the cases where D e ≥ α 2 (to keep the embedding rule a function o f one para meter, we chose to “er ase” the cov e rtext only if D e ≥ α 2 ). Its p erformance is presented in the following Coro llary: Corollary 1. F or λ > 1 2 ln 2 , the fa lse n e gative ex p onent of the impr ove d sign emb e dder is given by : E ( λ, D e ) =  0 , D e ≤ σ 2 1 2  D e σ 2 − ln  D e σ 2  − 1  , else ; (43) otherwise, the false-ne gative exp onent e quals to E se f n ( λ, D e ) . The pro of is deferred to the App endix. The fact that the optimal embedder c an oﬀer a p ositive false - negative exp onen t for every v a lue of λ is not surpr ising due to its ability to erase the co vertext, whic h leads to zer o probability o f false-negative. Although the improv ed sign embedder can o ﬀer a tight er lower bo und, the improv ement is ma de only in the case wher e D e ≥ σ 2 (though it is not known a priori to the embedder). Nevertheless, it emphasizes the true p oten tial of the optimal embedder a nd the fact that the sign embedder is truly inferior to the optimal embedder. In Figure 2, the fa lse negative exp onen t of the sign e mbedder and the false negativ e exp onen t of the improved embedder are plotted as functions of λ for a given v a lues of D e and σ . The p oint wher e the t wo gra phs bre ak apart is λ = 1 2 ln(2). F rom t his po in t on, the improved sign embedder a chieves a ﬁxed v alue of 0 . 5( D e /σ 2 − ln( D e /σ 2 ) − 1 ). 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.5 1 1.5 2 2.5 3 P S f r a g r e p la c e m e n t s E se f n E improv ed f n λ Figure 2 : E rror ex p onents of the sign e mbedder a nd its improved version fo r σ 2 = 1 and D e = 2. 16 4.3 Comparison t o the Additiv e Em b edd er Our nex t go al is to calculate the exp onen t of the false-neg ativ e probability of the linea r additive embedder y = x + √ D e u , wher e a normalize d cor relation detector is employ ed. Again, we ﬁrst calculate a thr e s hold v alue used by the detector whic h ensures a false-p ositive pr obabilit y no t greater than e − nλ . The false po sitiv e probability is given by P f p = Pr  ˆ ρ uy > T   H 0  = Pr  h u , x i k u k · k x k > T  = A n ( θ ) A n ( π ) . = e n l n (sin θ ) , (44) where θ = arccos( T ) (0 < θ ≤ π / 2). The second eq ualit y is due t o the fact that under H 0 Y = X , and the third equality is a gain, due to the radia l s y mmetry o f the p df of X . Then, ln(sin θ ) ≤ − λ implies: T ≥ cos h arcsin  e − λ i = p 1 − e − 2 λ , (45) and therefore, letting T = √ 1 − e − 2 λ ensures a false-p ositive probability exp onentially not gre a ter than e − nλ . Note that λ ≥ 0 implies that T must be non-neg ativ e. Deﬁne Ψ 1 ( r ) △ = arc cos " √ D e ( T 2 − 1) + T p r − D e (1 − T 2 ) √ r # (46) and deﬁne the false-negative e x ponent o f the additive embedder E ae f n △ = lim n →∞ − 1 n ln P f n , (47) where the false-neg ativ e proba bilit y is given by P f n = Pr n ˆ ρ uy ≤ p 1 − e − 2 λ   H 1 o . (48) Theorem 4. The false ne gative exp onent of the additive emb e dder is given by E ae f n ( λ, D e ) = min  E 1 ( λ, D e ) , E 2 ( λ, D e )  (49) wher e, E 1 ( λ, D e ) = min D e e − 2 λ σ 2 and Let us examine some of the prop erties of E ae f n ( λ, D e ). It is ea sy to see that E ae f n ( λ, D e ) ≤ E 2 ( λ, D e ) = E se f n ( λ, D e ), i.e., the upp er b ound on the a dditiv e embedder exp onent serves as a low er b ound o n the optimal-embedder exp onen t. It is c lear that E ae f n ( λ, 0 ) = 0 since E ae f n ( λ, 0 ) ≤ E se f n ( λ, 0 ) = 0 . In con tra st to the sign embedder , it turns out that E ae f n (0 , D e ) < ∞ . T o s ee why this is the cas e let us lo ok at E 1 (0 , D e ) = min r >D e f ( r ) (51) 17 where f ( r ) = 1 2  r σ 2 − ln  r σ 2  − 2 ln sin  Ψ 1 ( r )  − 1  . Now, since f ( r ) is ﬁnite for r > D e , the minim um v alue of f ( r ) m ust b e ﬁnite to o. This is the case where the threshold v a lue equa ls to zero and the probability that there is an em be dded vector Y with negative cor relation to u is not zero. Clearly , for a given D e , E ae f n ( λ, D e ) = 0 for λ ≥ 1 2 ln  1 + D e σ 2  . Numerica l ca lculations s ho w that this happ ens even for smaller v alues o f λ , how ev er, the exact s mallest v alue of λ for which E ae f n ( λ, D e ) = 0 is hard to ﬁnd. In Figures 3, 4 and 5 w e co mpare the t wo embedding strategies by plotting their exponents as a functions of σ 2 /D e . 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 P S f r a g r e p la c e m e n t s E ae f n E se f n λ Figure 3 : E rror ex p onents of the tw o embedding strategies ( σ 2 /D = . 1) 0.05 0.1 0.15 0.2 0.25 0.3 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 P S f r a g r e p la c e m e n t s E ae f n E se f n λ Figure 4 : Erro r exp onen ts of the tw o embedding s trategies ( σ 2 /D = 1 ) 18 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 P S f r a g r e p la c e m e n t s E ae f n E se f n λ Figure 5: Err or exp onen ts of the tw o embedding strategies ( σ 2 /D = 10) 4.4 Discussion When we take a closer lo ok a t the results, the fact the sign em bedder a c hieves a b etter per formance should not sur prise us. Clearly , when the cor relation b et w een x and u is non-negative, the additiv e em bedder and the sign embedder a c hieve the s ame p erformance. Howev er , when the correlation betw een x and u is nega tive (this happ ens in pro babilit y 1 / 2 due to the r adial sy mmetry of the p df of the cov er text) this is not true anymore. In this ca s e, the additive embedder tries to ma ximize the corr elation ρ b et ween the cov er text x and the watermark u (while the detector c o mpares the normalized co r relation ˆ ρ y u betw een y and u to a given thresho ld), how ever, thes e eﬀorts are turned to the wrong direction. Contrary to the additive embedding scheme, the sign em bedder tr ies to maximize the absolute v alue of the cor relation ρ while the detector compares the absolute v alue o f the norma lized corre lation to a given threshold. In this case, the sign embedder tries to minimize the cor r elation ρ . This diﬀerence is b est exempliﬁed in the case where λ = 0. In this case, the sign em bedder achieves E se f n (0 , D e ) = ∞ while E ae f n (0 , D e ) is ﬁnite since the probability of embedded vectors Y for which ˆ ρ y u < 0 is not zer o . W e note tha t althoug h the sig n embedder is sub optimal, it a c hieves a muc h b etter p erformance than the a dditive embedder with a slight increase in its complexity which is due to the calculation of sgn( ρ ). 5 A ttac k s Let us now extend the setup to include attacks. W e ﬁrst discus s a ttac ks in gener al and then co nﬁne our attent ion to memoryless attacks. In Section 6, we will discuss g eneral worst-cas e attacks. The case of attack is c ha racterized by the fact that the input to the detector is no long er the vector y a s b efore, but another vector, z = ( z 1 , . . . , z n ), tha t is the output o f a channel fed by y , which we sha ll denote by W n ( z | y ) as is shown in Fig . 1 . F or conv enience, we will assume that the comp onen ts of z take on v alues in the same a lphabet A , which will b e assumed again to b e ﬁnite, as in Sectio ns 2 a nd 3. Thus, 19 the op eration of the attack, which in general may be sto chastic, is thought of as a channel. Denoting the channel output marg inal by Q ( z ) = P y P X ( y ) W n ( z | y ), the analysis of this cas e is, in princ iple , the same as b efore. Assuming, for exa mple, tha t Q is memoryles s (which is the case when b oth P X and W n are memo ryless, i.e., W n ( z | y ) = Q n i =1 W ( z i | y i ) for s ome discrete memor y less channel W : A → A ), then Λ ∗ is as in Section 2, except that P X , Y , and y should b e replaced by Q , Z and z , r espectively . The o ptimal embedder then b ecomes f ∗ n ( x , u ) = argmin { y : d e ( x , y ) ≤ nD e } X z ∈ Λ c ∗ W n ( z | y ) , (52) for the redeﬁned version of Λ c ∗ which is given by: Λ c ∗ = n z : ln Q ( z ) + n ˆ H z u ( Z | U ) + nλ − |A| ln( n + 1) > 0 o (53) = n z : − n ˆ I z u ( Z ; U ) − n D  ˆ P z k Q  + nλ − |A| ln( n + 1) > 0 o , (54) where ˆ P z is the empirica l distribution o f z . Ev ide ntly , eq. (52) is not a c o n venient formula to work with. Therefore, let us try to simplify (52). F or a given y , let us r ewrite (52) as follows: X z ∈ Λ c ∗ W n ( z | y ) = X T ( z | y , u ) ⊆ Λ c ∗ X z ′ ∈ T ( z | y , u ) W n ( z ′ | y ) = X T ( z | y , u ) ⊆ Λ c ∗   T ( z | y , u )   W n ( z | y ) . (55) It is easy to show that fo r a g iv en z ′ ∈ T ( z | y , u ) and a memoryless channel W n ( z | y ), the proba bility of z ′ given y is given by the following expr ession: W n ( z ′ | y ) = e − n h ˆ H y z ( Z | Y )+ P a ∈A ˆ P y ( a ) D  ˆ P y z ( Z | Y = a ) k W ( Z | Y = a )  i . (56) Using the fact that the cardina lit y of T ( z | y , u ) is given by | T ( z | y , u ) | . = e n ˆ H uy z ( Z | Y ,U ) , (57) we conclude tha t f ∗ n ( x , u ) ∈ T ∗ ( y | x , u ), where T ∗ ( y | x , u ) corresp onds to the fo llowing conditional em- pirical distribution: ˆ P ∗ uxy ( Y | X, U ) = arg max ˆ P uxy ( Y | X,U ): ˆ E xy d e ( X,Y ) ≤ D e ( min ˆ P uy z ( Z | Y ,U ): ˆ I uz ( Z ; U )+ D ( ˆ P z k Q ) ≤ λ h ˆ I uy z ( Z ; U | Y ) + X a ∈A ˆ P y ( a ) D  ˆ P y z ( Z | Y = a )   W ( Z | Y = a )  i ) (58) i.e., for a giv e n u and x , w e search for the empirical distribution ˆ P uxy ( Y | X, U ) whic h maximizes the exp onen t of the f alse nega tiv e proba bilit y dictated b y the do minating co nditional type T ( z | y , u ) in Λ c ∗ . Once the optimal empirical distribution ˆ P ∗ uxy ( Y | X, U ) has b een found, it do es not matter which vector y is chosen fr om the corresp onding co nditional t ype T ∗ ( y | x , u ). 20 6 General At tac k Channel In this section w e extend the results of the previo us sections to include genera l attack channels sub ject to a distortion cr iterion. Consider a cov ertext sequence x = ( x 1 , x 2 , . . . , x n ) ∈ X n emitted from a memoryless source P X as befo re. Let d a : Y × Z → I R + denote another b ounded single-letter disto rtion measure. An a ttac ker sub ject to distortio n level D a w.r.t. d a is a channel W n , fed by a steg otext y a nd which pr oduces a forgery z such that d a ( y , z ) △ = n X i =1 d a ( y i , z i ) ≤ nD a ∀ ( y , z ) ∈ A × A . (59) W e deno te the set of attack channels which s a tisfy (59) by W n ( D a ). F or a g iv en u , w e would like to devise a decision rule that par titions the spa c e A n of s equences { z } , observed b y the detector, into t w o complementary regio ns, Λ and Λ c , such that for z ∈ Λ, we decide in fav o r of H 1 (w atermark u is present) and for z ∈ Λ c , w e decide in fav o r of H 0 (w atermark abs e nt: y = x ). Consider the Neyman- P e arson criter ion of minimizing the worst-cas e fals e neg ativ e pro babilit y P f n △ = max W n ∈W n ( D a ) P f n  f n , Λ , W n  (60) where P f n  f n , Λ , W n  △ = X z ∈ Λ c   X y ∈A n   X x : f n ( x , u )= y P X ( x )   W n ( z | y )   , (61) and P X ( x ) = Q n i =1 P X ( x i ), sub ject to the following co nstrain ts: (1) The distortion b et ween x and y does not exceed nD e . (2) The false p ositive probability is upp er b ounded by P f p ∆ = max W n ∈W n ( D a ) P f p  Λ , W n  ≤ e − nλ , (62) where λ > 0 is a prescrib ed constant and P f p  Λ , W n  △ = X z ∈ Λ   X y ∈A n P X ( y ) W n ( z | y )   . (63) In other words, we would like to choose an e mbedder f n and a decisio n region Λ s o as to minim ize P f n sub ject to a distortion constraint (b e t ween the covertext a nd the s tegotext) and the constra in t tha t the exp onen tia l decay rate o f P f p would b e at leas t as la r ge as λ , fo r any a tta ck channel in W n ( D a ). Similarly as in Section 2, we fo cus on the class of detectors which base their decisio ns on the e mpirical joint distribution of z and u . 6.1 Strongly Exchan geable A ttac k Channels First, we re s trict the set of attack channels to b e stro ngly exchangeable channels (the ex act deﬁnition will be g iv en in the sequel). Later, this r e striction will b e dr opped, and the attack channel will b e allowed to 21 be an y mem b er of W n ( D a ). How ev er, in this cas e ra ndom watermarks (ra ther than deterministic ones) m ust b e consider ed. The us e of strongly exchangeable channels in the co n text of g eneral a ttac k channels was prop osed in [22], where Somekh- B aruc h and Merhav show ed (in ano ther context) that the w orst strong ly exchangeable attack channel is as ba d a s the w o rst genera l attack channel, while strongly exc hangeable c ha nnels are m uch e asier to analyze. In the s equel, we will adjust the pro of technique prop osed in [2 2] to ﬁt o ur needs. Deﬁnition 1. A str ongly ex cha nge able channel W n is one t ha t satisﬁes f or al l y ∈ A n , z ∈ A n W n ( z ′ | y ′ ) = W n ( z | y ) , ∀ ( y ′ , z ′ ) ∈ T ( y , z ) . Denote the se t of all strongly exch angeable c hannels that op erate on n -tuples by C ex n and le t W ex n ( D a ) = W n ( D a ) ∩ C ex n . Deﬁne W ∗ n ( z | y ) = c n ( y ) | T ( z | y ) | 1 { d a ( y , z ) ≤ nD a } , (64) where, c n ( y ) = h P z : d a ( y , z ) ≤ nD a 1 | T ( z | y ) | i − 1 [22, p. 543]. Clearly , W ∗ n ∈ W ex n ( D a ). Note that c n ( y ) equals to the recipro cal o f the num b er of conditional types T ( z | y ) such that d a ( y , z ) ≤ nD a [22, p. 54 3] which implies that ( n + 1) −A 2 ≤ c ( y ) ≤ 1 . Hence, c n ( y ) is at most p olynomial in n . Deﬁne Λ ∗ = ( z : ˆ I z u ( Z ; U ) + min ˆ P y : ˆ E yz d a ( Y ,Z ) ≤ D a D  ˆ P y k P X  ≥ |A| ln( n + 1 ) n + λ ) . (65) Lemma 2. (i) F or every W n ∈ W ex n ( D a ) , P f p (Λ ∗ , W n ) ≤ e − n ( λ − δ n ) wher e lim n →∞ δ n = 0 . (ii) F or any Λ ⊆ A n that satisﬁes P f p (Λ , W n ) ≤ e − nλ ′ ∀ W n ∈ W ex n ( D a ) for some λ ′ > λ , then Λ c ∗ ⊆ Λ c for al l suﬃciently lar ge n . Pr o of. Let T ( z | u ) ⊆ Λ. Then, we have e − nλ ≥ max W n ∈W ex n ( D a ) P f p (Λ , W n ) = max W n ∈W ex n ( D a ) X z ∈ Λ   X y ∈A n P X ( y ) W n ( z | y )   ≥ X z ∈ Λ   X y ∈A n P X ( y ) W ∗ n ( z | y )   = X T ( z | u ) ⊆ Λ X z ′ ∈ T ( z | u )   X y ∈A n P X ( y ) W ∗ n ( z ′ | y )   = X T ( z | u ) ⊆ Λ X z ′ ∈ T ( z | u ) Q ∗ ( z ′ ) , (66) 22 where Q ∗ ( z ) △ = P y ∈A n P X ( y ) W ∗ n ( z | y ). Now, Q ∗ ( z ) = X y ∈A n P X ( y ) W ∗ n ( z | y ) = X T ( y | z ) ⊂A n X y ′ ∈ T ( y | z ) P X ( y ′ ) W ∗ n ( z | y ′ ) = X T ( y | z ) ⊂A n X y ′ ∈ T ( y | z ) P X ( y ′ ) c n ( y ′ ) | T ( z | y ) | 1 { d a ( y ′ , z ) ≤ nD a } . = X T ( y | z ) ⊂A n | T ( y | z ) | e − n h ˆ H y ( Y )+ D ( ˆ P y k P X ) i e − n ˆ H y z ( Z | Y ) c n ( y ) 1 { d a ( y , z ) ≤ nD a } . = X T ( y | z ) ⊂A n e − n h ˆ H y ( Y )+ D ( ˆ P y k P X ) − ˆ H y z ( Y | Z )+ ˆ H y z ( Z | Y ) i c n ( y ) 1 { d a ( y , z ) ≤ nD a } . = exp ( − n " ˆ H z ( Z ) + min ˆ P y : ˆ E yz d a ( Y ,Z ) ≤ D a D ( ˆ P y k P X ) #) , (67) where the last e qualit y stems fro m the fact that c n ( y ) is po lynomial in n . Clearly , for any z ′ ∈ T ( z ) the following holds Q ( z ) = X y ∈A n P X ( y ) W n ( z | y ) = X π ( y ) P X ( π ( y )) W n ( π ( z ) | π ( y )) = Q ( π ( z )) = Q ( z ′ ) , (68) where t he second equality is beca us e W n ∈ W ex n ( D a ) and π ( · ) is a permutation of { 1 , . . . , n } suc h that z ′ = π ( z ). Hence Q ∗ ( z ′ ) = Q ∗ ( z ) ∀ z ′ ∈ T ( z ). F ollowing (66), we get e − nλ ≥ X T ( z | u ) ⊆ Λ | T ( z | u ) | Q ∗ ( z ) ≥ | T ( z | u ) | Q ∗ ( z ) ≥ | T ( z | u ) | exp ( − n " ˆ H z ( Z ) + min ˆ P y : ˆ E yz d a ( Y ,Z ) ≤ D a D ( ˆ P y k P X ) #) ≥ exp ( − n " ˆ H z ( Z ) − n ˆ H z u ( Z | U ) + min ˆ P y : ˆ E yz d a ( Y ,Z ) ≤ D a D ( ˆ P y k P X ) #) ( n + 1) −|A| = exp ( − n " ˆ I z u ( Z ; U ) + min ˆ P y : ˆ E yz d a ( Y ,Z ) ≤ D a D ( ˆ P y k P X ) #) ( n + 1) −|A| . (69) In the sa me spirit as in the a ttac k -free scenario, we ha ve shown that every T ( z | u ) in Λ is also in Λ ∗ . Therefore, Λ c ∗ ⊆ Λ c and so the pr obabilit y o f Λ c ∗ is smaller than the proba bility of Λ c , i.e., Λ c ∗ minimizes P f n among all Λ c corres p onding to detectors that satisfy (62). It remains to sho w that Λ ∗ itself has a false p ositive exp onent which is at le a st a s la rge a s λ for suﬃciently large n . 23 Clearly , for any attack channel W n ∈ W ex n ( D a ), W n ( z | y ) = P z ′ ∈ T ( z | y ) W n ( z ′ | y ) | T ( z | y ) | = W  T ( z | y ) | y  | T ( z | y ) | ≤ 1 | T ( z | y ) | 1 { d a ( y , z ) ≤ nD a } , (7 0 ) where t he ﬁrst equalit y is because W n ( z ′ | y ) = W n ( z | y ) ∀ z ′ ∈ T ( z | y ). Moreov er, s imila rly as in (6 7), combined with the fact that c ( y ) is p olynomial in n implies that X y ∈A n P X ( y ) 1  d a ( y , z ′ ) ≤ nD a  | T ( z ′ | y ) | . = ex p ( − n " ˆ H z ( Z ) + min ˆ P y : ˆ E yz d a ( Y ,Z ) ≤ D a D ( ˆ P y k P X ) #) . (71) Using (70) and (7 1), it follows that Λ ∗ indeed fulﬁlls the false-p ositive constr a in t for any attack channel W n ∈ W ex n ( D a ): max W n ∈W ex n ( D a ) P f p (Λ ∗ , W n ) = max W n ∈W ex n ( D a ) X z ∈ Λ ∗   X y ∈A n P X ( y ) W n ( z | y )   ≤ X T ( z | u ) ⊆ Λ ∗ X z ′ ∈ T ( z | u )   X y ∈A n P X ( y ) 1  d a ( y , z ′ ) ≤ nD a  | T ( z ′ | y ) |   = X T ( z | u ) ⊆ Λ ∗ X z ′ ∈ T ( z | u ) " exp ( − n ˆ H z ( Z ) + min ˆ P y : ˆ E yz d a ( Y ,Z ) ≤ D a D ( ˆ P y k P X ) !)# = X T ( z | u ) ⊆ Λ ∗ e n ˆ H uz ( Z | U ) " exp ( − n ˆ H z ( Z ) + min ˆ P y : ˆ E y z d a ( Y ,Z ) ≤ D a D ( ˆ P y k P X ) !)# · ≤ X T ( z | u ) ⊆ Λ ∗ exp n − n ˆ I uz ( Z ; U ) o exp ( − n min ˆ P y : ˆ E yz d a ( Y ,Z ) ≤ D a D ( ˆ P y k P X ) ) ≤ ( n + 1) |A| e − nλ . = e − n ( λ − δ n ) , (72) where δ n = |A| ln( n +1) n → 0 as n → ∞ . Our next step is to ﬁnd a n embedder which minimizes the proba bility of false neg a tiv e under the given decis ion reg ion for any a tta ck channels W n ∈ W ex n ( D a ). F ollowing Section 5, the optimal embedder can b e written as follows: f ∗ n ( x , u ) = arg min y : d e ( x , y ) ≤ nD e max W n ∈W ex n ( D a ) X z ∈ Λ c ∗ W n ( z | y ) . (73) Lemma 3. F or any attack channel W n ∈ W ex n ( D a ) , the optimal emb e dder f ∗ n which minimizes the false-ne gative pr ob ability c an b e expr esse d in the fo l lowing manner: f ∗ n ( x , u ) = y , y ∈ T ∗ ( y | x , u ) (74) 24 wher e T ∗ ( y | x , u ) c orr esp onds to t he fol lowing c onditional empiric al distribution: ˆ P uxy ( Y | X, U ) = arg max ˆ P uxy ( Y | X,U ): ˆ E xy d e ( X,Y ) ≤ D e ( min ˆ P uy z ( Z | Y ,U ): ˆ I uz ( Z ; U )+min ˆ P y ( Y ): ˆ E y d a ( Y ,Z ) ≤ D a D ( ˆ P y k P X ) <λ ˆ I uy z ( Z ; U | Y ) ) . (75) Pr o of. F or a g iv e n y ∈ A n , max W n ∈W ex n ( D a ) X z ∈ Λ c ∗ W n ( z | y ) = max W n ∈W ex n ( D a ) X T ( z | y , u ) ⊆ Λ c ∗ X z ′ ∈ T ( z | y , u ) W n ( z | y ) ≤ X T ( z | y , u ) ⊆ Λ c ∗ X z ′ ∈ T ( z | y , u ) | T ( z | y ) | − 1 1 { d a ( y , z ′ ) ≤ nD a } ≤ X T ( z | y , u ) ⊆ Λ c ∗ | T ( z | y , u ) | · | T ( z | y ) | − 1 1 { d a ( y , z ′ ) ≤ nD a } . = max T ( z | y , u ) ⊆ Λ c ∗ e − n ˆ I uy z ( Z ; U | Y ) . (76) Therefore f ∗ n ( x , u ) ∈ T ∗ ( y | x , u ), where T ∗ ( y | x , u ) corres ponds to the conditional empirical distribution (75). Note that the optimal em be dder and the optimal decis ion rule corre spond to the case where the detector and the em be dder are tuned to the w orst p ossible channel W ∗ n . T o extend the ab o ve results to ge neral attack channels (i.e., c hannels that are mem b ers o f W n ( D a ) rather than W ex n ( D a )) w e m ust consider the random w atermark setting (cf. Subsection 3.4). The r eason for this will b e made clear in the seq uel. 6.2 Random W atermarks and General A ttac k Channels In the spirit of Subsec tio n 3.4, from this p o in t on, we will use the mo del in whic h u is random as w ell, in pa r ticular, b eing drawn fro m another source P U , indep enden tly of x , norma lly , the binar y sy mmetric source (BSS). In this case, the decision reg ions Λ and Λ c will b e deﬁned as subsets of A n × B n and the probabilities o f erro r P f n and P f p will be deﬁned, aga in, as the corres ponding summatio ns of pro ducts P X ( x ) P U ( u ). The co rresp o nding version o f Λ ∗ , pr oposed for strongly e x c hang eable attack, channels would be: Λ ∗∗ △ = ( ( z , u ) : ˆ I z u ( Z ; U ) + D  ˆ P u k P U  + min ˆ P y : ˆ E y z d a ( Y ,Z ) ≤ D a D  ˆ P y k P X  ≥ |A| ln( n + 1) n + λ ) . (77) Theorem 5. (i) F or every W n ∈ W n ( D a ) , P f p (Λ ∗∗ , W n ) ≤ e − n ( λ − δ n ) , wher e lim n →∞ δ n = 0 . (ii) F or any Λ ⊆ A n × B n that satisﬁes P f p (Λ , W n ) ≤ e − nλ ′ ∀ W n ∈ W n ( D a ) for some λ ′ > λ , then Λ c ∗∗ ⊆ Λ c for al l suﬃciently lar ge n . 25 T o pro v e the above theor em in the case of gener a l attack channels, w e ﬁrst need to ensure that the probability of false p ositive under Λ ∗∗ will b e smaller than e − nλ for any attack channel in W n ( D a ). W e use an ar g umen t, which was use d in [2 2, Lemma 4], to prove that the worst strongly exchangeable attack channel is as ba d as the worst general channel, and therefore we can reuse the results of Lemma 2. F o r the sa k e o f completenes s , we will rephra se the argument a nd adjust it to our pro blem. Pr o of. Given a general attack channel W n ∈ W n ( D a ), let π denote a per m utation of { 1 , . . . , n } and let W π n ( z | y ) △ = W n  π ( z ) | π ( y )  . Clea rly , ˜ W n ( z | y ) = 1 n ! X π W π n ( z | y ) is a str ongly exchangeable channel. F or a given W n ∈ W n ( D a ), let the false-p ositive probability under Λ be P f p (Λ , W n ) △ = X u P U ( u ) X z ∈ Λ( u )   X y ∈A n P X ( y ) W n ( z | y )   , (78) where Λ( u ) =  z : ( z , u ) ∈ Λ  . Rec all that any decision reg ion Λ is a union of joint type clas ses  T ( u , z )  . Since P f p (Λ , W n ) is aﬃne in W n , we can s ee tha t 1 n ! X π P f p (Λ , W π n ) = 1 n ! X π X u P U ( u ) X z ∈ Λ( u )   X y P X ( y ) W π n ( z | y )   = X u P U ( u ) X z ∈ Λ( u )   X y P X ( y ) 1 n ! X π W π n ( z | y ) !   = P f p Λ , 1 n ! X π W π n ! . (7 9) Now, for a g iv en pe r m utatio n π , P f p (Λ , W π n ) = X u P U ( u ) X z ∈ Λ( u )   X y P X ( y ) W n  π ( z ) | π ( y )    = X u P U  π ( u )  X z ∈ Λ( π ( u ))   X y P X  π ( y )  W n  π ( z ) | π ( y )    = X u P U ( u ) X z ∈ Λ( u )   X y P X ( y ) W n ( z | y )   = P f p (Λ , W n ) (80) where the seco nd equa lit y follows since z ∈ Λ( u ) ⇒ π ( z ) ∈ Λ( π ( u )) (and that is be c ause Λ is a union of joint type classes { T ( u , z ) } ) and the third equality follows from the fact that P X and P U are memor yless which implies that P X  π ( y )  = P X ( y ) and P U  π ( y )  = P U ( y ). 26 F rom (79 ) and (80), we get that for any Λ, P f p Λ , 1 n ! X π W π n ! = 1 n ! X π P f p (Λ , W π n ) = 1 n ! X π P f p (Λ , W n ) = P f p (Λ , W n ) . (81) Therefore, for any Λ, max W n ∈W n ( D a ) P f p ( W n , Λ) = max W n ∈W ex n ( D a ) P f p ( W n , Λ) . (82) Hence, the worst gener al attack channel is not worse than the worst strong ly exchangeable channel, and therefore we can conﬁne o ur search to the s et of strongly exchangeable channels under which Λ ∗∗ , deﬁned in (77), is optimal. Using a similar pro of of Lemma 2, it is easy to show that indeed under Λ ∗∗ the false-p ositiv e probability is not grea ter than e x p  − n ( λ − δ n )  , where lim n →∞ δ n = 0 . Note that the summation ov er u (and the fact that any Λ is a union of t ypes ) enabled us the use of this argument, which migh t suggest that for a deterministic watermark, a genera l attack channel is worse than the worst strongly ex changeable channel. How e ver, this c ha nnel might b e depe ndent on the watermark sequence which is not av a ilable to the attack er. This is exactly the r eason why r a ndom watermark setting is consider ed in the g eneral attack sce na rio. Once aga in, it is eas y to verify that Λ ∗∗ do es no t vio late the false-p ositiv e probability constra in t under general attack channel while minimizing the false-negative probability . W e now pro ceed to ﬁnd the optimal embedder. The fals e-negative probability for a given attack channel W n , embedder f n , a nd decisio n region Λ can b e wr itten as follow P f n  f n , Λ , W n  = X u ∈B n P U ( u ) P f n  f n , Λ( u ) , W n  , (83) where P f n  f n , Λ( u ) , W n  = X z ∈ Λ c ( u ) X y ∈A n   X x : f n ( x , u )= y P X ( x )   W n ( z | y ) . (84) Corollary 2. F or any attack channel W n ∈ W n ( D a ) , the optimal emb e dder f ∗∗ n which minimizes the false-ne gative pr ob ability is the emb e dder deﬁne d in (74). Pr o of. Clear ly , for any u ∈ B n min f n ( x , u ): d e ( x , y ) ≤ nD e max W n ∈W n ( D a ) P f n ( f n , Λ ∗∗ ( u ) , W n ) ≥ min f n ( x , u ): d e ( x , y ) ≤ nD e max W n ∈W ex n ( D a ) P f n ( f n , Λ ∗∗ ( u ) , W n ) = max W n ∈W ex n ( D a ) P f n ( f ∗ n , Λ ∗∗ ( u ) , W n ) , (85) but on the o ther ha nd min f n ( x , u ): d e ( x , y ) ≤ nD e max W n ∈W n ( D a ) P f n ( f n , Λ ∗∗ , W n ) ≤ max W n ∈W n ( D a ) P f n ( f ∗ n , Λ ∗∗ , W n ) = max W n ∈W ex n ( D a ) P f n ( f ∗ n , Λ ∗∗ , W n ) , (86) 27 where the last equality can eas ily b e obtaine d fro m the a bov e argument when applied to em bedder s which use a certain conditiona l type T ( y | x , u ) to pro duce the stegotext (as f ∗ n ). Therefo re, the optimal embedder in the case of a gener al attack channel is f ∗ n , prop osed in Theorem 3. Note that from (74), (77) the false-negative er r or exp onen t can be expressed in a closed form using the metho d o f t yp es [27]. 6.3 Discussion In this section, we extended the basic setup prese nted in Section 2 to the c ase o f genera l attack channels. First, w e solved the problem for the ca se where the watermark sequence is deterministic under s trongly exchangeable channels. Then, we trea ted the c a se of general attack channels, but, we had to assume that the watermark sequence u is random to o. How ever, this sho uld not surprise us. Cle a rly , fo r a g iven watermark, the w orst attack c ha nnel is dep endent on the watermark (although it is not known to the attack e r ). In this ca se, the attack er can imitate the detecto r op eration: ﬁrst, it decides which hypothesis is more likely (using a simila r dec is ion r ule used by the detec to r). Then, it can try to “push” the stegotext in the wrong directio n causing a false detection. A similar behavior can be seen in the case o f a random watermark message u and a deterministic cov ertext sequence x . If d e = d a and D a ≥ D e , the worst channel (which do es dep end on the co vertext x ) is the follo wing: if y 6 = x (hypothesis H 1 ) then z = x , i.e., the channel completely erases the messa ge, otherw is e (hypothesis H 0 ) the c hannel tries to “push” y to Λ. In this case, b oth the false-negative pr obabilit y and the false-p ositiv e probability mig h t conv er g e to one. The rea s on for that is r ooted in the fact that the set of attack c hannels has no t b een limited. In Subsection 6.1, we r e stricted the class o f attack c hannels to b e a strong ly e x c hang eable channel a nd got non-trivial results. Other limitations may b e impo sed o n the attack channels (e.g., blo ckwise memor y less, ﬁnite-state channels) if meaningful r esults ought to b e obtained. Note that the worst attack strategy W ∗ n is indep enden t of λ , the covertext distribution P X , and even the embedder strategy a nd its distortion level D e (assuming that the em b edder use a certain t ype T ( y | x , u ) to pr oduce the stegotex t). The a ttac k strategy is only dep e nden t on the allo w able dis tortion level D a . Therefo r e, the embedding strategy can b e designed as suming that the worst attack channel is presen t. This can be useful in ev aluating the p erformance (in ter ms o f fals e-negativ e pro babilit y) of sub o ptimal embedder s. App endix Pr o of of The or em 2. First, we explore the case where a = 0, i.e ., y = b u . Substituting a = 0 in the constraint of eq. (32), we get that b 2 − 2 ρb + ( α 2 − D e ) ≤ 0. The fact that b is a real num b er implies that the dis c riminan t of ( b 2 − 2 ρ b + ( α 2 − D e )) is non-negative whic h lea ds to ρ 2 − ( α 2 − D e ) ≥ 0, o r D ≥ α 2 − ρ 2 . This cor responds to the case where the stego text includes only a fr action of u without violating the distortion constraint. In this case, the f alse-nega tiv e probability is zero (the distortion constraint is so lo ose, it allows to “erase” the co vertext). In the following cas e , we can c hoo se b ∗ = ρ + p ρ 2 − α 2 + D as t he optimal solution. F r om now on, we a ssume that D e < α 2 − ρ 2 which means that a = 0 is no t a 28 legitimate solution. Let us a ssume tha t ρ ≥ 0. Deﬁne t △ = b/a , and r ewrite (32) b y dividing the numerator and denominato r by a 2 : max t ∈ R f ( t ) sub ject to: a 2 t 2 + 2( a − 1) aρt + ( a − 1) 2 α 2 ≤ D (A-1) where f ( t ) = ( t + ρ ) 2 ( t + ρ ) 2 + ( α 2 − ρ 2 ) . It is ea sy to show that maximizing f ( t ) is equiv alent to maximizing t . Since t is a real num b er, the discriminant o f  a 2 t 2 + 2( a − 1) aρt + ( a − 1 ) 2 α 2 − D  m ust b e no n-negativ e, i.e., ∆ = 4 a 2  D − ( a − 1) 2 ( α 2 − ρ 2 )  ≥ 0 , (A-2) which leads to 1 − s D e α 2 − ρ 2 ≤ a ≤ 1 + s D e α 2 − ρ 2 . ( A-3) Hence, a must be in the r ange R △ = h 1 − q D e α 2 − ρ 2 , 1 + q D e α 2 − ρ 2 i . Let us r e w r ite the constraint as fo llo ws , [ at + ( a − 1) ρ ] 2 + ( a − 1) 2 ( α 2 − ρ 2 ) − D ≤ 0 , (A-4) consequently , (1 − a ) ρ − p D e − ( a − 1 ) 2 ( α 2 − ρ 2 ) a ≤ t ≤ (1 − a ) ρ + p D e − ( a − 1 ) 2 ( α 2 − ρ 2 ) a . (A-5) Our next step w ill b e to maximize the upp er b ound on t in the allow able r ange o f a . arg max a ∈ R t ( a ) (A-6) where t ( a ) = (1 − a ) ρ + p D e − ( a − 1 ) 2 ( α 2 − ρ 2 ) a . ( A-7) After diﬀerentiating with resp ect to a and equating to zero, we g et a 1 , 2 = ( α 2 − ρ 2 )( α 2 − D e ) ± p D ρ 2 p ( α 2 − ρ 2 )( α 2 − D e ) α 2 ( α 2 − ρ 2 ) . (A-8) Accordingly , the optimal v alue of a and b ar e ( a ∗ , b ∗ ) =  arg max n t ( a ) | a ∈ { a 1 , a 2 , a 3 , a 4 } \ R o , a ∗ · t ( a ∗ )  , (A-9) where a 3 , 4 = 1 ± q D e α 2 − ρ 2 . The same results a re obtained in the case wher e ρ < 0 . Pr o of of The or em 3. It is easy to show that under H 1 ˆ ρ 2 uy =  | ρ | + √ D e  2  | ρ | + √ D e  2 + ( α 2 − ρ 2 ) , (A-10) 29 where α 2 and ρ ar e functions of the r andom vector X . By c o nditioning o n α 2 , we can expres s the false-negative pro babilit y as P f n = Z ∞ 0 Pr n ˆ ρ 2 uy ≤ 1 − e − 2 λ    H 1 , α 2 = r o · p α 2 ( r ) dr, (A-11) where ( nα 2 /σ 2 ) is χ 2 distributed with n degrees o f freedom a nd the pr obabilit y density function fo r the χ 2 distribution with n degre es of freedom is given by p χ 2 n ( z ) = (1 / 2) n/ 2 Γ( n/ 2) z n/ 2 − 1 e − n/ 2 , z ≥ 0 , and Γ( · ) denotes the Ga mma function. Now, given α 2 , D e and a threshold v alue τ △ = 1 − e − 2 λ , let us ﬁnd the r a nge of ρ for which ˆ ρ 2 uy ≤ τ , i.e., ˆ ρ 2 uy ( ρ ) △ = ( | ρ | + √ D e ) 2 ( | ρ | + √ D e ) 2 + ( α 2 − ρ 2 ) ≤ τ . (A-12) The function ˆ ρ 2 uy ( ρ ) is s ymmetric with resp ect to the ρ ax is, monotonica lly increasing in | ρ | and attains its minimum v a lue D e D e + α 2 at ρ = 0. Hence, for α 2 < D e (1 − τ ) τ , ˆ ρ 2 uy is gre ater than τ . Aft er so lv ing (A-12) with r espect to ρ and using the fact that τ ≤ 1, we get that | ˆ ρ uy | ≤ √ τ implies that | ρ | ≤ √ D e ( τ − 1 ) + √ D e τ 2 + τ α 2 − τ D as long as α 2 ≥ D e (1 − τ ) τ . Deﬁne Θ( r ) △ = arcc os  √ D e ( τ − 1) + √ D e τ 2 + τ r − τ D √ r  (A-13) It follows that Pr n ˆ ρ 2 uy ≤ τ    H 1 , α 2 = r o = Pr  ρ 2 ≤ h p D e ( τ − 1) + p D e τ 2 + τ α 2 − τ D i 2    H 1 , α 2 = r  = 1 − Pr  ρ 2 > h p D e ( τ − 1 ) + p D e τ 2 + τ α 2 − τ D i 2    H 1 , α 2 = r  = 1 − 2 A n  Θ( r )  A n ( π ) , where A n  Θ( r )  A n ( π ) . = e n ln sin  Θ( r )  . W e no te that Pr n ˆ ρ 2 uy ≤ τ   H 1 , α 2 o = 0 for α 2 in the rang e h 0 , D e (1 − τ ) τ i . Ther efore, P ( n ) f n = (1 / 2) n/ 2 Γ( n/ 2) Z ∞ D e (1 − τ ) τ  1 − e n l n sin  Θ( r )   e − nr 2 σ 2 nr σ 2 ! n − 2 2 dr = (1 / 2) n 2 n n − 2 2 Γ( n/ 2) " Z ∞ D e (1 − τ ) τ σ 2 r e − nr 2 σ 2 e n 2 ln( r/σ 2 ) dr − Z ∞ D e (1 − τ ) τ e n l n sin Θ( r ) e − nr 2 σ 2 e n 2 ln( r/σ 2 ) dr # . (A-14) Our next step is to ev aluate the exp onential decay rate of (A-14 ). It is ea sy to see that the ﬁrst integral of (A-14) has a slow er exp onen tial decay rate a nd therefore dictates the overall decay rate. T o ev alua te 30 the exp onen tia l decay rate of P ( n ) f n as n → ∞ we use Laplac e ’s metho d for integrals 5 . Therefore, we need to ﬁnd the slowest e xponential decay rate of the integrant in the limits of the integral. It is easy to s ho w that lim n →∞ 1 n ln " (1 / 2) n 2 n n − 2 2 Γ( n/ 2) # = 1 2 , (A-15) and therefor e the ov erall exp o nen t is g iv en by E se f n ( τ , D e ) = min r ≥ D e (1 − τ ) τ 1 2 " r σ 2 − ln( r/σ 2 ) − 1 # . (A-16) The function g ( r ) =  r/σ 2 − ln( r/ σ 2 ) − 1  , r ∈ (0 , ∞ ) a c hieves its minim um at r = σ 2 and g ( σ 2 ) = 0. Therefore, in the case wher e D e (1 − τ ) τ ≤ σ 2 , E se f n ( τ , D e ) = 0 . Otherwise, the minimum of (A-16) is obtained at r = D e (1 − τ ) τ . Hence, the false-nega tiv e exp onent of the sign embedder is given by E se f n ( τ , D e ) = ( 0 , D e (1 − τ ) τ ≤ σ 2 1 2 h D e (1 − τ ) τ σ 2 − ln  D e (1 − τ ) τ σ 2  − 1 i , else (A-17) Setting τ = 1 − e − 2 λ achiev e s (41). Pr o of of Cor ol lary 1. Since the fals e - negativ e pro babilit y of the improved em bedder (42) is zero for α 2 ≤ D e we can r ewrite the integral (A-14) for the case where 1 − τ τ ≤ 1 (or λ ≥ 1 / 2 ln 2) where the lower limit equals to D e (and do es no t dep e nd on λ ) as fo llo wing : P ( n ) f n = (1 / 2) n/ 2 Γ( n/ 2) Z ∞ D e  1 − e n l n sin  Θ( r )   e − nr 2 σ 2 nr σ 2 ! n − 2 2 dr . (A-18) Optimizing using Laplace metho d as done in the pro of of The o rem 3 leads to (43). Pr o of of The or em 4. Given λ > 0 , the false-neg ativ e pr obabilit y is given by P f n = P r n ˆ ρ uy ≤ p 1 − e − 2 λ   H 1 o , (A-19) where the normaliz e d corr elation, under H 1 , is given b y ˆ ρ uy = ρ + √ D e p α 2 + 2 √ D e ρ + D < T . (A-20) The function ˆ ρ uy ( ρ ) a c hieves its minimum at ρ = − α 2 √ D e . Since ρ ∈ [ − α, α ] we conclude that in the ca s e where α 2 ≥ D e , ˆ ρ uy < T implies that ρ < √ D e ( T 2 − 1) + T p α 2 − D (1 − T 2 ) ( ˆ ρ uy ( ρ ) is monotonically increasing in ρ , a nd ˆ ρ uy ( − α ) = − 1). If (1 − T 2 ) D ≤ α 2 < D e , ˆ ρ uy < T implies that p D e ( T 2 − 1) − T p α 2 − D (1 − T 2 ) ≤ ρ ≤ p D e ( T 2 − 1) + T p α 2 − D (1 − T 2 ) . 5 Laplace’s metho d is a general tec hnique for obtaining the asymptotic behavior of integrals of the form I ( x ) = R b a f ( t ) e x Φ( t ) dt as x → ∞ . In t his case c ∈ [ a, b ], the maximum of Φ( t ) in t he int erv al [ a, b ], dictates the asymptotic behavior of the integral (assuming that f ( c ) 6 = 0), or in the ab o ve case: lim n →∞ − 1 n ln » Z b a f ( t ) e − n Φ( t ) dt – = min t ∈ [ a,b ] Φ( t ) . See [34, Sec. 6.4], [35, Ch.4] for more i nformation. 31 Otherwise, for α 2 < (1 − T 2 ) D e , ˆ ρ uy ≥ T for all ρ ∈ [ − α, α ]. Deﬁne Ψ 1 ( r ) △ = arc cos " √ D e ( T 2 − 1) + T p r − D (1 − T 2 ) √ r # , (A-21) Ψ 2 ( r ) △ = arc cos " √ D e ( T 2 − 1) − T p r − D (1 − T 2 ) √ r # . (A-22) W e need to pay attention to the p oin t r 0 = D e (1 − T 2 ) T 2 in which Ψ 1 ( r 0 ) = π / 2. Beyond that p oin t ( r > r 0 ), the pr obabilit y of false-negative given α 2 = r goes to one as n tends to inﬁnit y . Ther e fo re, the false-negative pro babilit y can b e wr itten as follows: In the case wher e 1 − T 2 T 2 > 1 (or λ < 1 2 ln(2)) P ( n ) f n = (1 / 2) n 2 n n − 2 2 Γ( n/ 2) " Z D e D e (1 − T 2 ) σ 2 r  e n l n sin  Ψ 1 ( r )  − e n l n sin  Ψ 2 ( r )   e − nr 2 σ 2 e n 2 ln( r /σ 2 ) dr + Z D e (1 − T 2 ) T 2 D e σ 2 r e n l n sin  Ψ 1 ( r )  e − nr 2 σ 2 e n 2 ln( r /σ 2 ) dr + Z ∞ D e (1 − T 2 ) T 2 σ 2 r  1 − e n l n sin  Ψ 1 ( r )   e − nr 2 σ 2 e n 2 ln( r/σ 2 ) dr # . (A-23) The ﬁrst integral in (A-23) repr e sen ts the false-nega tiv e probability when b oth Ψ 1 ( r ) and Ψ 2 ( r ) are greater than π / 2. In th is case, we need to subtra ct the ar eas of tw o ca ps, i.e., A n ( π − Ψ 1 ( r )) − A n ( π − Ψ 2 ( r )) A n ( π ) . The seco nd in tegral in (A-23) stems from the fact that for r ≥ D e the f alse-nega tiv e proba bilit y (giv en α 2 = r ) equa ls to A n ( π − Ψ 1 ( r )) A ( π ) . The last integral in (A-23) stems from the fact that the false-negative probability (given α 2 = r ) equals to 1 − A (Ψ 1 ( r )) A ( π ) . In a similar way , in the case wher e 1 − T 2 T 2 ≤ 1 (or λ ≥ 1 2 ln(2)) P ( n ) f n = (1 / 2) n 2 n n − 2 2 Γ( n/ 2) " Z D e (1 − T 2 ) T 2 D e (1 − T 2 ) σ 2 r  e n l n sin  Ψ 1 ( r )  − e n l n sin  Ψ 2 ( r )   e − nr 2 σ 2 e n 2 ln( r /σ 2 ) dr + Z D e D e (1 − T 2 ) T 2 σ 2 r  1 − e n l n sin  Ψ 1 ( r )  − e n l n sin  Ψ 2 ( r )   e − nr 2 σ 2 e n 2 ln( r /σ 2 ) dr + Z ∞ D e σ 2 r  1 − e n ln sin  Ψ 1 ( r )   e − nr 2 σ 2 e n 2 ln( r/σ 2 ) dr # . (A-24) Since we a re interested in the exponential decay ra te (to the ﬁrst o r der), the slo w est exp onent dicta tes the ov er all exp onen tial be havior. Therefore , the fact that sin  Ψ 1 ( r )  > sin  Ψ 2 ( r )  for D e (1 − T 2 ) ≤ r ≤ D (1 − T 2 ) /T 2 implies tha t P f n . = (1 / 2) n 2 n n − 2 2 Γ( n/ 2) " Z D e (1 − T 2 ) T 2 D e (1 − T 2 ) σ 2 r e n l n sin  Ψ 1 ( r )  e − nr 2 σ 2 e n 2 ln( r/σ 2 ) dr + Z ∞ D e (1 − T 2 ) T 2 σ 2 r e − nr 2 σ 2 e n 2 ln( r /σ 2 ) dr # . (A-25) Again, using the L a place’s metho d for integrals [35, C h.4 ] we ca n co nclude that E ae f n ( T , D e ) = min n E 1 ( T , D e ) , E 2 ( T , D e ) o , (A-26) 32 where, E 1 ( T , D e ) = min D e (1 − T 2 ) D e (1 − T 2 ) T 2 1 2 " r σ 2 − ln  r σ 2  − 1 # . (A-28) E 2 ( T , D e ) is given b y E 2 ( T , D e ) = ( 0 , D e (1 − T 2 ) T 2 ≤ σ 2 1 2 h D e (1 − T 2 ) T 2 σ 2 − ln  D e (1 − T 2 ) T 2 σ 2  − 1 i , else . (A-29) Since T 2 = 1 − e − 2 λ , then E 2 ( λ, D e ) = E se f n ( λ, D e ) and therefore E ae f n ( λ, D e ) ≤ E se f n ( λ, D e ). Our next step will b e to prov e that E 1 ( T , D e ) < E 2 ( T , D e ) when D e (1 − T 2 ) T 2 > σ 2 (otherwise, E ae f n ( T , D e ) = 0). Deﬁne f ( r ) = r 2 σ 2 − 1 2 ln  r σ 2  − ln sin  Ψ 1 ( r )  − 1 2 (A-30) f ( r ) is a co n tinuous, no n-negativ e function in the range D e (1 − T 2 ) < r ≤ D e (1 − T 2 ) T 2 . Clea rly , E 1 ( T , D e ) ≤ f  D e (1 − T 2 ) T 2  = E 2 ( T , D e ) . (A-31) In addition, f ′ ( r ) is contin uous in the ab ov e ra nge. It can easily b e shown that f ′  D e (1 − T 2 ) T 2  = 1 2  1 − T 2 σ 2 D e (1 − T 2 )  > 0 (A-32) hence, f ( r ) is monotonically increa sing in s mall ne ig h b orho od of D e (1 − T 2 ) T 2 , and therefore E 1 ( T , D e ) < E 2 ( T , D e ). This fact leads to the conclusion tha t E ae f n ( λ, D e ) < E se f n ( λ, D e ). The e xact v alue of E 1 ( T , D e ) is cumberso me a nd therefor e w ill not be pres e n ted. References [1] R. Anderson a nd F. Petitcolas, “On the limits o f stenography ,” IEEE J. S ele ct. Ar e as Commun. , vol. 1 6, no. 4, pp. 47 4–481, May 199 8 . [2] F. Petitcolas, R. Ander son, and M. Kuhn, “Infor mation hiding – a survey ,” Pr o c. IEEE , vol. 87, no. 7, pp. 1062 – 1078, J uly 1 999. [3] I. J. Cox, M. L. Miller, and A. L. McKe llips , “ W a termarking as communications with s ide infor ma- tion,” Pr o c. IEEE , vol. 87, no. 7, pp. 1 1 27–1141 , July 199 9. [4] P . Moulin and J . O ’Sulliv an, “Informa tion-theoretic analysis of information hiding,” IEEE T r ans. Inform. The ory , vol. 49, no. 3, pp. 5 63–593, Mar. 200 3. [5] N. Merhav, “Universal detection of mes sages v ia ﬁnite–state channels,” IEEE T r ans. Inform. The ory , vol. 4 6, no. 6, pp. 22 42–2246, Sept. 20 00. 33 [6] T. Liu a nd P . Mo ulin, “Err o r exp onents for watermarking ga me with squar ed-error cons train ts,” in Pr o c e e dings of Int ernational Symp osium on Information The ory, (ISI T ’03) , Y okohama, J apan, July 2003. [7] ——, “ Error expo nen ts fo r o ne-bit watermarking ,” in Pr o c e e dings of Int. Conf. on A c oustics, Sp e e ch, and Signal Pr o c essing (ICASS P ’03 ) , vol. 3, Apr. 2003, pp. 65–6 8. [8] N. Merhav, “An information-theor etic view of w a termark em bedding-de tec tio n and g eometric attacks,” June 2 005, presented at W a Cha ‘0 5 , Barcelona , Spain. [O nline]. Av aila ble: ht tp://www.ee.technion.ac.il/p eople/merhav/paper s/p98.p df [9] F. Hartung and M. Kutter, “Multimedia watermarking techniques,” Pr o c. IEEE , vol. 87 , no . 7, pp. 1079– 1107, July 1999. [10] J. Linnartz, T. Kalk er, and G. Depov ere, “Mo delling the false alarm and miss ed detection ra te for electronic watermarks,” in Information Hiding: Se c ond International Workshop, IH’98 , Portland, Orego n, USA, Apr. 1998, p. 329 . [11] M. L. Miller a nd J. A. Blo om, “Co mputing the pr obabilit y of false watermark detection,” in IH ’99: Pr o c e e dings of the Thir d International Workshop on Information Hiding . London, UK: Springer - V erlag, 2000, pp. 146 –158. [12] M. L. Miller , I. J. Cox, and J. A. Blo om, “Informed embedding: Explo iting image and detector in- formation during watermark insertion,” in International Confer enc e on Image Pr o c essing Pr o c essing (ICIP ’00) , vol. 3 , 2000 , pp. 1–4 . [13] C. Po dilc h uk and E. Delp, “Digital watermarking: Algorithms and applica tions,” IEEE Signal Pr o- c essing Mag. , vol. 18, no. 4, pp. 33–46 , J uly 2 001. [14] M. Bar ni and F. B artolini, Watermarking Systems Engine ering: Enabling Digital Asset s S e curity and Other App lic ations . Marcel Dekker, 200 4. [15] F. Hartung, J . Su, and B. Giro d, “Spread sp ectrum watermarking : Malicious attacks a nd counter- attacks,” in Pr o c e e dings of S PIE V ol. 3657, Se curity and Watermarking of Multime dia Content s , San Jose, CA, Jan. 19 99, pp. 147 –158. [16] H. S. Malv ar and D. A. F. Flo r ˆ encio, “Improved spre a d s p ectrum: a new modulation technique for robust watermarking ,” IEEE T r ans. Signal Pr o c essing , vol. 5 1, no. 4, pp. 898–9 05, Apr. 2 003. [17] T. F uron, “A constructive and unifying framework for zero-bit w ater marking,” su bmitte d to IEEE T r ans. Informatio n F or ensics and Se cu rity , 2006. [18] J. Herna ndez and F. Perez-Gonzalez, “Statistical analysis of watermarking schemes for copyright protection o f images,” Pr o c. IEEE , vol. 87, no. 7, pp. 114 2–1166, July 1999 . [19] M. Gutman, “Asymptotically optimal cla ssiﬁcation for m ultiple tests withempirically observed s tatis- tics,” IEEE T r ans. In fo rm. The ory , vol. 35 , no. 2, pp. 4 01–408, Mar. 19 8 9. 34 [20] N. Merhav, M. Gutman, and J. Ziv, “On the estimation o f the orde r of a ma rk o v c hain a nd univ ersal datacompress ion,” IEEE T r ans. Inform. The ory , vol. 35, no. 5, pp. 101 4 –1019, Sept. 1 989. [21] ——, “E stimating the num ber of states o f a ﬁnite-sta te s ource,” IEEE T r ans. Inform. The ory , vol. 38, no. 1, pp. 61–6 5 , J an. 1992. [22] A. Somekh- B aruc h a nd N. Merha v, “On the err or exp onen t and capacity games of priv ate water- marking systems,” IEEE T r ans. Inform. The ory , vol. 49, no. 3, pp. 53 7–562, Mar . 20 03. [23] ——, “On the capacity ga me of public watermarking systems,” IEEE T r ans. In fo rm. The ory , vol. 50 , no. 3, pp. 511– 5 24, Ma r. 2004. [24] E. Sabbag a nd N. Merhav, “Optimal watermark embedding and detection stra tegies under limited detection resourc e s,” in Pr o c. In t. Symp. on In formation Th e ory (ISIT’06) , Seattle, USA, 20 06, pp. 173–1 77. [25] ——, “Optimal w atermark em b edding and detection strategies under general worst case attacks,” A c c epte d t o In t . Symp. on Informatio n The ory (ISIT’07 ) , June 2007. [26] H. L. V an-T rees, Dete ction, Estima tion a nd Mo dulation The ory-V olume I . New-Y ork: Jo hn Wiely & Sons, 19 68. [27] I. Csis z ´ ar and J. K¨ orner, Information Th e ory: Co ding The or ems for Discr ete Memoryless Systems . Academic Press, 19 81. [28] M. B a rni, “Eﬀectiveness of exhaus tiv e search and template matching a gainst watermark desynchro- nization,” IEEE Signal Pr o c essing L ett . , vol. 1 2, no. 2, pp. 15 8–161, F eb. 200 5. [29] N. Merhav, “On the estimation of the mo del order in e x ponential families,” IEEE T r ans. Inform. The ory , vol. 3 5 , no . 5 , pp. 1109 –1114, Sept. 1 989. [30] ——, “Universal deco ding for memoryles s Gaussian channels with a deterministic interference,” IEEE T r ans. Info rm. The ory , vol. 39, no. 4, pp. 1 261–1269 , July 1993. [31] N. Merhav, G. K aplan, A. La pidoth, and S. Shama i (Shitz), “O n infor mation rates for mismatched deco ders,” IEEE T r ans. Info rm. The ory , vol. 40, no. 6, pp. 1 953–1967 , Nov. 1994 . [32] B. Chen and G. W ornell, “ Quan tization index modulation: A c la ss of prov ably go o d metho ds for digital watermarking and informatio n embedding,” IEEE T r ans. Inform. The ory , vol. 47 , no . 4 , pp. 1423– 1443, May 2 0 01. [33] A. D. Wyner , “A b ound on the num b er of distinguishable functions which ar e time-limited and approximately band-limited,” SIAM Journal on Applie d Mathematics , vol. 24, no. 3, pp. 289– 297, May 1 973. [34] C. M. Bender a nd S. A. Or szag, A dvanc e d Mathematic al Metho ds for S ci entists and Engine ers . New Y ork: Mc Gr a w-Hill, 1978 . 35 [35] N. G. de Bruijn, Asymptotic Metho ds in Analysis , 3 rd ed. North-Holland publishing company , 1970. 36

Optimal Watermark Embedding and Detection Strategies Under Limited Detection Resources

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment