DNA-Inspired Information Concealing
Protection of the sensitive content is crucial for extensive information sharing. We present a technique of information concealing, based on introduction and maintenance of families of repeats. Repeats in DNA constitute a basic obstacle for its recon…
Authors: ** 논문 원문에 저자 정보가 명시되지 않아 제공할 수 없습니다. (원문에 포함된 경우 별도 확인 필요) **
DNA-INSPIRED INF ORMA TION CONCEALING LUKAS KENCL AND MAR TIN LOEBL Abstract. Protect ion of the sensitive con ten t is cr ucial for extensiv e information sharing. W e presen t a tec hnique of information concealing, based on i nt ro duction and mainte nance of families of r epeats. Repeats in DNA constitute a basic obstacle for its reconstruction b y hybridisation. Information concealing in DNA b y rep eats is considered i n [1]. 1. Introduction Contemporar y computer s ystems may b e distr ibuted and may consis t of ma ny in terconnected pr o cessing units or a large n um b er of network ed computer subsystems. In additio n con temp orar y digital net works may consist of a large num b er of end- and in termediate- nodes . In all these systems, information, in the form of the sequences over some alphab et of sy m b ols, is circ ula ting or being s tored. The entit y controlling a subsystem or a no de is o ften unwilling or prohibited to share this infor mation-sequences with other no des. How ev er, sharing of some reduced lo ca l information might be very useful for purp oses of s ecurity , stability and v arious ana ly sis of the system p erfo r mance, and for data mining. Such a na lysis might for example allow to ide ntify frequently a pp earing segments by p er fo rming approximate statis tica l analysis o n segment frequency , allowing to detect replica ting malic io us co de-worms. It also allows to identify s egments-markers of computer viral infectio n, by detecting patterns existing in some database of malicious seq ue nc e s . Such databases are used e.g. in cont emp orar y intrusion detection systems or spam filters. It has been shown that being a ble to p e rform pattern matc hing against only fixed-leng th prefixe s or substrings o f longer sequence s can provide appr oximate hint s as to the presence of suspicious conten t [2]. Likewise, esta blished worm detection tec hniques such as Autograph [3] or E arlyBir d [4] are based on coun ting frequency of small blo cks of a fixed size. Sharing of reduced loca l information among the members of an interconnected computer system o r com- m unication net work thus helps to disco ver a tta cks earlier . Affected parts may b e isolated and further a tta ck spread prev ented. The b enefits of sharing lo cal information ma y be reap ed in case of exis tence of a co m- putational informa tio n pro cessing , whic h pr eserves lo ca l information (e.g . all segments of certain max ima l length) and mak es imp oss ible to reco nstruct longer o r sensitive parts of the information sequences. W e call such information pro ces s ing c o nc e al ing . The systems which conceal information and share the concealed infor mation ar e lik ely to poss e ss a co mpe tative adv an tage in the form of robustness, attack resis- tance and immunit y due to ability to exchange, publish and pr otect information. Clearly , any info r mation concealing alg o rithm needs to address tw o conflicting g oals: (1) preser v ing pr esenc e and, p ossibly , fr e q uency r ank of segments of given size (making spam iden tifica- tion and w orm detection still p ossible), while (2) making reconstruction of co nt ent longer than the predefined limit computationa lly ha rd (e.g. dis- abling interpretation or unders tanding of the priv ate conten t). 1.1. Main con tribution. The main co nt ribution o f this paper is • F ormulation o f the information concea ling problem • Presentation of an information concealing alg orithm • Analysis of the algor ithm a nd a pro of of the hardness of reconstruction of the input s equence M. Loebl is with the De partment of App lied Mathematics and Institute of Theoretica l Informatics (ITI), Charles Univ ersity , Prague, Czec h Republic. L. Kencl is with the Research and D evelopmen t Centre (RDC), Czech T echnica l Univ ersity , Prague, Czec h Republic. 1 2. Rela ted W ork 2.1. Rep e ats in DNA. Our inspiration comes fr o m an imp or tant fea ture of euk aryotic DNA, namely that it con tains v a rious r ep e at families , and that t heir presence constitutes a basic difficult y in DNA r e construction by hybridisation [6]. A la rge prop ortion of euk ary otic genomes is co mp o sed of DNA segments that are rep eated either precisely or in v ariant form more than once. Highly rep eated segments ar e arrang ed in tw o wa ys: as tandem arrays or disp er sed among man y unlinked genomic lo cations. As yet, no function has be en a s so ciated with man y of the rep eats [8]. In the paper [1] which accompanies this pap er, the authors prop o se that in euk aryotes the cells hav e DNA as a depos itary of co ncealed genetic infor mation and the g enome achieves the self-co ncealing by ac cum ulation a nd maintenance of rep eats . The pro tected information may b e shared and this is useful for the dev elopment of intercellular commun ication a nd in the developmen t of multicellular org anisms. The asser tion that the rep eats a re maintained in DNA in a pr ogra mmed w ay for self-concea ling explains basic puzzling fea tures of rep eats: the uniformity along with the p olymo rphism of the rep eated sequences; the freedom of the rep eated DNA to adopt q uite different primar y sequences in closely related sp ecies; appare nt non-functionality of the precise amount o r the precise sequence of the r ep eats. The containmen t of repea ts versus DNA s e q uencing pro blem is receiving extensive attent ion of biologis ts , computer scientists and ma thematicians (see [5 ], [6], [7]). 2.2. Rep e ats v ersus DNA reconstruction. W e explain the basic idea of concealing b y repeats in this subsection. Assume w e ar e given a collection K of segments of DNA. E ach s e g ment S fro m K is divided into t wo parts, the initial pa rt S ( I ) and the ter minal pa rt S ( T ). W e th us may write S = S ( I ) | S ( T ). This is a n artificial a ssumption imp osed o nly for the clarity of the presentation. A r e c onstruction of K is a sequence of its segments so that the terminal part of each segment a grees with the initial part of next segment in the sequence. If several o f these initial and terminal par ts coincide, there may be an exp onential n umber o f p ossible r econstructions. Let us consider a v ery s imple example. Let K be the following collection of segments, where the initial and the ter minal pa rts ar e divided by the vertical line: A | B , B | A, A | C, C | A, B | C, C | B . the following sequences ar e some o f the pos s ible recons tr uctions: AB AC B C A, AC AB C B A, B AC AB C B , AB C AC B A . In this simple example, even unlimited computational p ower is useless to anybo dy who wan ts to obtain the correct reconstruction from the many p os s ible r e constructions. This pheno menon may well be describ ed in terms of the de Bruin graph: this gr aph ha s a no de for each s egment which is an initial or a terminal part of an elemen t o f K . F o r each segmen t S of K there is an a rrow (a dir ected edg e) from S ( I ) to S ( T ). A B C Figure 1. De Bruin graph fo r K 2 The p o ssible reconstructio ns now corre s po nd to the walks on the de B r uin graph s o that each directed edge is trav ersed e xactly once. These walks a r e us ua lly called Euler w alks. If a no de of the de Bruin gra ph has more than one outgoing inciden t directed edge, then lo cally there are several indep e nden t ways to trav erse these edges. The num b er of the E uler w alks of the de Bruin gra ph is therefor e typically exp onential in the nu mber o f these no des (see [7] for the ca lculations). 2.3. Concealing in Inform atio n and Com m unication T ec hnologies. The concept of hiding pr iv ate or sensitive data but preserv ing some fo r m of structural infor mation has b een studied r ecently in v arious sub- domains of ICT. Some techniques co ncentrate on hiding the orig ina tor of information, i.e . anonynimization , other fo cus on enabling par ticular functions ov er the da ta that can b e shared amo ng multiple par tners, such as private matching . 2.3.1. Conc e aling N etwork Data. An anonymization scheme ov er the netw ork pack et IP addr e sses called CryptoPan [1 1] preser ves the prefix hierar ch y of the original addresse s , while making them computationally hard to reco ns truct by us ing hashing . This in turn allows to shar e netw ork traces (with pack et he a ders only), with prese rv ation o f the prefix hierar ch y . Similarly , in [12], structure of the router configuratio n files and data is preserved, while the a ctual v alues are obfusca ted. A technique to pro cess and tra nsform the net work p acket p a ylo ad has be e n prop osed in [13]. This metho d uses dictionaries of impor tant sequences that are v aluable fro m the data mining p ersp ective and should b e preserved, while encry pting the rest of the information with crypto graphically strong ha sh function. This techn ique p erforms well in ter ms of data protection, how ever, it only allows to study co nten t p ortio ns pre- determined by a known list, and thus do es not allow to study the payload to detect pr eviously unknown conten t, suc h as e.g. malicious subsequence. The p opular Blo om filter [1 4] approa ch is used in constructing the Hiera rchical Blo om Filter p aylo ad attribution technique [15]. A Blo om filter ca n store (incompletely but efficien tly) input items (which ca n be substrings ) and easily a nswer set membership queries. It consists of k hash functions, eac h as so ciating one of m n um b ers to each input item. Set mem ber ship queries exhibit no false negatives, but can hav e false po sitives. Pa yload attribution with a Hierar chical Blo o m Filter stores s egments of ne tw ork pa ck et pa yloads with their IP so urce and destination addresses. Each payload is cut into segments s 1 , . . . , s n . The s i ’s are stor ed in a Blo om filter o f level 0, the p airs s 1 s 2 , s 3 s 4 , . . . in Blo om filter of le vel 1, quadruples in level 2, and so on. A query on an excerpt of payload, which ma y consist of several conse cutive blocks, may answer the source and destination address by running thro ugh consecutive hierarch y levels. The autho r s prop o se deploymen t at net work concentration po ints. Priv acy pro tec tio n is to b e ac hieved by restricting acces s of ent ities that can pose quer ies, otherwise exhaustive attacks might lead to pa yload reconstructio n. 2.3.2. Private matching. Pr iv ate matc hing [16, 17] fo c uses o n the proble m of tw o entities try ing to find common data elements in their databas es, without revealing priv ate informatio n. The basic prop er ty (and difference to the general infor mation concea ling problem) is that only tw o parties ar e in volv ed; a m ulti- party solution is a future work suggestio n. F urther problems are asymmetry in the sequence of informatio n exchange among the pa r ties and needed presumption of honesty (’semihones t y’ in the pap er). Priv ate matching is a specia l case of cr yptogra phy theo r y of multi-p arty c omput ation : m parties w ant to compute function f on their m inputs. In the ide al mo del , wher e a trusted party exists, the parties give their inputs to the tr usted authority , it c alculates f , and returns the r esult to each par ty . The ideal model assumes an ideal situation: for ex ample, no pr oto col can preven t a par ty to change its input b efor e the communication is started. A secure m ultiparty computatio n proto co l emulates what happ ens in the ideal mo del. Paper [16] also introduces ’da ta ownership ce r tificates’ to mo dify the pr iv ate matc hing proto cols to b e unspo ofable. This technique is sho wn to b e useful in a more pr actical setting to enable priv acy-pro tecting sharing of e-mail white-lists in [18]. 3 2.3.3. Data Masking. V arious techniques of ma s king, sanitizing a nd obfuscating data have b een studied to enable test- or third-party developmen t o ver se ns itive databases (suc h as the Human Reso ur ces data). After sanitization, the databas e rema ins usable - the lo ok- a nd-feel and so me r elations and distributions are pr eserved - but the information co nten t is secure . The used techniques include masking, shufflin g, substitution, num be r-v ariance, encryption etc [1 9]. These techniques share a similar go a l with informatio n concealing, but fo cus on str uctured da ta without the need o f preserving the lo cal informa tion. 2.3.4. Data Mining and Anonymization. In data mining, anonymization mechanisms (obfusca ting the origi- nator or the priv ate pa rt of the data) ar e cur rently studied in tensively . Priv acy mechanisms can be classified int o several ca tegories, accor ding to where they are deploy ed during the life cycle of the data. The mecha- nism prop osed in this pap er falls into the category where the individuals trust no one but themselv es, and they conceal their resp ective data befor e they make them av a ilable fo r sharing. The existing algo rithms in this catego ry [20, 21, 2 2, 23, 24] are called lo ca l p erturba tio n; they a re based on differen t ideas then the concealing pro ce dur e prop os ed in this letter. In a nother categ ory , data publishing, data are anonymized at a ce ntral s e r ver; the individuals are required to trust this server [25]. Anonymization in so cial netw orks is studied in [26]. An imp o r tant theoretica l foundatio n for data anonymit y a nd originator protection was laid in [29]. The k-anonymit y mo del for pro tecting pr iv acy allows holder s to release their priv ate data without being distin- guishable fro m at le a st k-1 other individuals als o in the release. 2.3.5. Ste gano gr aphy. This form of informa tion hiding [27, 28] is a related art and science o f writing hidden messages in such a wa y that no one apa rt from the sender and intended recipient rea lizes ther e is a hidden message; this nowada ys includes concealment of digital information within computer files. Compar ably , steganalys is is the art o f detecting the hidden information. Steganogr aphy is a mature science, in par ticular fo cussing on the do main o f Digital Rig ht s Management (DRM), where v arious ’watermark ing’ or ’tamp er-pr o ofing’ techn iques may se a mlessly em b ed extra informa- tion ab out the origin of a digital work within itself. This is a different goa l than the prop osed information concealing. W hile the embedded information may b e well c o ncealed and th us very hard to r econstruct, data mining of such information would not b e g enerally p ossible. How ever, it w ould be very int eresting to apply the steg a nalysis to ols to the infor mation concealing ’attacker pr oblem’ (see Section 3) and it certainly belo ngs amo ng our future w ork. 2.3.6. Information R etrieval. The ” attack er pro ble m” of concealed string reconstr uction (s e e Sectio n 3) ha s a strong connection to the problem of information retriev al [30], wher e probabilistic information ab out the exp ected str ing (e.g. natural text) ma y b e used to der ive further information o r assist text reconstructio n. 2.4. Segments shuffling. Finally we mention that our first attempt to solve the anonymization pr oblem [31] was using random p ermutations o f a c ollection of short ov erlapping segments. This metho d how ever by itself do esnot lead to conce aling the orig inal data informa tion. It is shown in this pap er that in order to sufficiently extend the families of rep eats of the r esulting seq uence and make the concealing successful, other pro cedures need to b e p erfor med as well. In pa rticular the overlapping segments cont aining complete lo cal info r mation need to b e prolonged b y a ttaching additional short seg ments to their b eginning and/o r their end. The shufflin g p er mutation a ls o needs to satisfy some pr op erties. This is describ ed in the res t of the pa p e r. 3. Informa tion Concealing Problem W e introduce formally the information-se quence co ncealing problem. Let | ω | denote the num ber o f symbols (length) of sequence ω . The se quenc e c o nc e ali ng problem is the following: Given a s equence ω and a small po sitive in teger k , w e want to transform ω to another sequence ω F so that: I.- If s is a segment of ω with | s | ≤ k , then s is a segment of ω F . II.- It is computationally har d to rec o nstruct sequence ω from ω F . II I.- The length of ω F is linea r in | ω | . IV.- It is a lso desir able tha t with lo w pr obability , a segment not in ω app ears in ω F , and that relative frequency (i.e., frequency rank) o f segments of ω of a giv en length is preserved in ω F . The precise statement o f these t wo co nditions is ho wev er strongly applica tion dep endent. 4 Given the statemen t o f the information concealing problem, the key issue is ho w muc h informa tion a b out ω can an attack er deduce from ω F ; let us call this issue the attacker pr ob lem . Clearly , the answer to the attack er problem is a pplication-dep endent. If the input sequence ω is very restrictive, e.g. if a shor t pre fix uniquely deter mines larger part of ω and the k- segments of ω may b e distinguished within the lar ger k-s egment super set of ω F , then ine v itably lar ge pa rt of input ω may be reconstructed from ω F . In quite a n umber o f pra ctical situations (DNA sequence, co mputer progr am, sound and video trace, text on non-sp ecific topic), how ever, this is not the case. Moreover, for restric tive input sequences, w e can pe r form preparator y pr o cedures (as pro cedure S describ ed b elow) which make the input sequence less specific. This pa rtly justifies the follo wing c onsistency assum ption concer ning the attack er problem whic h we need to make in or der to carr y the security analysis of the concealing algorithm. Prop ositi o n 3 . 1. The c omplete input of the attacker pr oblem, i.e. al l the useful information an attacker has ab out ω , is ω F , the length | ω | , the length k of the pr eserve d se gments and the c onc e aling algorithm use d in obtaining ω F . Thu s, an attack er ma y use the list of fr equencies of r ep eats of segments of ω F along with the knowledge of the concealing algo rithm to attempt the recons tr uction of ω . 4. Concealing by repea ts The input of the problem is a sequence over a n alphab et. W e first turn it into a cyclic sequence by connecting its beg inning and end. Next we describ e five pro cedur es which are used in the alg o rithm. The basic patter n o f all the pr o cedures is the same and may be describ ed a s follows: the input is a cyclic sequence ω . First, ω is partitioned into consecutive disjo int blo cks . Then the terminal part of the pr eceding block of length o (the overlap ) is added in front of ea ch blo ck. The r esulting segmen ts contain all the studied lo cal infor mation; depending on the pro cedure, these segments will a lso co ntain some excess information which is vital in a pro po sed co mpo sition of the procedur e s which fo rms our concealing algorithm. Next, a segmen t c a lled dust can but neednot b e added b ehind each s e gment. The enhanced blo cks are called the c ar ds . The last step consis ts in a rranging the car ds in to the output cy c lic sequence ω F . The first pr o cedure S has a preparator y character in the concealing alg orithm. Several runs o f S hav e the role of breaking the lo ca l sequential order in the input sequence. 4.1. Pro cedure S ( ω , o, l b, ub ) . Its input is a cy clic sequence ω , and it has parameter s o, l b, ub ; o stands for the size of the ov erlap, l b is for low er bound o f the leng th of a block, and ub is for the upp er bound of the length o f a blo ck. The pro cedure S ( ω , o, l b, ub ) is defined by 1.- 4. below. 1. W e partition (so metimes we say that we cut ) ω in to c onsecutive disjoint blo cks P 1 , . . . , P m such that the length o f each P i is chosen at r a ndom b etw een l b, ub . 2. W e add o verlap of length o in fro nt of each blo ck. The o verlapping segments thus con tain a ll the original sub-s e g ments of length up to o + 1. 3. The blo cks enha nced by the ov erlaps now start and end with the corr esp onding overlaps. If these were ar ranged into a cyclic sequence, the overlaps would neighbor. This may help a n attack er in reconstructio n. T o break the neig hborho o d relatio nship of the overlaps, we may add dust (a randomly chosen segment ) behind each blo ck. Adding dust is optional and applicatio n dep endent. A natural restriction is that the dust is a segment of the input sequence and that the av erage length of dust is 1 / 2( l b + ub ) − o to matc h the average length of the s e g ments complementing the ov erlaps. Ho wev er, depe nding on a pplications, and the stringency o f condition [ I V ] of the sequence concea ling pro blem, length of dust may be different and the dust need not b e a segment of the input sequence. 4. W e ar range the res ulting cards ra ndomly into a cyclic sequence. As an illustration we perfo r m S on an example input sequence: 5 Example 1: Pr o c e dur e S ( ω , o, l b, ub ) Input ω = ’ the aim of this pap er is to pres en t an i nformation concealing algorithm ’, parameter s o = 3 , l b = 4, ub = 6. 1. First, the input sequence is partitione d ra ndomly int o blo cks o f length 4 , 5 o r 6. The blo cks a re divided by ’+’ b elow: ’ the ai + m of t + his + pap e + r is + to pr + esent + an i + nfor + matio + n co + ncea + ling + algor + i thm +’ 2. Next we add ov erlap (of length o = k − 1 = 3 ) in fro nt of each block: ’ thm the ai + aim of t + f thi s + is pap e + ap er is + i s to pr + present + en t an i + n infor + formati + ti o n co + concea + cealing + ingal g or + gorithm + ’ 3. Next we add the dust be hind eac h blo ck (of length approximately 2 ), and we get the cards: ’ thm the aip + aim of tim + f this con + is pap e in + ap er is a + is to pro p + presen tese + ent an i lgo + n i nfo ri + form atifo + tion co + concea ci + cealing pa + i ngalgor p + gorithmap +’ 4. Finally the o utput is given by arranging the car ds in a ra ndom order (here w e use the order 14 , 9 , 10 , 1 3 , 5 , 3 , 12 , 1 , 6 , 4 , 7 , 11 , 8 , 15 , 2): ’ ingalgor pn infori formatifo cealing paape r is af this conconcea cithmthe aipis to pro pis pape in presen tesetion co en t an il gogorithmap ai m of tim ’ 4.2. Pro cedure S 1 ( ω , lb, ub ) . Pro cedure S 1 ( ω , lb, ub ) is as S but the overlap is always the whole preceding blo ck - t ypically exceeding the s ize needed to preserve the s tudied lo cal informa tio n (this excess is used in the comp os ition of the pro cedures for ming our c oncealing algo rithm). Hence, if the blo cks are ω = P 1 P 2 P 3 . . . P m , then the cards of S 1 are P 1 P 2 , P 2 P 3 , . . . , P m P 1 . Each P i app ears once as initial seg ment and once as terminal segment of ea ch card. Hence, the cyclic consecutive or der o f the cards of S 1 may b e descr ib ed by a p ermutation π of 1 , . . . , m ; for further discussions it turns out useful to define such p er mu tation so that it assigns , to eac h terminal blo ck of a card, the initial blo ck of the next ca r d. B y p ermutation of 1 , . . . , m we mea n a bijection from set { 1 , . . . , m } onto itself. If π is a p ermutation then π − 1 denotes the inv erse p ermutation ( π ( x ) = y if and only if π − 1 ( y ) = x ). Hence, in our forma lism, card P i − 1 P i is follow ed by card P π ( i ) P π ( i )+1 . The output o f S 1 th us always has form P 1 P 2 P π (2) P π (2)+1 . . . P π − 1 (1) − 1 P π − 1 (1) . F or instance , if w e have m = 3 then the ca rds ar e P 1 P 2 , P 2 P 3 , P 3 P 1 and a shuffl ing which re s ults in sequence P 1 P 2 P 3 P 1 P 2 P 3 is describ ed b y p ermutation π (1) = 2 , π (2) = 3 , π (3) = 1. 4.2.1. A c c eptable p ermutations. F or our purp os e s, not all p ermutations π are acce ptable; let us fo r mally denote by A the set o f all the ac c eptable p ermutations . T o define A , w e fir st intro duce an auxilia ry bipar tite graph G ( π ). Definition 4.1 . Graph G ( π ) has v ertex-set V = V 1 ∪ V 2 where V 1 = { u 1 , . . . , u m } and V 2 = { v 1 , . . . , v m } . The edge -set of G ( π ) is the union of thre e disjoint perfect matchin gs o f the v ertex-s et, namely: 1. The p erfect ma tching M 1 consisting o f the edges { u i , v i } . 2. The p erfect ma tching M 2 consisting o f the edges { u i +1 , v i } . 3. The p erfect ma tching M 3 consisting o f the edges { u π ( i ) , v i } . Definition 4.2. W e construct a directed graph G ′ ( π ) from G ( π ) by first directing each edge of M 2 ∪ M 3 from V 2 to V 1 , and then contracting each edg e of M 1 . Definition 4.3. (of set A of all acceptable per mut ations ) Permutation π is acceptable ( π ∈ A ) if and only if the following tw o conditions are satisfied: 1. The dire cted gr aph G ′ ( π ) has a directed euler ian closed walk where the edg es of M 2 and M 3 alternate. This condition is equiv a lent to saying that p er mutation π desc rib es a rea r rangement of the c ards of S 1 int o a sequence. 6 2. In the a uxiliary gra ph G ( π ), the union of the pe rfect matchings M 2 ∪ M 3 contains many (at least m/ c where c ≥ 2 is a small constant ) cycles. This condition is added in order to mak e the r econstruction of the input sequence ha rd; see the sections b elow. The following observ ation abo ut the graph G ( π ) will b e used in the analysis of the attack er problem. Observ at ion 4. 4. L et G ( π ) b e as in Definition 4.1. F or v i ∈ V 2 let s ( v ) = P i b e its asso cia te d se gment. Then we have the fol lowing e quali ty b etwe en cyclic se quenc es: P 1 P 2 P 3 . . . P m = s ( M 2 (1)) s ( M 2 (2)) . . . s ( M 2 ( m )) , wher e M 2 ( i ) denotes the vertex of V 2 c onne cte d with u i ∈ V 1 by an e dge of M 2 . F or illustr ation we p erfor m S 1 on the o utput seq uence of the previous e xample (which would b e the natural use of S 1 , as descr ib ed later ): Example 2: Pr o c e dur e S 1 ( ω , lb, ub ) Input ω = ’ i ng algor pn infori formatifo cealing paap er is af this conconcea cithmt he aipis to pro pis pap e i n pres e n tesetion co en t an ilgo gorithmap aim o f ti m ’, parameters lb = 6 and ub = 8 . First, the input sequence is partitione d ra ndomly in to blocks of length 6 , 7 or 8. The blo cks are divided by ’+ ’ b elow: ’ ingalgo + r pn inf + ori for + matifo + ceali ng + paap e + r is a + f this c + onconce + a cithm + the aipi + s to pro + pi s p + ap e in + presen t + eseti on + co ent + an il g + ogorith + m ap ai + m of tim +’ Next we add ov erlap in fro nt of each blo ck. F or pr o cedure S 1 the ov erlap is always the whole preceding blo ck. W e g e t the following cards; to make the example easier to understa nd we indicate by ’*’ the division of each card in to tw o blo cks: ’ m of tim*ingalgo + ingalgo* r pn inf + r pn inf*ori for + ori for*matifo + matifo*ceali ng + cealing* paap e + paap e* r i s a + r is a*f this c + f this c*onconce + onconce*a cithm + a cithm*the a ipi + the aipi*s to pro + s to pro* pi s p + pis p* ap e i n + ap e in* present + presen t*esetio n + esetion * co en t + co ent *an i lg + an ilg*ogo rith + ogorith*m ap ai + m ap ai*m of tim +’ Finally the output is given b y r earra nging the car ds b y a n acceptable pe r mutation, i.e. b y a pe r mutation whose cor resp onding bipartite graph consists o f a lot o f cycles . The smallest length of a cycle is 4. It is not difficult to s e e that the following p ermutation π crea tes nine 4 − c ycles and one 6 − cy cl e . In the following description of π , the cycles are group ed to gether; for ins tance the fir st 4 − cycle has edg es ( v 1 , u 10 ) , ( v 9 , u 2 ) , ( v 1 , u 2 ) , ( v 9 , u 10 ). The first t wo of them b elong to perfect matc hing M 3 , the last t wo belo ng to per fect matching M 2 . [ π (1) = 10 , π (9) = 2]; [ π (2) = 6 , π (5) = 3]; [ π (3) = 9 , π (8) = 4]; [ π (7) = 1 3 , π (12) = 8]; [ π (14 ) = 11 , π (1 0) = 15]; [ π (11) = 18 , π (1 7 ) = 12]; [ π (19) = 14 , π (13) = 20]; [ π (1 6) = 21 , π (20) = 17]; [ π (1 5 ) = 19 , π (1 8) = 16 ]; [ π (21) = 7 , π (4) = 5 , π (6) = 1]. Hence the final sequence (for ease of unders tanding we preserve the separa tion sym b ols ’*’, which in reality w ould not b e present):. ’ ingalgo*r pn inf paap e*r is a pi s p*ap e inthe aipi*s to prof thi s c*onconcer pn inf*ori foronconce*a cithm present*esetion m of tim*i ngalgoa cithm*the aipian i lg*ogori thap e in* presen togorith*map aico e n t *an ilgese ti on *co ent s to pro* pis pmap ai*m of timr is a*f thi s cm atifo * cealingori for*matifo cealing* paap e ’ 4.3. Pro cedure S 1+ ( ω , lb, ub ) . If the input of the pro cedur e S 1 comes fro m several runs of the preparato ry pro cedure S describ ed a bove, then we need to modify S 1 in o r der to make its output g eneric, that is to int entionally preserve the attack er-co nfusing ov erlaps. This mo dified pro cedure is called S 1+ . W e reca ll that S 1 rep eats the whole blo cks P i , i.e. the output o f S 1 is the cy clic sequence P 1 P 2 P π (2) P π (2)+1 . . . P π − 1 (1) − 1 P π − 1 (1) . 7 W e ass ume that the input ω of S 1+ comes from rep eated runs of pro cedur e S and so ω contains a lo t of segments of length o (the ov erlaps of runs of S ) rep ea ted at least t wice; let us denote b y R the set of all these se g ments. Pro cedure S 1+ starts as S 1 by partitioning of ω in to blo cks P 1 , P 2 , . . . , P m . The blo cks of S 1+ cut some of the segments from R . T o r eflect this, w e wr ite P i = r T i − 1 Q i r I i where • Segment r T i − 1 is an empty segment or a terminal segmen t of an elemen t of R cut by the partition betw een blo cks P i − 1 and P i . • Segment r I i is an empty seg ment or an initial segment of an element of R cut by the pa rtition b etw een blo cks P i and P i +1 . Summarizing this notation we write P 1 P 2 . . . P m = Q 1 r 1 Q 2 r 2 Q 3 r 3 . . . r m − 1 Q m r m , where each r i is such an element of R tha t is cut b y the blo cks of S 1+ , or a n empty seg ment. Ea ch P i = r T i − 1 Q i r I i where r i = r I i r T i . The first difference o f S 1 and S 1+ is that the ov erlaps of S 1+ are not the whole preceding blo cks. Instea d, the o verlap a dded in fron t o f blo ck P i +1 is Q i r I i . Hence, blo ck P i +1 with the overlap added in fro nt of it has form Q i r i Q i +1 r I i +1 . T o make the cards of S 1+ more generic (see the same step in the description of Pro cedur e S ), we c hange each such Q i r i Q i +1 r I i +1 int o Q i r i Q i +1 r ′ i +1 where r ′ i +1 is obtained from r I i +1 by adding a seg men t so that r ′ i +1 has length o and is repea ted else w her e in ω . Summarising, the o utput o f S 1+ has form Q 1 ∗ Q 2 ∗ Q π (2) ∗ Q π (2)+1 ∗ . . . ∗ Q π − 1 (1) − 1 ∗ Q π − 1 (1) ∗ , where each ∗ stands for a segment o f length o whic h is re p ea ted (at least) t wice in this output, or the empty string. More specifica lly , if ∗ follows segment Q i then it is equal to r i or to r ′ i . 4.4. Pro cedure S 2 ( ω , o ) . Let S 2 ( ω , o ) be a s follows: we a ssume its input is an output of S 1 , i.e. it is the cyclic sequence P 1 P 2 P π (2) P π (2)+1 . . . P π − 1 (1) − 1 P π − 1 (1) . Note that in this sequence , each blo ck P i app ears twice. Pro ce dur e S 2 first cuts ea ch P i randomly into P 1 i , P 2 i so that length of P 1 i is at leas t o , i.e. the whole o verlap of length o , which w e denote by o i , is c ontained in P 1 i . The trick of the c o ncealing a lgorithm is that b oth c opi es of e ach P i ar e cut in the same way! Let o i P 2 i denote P 2 i with the added ov erlap. F or example, if P i is equa l to ’ab cdefghijkl’ and o = 3 then a po ssible cut of S 2 is ’a b cde+ fghijkl’; P 1 i is equal to ’ab cde’, P 2 i is equa l to ’fghijkl’ and o i P 2 i is equa l to ’cdefghijkl’. W e may describ e the se t of the cards of S 2 as the disjoint union of t wo sets C 1 ∪ C 2 , where C 1 = { o 1 P 2 1 P 1 2 , o 2 P 2 2 P 1 3 , . . . , o m P 2 m P 1 1 } and C 2 = { o 1 P 2 1 P 1 π (1) , o 2 P 2 2 P 1 π (2) , . . . , o m P 2 m P 1 π ( m ) } . W e remark here that the car ds of C 1 corres p o nd to the edges o f pe rfect matching M 2 of gr aph G ( π ) and the car ds of C 2 corres p o nd to the edges of p erfect matc hing M 3 of G ( π ) (see Definition 4.1). Finally S 2 arrang es C 1 ∪ C 2 int o a random cyclic s equence. F or illustration w e p erform S 2 on the output sequence of the previous example 2 (whic h would b e the natural use of S 2 , as descr ib ed later ): 8 Example 3: Pr o c e dur e S 2 ( ω , o ) Input ω = ’ ingalgo* r pn inf paap e*r is a pis p*ap e inthe aipi*s to prof this c*onconcer pn inf*ori foronconce*a cithm pres en t*esetio n m of tim*ingal g oa cithm*the aipian ilg*ogo rithap e in* pres en togorith*map aico ent *an ilgesetio n *co en t s to pro* pis pmap ai*m of timr is a*f this cmatifo*cealingori for*m atifo cealing* paape ’, parameter o = 3. A consis ten t partitioning in to blo cks is indicated below: ’ inga + lgo* r pn i + nf pa + ap e*r i s + a pis + p*ap e + inthe a + ipi*s to + prof thi s + c*onco + ncer pn i + nf*ori + foronco + nce*a ci + thm pre + sen t*eseti + o n m of + tim *inga + lg oa ci + thm*the a + ipian i + lg*ogo ri + thap e + in* pre + sen togori + th*map + aico e + n t *an i + lgeseti + on * co e + n t s to + pro* pis + pmap + ai*m of + timr i s + a*f this + cmati + fo*ceal + ingori + for*mati + fo ceal + ing* pa + ap e ’ Next we add overlap (of length o ) in front of each blo ck (a nd we delete the ’helpful sy mbo l’ *): ’ ap einga + ngalg or pn i + n inf pa + paap er is + is a pis + pi s pap e + ap e inthe a + e aipis to + to prof this + is conco + nconcer pn i + n i nfori + ori foronco + nconcea ci + cithm pre + presen teseti + etion m of + of timing a + ng algoa ci + cithm the a + e aipian i + n ilgogo ri + orithap e + ap e in pre + presentogori + orithmap + ap aico e + o en t an i + n ilgese ti + etion co e + o ent to + to pro pis + pis pmap + ap aim o f + of tim r is + i s af thi s + is cmati + atifo ceal + ealing ori + ori formati + atifo ceal + e aling pa + paape ’ Finally we rear range the cards in a ra ndo m order. The resulting sequence is as follows: ’ n inforio ent s to ori formati paap en i lgeseti pis pap e n ilgo goringalgor pn iap eing aof timr is is af this presen tesetin inf paealingoriealing papresentogorietion m o f atifo cealap aim of ngalgoa cie aipisan iof tim ingaatifo cealis cmatipi s pmap orithap e is conco o ri foroncoto pro pise aipis to paaper isnconcer pn ieti on co e i s a pis cithm p reo en t an ito prof this nconcea ciap ai co eape int he aorithmap cithmthe aap e in pre ’ 4.5. Pro cedure S 2+ ( ω , o ) . W e assume its input is an output of S 1+ . This pro cedur e is defined analogo usly as S 2 with the only difference that the cuts are p erformed to segments Q i instead of segments P i . 5. The co ncealing algo rithm Let the input string b e ω , and the length of the pr eserved segments b e k . W e consider tw o scenario s, we ak c onc e aling and str ong c o nc e ali ng , dep ending on the nature o f the input. W e p erform the we a k c onc e aling algorithm if the input is nonsp ecific, i.e., s hort segments hav e man y pos sible alternative prolo ngations, or there do es not exis t any outside knowledge ab out the likeliho o d of presence of some segments in the input (e.g. an English text). The we ak c onc e aling algorithm may b e des c rib ed as ω F = S 2 ( S 1 ( ω , 3 k / 2 , 2 k ) , k − 1) . W e choose to hav e the block length in S 1 longer and to o verlap the whole blocks in S 1 since we want to ensure that the cuts of S 2 may be done in the same wa y in each of the tw o copies of the blo cks P i . The str ong c onc e aling algorithm may be written a s ω F = S 2+ ( S 1+ ( S . . . S ( ω , k − 1 , k , 3 k / 2))) , 3 k / 2 , 2 k ) , k − 1) , where the n umber of rep e titions of pro c edure S is application specific. 6. Anal ysis of the concealing algorithm Observ at ion 6 .1. The c onc e aling algorithm pr eserves al l se gments of length k pr esen t in the input se quenc e ω within the output se quenc e ω F . This obser v ation is straightforw ard as whenever an y of the ab ove pro cedures c uts the input string, an ov erlap o f length at least k − 1 is added in fr o nt of the segment following the cut, thus preserving a ll subsegments of length k whic h would otherwise b e partitioned by the cut. It is a lso stra ightforw ard that bo th weak and stro ng concealing algorithms are linear in | ω | if we hav e 9 • Access to a genera tor of rando m permutations of the num bers less than | ω | , • Access to a genera tor of rando m elemen ts o f A (see Definition 4.3). A random per m utation may be g enerated in linear time (see [9]). W e will not discuss the co mplexity of generating ra ndom elemen ts of A . Instead, we sp ecify a large subset B of A such that generating a ra ndom element of B may be reduced to generating a random p ermutation of a n um b er less than | ω | . Each elemen t o f B may b e constructed as follows: we take a ny per mut ation π of m / 2 (we as sume m is even) and we consider the pairing P ( π ) of { 1 , 2 , . . . m } giv en by (1 , π (1) + m/ 2) , . . . ( m/ 2 , π ( m/ 2) + m/ 2 ). This pairing ma y b e loo ked at as an in volution i ( π ) (a p ermutation α is inv olution if α ( α ( x )) = x for each x ) on m . Finally , we get element β = β ( π ) o f B by shifting i ( π ) by 1, i.e., by letting β ( a ) = i ( π )( a ) + 1 mo dulo m ; we hav e an a dditional c ondition that O j (1) 6 = 1 for j < m and O ( a ) = β ( a ) + 1. This condition makes s ure that the first condition of the definition of the acceptable permutation is satisfied. The following observ ation is stra ightforw ard. Observ at ion 6.2. |B | ≤ ( m/ 2 − 1)! F urther, the gr aphs define d by a p ermutation fr om B ar e disjoint unions of m/ 2 cycles of length 4 . Gener ating a r andom element fr om B is as har d as cho osing a r ando m p ermutation of m/ 2 . The following observ ation is also s traightforw ard. Observ at ion 6.3. The length of the outpu t of e ach of the pr o c e dur es appl ie d t o input ω is line ar in | ω | . F or example, for S and S 1 it is 2 | ω | . 7. Hardness of the a tt ac ker problem W e reca ll that the attack er problem introduced in Section 3 (see also P rop ositio n 3.1) reads: How mu ch information ab out ω c a n an attacker de duc e fr om ω F , | ω | , k and the know le dge of t he c onc e aling algorithm? F or ins ta nce, the attack er can try to get all the ov erlaps of S 2 since assuming ω F has n o ac cidental r ep e ats these ov erlaps app ear ex actly four times in ω F and no other segment is like that. The attacker may partition ω F int o cards as indicated by all these ov erlaps. She g e ts a co llection of cards, with ( k − 1)-leng th segments marked in the beginning and the end of each card. The attac ker want s to ov erlap these ma rked seg ments. Depending o n whether ω F has ac cidental r ep e ats , the attack er possibly cuts in more place s than were the original car ds used in the a lgorithm. Hence, in her co llection of cards some overlaps sho uld not hav e been considered, and s ome segments hav e overlaps with more than one other ca rd. These c onsiderations naturally sp ecify the domino and donk ey problems. In more r ealistic situation the attack er does not know the corr ect list of ca rds of S 2 and hence she needs to cho ose w hich 4 − rep eats to ig no re. W e may assume that s he ha s some hints as to whic h ov erlaps are ’likely’ ok. This is the situation w e mo del b y the follo wing pro blem. Shortest domi no ro w problem (SDRP). Assume we are given a co lle ction of domino es (domino will mean a rectang le par titioned vertically into t wo squa res, where one is initial and the other one is ter minal), and w e ar e als o given a gra ph on the s quares. This g raph should be in terpreted as the gra ph of hints. W e wan t to put all the domino es into a row, so that if t wo c onsecutive sq uares are connec ted b y an edge of the graph, w e can put one squa re on top of the other (i.e., iden tify them). The aim is to make the resulting row as shor t as p oss ible, i.e. to satisfy as man y hints as p ossible. Let us define the (de B ruijn-type) g raph G = ( V , E ) wher e V is the set o f all the squares, a nd E is the set of the domino es: edge e i connects the squares of domino Q i . The following obser v ation is straightforward. Observ at ion 7.1. Ther e is a natur al bije ction b etwe en the set of the Euler cir cu its (eulerian close d walks) of G and the set of al l the cir cular se quenc es c onsistent with the overlapping domino es Q 1 , . . . , Q m . Theorem 1 . The SDRP is se ar ch -NP-c omplete. Pr o of. Assume that in the a uxiliary graph, ther e is edge b et ween tw o squares if they are equal, but not all such edg es are there. This is exactly consistent with our interpretation. Now, in the r eformulation with the de Br uijn gr aph and the Euler circuit, this corresp onds to the problem that we are given a g r aph, with some transitions betw een neighbo ring edges recommended, and we wan t to find an E ule r circuit with as man y 10 recommended tra nsitions as p ossible. A particular instance is that s o me transitions ar e forbidden, and we wan t to find out whe ther Euler circ uit where all the transitions ar e allow ed exists. This is known to b e NP complete ([1 0]). W e have in fact a se ar ch inst anc e of this problem: we kno w that such an Euler circuit exists, a nd we wan t to find it. There is a standard trick whic h shows tha t the decisio n problem is p olynomial if the search problem is po lynomial: Assume ther e is a p olynomial algorithm A that solves the search version, and let its running time be n 10 , say . T o solve the decisio n pr o blem, we apply A to an input. It either finds the right Euler circuit and then the answer is YES, or it r uns longer than n 10 , and then the answer is NO. In the donk ey problem we assume that ω F has no ac cid ental r ep e ats . Wha t the attack er gets? There are tw o v ersions of the algorithm. Let us first consider the str ong c onc e aling where the preliminar y step is per formed. 1. As describ ed above, using the 4 − rep eats of length k − 1 of ω F , the attack er gets the ca rds of S 2+ , i.e. C 1 ∪ C 2 , where C 1 = { o 1 Q 2 1 r 1 Q 1 2 , o 2 Q 2 2 r 2 Q 1 3 , . . . , o m Q 2 m r m Q 1 1 } and C 2 = { o 1 Q 2 1 r ′ 1 Q 1 π (1) , o 2 Q 2 2 r ′ 2 Q 1 π (2) , . . . , o m Q 2 m r ′ m Q 1 π ( m ) } . 2. The attack er als o gets each Q 1 i and each o i Q 2 i since these are exactly maxima l initial and terminal segments of the car ds ab ov e which a re rep ea ted twice in ω F . 3. By matching the ov erlaps, the a ttack er gets ea ch pair Q 1 i Q 2 i since the overlap o i in o i Q 2 i is a terminal segment of Q 1 i and we may as sume that thes e ca nno t b e misinterpreted. 4. What the attack er gets fro m the initial a pplications of pro cedure S ? Ea ch of their overlaps (of length k − 1) app ears at le a st t wice in the input of S 1+ . Moreov er most of the cuts of the pr o cedures S are different. Let us r ecall here that among these overlaps may b e also the dust. Pro cedures S 1+ and S 2 cut into some of these. Those cut will remain 2- rep eats, tho se not cut may gain r ep e ats. Moreov er, S 1+ int ro duces dust in the b or der of each card: this adds 2 - rep eats o f strings of leng th k − 1 undistinguishable from the 2-rep eats coming fro m initial proc e dures S . In case weak concealing alg orithm is a pplied, the atta cker has 1 ., 2 ., 3 . wher e Q 2 i r i and Q 2 i r ′ i are r e pla ced by P 2 i and Q 1 i is r eplaced by P 1 i . The next pr op osition s umma r ises the p os sible types of rep eats intro duced by the alg orithm. Prop ositi o n 7.2. Al l the r ep e ats of ω F gener ate d by the we ak or str ong c onc e aling algorithm ar e those describ e d in 1 ., 2 ., 3 ., 4 . . Corollary 7.3. Al l the useful information for the attacker pr oblem is | ω | , k , and 1 ., 2 ., 3 ., 4 . . The information 1 ., 2 ., 3 . may b e describ ed by the a uxiliary bipartite gr aph G ( π ) de fined in Definition 4.1. If the weak concea ling algorithm is applied, information [4 . ] do es no t exist. The attack er pro blem is th us reduced to the following: The donkey-decision probl em. The input is a bipartite graph G wher e the vertices in b oth parts V 1 , V 2 are ordered. Let V 1 = { u 1 , . . . , u m } and V 2 = { v 1 , . . . , v m } . Mor e ov er a s egment s ( v ) of leng th at least 3 k / 2 is asso cia ted with each elemen t o f V 2 . The set o f the edges of G is fo r med by a disjoint union of t wo per fect matchings M 2 , M 3 . The attack er needs to reco nstruct string s ( M 2 (1)) s ( M 2 (2)) . . . s ( M 2 ( m )) , where M 2 ( i ) is the vertex of V 2 connected with u i ∈ V 1 by an edge of M 2 . The difficult y of the donkey-decision pr oblem is the following: bipa rtite graph G is a union of t wo edge- disjoint perfect matchings. E ach vertex of G th us ha s degree 2 and G is a union of disjoint cycles. T o solve the donkey-decision problem, one needs to cho ose the c orr e ct p erfe ct matching indep endently in each of these cycles (namely , the p erfect matching induced by M 2 ). This is impossible, and the list of all the p oss ibilities is a lmost alw ays exp o ne ntial in the n umber of the cycles, since each of the c ycles has tw o p erfect matchings. This is analysed precis ely below, when w e sp ea k ab out the fe a sible solutions . 11 Next we argue that, when the s tr ong co nc e a ling algor ithm is applied, the attack er problem is reduced to the donkey-decision problem to o. The attacker is le ft with the statistics of the repea ts of ω F . Here comes the reaso n why we intro duced the dust in S 1+ : it is to ma ke sure that the 2-r e pea ts a ppea r symmetric for bo th matchings M 2 , M 3 . This hides the rep eats in tro duced by the initial applications of pro cedure S . The information of [4 . ] is thus useless. W e obtain: Prop ositi o n 7.4. The attacker pr oblem for b oth st r ong and we ak c onc e aling is r e duc e d t o the analysis of the donkey-de cision pr oblem. A fe asible solution to the donkey-decision problem is any sequence s ( M (1 )) s ( M (2)) . . . s ( M ( m )), where M is an y p erfect matc hing of the input bipartite g raph G . In order to solve the donkey-decision pro ble m, one needs to c ho ose, from the p o ol of these feasible solutions, the unique corre c t one. Next w e ar gue that unless the input to our proble m is extremely restrictive, there is a n exp o ne ntial num ber of the comp etative solutions. The bipartite gra phs G coming from A hav e at least 2 m/c per fect matchings. T he output sequences o f t wo p erfect matchings M , N may still be equal: if the cycle has le ngth 4, this ha ppe ns if and only if the tw o vertices v i , v j of V 2 in each 4 − cycle in which M , N differ ha ve the same as so ciated seg ment ( s ( v i ) = s ( v j ) as defined in Obser v ation 4.4). F or ins ta nce, if all the vertices of V 2 hav e the same a sso ciated segment, then there is only o ne compe tative solution. This extreme situation may happ en if the input ω is a sequence of rep etitions o f one symbol only . If t wo symbols may a pp ear in the segments (of length at least 3 k / 2) asso ciated with the vertices of V 2 , then the probability that in a 4-cycles the co r resp onding pairs o f strings a re indistinguishable is 2 − 3 ka/ 2 . Hence with only exponentially small probability ther e is less than a n exp onential n umber of feasible solutions. 8. Conclusion W e define the information concea ling pro blem a nd prop ose an a lg orithm to solve it. It is based on the in tuition coming from the difficulties of DNA reconstruction b y h ybridisation. The algorithm may be efficiently implemen ted. In analysing the amo unt of information lea ked by the concealing a lgorithm to an attack er (this is called the attacker pr oblem in the pap er ), we firs t consider the ca se that the output contains random rep eats; this leads to the domino pr oblem which is shown to b e NP-complete. Even if the attack er solves the domino problem, she is faced with the donkey pr oblem which is reduced to the donkey -de cision pr obl em . It is shown that with hig h probability the donk ey-decisio n problem has a n exp o nential num b er o f feasible solutions among which the attack er needs to choose the correct one. References [1] J. Bl amey , L. Kencl, M. Lo ebl, DNA self - concealing by rep eats, Under r eview . [2] R. Ramaswam y , L. Kencl, and G. Iannaccone . Approximate fingerprinting to acce lerate patte rn matc hing. In Pr o c e e dings of the ACM Inte rnet M e asur ement Confer enc e (IMC) , Rio de Janeiro, Brazil, 2006. [3] H.-A. Kim and B. Kar p. Autograph: T o w ard automated, distributed worm signature detection. In Pr o c e e dings of the 13th Usenix Se curity Symp osium (Se curity 2004) , San Di ego, CA, August 2004. [4] S. Singh, C. Estan, G. V arghese, and S. Sav age. Automated worm fingerprinting. In Pr o c e e dings of the ACM/USENIX Symp osium on Op er ating Syste m Design and Implementation (OSDI) , San F rancisco, CA, USA, 2004. [5] R. Guy-F ranck , F. P aques, EMBO Rep orts 1, 122-126, 2000. [6] P .A. P evzner, Computational Molecular Bi ology: An Algorithmic Approach, The M IT Pr ess Cambridge, Massachusetts, L ondon, England (200 0). [7] R. Ar r atia, B. Boll obas, D. Copp ersmith, G.B. Sorkin, Euler circuits and DNA sequen cing by hybridization, Discr ete applie d Mathematics 104, 63-96, 2000. [8] M. Singer, P . Berg, Genes and Genomes, University Science Bo oks Sausalito, California (1991). [9] D. E. Kn uth, The Art of Computer Pr ogramming (II I edition), A dd ison-Wesley, R e ading, Massachussets (199 7). [10] M . R. Garrey , D. S. Johnson, Computers and Intracta bility: A Guide to the Theory of NP- Completeness. [11] J. Xu, J. F an, M. H. Ammar, and S. B. Moon. Prefix-preserving IP address anonymizat ion: Measuremen t-based securit y ev aluat ion and a new cryptograph y-based scheme. In Pr o ce e dings of the IEEE International Confer enc e on Network Pr oto c ols (ICNP) , 2002. [12] D . Maltz, J. Zhan, G. Xie, H. Zhang, G. Hjalmt ysson, J. R exfor d, and A. Greenberg. Structure pr eserving anonymization of router configuration data. In Pr o c e e dings of the A CM Internet Me asur ement Confer enc e (IMC) , T aormina, Italy , 2004. [13] R . Pang and V. Pax son. A high-level programming en vironmen t for pac ket trace anon ymization and transformation. In Pr o c e e dings of A CM SIGCOMM , 2003. 12 [14] B. Bloom. Space/time tradeoffs in hash co ding with all o wable err ors. In CACM , page 422. [15] K . Shanmugasunda ram, H. Bro ennimann, and N. Memon. P ayload attribution via hi erarch ical Blo om filters. In Pr o c ee dings of the ACM Confer enc e on Computer and Communic ations Se curity (CCS) , W ashington D. C . , USA, O ctob er 2004. [16] Y . Li, J. Tygar, and J. M . Hell erstein. Pri v ate matc hing. In Pr o c ee dings of the 30th VLDB Confer e nc e , T oron to, Canada, 2004. [17] R . Agraw al, A. Evfimievski, and R. Srik ant . Information sharing across pri v ate datab ases. In Pr o c e e dings of the ACM SIGMOD Inte rnational Confer enc e on Management of D ata , San Diego, CA, USA, 2003. [18] S. Garri ss, M. Kaminsky , M. J. F reedman, B. Karp, D. Mazires, and H. Y u. Re: Reliable email. In Pr o c e e dings of the Symp osium on Networke d Systems D esign and Implementation (NSDI) , San Jose, CA, USA, 2006. [19] N et 2000 Ltd. Data masking whitepaper s . h ttp://www.datamask er.com/ , 2005. [20] G. Miklau, D. Suciu, A F ormal Analysis of Information Di s closure in Data Exchan ge, Pr o c. of ACM SIGMOD Intl. Conf. on Management of Data (2006). [21] R . Agr a wal, R. Srik an t, Priv acy-preserving data mi ning Pr o c. of A CM SI GMOD Intl. Conf. on Management of Data (2000). [22] S. Agraw al, J. R. Haritsa, A fr amework for high-accuracy priv acy-rpreserving mining, Pr o c e e dings of the 21st Inte rnational Confer enc e on Data Engine ering ICDE (2005). [23] A . Evfimievski, J. Gehrke, R. Sri k an t, Limiting priv acy breac hes in pr iv acy preserving data mi ning, Pr o c. of ACM Symp. on Principles in Datab ase Systems PODS (20 03). [24] N . M ishra, M. Sandler, Pr iv acy via pseudo random ske tch es, Pr o c. of A CM Symp. on Principles in Datab ase Systems PODS (2006). [25] V . Rastogi, D. Suciu, S. Hong, The Boundary Betw een Priv acy and Utili t y in Data Publishing, Pr o c. of Intl. Conf. on V e ry L ar ge data Bases VLDB (2007). [26] M . Hay , M. Miklau, G.D. Jensen, S. W eis s , et.al., Anonymizing Social Net works, University of Massachusets, Amherst, T echnic al R ep ort (2007). [27] N . F. Johnson, Z. Duric, Z. and S. Ja jo dia, Information Hiding: Steganograph y and W atermarking - Attac ks and Coun- termeasures, Springer (2000). [28] I. Cox, M. Miller, J. Blo om, J. F ridric h and T. Kalker, Di gi tal W atermarking and Steganograph y , Second Edition, The Mor gan Kaufmann Series in Multime d ia Information and Syst ems (2007) . [29] L. Sw eeney , k-anonymit y: a mo del f or protecting priv acy , Internationa l Journal on Uncertaint y , F uzz iness and Kno wledge- based Systems, 10 (5), 557-570, (2002). [30] C. D. Manning, P . Raghav an and H. Sc htze, Introduction to Infor m ation Retriev al, Cam bridge Universit y Press, (2008). [31] L. K encl, J. Zamora and M. lo ebl, Pac k et Conte nt Anonymizat ion b y Hiding W ords, IEEE INF OCOM, (2006) . 13
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment