Attacking and Defending Covert Channels and Behavioral Models

CRESPI, CYBENKO, GIANI 1 Attacking and Defending Co v ert Channels and Beha vioral Models V alentino Crespi, Geor ge Cybenko, Annarita Giani Abstract In this paper we present methods for attacking and defending k -gram statistical analysis techniques that are used, for e xample, in network trafﬁc analysis and covert channel detection. The main ne w result is our demonstration of how to use a behavior’ s or process’ k -order statistics to build a stochastic process that has those same k -order stationary statistics but possesses dif ferent, deliberately designed, ( k + 1) - order statistics if desired. Such a model realizes a “complexiﬁcation” of the process or behavior which a defender can use to monitor whether an attacker is shaping the behavior . By deliberately introducing designed ( k + 1) -order behaviors, the defender can check to see if those behaviors are present in the data. W e also de velop constructs for source codes that respect the k -order statistics of a process while encoding cov ert information. One fundamental consequence of these results is that certain types of behavior analyses techniques come down to an arms race in the sense that the advantage goes to the party that has more computing resources applied to the problem. Points of vie w in this document are those of the authors and do not necessarily represent the of ﬁcial position of the sponsoring agencies or the U.S. Government. V . Crespi is with the Department of Computer Science, California State Uni versity at Los Angeles, Los Angeles CA, 90032 USA. email: vcrespi@calstatela.edu. Crespi’ s work was partially supported by AFOSR Grant F A9550-07-1-0421 and by NSF Grant HRD-0932421. G. Cybenko is with the Thayer School of Engineering, Dartmouth College, Hano ver NH 03755. email: gvc@dartmouth.edu. Cybenko’ s work was partially supported by Air Force Research Laboratory contracts F A8750-10-1-0045, F A8750-09-1-0174, AFOSR contract F A9550-07-1-0421, U.S. Department of Homeland Security Grant 2006-CS-001-000001 and D ARP A Contract HR001-06-1-0033 A. Giani is with the Department of EECS, Univ ersity of California at Berkele y , Berkele y CA 94720. email: agiani@eecs.berkeley .edu. Giani’ s’ s work was partially supported by U.S. Department of Homeland Security Grant 2006-CS- 001-000001 and DARP A Contract HR001-06-1-0033 when she was a Ph.D. student at Dartmouth September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 2 Index T erms Cov ert Channels, Exﬁltration, Probabilistic Automata, Cognitiv e Attack, Anomaly Detection. I . I N T R O D U C T I O N Computer security researchers hav e been in vestigating statistical behavioral modeling tech- niques as a means for determining whether a machine, a network or data packet contents are behaving “normally” or not. These are so-called behavior analysis techniques and implicitly model stochastic processes at some le vel of ﬁdelity . Consider for example, the problem of detecting covert channels. Some existing approaches assume that an adv ersary has installed an e xﬁltrating agent, or T rojan, which operates by encoding data in a way that introduces detectable re gularities in some network traf ﬁc statistics. For example, Giani et al. [1] and Cabuk et al. [2] estimate certain ﬁrst order statistics of packet inter-arri v al delays in order to determine whether a time covert channel is being used. Dainotti et al. [3] learn a Hidden Mark ov Process [4], [5], [6], [7], [8] using both packet inter -arri val delays and packet sizes to detect trafﬁc anomalies. Other techniques are based on v arious analyses of n -gram statistics [9]. In fact, some hav e called techniques that match n -gram statistics “mimicry attacks” and while techniques hav e been dev eloped for detecting certain simple types of mimicry , techniques for building mimicry attacks as described in the present paper appear to be novel [9]. General discussions of co vert channels and their taxonomies, existence and modeling ha ve been published [10], [11], [12], [13], [14]. The design, implementation and experimental e v aluation of sev eral speciﬁc covert channel attacks in real systems is of speciﬁc interest [14]. That work presents threat models, achie v able bit rates, noise properties and channel capacities for cov ert channels. The existence and successful use of a covert channel is based on the assumption that the cov ert channel code does not perturb the measured statistical properties of behavior so that, ov er time, a cov ert transmission does not introduce discernible patterns which are different than expected, at least with respect to what is measured. In this paper we assume the ability to learn a k -gram type model of “normal behavior . ” This is simply done by counting the occurrences of k -grams and then normalizing to produce frequencies or probabilities. It is important to note that researchers often talk about entropy as a channel statistic [10], [15] but entropy is typically September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 3 calculated from k -order statistics so that our methods for preserving k -order statistics preserves all lo wer order statistics and will also preserve the entropy . W e present a technique for encoding messages that respects these k -order statistics. Both attacker and defender can use this coding technique. The attacker could exﬁltrate coded in- formation while the defender could embed an encoded reference message or carrier to detect manipulations of the channel by an adversary attempting covert communications. That is, for any order k , an attacker or a defender can encode cov ert messages while otherwise respecting the k -order statistics of the trafﬁc. Also, we sho w how a defender can create a process of order k + 1 which has the same k -order statistics b ut speciﬁcally designed ( k + 1) -order statistics that the defender can easily monitor to see if the ( k + 1) -order statistics hav e been changed. Researchers have recently started to dev elop systematic taxonomies and examples of attacks against statistical machine learning techniques [16]. In that spirit, the present work de velops speciﬁc techniques to both attack and defend using certain statistical approaches. W e discuss these methods in the context of behaviors that hav e a ﬁnite set of observ able symbols (the alphabet). Interpacket arri v al times, packet sizes, header ﬁelds, packet contents and so on are examples of such observ ables if quantized into a ﬁnite number of bins. Our approach models the observ ables as a stationary stochastic process X [17]. After estimating the k -order statistics, we build a Probabilistic Deterministic Finite Automata model (PDF A) [18], [19], [20], [21] that realizes the k -order statistics. Using that PDF A, we sho w that: 1) an adversary can encode messages cov ertly while respecting the k -order statistics; 2) the defender can encode reference messages or a carrier while respecting the k -order statistics and; 3) the defender can build a more complex process which has the same k -order statistics but possesses deliberately designed ( k + 1) -order statistics. Examples of such cov ert channels in network trafﬁc include, but are not restricted to: • T iming Channels: The observable symbols are the inter-packet time delays, appropriately quantized; • Size Channels: The observ able symbols are quantized sizes of the packets; • Header Channels: The observable symbols are various header ﬁelds in TCP/IP packets September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 4 which can be manipulated by the transmitting entity without violating protocol semantics. Se veral such ﬁelds are known to exist [22]. It is important to clarify right away what we mean by a k -order statistic and a k -gram. Suppose we have an alphabet consisting of { α , β , γ } and we observe a sequence comprised of that alphabet, say αααβ αβ γ γ . The ﬁrst order statistics are [1 / 2 1 / 4 1 / 4] indicating that 1/2 of the symbols are α ’ s, 1/4 are β ’ s and 1/4 are γ ’ s. The 1 -grams are merely { α , β , γ } . The 2 -grams observed in this sequence are αα, αα, αβ , β α, αβ , β γ , γ γ and the 2 -order statistics for the 9 possible 2 -grams αα, αβ , αγ , β α, β β , β γ , γ α , γ β , γ γ are respecti vely [2 / 7 2 / 7 0 1 / 7 0 1 / 7 0 0 1 / 7] . That is, our k -grams are obtained by moving a sliding windo w of width k across the data one symbol at a time. This is not to be confused with moving that window across the data sequence k symbols at a time. The following discussion shows ho w a timing covert channel can be constructed based on a beacon and argues that a naiv e encoding of cov ert messages based on packet inter-arri v al times produces a clearly detectable distortion of the 1 -order statistics of those time interv als in network traf ﬁc [23], [1]. Figure 1 describes the setup. Machine A sends a regularly timed beacon to machine D. (Such a beacon can be a time server request or a stay alive beacon for instance.) The inter-pack et delays seen at machine B are not regular due to internal routing delays in the LAN. (These statistics were actually measured from a regularly timed beacon trav eling sev eral hops.) An intruder was able to compromise and control machine B which is inside the local network and a relay for the traf ﬁc between A and D. (B could be a proxy server , border router or other device for example.) Assume we set up a machine C outside the internal network perimeter to check for timing covert channels. C has seen a certain distribution of inter-pack et delays coming from A going to D. September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 5 !"#$ %&'()&*+$&(', -).$ /0'()&*+$&(',-).$ %&'()&( '$ 12$$3(*4-&$5(&'$*'$)(67+*)$ 89($:&'();*+5$< (6 =$5'* >$ *+:;(=$89($5();() =$( '4?$ @2$3(*4-&$:5$)(4(:;(A$*'$ B-)A()$)-7'()$,:'C$ :))(67+*)$:&'() DE*4.('$ A(+*>5$A7($'-$:& '()&*+$ !"#$4-&6(58-&=$('4 2$ F2$3(*4-&$:5$)(4(:;(A$ -7'5:A($'C($!"#$,:'C$ 5:9:+*)$:))(67+*)$:&'() D E*4.('$A(+* >5$ G2$H(58&*8-&$-I$B(*4-&$ ,C()($(0:5' (&4($*&A$ 4 - & '(&'$*) ($)(+(;*&'$B7'$ 8 9 ($A(+*>5$*)($ : ) ) ( +(;*& '2$ "$ 3$ J$ H$ K-&:'-):&6$-7'5:A($'C($&(',-).$ ,:++$5C-,$'C*'$1 5' $-)A()$5' *85845$ C*; ($&-'$4C*&6(A2$ "$4-9E)-9:5(A$B-) A()$)-7' ()$4*&$ (9B(A$*$4-;()'$9(55*6 ($75:&6$'C($ 5*9($1 5' $-)A()$5' *85845$*5$"L5$&-)9*+$ B(*4-&$5' *85845$*5$5((&$*'$32$ MC:5$1 5' $-)A()$A:5'):B78-&$,:++$B($'C($ 5*9($,C('C()$-)$&-'$'C()($:5$*$ 4-;()'$9(55*6($(9B(AA(A$B>$32$ MC:5$1 5' $-)A()$A:5'):B78-&$,:++$B($*+9-5'$ 'C($5*9($,C('C()$-)$&-'$'C()($:5$*$ 4-;()'$9(55*6($(9B(AA(A$B>$32$ Fig. 1. An intruder controlling machine B inside a local network exﬁltrates data coded in inter-packet delays received by machine C en route to machine D. Monitoring outside the intranet LAN will show the same ﬁrst order inter-packet delays with or without the covert channel as constructed in this paper . In this paper , we show ho w machine B can encode cov ert messages in the inter-pack et delays in such a way that the ﬁrst order statistics as seen by C remain unchanged from the original distribution. Con versely , we can deliberately defend against such channels by encoding messages so that any manipulation of the delays will be detectable on the outside at machine C, because the cov ert message will not be recei ved at C. Figure 2 shows the number of packets received with a giv en delay in two scenarios. The horizontal axis reports the inter-arri v al time in seconds, and the vertical axis the number of packets recei ved with those delays. In the left graph of Figure 2 are the observed inter-packet delays resulting from a regularly timed beacon trav ersing multiple hops in a LAN. On the right hand graph, we depict a naiv e covert timing channel using two time intervals to encode a message. It is evident from the data that the naive covert communication in the right graph can be easily detected if the 1 -order statistics of normal trafﬁc have been measured and are those on September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 6 Fig. 2. First order statistics of inter-pack et delays of normal trafﬁc (left) and a poorly designed covert channel using two delays only (right). (Packets were deliberately routed to hop several times between source and destination.) This paper develops techniques for creating covert channels that hav e the same statistics as the ones depicted on the left, e ven if higher order statistics are measured. the left. Howe v er , the 1 -order distrib ution on the left can be generated either by normal trafﬁc, as it was obtained, or by a covert channel, as we will show . In this contribution, we dev elop a more sophisticated approach than the naiv e approach insofar as we consider also statistics of arbitrarily higher order, i.e. k > 1 , and our results effecti vely sho w that, for any k , defenders and attackers both have technical approaches for , respecti vely , attacking or defending a k -order behavior with respect to covert communications. Consequently , the situation is an arms race in the sense that whichev er side has the ability to learn the highest order statistics wins. A. Outline of the P aper In Section II we present an illustrativ e example. In Section III we describe our method and sho w how to manipulate a behavior’ s statistics with Probabilistic Automata. In Section IV we provide a numerical example. In Section V we show ho w to use Probabilistic Automata to build a channel code that respects the statistics of trafﬁc up to some predecided order . Finally , Section VI contains some conclusions and future work references. September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 7 I I . A S I M P L E I L L U S T R A T I V E E X A M P L E T o illustrate these concepts, consider a simple binary observ able with v alues, 0 and 1 . It is assumed that these observ ables are irrelev ant to the normal operation of the underlying system and its semantics. For example, the observ ables could be quantized inter-arri v al times or unused packet header ﬁelds. Assume that the 1 -order statistics of these observables are r 0 > 0 and r 1 > 0 with r 0 + r 1 = 1 . This means that the relati ve frequency of 0 ’ s and 1 ’ s as observed in the behavior are r 0 and r 1 respecti vely . Now suppose an attacker has estimated these probabilities and seeks to exﬁltrate messages while respecting these probabilities. This is possible and, later in this paper , we revie w standard source coding ideas that allo w the attacker to create such codes efﬁciently . In fact, if the messages to be sent are binary and Bernoulli with p = 0 . 5 (such as for encrypted and/or compressed messages), then there are codes that use 1 /H ( r 0 ) = 1 /H ( r 1 ) bits in the co vert channel per original message symbol where H ( x ) = − ( x log 2 x + (1 − x ) log 2 (1 − x )) is the entropy function. W e show ho w to construct such codes to respect k -order statistics as well. By the same token, the defender can encode a reference signal, also respecting the ﬁrst order statistics as abov e, which can be decoded and veriﬁed at the receiving end. Note that no speciﬁc second order statistics r 00 , r 01 , r 10 and r 11 hav e been modeled so far , but if the process is modeled by a Bernoulli process with p = r 1 then the second order statistics would be r i · r j = r ij = P r ob ( ij ) = P r ob ( j i ) by independence. Ho wev er , the defender can construct a second order process with second order statistics r 00 , r 01 , r 10 and r 11 for which r ij 6 = r i · r j while satisfying the required ﬁrst order statistics, namely r 0 and r 1 . If the attack er exploits the channel through a purely ﬁrst order process, the constructed second order statistics r ij will likely not be observed by the defender who could then conclude that the traf ﬁc is being shaped by an adversary . T o illustrate this 2 -order construction, consider an automaton with two states, Q = { 0 , 1 } , corresponding to the two 1 -grams of observ ables. Let X be the matrix of the transition proba- bilities X =    p 00 p 01 p 10 p 11    . W e seek PDF As that hav e the stationary distribution π = [ r 0 1 − r 0 ] = [1 − r 1 r 1 ] = [1 − r r ] . September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 8 Speciﬁcally , we seek X that satisﬁes [1 − r r ] · X = π X = π = [1 − r r ] with X being a stochastic matrix (non-negati v e with row sums equal to 1 ). The class of PDF As that are 1 -order equi valent to the giv en process is therefore determined by a set of linear equality and inequality constraints as follo ws: (1 − r ) · p 00 + r · p 01 = 1 − r (1 − r ) · p 10 + r · p 11 = r p 00 + p 01 = 1 p 10 + p 11 = 1 p ij ≥ 0 . The four equations are linearly dependent and we can reduce them to the three equations and constraints r · p 11 − (1 − r ) · p 00 = 2 · r − 1 p 00 + p 01 = 1 p 10 + p 11 = 1 0 ≤ p 00 , p 11 ≤ 1 . There are an inﬁnite number of solutions according to p 11 = 1 − r r p 00 + 2 r − 1 r , 0 ≤ p 00 , p 11 ≤ 1 . (1) For example, if r = 0 . 3 , 1 − r = 0 . 7 then the constraints become: p 11 = 0 . 7 p 00 − 0 . 4 0 . 3 0 ≤ p 00 , p 11 ≤ 1 so letting p 00 = 0 . 8 we get p 11 = 0 . 16 0 . 3 = 0 . 5 ¯ 3 and therefore p 01 = 0 . 2 and p 10 = 0 . 4 ¯ 6 . This September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 9 yields 2 -order statistics of: r 00 = p 00 π 1 = 0 . 8 · 0 . 7 = 0 . 56 r 01 = p 01 π 1 = 0 . 2 · 0 . 7 = 0 . 14 r 10 = p 10 π 2 = 0 . 4 ¯ 6 · 0 . 3 = 0 . 14 r 11 = p 11 π 2 = 0 . 5 ¯ 3 · 0 . 3 = 0 . 16 . Notice that r 01 = r 10 = 0 . 14 , r 01 + r 00 = r 0 = π 1 = 0 . 3 and r 01 + r 11 = r 1 = π 2 = 0 . 7 as required. Another , equiv alent way to deri ve these relations is to note that there are two trivial solutions for X , namely X 1 = I 2 (the 2 by 2 identity matrix) and X 2 = 1 · π where 1 = [1 1] T is the column vector whose entries are all 1’ s. These two solutions are always dif ferent. Moreover , we can see that any con v ex combination ρX 1 + (1 − ρ ) X 2 for 0 ≤ ρ ≤ 1 is also a solution to all the constraints and in fact yields the same class of solutions as above. The point of this example is that we can shape the second order statistics of the observables without changing the ﬁrst order statistics. In particular , multiple choices for p 00 (and so for r 00 ) are possible, all of which lead to the same 1 -order statistics. A defender can shape the second order statistics so that if an attacker only obeys the ﬁrst order statistics, the defender can detect that the expected second order statistics are wrong. Note that the second order process in this example satisﬁes an additional constraint - namely , the marginal distributions must agree with the ﬁrst order process, namely r 01 + r 11 = r 1 and so on. Moreover , r 01 = r 10 must be true as well (this is a symmetry which arises from considering the 0 to 1 and 1 to 0 transitions in the observed sequence which must be equal). For higher order processes, the construction in v olv es identifying and dealing with additional constraints and ﬁnding realizations which satisfy them. These generalizations to higher orders are one of the main contributions of this paper . T o apply this construction to the empirical data shown in Figure 2, normalize the counts into frequencies or probabilities by dividing by the total packet count. This yields a vector of probabilities: R = [ 0 . 0029 0 . 0144 0 . 0734 0 . 1453 0 . 3094 0 . 1295 0 . 1151 0 . 1079 0 . 1007 0 0 0 0 . 0014 ] (2) September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 10 where the coordinates 1 through 13 correspond to delays of 0 . 01 through 0 . 13 in increments of 0 . 01 . W e seek to construct a Markov Chain whose states correspond to observable inter-packet delays and whose transition probabilities, P , describe the probability that one delay follows another . As explained above, P must satisfy two matrix equations (capturing the facts that R is a stationary vector for P and that P is ro w stochastic) R ∗ P = R and P ∗ 1 = 1 where 1 is the column vector of all ones. Moreov er , the entries of P are all non-negati v e. In this simple case, there are two solutions which are simple to identify , namely P B = 1 ∗ R and P D = I (3) where I is the 13 by 13 identity matrix. The reader can easily check that both these matrices satisfy the two required matrix equations. This construct is simple for 1 -grams but becomes more complex for general k -grams as shown below . Moreov er , for any 0 ≤ α ≤ 1 , P α = αP B + (1 − α ) P D is also a solution. Whereas P B deﬁnes a Bernoulli process and P D describes a completely disconnected Markov Chain with an inﬁnite number of ﬁxed distributions, P α deﬁnes a Marko v Chain that is irreducible, aperiodic and not a Bernoulli process for any 0 < α < 1 . Therefore, P α can be used by a defender to create speciﬁc second order statistics which an attacker would hav e to ﬁrst model and then respect. I I I . C O N S T R U C T I N G T H E A U T O M A T A In this section, we show ho w to construct automata that can reproduce observed statistics computed from data. Let Σ = { a, b, c, ... } be the ﬁnite observable alphabet and σ = | Σ | < ∞ be the number of observ ables. W e are assuming that we hav e sequences of observ ables from which we compute the relati ve frequencies of k -grams ( k ≥ 1 ): 0 ≤ R ( x ) ≤ 1 , X x ∈ Σ k R ( x ) = 1 . Here Σ k is the set of k -grams; that is, the set of all possible sequences of length k drawn from the alphabet Σ . September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 11 Roughly speaking, if s 0 s 1 ...s n − 1 = S 0: n − 1 is an observed data sequence of length n > k , R ( x ) is approximated by the number of occurrences of the substring x in S 0: n − 1 di vided by the total number of substrings of length k in S 0: n − 1 , namely n − k + 1 . The set of R ( x ) ’ s is precisely what we mean by the k -order statistics of the observations. These statistics must satisfy certain regularity conditions required by the proposed construction so some care must be taken in their computation. Speciﬁcally , the identity X a ∈ Σ R ( ay ) = X b ∈ Σ R ( y b ) = R ( y ) should hold for e very y ∈ Σ k − 1 . This can be accomplished by appending S 0: n − 1 with s 0 s 1 ...s k − 2 as a suf ﬁx, creating a periodic string ef fecti vely , and counting occurrences in the periodic string. Moreov er , this can be repeated for e very 1 < j < k by using a circular buf fer appending s 0 s 1 ...s j − 2 . All marginal distributions X w ∈ Σ k − j R ( w y ) = X w ∈ Σ k − j R ( y w ) = R ( y ) will hold for all y ∈ Σ j then. (Details are left to the reader .) W e will now construct a special type of Markov Chain in which Σ k are the states and the semantics of the k -grams are preserved so that if x = ay ∈ Σ k is an observed k -gram, then P ( ay , y b ) is the probability of transitioning to state y b where both a, b ∈ Σ . Such transitions are the only ones possible in the Marko v Chain k -gram model. Such models are called k th -order Marko v Models, k Markov Chains or k -gram models by different authors [19], [24]. Let π be the vector of measured k -gram statistics, R ( x ) , and let P be the desired Marko v Chain transition probabilities: P = ( P ( x, x 0 )) where the entries of both π and P are indexed by x, x 0 ∈ Σ k . The stationary probabilities of the desired Markov Chain are precisely π when the equa- tion π P = π is satisﬁed. This matrix equation consists of σ k equations and the stochasticity requirement on P is another σ k equations resulting in the follo wing 2 σ k equations ov erall: X x ∈ Σ k P ( x, x 0 ) R ( x ) = R ( x 0 ) , ∀ x 0 ∈ Σ k , (stationary probability conditions) (4) X x 0 ∈ Σ k P ( x, x 0 ) = 1 , ∀ x ∈ Σ k (probabilit y requirements) (5) September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 12 where P ( x, x 0 ) ≥ 0 as well. Because of the relationship between k -grams and the Markov Chain that we are seeking to construct, we can only hav e P ( x, x 0 ) 6 = 0 when x = ay and x 0 = y b for some a, b ∈ Σ and y ∈ Σ k − 1 . That is, y is the suf ﬁx of the state x = ay and we can only transition to states x 0 = y b which have y as a preﬁx and some suf ﬁx b ∈ Σ . Accordingly , for ev ery y ∈ Σ k − 1 , we have the 2 σ equations X a ∈ Σ P ( ay , y b ) R ( ay ) = R ( y b ) , ∀ b ∈ Σ , (6) X b ∈ Σ P ( ay , y b ) = 1 , ∀ a ∈ Σ (7) P ( ay , y b ) ≥ 0 (8) which are completely decoupled from the equations corresponding to ( k − 1) -grams other than y . Accordingly , we can solve each system independently . Noting that the k -grams statistics, R ( x ) , satisfy the marginalization relations X a ∈ Σ R ( ay ) = X b ∈ Σ R ( y b ) = R ( y ) , ∀ y ∈ Σ k − 1 , summing ov er b in the equations (6), we get X b ∈ Σ X a ∈ Σ P ( ay , y b ) R ( ay ) = X a ∈ Σ X b ∈ Σ P ( ay , y b ) R ( ay ) = X a ∈ Σ R ( ay ) = X b ∈ Σ R ( y b ) = R ( y ) which is an identity not in v olving the unkno wn P ( ay , y b ) . Accordingly , there are no more than 2 σ − 1 linearly independent equations in (6). In fact, if we deﬁne pr e ( y ) to be the number of nonzero R ( ay ) and post ( y ) be the number of nonzero R ( y b ) , there are in fact no more than pr e ( y ) · post ( y ) unknown probabilities, P ( ay , y b ) , and no more than pr e ( y ) + post ( y ) − 1 independent equations altogether . A. The Standar d Solution One solution to the equations, which we call the Standard Solution, is ¯ P ( ay , y b ) = R ( y b ) /R ( y ) because then X a ∈ Σ ¯ P ( ay , y b ) R ( ay ) = X a ∈ Σ R ( ay ) R ( y b ) /R ( y ) = R ( y b ) /R ( y ) X a ∈ Σ R ( ay ) = R ( y b ) and X b ∈ Σ ¯ P ( ay , y b ) = X b ∈ Σ R ( y b ) /R ( y ) = R ( y ) /R ( y ) = 1 . September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 13 This speciﬁc solution has pr e ( y ) · post ( y ) nonzero probabilities, P ( ay , y b ) , for the substring y ∈ Σ k − 1 by construction. By construction, this Marko v Chain is irreducible because we hav e constructed the transition probabilities from a circular buf fer so that there is a nonzero probability of going from any state with nonzero probability , namely R ( x ) , to any other state with nonzero probability . If additionally the constructed Standard Solution Markov Chain is aperiodic, its unique stationary distribution is precisely R ( x ) and its entropy rate is H ( ¯ P ) = H P ( X k +1 | X k 1 ) = − X a ∈ Σ X y ∈ Σ k − 1 X b ∈ Σ R ( ay ) ¯ P ( ay , y b ) log ( ¯ P ( ay , y b )) . (9) B. Extended Solutions If pr e ( y ) and post ( y ) are both strictly greater than 1, then pr e ( y ) · post ( y ) > pr e ( y ) + post ( y ) − 1 . From the theory of linear programming, there are feasible solutions to the linear program deﬁned by (6), (7) and (8) which hav e no more than pr e ( y ) + post ( y ) − 1 nonzero coordinates, namely the Basic Feasible Solutions [25]. Let such a Basic Feasible Solution be ˆ P ( ay , y b ) . As deri ved abov e, there are solutions with exactly pr e ( y ) · post ( y ) nonzero coordinates, namely the Standard Solutions, ¯ P ( ay , y b ) . Note that strict con ve x combinations of ˆ P with ¯ P , P u = u ˆ P + (1 − u ) ¯ P with 0 < u < 1 , deﬁne a continuum of solutions to (6), (7) and (8), with each solution corresponding to an irreducible Marko v Chain. This is the case because e very state is reachable from e very other state with nonzero probability due to the construction of the Standard Solution. Moreov er , when pr e ( y ) and post ( y ) are both strictly greater than 1, ˆ P and ¯ P are dif ferent. As an aside, we ha ve observed that Basic Feasible Solutions typically result in reducible chains because those solutions in v olve a minimal number of nonzero transition probabilities. I V . N U M E R I C A L E X A M P L E S In this section we demonstrate the constructions described abov e. 1) W e consider data generated by the automata depicted in Fi gure 3 which is a Hidden Mark ov Model (HMM), M = { A (0) , A (1) } , deﬁned by the two transition matrices A (0) =    0 . 5 0 . 5 0 0 . 5    , A (1) =    0 0 0 . 5 0    . September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 14 !" #" $"%"$&'" $"%"$&'" !"%"$&'" $"%"$&'" Fig. 3. A two state Hidden Markov Model used to generate data for the example. Transitions between states are labeled with the emitted symbols and the probabilities that the transitions occur so that 0 | 0 . 5 means the transition occurs with probability 0.5 and emits the symbol “0. ” The stochastic process of observables is not Marko vian of any order as can be seen by the fact that P ( y t = 0 | y t − k ...y t − 1 = 0 k ) 6 = P ( y t = 0 | y t − k − 1 y t − k ...y t − 1 = 10 k ) for any k . Moreover , it can be sho wn that this process is not equiv alent to any Probabilistic Deterministic Finite State Automaton. W e generated a sequence of 1000 observations by performing a simulation of this HMM, starting in state 1. 2) W e set k = 2 and computed the statistics R ( xy ) and R ( xy z ) by scanning the data sequence from left to right and computing sample av erages as appropriate:           R (00) = 0 . 513 R (01) = 0 . 244 R (10) = 0 . 244 R (11) = 0 . 000           , September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 15 and                         R (000) = 0 . 338 R (001) = 0 . 174 R (010) = 0 . 244 R (011) = 0 . 000 R (100) = 0 . 174 R (101) = 0 . 070 R (110) = 0 . 000 R (111) = 0 . 000                         . Observe that R (01) = R (10) which is a necessary re gularity that follows from the marginal- ization property: X a R ( ay ) = X b R ( y b ) = R ( y ) . In order to be sure that the estimates verify those consistency conditions we hav e treated the data stream as a circular buf fer as described previously . 3) W e built the Standard Solution, P , where P ( ay , y b ) = R ( y b ) /R ( y ) , and then we computed a different numerical solution, ˆ P , of the linear program (6), (7) and (8). 1 The two solutions are summarized belo w:                         P (00 , 00) = 0 . 678 P (00 , 01) = 0 . 322 P (10 , 00) = 0 . 678 P (10 , 01) = 0 . 322 P (01 , 10) = 1 . 000 P (01 , 11) = 0 . 000 P (11 , 10) = 1 . 000 P (11 , 11) = 0 . 000                         ,                         ˆ P (00 , 00) = 1 . 000 ˆ P (00 , 01) = 0 . 000 ˆ P (10 , 00) = 0 . 000 ˆ P (10 , 01) = 1 . 000 ˆ P (01 , 10) = 1 . 000 ˆ P (01 , 11) = 0 . 000 ˆ P (11 , 10) = 0 . 000 ˆ P (11 , 11) = 1 . 000                         . Note that the Basic Feasible Solution, ˆ P , has a maximal number of zeros and results in a reducible chain with three communicating classes, namely 00 , { 01 , 10 } , 11 . By con ve xity 1 ˆ P is a Basic Feasible Solution obtained by employing the Matlab linprog function. September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 16 P u = u · P + (1 − u ) · ˆ P is also a solution, for any 0 < u < 1 , so that for u = 0 . 5 and u = 0 . 2 we obtain respectiv ely the follo wing two different 2 -grams:                         P 0 . 5 (00 , 00) = 0 . 839 P 0 . 5 (00 , 01) = 0 . 161 P 0 . 5 (10 , 00) = 0 . 339 P 0 . 5 (10 , 01) = 0 . 661 P 0 . 5 (01 , 10) = 1 . 000 P 0 . 5 (01 , 11) = 0 . 000 P 0 . 5 (11 , 10) = 0 . 500 P 0 . 5 (11 , 11) = 0 . 500                         ,                         P 0 . 2 (00 , 00) = 0 . 936 P 0 . 2 (00 , 01) = 0 . 064 P 0 . 2 (10 , 00) = 0 . 136 P 0 . 2 (10 , 01) = 0 . 864 P 0 . 2 (01 , 10) = 1 . 000 P 0 . 2 (01 , 11) = 0 . 000 P 0 . 2 (11 , 10) = 0 . 200 P 0 . 2 (11 , 11) = 0 . 800                         . 4) No w compare the original 2 -order statistics speciﬁed by M with the statistics speciﬁed by the two new models, namely P 0 . 5 and P 0 . 2 as abov e:           R (00) = 0 . 513 R (01) = 0 . 244 R (10) = 0 . 244 R (11) = 0 . 000           ,           R 0 . 5 (00) = 0 . 513 R 0 . 5 (01) = 0 . 244 R 0 . 5 (10) = 0 . 244 R 0 . 5 (11) = 0 . 000           ,           R 0 . 2 (00) = 0 . 513 R 0 . 2 (01) = 0 . 244 R 0 . 2 (10) = 0 . 244 R 0 . 2 (11) = 0 . 000           . They are numerically identical as expected. Finally we verify that the 3 -order statistics are all different from each other and from the 3 -order statistics of the original data, R , pre viously listed.                     R (000) = 0 . 348 R (001) = 0 . 165 R (010) = 0 . 244 R (011) = 0 . 000 R (100) = 0 . 165 R (101) = 0 . 079 R (110) = 0 . 000 R (111) = 0 . 000                     ,                     ˆ R (000) = 0 . 513 ˆ R (001) = 0 . 000 ˆ R (010) = 0 . 244 ˆ R (011) = 0 . 000 ˆ R (100) = 0 . 000 ˆ R (101) = 0 . 244 ˆ R (110) = 0 . 000 ˆ R (111) = 0 . 000                     ,                     R 0 . 5 (000) = 0 . 430 R 0 . 5 (001) = 0 . 083 R 0 . 5 (010) = 0 . 244 R 0 . 5 (011) = 0 . 000 R 0 . 5 (100) = 0 . 083 R 0 . 5 (101) = 0 . 161 R 0 . 5 (110) = 0 . 000 R 0 . 5 (111) = 0 . 000                     ,                     R 0 . 2 (000) = 0 . 480 R 0 . 2 (001) = 0 . 033 R 0 . 2 (010) = 0 . 244 R 0 . 2 (011) = 0 . 000 R 0 . 2 (100) = 0 . 033 R 0 . 2 (101) = 0 . 211 R 0 . 2 (110) = 0 . 000 R 0 . 2 (111) = 0 . 000                     . These 3 -order statistics are calculated using the relationships ˜ R ( ay b ) = R ( ay ) · ˜ P ( ay , y b ) for the v arious ˜ R, a, y , b . Moreover , the R u are the same con ve x combinations as the the various P u ’ s. This e xample illustrates the v arious constructions we ha ve described in complete generality in the pre vious section. September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 17 V . A C O V E RT C H A N N E L C O D I N G T E C H N I Q U E In the previous section, we showed that given observed string frequencies, R ( z ) , z ∈ Σ k we can construct multiple Markov Chains, M , whose states are the k -grams ( z ∈ Σ k ), transition probabilities are P ( ay , y b ) , a ∈ Σ , y ∈ Σ k − 1 and whose stationary distributions are precisely the observed R . W e now show ho w to use such a Marko v Chain to encode messages while preserving the statistics, R , of the channel. This means that someone monitoring the channel will observ e the same k -gram statistics in spite of the fact that covert messages can be communicated within that channel. As noted before, this can be exploited by either attacker or defender . Conceptually , the coding concept is the opposite of the classical Shannon Source Coding Theorem [17] in the sense that traditionally we start with a stochastic source with entropy rate H that we seek to compress into binary strings whereas in this case we start with a collection of 2 r messages which we wish to efﬁciently encode using the dynamics and statistics of the giv en stochastic process. Because we hav e to respect the statistics of the channel, the encoding will typically not be be compressing but expanding the number of bits needed. Nonetheless, we still seek ef ﬁciency with respect to observing the channel’ s k -gram statistics. In this work, we will assume, for simplicity , that the communication cov ert channel is noiseless noting that the results can be extended to noisy channels in the traditional way . A more thorough analysis is deferred to a future study in which the Shannon capacity of noisy channels will be considered. This construction in v olves sev eral steps: 1) Compute the entropy of the irreducible Markov Chain M , H M , speciﬁed by transition probabilities, P M , and stationary distribution, R M : H M = − X ay ∈ Σ k R M ( ay ) X b ∈ Σ P M ( ay , y b ) log 2 ( P M ( ay , y b )) . Note that we construct the Marko v Chains to hav e a gi ven stationary distrib ution, R , so only P M is dif ferent for the different models. For the examples dev eloped in the previous section, we hav e computed: H P = 0 . 6863 , H ˆ P = 0 , H P 0 . 2 = 0 . 3165 , H P 0 . 5 = 0 . 5520 . Note that ˆ P is entirely deterministic and so has zero entropy . September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 18 Since we constructed these Marko v Chains so that dif ferent transitions from a state cor - respond to dif ferent observ ables (that is, be DPF A ’ s), knowledge of the initial state of the Markov Chain results in a one-to-one correspondence between state sequences and observ ation sequences. Hence the entropy rates of both the Markov Chain state sequences and resulting observ ation sequences are the same. Let Y s represent the stochastic process of observ ations produced by the constructed Mark ov Chain, M , starting in state s ∈ M . All states in M are recurrent by construction so the entropy rate of each process Y s is the same and equal to H M . 2) Apply the Shannon-MacMillan-Breiman Asymptotic Equipartition Property (AEP) The- orem [17] to each Y s sho wing that for large n there are approximately 2 nH M typical sequences of length n of Y s and each occurs with probability approximately (1 / 2) nH M . Consequently , in order to encode 2 r cov ert messages, say C i with 1 ≤ i ≤ 2 r we must hav e r ≤ nH M or equiv alently n ≥ r /H M so n is selected to encode 2 r dif ferent co vert message sequences accordingly . 3) Construct length n typical sequences of Y s by starting in state s and then performing a random walk of length n in M according to the probabilities P M . Such random walks deﬁne observation sequences of length n in Σ n . Produce 2 r ≤ 2 nH M unique sequences for each state s , labeling them as Y s ( i ) where 1 ≤ i ≤ 2 r ≤ 2 nH M . (If a random walk produces a sequence already generated, simply repeat until a nov el random walk is produced.) 4) Note that the k -gram frequencies of each z ∈ Σ k within the Y s ( i ) approach the original R ( z ) as n → ∞ because R is the stationary distrib ution of the Markov Chain and Y s ( i ) is produced by taking a random walk in the chain. 5) For each state, s , assign the covert message C i to Y s ( i ) . Pick a random initial state s (0) and assign a sequences of cov ert messages C i 1 C i 2 ...C i m to Y s (0) ( i 1 ) Y s (1) ( i 2 ) ...Y s ( m − 1) ( i m ) where s ( j ) is recursiv ely deﬁned as the state in which Y s ( j − 1) ( i j ) ended. Because each random walk in the sequence thus constructed starts in the state in which the pre vious random walk ended, the concatenated sequence of random walks is also a le gal random walk in the Marko v Chain, obeying all the transition probabilities. Moreo ver , the k -gram statistics in the overall concatenated sequence of mn observations is approximately R and approaches R September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 19 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0 50 100 150 200 250 Symbols (Coded Inter − Packet Delays) Symbol Count Fig. 4. The frequencies of the 13 dif ferent delays measured from the code word sequence are in the right graph. This is to be compared with the left graph which is from the empirical data as in Figure 2. Someone monitoring the delays would see no change in the distribution b ut a co vert channel is present. as n → ∞ . The encoded sequence is uniquely decodable by the receiv er as well. T o illustrate this construction, consider the example presented in the left of Figure 2 where we use the 1 -order statistics as in equation (2). W e take the con ve x combination (see Section II) P = 0 . 75 · P B + 0 . 25 · P D , which results in an entropy of H P = 0 . 004 as computed from (9). W e b uild a (2 r , n ) codebook as described abov e with r = 8 and n = d r/H P e = 1995 . That is, this encodes binary sequences of length 8 into inter-packet delays of length 1995. W e encoded 16 blocks of 8 random source bits each into 16 · 1995 = 31920 symbols from the alphabet Σ = { 1 , 2 , . . . , 13 } which correspond to the delays in the left graph of Figure 2. The obtained 1 -order statistics of the resulting 31920 long concatenated code word are R 0 = [0 . 0029 0 . 0144 0 . 0734 0 . 1453 0 . 3094 0 . 1295 0 . 1151 0 . 1079 0 . 1007 0 0 0 0 . 0014]; and are depicted in Figure 4. Note the empirical frequencies and graphs are identical to the displayed precision. This illustrates empirically the effecti veness of the construction described in this paper . Matlab code for reproducing these results is av ailable upon request. September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 20 V I . C O N C L U S I O N S A N D F U T U R E W O R K This paper has demonstrated that cov ert channels can e xist e ven when arbitrarily high order statistics about a channel are estimated and monitored. The resulting co vert channels can be used to either exploit or defend the channel and the advantage goes to the party that has the ability to estimate the highest order statistics. The adversarial nature of this situation falls within the scope of cognitive attacks [26], [27]. It can be described in abstract as follows: the en vironment (for example, inter -packet delays) is modeled as a stochastic process X (such as a Hidden Markov Model, Marko v Chain or other formalism). Both the attacker , A , and the defender , D , monitor the en vironment through functions f A ∈ F and f D ∈ F respecti vely (for example, f D ( X ) could be the probability distribution of k -grams produced by X ). The attacker guesses f D and manipulates X in order to produce a new process, A ( X ) , so that covert communications can be performed while respecting the behavior that the defender expects; namely , f D ( X ) = f D ( A ( X )) . On the other hand, the defender , by anticipating the attacker’ s guess of f D , picks a dif ferent ˜ f D and manipulates X to produce a new process D ( X ) so that: 1) f D ( A ( D ( X ))) = f D ( D ( X )) = f D ( X ) = f D ( A ( X )) : the defensiv e shaping action is imperceptible to the defender who uses f D ; 2) ˜ f D ( A ( D ( X ))) 6 = ˜ f D ( D ( X )) : the attacker’ s action (that is, creation of a cov ert channel) is detectable by the defender . The game consists of attacker and defender guessing and then exploiting each other’ s monitoring strategy and manipulating the en vironment accordingly . The common objecti ve of the players is to alter the en vironment in a manner that would be imperceptible to the opponent in order to perform a secret task (cov ert communication or cov ert channel detection). This work raises some questions which are deferred to future work. In particular , the following directions are worthy of future in vestig ation: • Inter-pack et delays in v olve real-world time so the question of stability when shaping the channel must be considered. That is, packets can be delayed by certain times only if there are packets in the queue to be delayed. Discussions of such queuing aspects of timing channels and the possibility of jamming them hav e been studied [28]. Relating this work September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 21 to timing channel jamming will be in v estigated. • W e used a circular b uf fer in Section IV to numerically estimate k -gram statistics so that the statistics ha ve the required mar ginalization properties. A single pass, online algorithm for implementing this circular buf fer only requires storing the ﬁrst and last k symbols of the data. In the absence of such a buf fer , the empirical statistics will not in general obey the marginalization identities and so some additional processing would be required. The use of singular value decompositions, non-negati ve matrix factorizations or other decomposition methods for imposing the regularity might be worth exploring further as alternativ es to the circular buf fer approach. • In principle, one can attempt to b uild automata smaller than the Marko v Chains we construct. In particular , Probabilistic Finite Automata (PF A) [29], [30], [31] could implement Marko v Chains based on k -grams but using fe wer states. Unlike k -gram based Markov Chains, k - PSAs have states that are labeled with input sequences of length at most k . So they can be seen as “variable length” k -gram Markov Chains. They can be learned efﬁciently in the KL-P A C sense [32], [33], [34] and are generally smaller than k -gram based Marko v chains (by having fe wer states). • W ithin the space of possible Markov Chains that realize gi ven k -gram statistics, it would be good to select the “best” chain from the point of maximizing entropy so that the cov ert channel coding is as efﬁcient as possible. Our experiments suggest that the so-called Standard Solutions presented in Section III-A hav e the largest entropy although we hav e not been able to prov e that analytically . • It is reasonable to ask how our results relate to the use of Hidden Marko v Models for modeling trafﬁc, as for example in [3]. It is known that a Hidden Markov Model with n states is completely determined by the 2 n -grams produced by the model so that reproducing 2 n -gram statistics will result in the same n state Hidden Marko v Model [8]. R E F E R E N C E S [1] V . Berk, A. Giani, and G. Cybenko, “Detection of co vert channel encoding in network packet delays, ” in Pr oc. of FloCon 2005 Pittsb ur gh, P A , 2005. [2] S. Cabuk, C. Brodley , and C. Shields, “IP covert timing channels: design and detection, ” Pr oceedings of the 11th ACM Confer ence on Computer and Communications Security , 2004. September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 22 [3] A. Dainotti, A. Pescap ´ e, P . S. Rossi, F . P almieri, and G. V entre, “Internet trafﬁc modeling by means of Hidden Markov Models, ” Computer Networks , v ol. 52, pp. 2645–2662, 2008. [4] Y . Ephraim, “Hidden Markov Processes, ” IEEE T r ansactions on Information Theory , v ol. 48, no. 6, pp. 1518–1569, 2002. [5] L. R. Rabiner, “ A tutorial on Hidden Markov Models and selected applications in speech recognition, ” Pr oceeding of the IEEE , v ol. 77, no. 2, pp. 257–286, 1989. [6] L. Finesso and P . Spreij, “ Approximate Nonnegati ve Matrix Factorization via alternating minimization, ” in Proceedings of Mathematical Theory of Networks and Systems, Leuven, Belgium , 2004. [7] ——, “Nonnegati ve Matrix Factorixation and I-di vergence alternating minimization, ” Linear Algebra and its Applications , vol. 416, pp. 270–287, 2006. [8] G. Cybenko and V . Crespi, “Learning Hidden Markov Models using Nonnegativ e Matrix Factorization, ” IEEE T r ansactions on Information Theory , 2011, to appear . [Online]. A vailable: http://arxi v .or g/abs/0809.4086 [9] K. W ang, J. J. Parekh, and S. J. Stolfo, “ Anagram: a content anomaly detector resistant to mimicry attack, ” Springer Lectur e Notes in Computer Science, Recent Advances in Intrusion Detection , vol. 4219, 2006. [10] R. Anderson and F . Petitcolas, “On the limits of steganography , ” Selected Areas in Communications, IEEE Journal on , vol. 16, no. 4, pp. 474 –481, May 1998. [11] R. Kemmerer , “Shared Resource Matrix Methodology: an approach to identifying storage and timing channel, ” A CM T r ansactions on Computer Systems , vol. 1, no. 3, pp. 256–277, August 1983. [12] Z. W ang and R. B. Lee, “New constructiv e approach to covert channel modeling and channel capacity estimation, ” in Information Security , ser . Lecture Notes in Computer Science, J. Zhou, J. Lopez, R. H. Deng, and F . Bao, Eds. Springer Berlin / Heidelberg, 2005, v ol. 3650, pp. 498–505. [13] A. Grusho, N. Grusho, and E. Timonina, “Problems of modeling in the analysis of co vert channels, ” in Computer Network Security , ser . Lecture Notes in Computer Science, I. K otenko and V . Skormin, Eds. Springer Berlin / Heidelberg, 2010, vol. 6258, pp. 118–124. [14] H. Okhravi, S. Bak, and S. T . King, “Design, implementation and ev aluation of covert channel attacks, ” in HST ’10: Pr oceedings of IEEE Conference on T echnologies for Homeland Security , Oct 2010. [15] I. Moskowitz and M. Kang, “Covert channels–here to stay?” in Computer Assurance, 1994. COMP ASS ’94 Safety , Reliability , F ault T olerance, Concurr ency and Real T ime, Security . Pr oceedings of the Ninth Annual Confer ence on , June 27 – July 1 1994, pp. 235–243. [16] M. Barreno, P . L. Bartlett, F . J. Chi, A. D. Joseph, B. Nelson, B. I. P . Rubinstein, U. Saini, and J. D. T ygar, “Open problems in the security of learning, ” in Conference on Computer and Communications Security , Pr oceedings of the 1st A CM W orkshop on W orkshop on AISec . Alexandria, V irginia, USA: Assocation for Computing Machinery , 2008, pp. 19–26. [17] T . M. Cov er and J. A. Thomas, Elements of Information Theory . John W iley & sons, 1991. [18] E. V idal, F . Thollard, C. de la Higuera, F . Casacuberta, and R. Carrasco, “Probabilistic Finite State Machines – Part I, ” P AMI , vol. 27, no. 7, pp. 1013–1025, July 2005. [19] ——, “Probabilistic Finite State Machines – Part II, ” P AMI , vol. 27, no. 7, pp. 1026–1039, July 2005. [20] C. De La Higuera and J. Oncina, “Learning deterministic linear languages, ” Lectur e Notes in Computer Science - Lectur e Notes in Artiﬁcial Intelligence , v ol. 2375, pp. 185–200, 2002. [21] C. de la Higuera and J. Oncina, “Learning stochastic ﬁnite automata, ” 0302-9743 - Lecture Notes in Computer Science - Lectur e Notes in Artiﬁcial Intelligence , v ol. 3264, no. 3264, pp. 175–186, 2004. September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 23 [22] S. J. Murdoch and S. Le wis, “Embedding covert channels into TCP/IP, ” 7th Information Hiding W orkshop, Barcelona, Catalonia (Spain), June 2005. [23] A. Giani, V . H. Berk, and G. Cybenko, “Data exﬁltration and co vert channels, ” Proceedings of the SPIE V ol. 6201, Sensors, and Command, Control, Communications, and Intelligence (C3I) T echnologies for Homeland Security and Homeland Defense IV Orlando, Florida, April 2006. [24] F . Jelinek, Statistical methods for speech recognition . MIT Press, 1997. [25] C. H. Papadimitriou and K. Steiglitz, Combinatorial optimization : algorithms and complexity . Dov er , July 1998. [26] G. Cybenko, A. Giani, and P . Thompson, “Cognitive Hacking, ” Advances in Computers , v ol. 60, pp. 36–75, 2004. [27] ——, “Cogniti ve Hacking: a battle for the mind, ” IEEE Computer , vol. 35, no. 8, pp. 50–56, 2002. [28] J. Giles and B. Hajek, “ An information–theoretic and game–theoretic study of timing channels, ” Information Theory , IEEE T r ansactions on , v ol. 48, no. 9, pp. 2455–2477, sep 2002. [29] D. Ron, Y . Singer , and N. T ishby , “The po wer of amnesia, ” Advances in Neur al Information Pr ocessing Systems , vol. 6, 1993. [30] ——, “Learning probabilistic automata with v ariable memory length, ” in Pr oceedings of the W orkshop on Computational Learning Theory , 1994. [31] N. Palmer and P . W . Goldberg, “P AC-learnability of probabilistic deterministic ﬁnite state automata in terms of v ariation distance, ” Theor . Comput. Sci. , vol. 387, no. 1, pp. 18–31, 2007. [32] A. Clark and F . Thollard, “P A C-learnability of probabilistic deterministic ﬁnite state automata, ” Journal of Mac hine Learning Resear ch , vol. 5, pp. 437–497, 2004. [33] N. Abe and M. W armuth, “On the computational complexity of approximating distributions by probabilistic automata, ” Machine Learning , vol. 9, pp. 205–260, 1992. [34] A. Apostolico and G. Bejerano, “Optimal amnesic probabilistic automata or how to learn and classify proteins in linear time and space, ” J ournal of Computational Biology , v ol. 7, no. 3/4, pp. 381–393, 2000. V alentino Crespi receiv ed his Laurea Degree and his Ph.D. Degree in Computer Science from the Uni versity of Milan, Italy , in July 1992 and July 1997, respectively . From September 1998 to August 2000 he was an Assistant Professor of Computer Science at the Eastern Mediterranean Univ ersity , Famagusta, North Cyprus and from September 2000 to August 2003 he worked at Dartmouth College, Hanover , NH, as a Research Faculty . Since September 2003 he has been on faculty at the Department of Computer Science, California State University , Los Angeles, currently in the capacity of Associate Professor . His research interests include Distrib uted Computing, T racking Systems, U A V Surv eillance, Sensor Netw orks, Information and Communication Theory , Complexity Theory and Combinatorial Optimization. At Dartmouth College he de veloped the T ASK project and consulted to the Process Query Systems project, directed by Prof. George Cybenko. At CSULA he has been teaching lower di vision, upper division and master courses on Algorithms, Data Structures, Jav a Programming, Compilers, Theory of Computation and Computational Learning of Languages and Stochastic Processes. During his professional activity Dr Crespi has published a number of papers in prestigious journals and conferences of Applied Mathematics, Computer Science and Engineering. Moreov er Dr Crespi is currently a member of the A CM and of the IEEE. September 17, 2021 DRAFT CRESPI, CYBENKO, GIANI 24 George Cybenko is the Dorothy and W alter Gramm Professor of Engineering at the Thayer School of Engineering at Dartmouth College. Prior to joining Dartmouth, he was Professor of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. His current research interests are in machine learning, signal processing and computer security . Cybenko was founding Editor-in-Chief of IEEE Computing in Science and Engineering and IEEE Security & Privac y . He earned his B.Sc. (T oronto) and Ph.D (Princeton) degrees in mathematics and is a Fellow of the IEEE. Annarita Giani recei ved her Laurea (Master degree) in Mathematics from the Universit ` a di Pisa, Italy . Thereafter , she work ed as a researcher for the Italian Registration Authority , as well as the Istituto di Informatica e T elematica del Consiglio Nazionale delle Ricerche in Pisa. In 2001 she mov ed to the United States to commence a Ph.D in Computer Engineering at Dartmouth College’ s Thayer School of Engineering, Hanov er , Ne w Hampshire. While at Dartmouth, she participated to the Process Query System (PQS) project sponsored by the Advanced Research and Dev elopment Activity (ARD A). Her dissertation addressed issues relating to computer security , anomaly tracking and cognitiv e attacks. She receiv ed her doctoral degree in 2007. She presently holds the position of postdoctoral fellow at the Department of Electrical Engineering and Computer Science at the Univ ersity of California at Berkeley . While at Berkeley , she has been w orking on security for wireless and sensor networks, as well as security issues related to body sensor networks and critical infrastructures. September 17, 2021 DRAFT

Attacking and Defending Covert Channels and Behavioral Models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment