Polar Coding for Secure Transmission and Key Agreement
Wyner's work on wiretap channels and the recent works on information theoretic security are based on random codes. Achieving information theoretical security with practical coding schemes is of definite interest. In this note, the attempt is to overc…
Authors: O. Ozan Koyluoglu, Hesham El Gamal
The notion of information theoretic secrecy was introduced by Shannon to study secure communication over point-to-point noiseless channels [1]. This line of work was later extended by Wyner [2] to noisy channels. Wyner's degraded wiretap channel assumes that the eavesdropper channel is a degraded version of the one seen by the legitimate receiver. Under this assumption, Wyner showed that the advantage of the main channel over that of the eavesdropper, in terms of the lower noise level, can be exploited to transmit secret bits using random codes. This keyless secrecy result was then extended to a more general (broadcast) model in [3] and to the Gaussian setting in [4]. Recently, there has been a renewed interest in wireless physical layer security (see, e.g., Special Issue on Information Theoretic Security, IEEE Trans. Inf. Theory, June 2008 and references therein). However, designing practical codes to achieve secrecy for any given main and eavesdropper channels remained as an elusive task.
In [5], the authors constructed LDPC based wiretap codes for certain binary erasure channel (BEC) and binary symmetric channel (BSC) scenarios. In particular, when the main channel is noiseless and the eavesdropper channel is a BEC, [5] This work is partially supported by Los Alamos National Labs (LANL) and by National Science Foundation (NSF). The first author is partially supported by the Presidential Fellowship award of the Ohio State University. This work will appear in Proc. 21st Annual IEEE International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), Sept. 2010, Istanbul, Turkey.
presented codes that approach secrecy capacity. For other scenarios, secrecy capacity achieving code design is stated as an open problem. Similarly, [6] considers the design of secure nested codes for the noiseless main channel setting (see also [7]).
This work considers secret communication over a binaryinput degraded wiretap channel. Using the polar coding technique of Arıkan [8], we show that non-trivial secrecy rates are achievable. According to our best knowledge, this coding technique is the first provable and practical (having low encoding and decoding complexity) secrecy encoding technique for this set of channels. In the special case of the symmetric main and eavesdropper channels, this technique achieves the secrecy capacity of the channel 1 . Next, we consider fading wiretap channels and propose a key agreement scheme where the users only assumed to have the statistical knowledge of the eavesdropper CSI. The enabling observation is that by blindly using the scheme over many fading blocks, the users will eventually create an advantage over Eve, which can then be exploited to generate secret keys using privacy amplification techniques.
Throughout this paper, vectors are denoted by
or by x if we omit the indices. Random variables are denoted with capital letters X, which are defined over sets denoted by the calligraphic letters X . For a given set A ⊂ {1, • • • , N }, we write x A to denote the sub-vector {x i : i ∈ A}. Omitting the random variables, we use the following shorthand for probability distributions p(x)
Pr(X = x), p(x|y) Pr(X = x|Y = y).
Consider a binary-input DMC (B-DMC) given by W (y|x), where x ∈ X = {0, 1} and y ∈ Y for some finite set Y. The N uses of W is denoted by W N (y N 1 |x N 1 ). The symmetric capacity of a B-DMC W is given by
which is the mutual information I(X; Y ) when the input X is uniformly distributed. The Bhattacharyya parameter of W is given by
which measures the reliability of W as it is an upper bound on the probability of ML decision error on a single use of the channel.
Polar codes is recently introduced by Arıkan [8]. These codes can be encoded and decoded with complexity O(N log N ), while achieving an overall block-error probability that is bounded as O(2 -N β ) for any fixed β < 1 2 ([8], [10]). In [8], channel polarization is used to construct codes (polar codes) that can achieve the symmetric capacity, I(W ), of any given B-DMC W . Channel polarization consists of two operations: Channel combining and channel splitting. Let u N 1 be the vector to be transmitted. The combined channel is represented by W N and is given by
where B N is a bit-reversal permutation matrix, N = 2 n , and
. Note that the actual channel input here is given by x N 1 = u N 1 B N F ⊗n . The channel splitting constructs N binary input channels from W N , where the transformation is given by
The polarization phenomenon is shown by the following theorem.
Theorem 1 (Theorem 1 of [8]): For any B-DMC W , N = 2 n for some n, and δ ∈ (0, 1), we have
In order to derive the rate of the channel polarization, the random process Z n is defined in [8] and in [10]. Basically,
The rate of the channel polarization is given by the following.
Theorem 2 (Theorem 1 of [10]): For any B-DMC W and for any given β < 1 2 , lim
Now, the idea of polar coding is clear. The encoder-decoder pair, utilizing the polarization effect, will transmit data through the subchannels for which Z(W (i) N ) is near 0. In [8], the polar code (N, K, A, u A c ) for B-DMC W is defined by
where u A c is a given frozen vector, and the information set A is chosen such that |A| = K and Z(W
N ) for all i ∈ A, j ∈ A c . The frozen vector u A c is given to the decoder. Arıkan's successive cancellation (SC) estimates the input as follows: For the frozen indices ûA c = u A c . For the remaining indices s.t. i ∈ A; ûi = 0, if
) and ûi = 1, otherwise. With this decoder, it is shown in [8] that the average block error probability over the ensemble (consisting of all possible frozen vector choices) of polar codes is bounded by
We now state the result of [8] using the bound given in [10]. Theorem 3 (Theorem 2 of [10]): For any given B-DMC W with I(W ) > 0, let R < I(W ) and β ∈ (0, 1 2 ) be fixed. Block error probability for polar coding under SC decoding (averaged over possible choices of frozen vectors) satisfies
Note that, for any given β ∈ (0, 1 2 ) and ǫ > 0, we can define the sequence of polar codes by choosing the information indices as
Then, from the above theorems, for sufficiently large N , we can achieve the rate
with average block error probability (averaged over the possible choices of u
under SC decoding. (See also [11].)
This result shows the existence of a polar code (N, K, A, u A c ) achieving the symmetric capacity of W . We remark that, any frozen vector choice of u A c will work for symmetric channels [8]. For our purposes, we will denote a polar code for B-DMC W with C(N, F , u F ), where the frozen set is given by F A c . Note that, A denotes the indices of information transmission for the polar code, whereas F is the set of frozen indices.
We conclude this section by noting the following lemma (given in [11]) regarding polar coding over degraded channels.
Lemma 4 (Lemma 4.7 of [11]):
N and Z(W
A discrete memoryless wiretap channel with is denoted by
for some finite sets X , Y m , Y e . Here the symbols x ∈ X are the channel inputs and the symbols (y m , y e ) ∈ Y m × Y e are the channel outputs observed at the main decoder and at the eavesdropper, respectively. The channel is memoryless and time-invariant:
1 ) = W (y mi , y ei |x i ). We assume that the transmitter has a secret message M which is to be transmitted to the receiver in N channel uses and to be secured from the eavesdropper. In this setting, a secret codebook has the following components:
1) The secret message set M. The transmitted messages are assumed to be uniformly distributed over these message sets.
2) A stochastic encoding function f (.) at the transmitter which maps the secret messages to the transmitted symbols:
3) Decoding function φ(.) at receiver which maps the received symbols to estimate of the message:
The reliability of transmission is measured by the following probability of error.
We say that the rate R is an achievable secrecy rate, if, for any given ǫ > 0, there exists a secret codebook such that,
for sufficiently large N . Consider a degraded binary-input wiretap channel, where, for the input set X = {0, 1}, the main channel is given by
and the eavesdropper channel is
Here, the degradation is due to the channel W d (y e |y m ). Note that, due to degradation, polar codes designed for the eavesdropper channel can be used for the main channel. For a given sufficiently large N and β ∈ (0, 1 2 ), let C(N, F e , u Fe (v m )), the ensemble ∪ vm,uFm C e (v m ) is a symmetric capacity achieving polar code ensemble for the eavesdropper channel W e (if the eavesdropper channel is symmetric, any frozen vector choice will work [8], and hence the code achieves the capacity of the eavesdropper channel for any vm , u Fm ). This implies that the code for the main channel can be partitioned as C m = ∪ vm C e (v m ). This observation, when considered over the ensemble of codes, enables us to construct secrecy achieving polar coding schemes, even if the eavesdropper channel is not symmetric, as characterized by the following theorem.
Theorem 5: For a binary-input degraded wiretap channel, the perfect secrecy rate of I(W m ) -I(W e ) is achieved by polar coding.
Proof: Encoding: We map the secret message to be transmitted to vm and generate a random vector vr , according to uniform distribution over X , of length |A e |. Then, the channel input is constructed with
, where u Fm is the frozen vector of the polar code C m , u Fe\Fm = vm , and u Ae = vr . The polar code ensemble is constructed over all different choices of frozen vectors, i.e., u Fm .
Decoding: The vectors vm and vr can be decoded with the SC decoder described above with error probability P e = O(2 -N β ) (averaged over the ensemble) achieving a rate R = Security: Lets assume that the vector vm is given to the eavesdropper along with u Fm . Then, employing the SC decoding, the eavesdropper can decode the random vector vr with P e = O(2 -N β ) averaged over the ensemble. Utilizing the Fano's inequality and average it over the code ensemble seen by the Eve, i.e. over Vm and U Fm , we obtain
where ǫ(N ) → 0 as N → ∞.
Then, the mutual information leakage to the eavesdropper averaged over the ensemble can be bounded as follows.
(c)
(d)
where in (a) we have U N 1 each entry with i.i.d. uniformly distributed, (b) follows from data processing inequality, (c) is for a given ǫ > 0 for sufficiently large N . As the reliability and secrecy constraints are satisfied averaged over the ensemble, there exist a polar code with some fixed u Fm achieving the secure rate I(W m ) -I(W e ).
Note that in the above result, the code satisfying the reliability and the secrecy constraints can be found from the ensemble by an exhaustive search. However, as block length increases, almost all the codes in the ensemble will do equally well. If the eavesdropper channel is symmetric, then the secrecy constraint is satisfied for any given frozen vector u Fm and the code search is only for the reliability constraint. If the eavesdropper channel is not symmetric, a prefix channel can be utilized to have this property.
Corollary 6: For non-symmetric eavesdropper channels, the channel can be prefixed with some p(x|x ′ ) such that the resulting eavesdropper channel
is symmetric. Then, using the scheme above, the secret rate
Finally, we note that the scheme achieves the secrecy capacity and any code in the ensemble, i.e., any fixed u Fm , will satisfy both the reliability and secrecy constraints, if the main and eavesdropper channels are symmetric.
Corollary 7: For a binary-input degraded wiretap channel with symmetric main and eavesdropper channels, polar coding achieves the secrecy capacity, i.e., C(W m ) -C(W e ), of the channel.
We note that the stated results are achievable by encoders and decoders with complexity of O(N log N ) for each. In addition, if the channels are binary erasure channels (BECs), then there exists algorithms with complexity O(N ) for the code construction [8].
In this section, we focus on the following key agreement problem: Alice, over fading wiretap channel, would like to agree on a secret key with Bob in the presence of passive eavesdropper Eve. We focus on the special case of binary erasure main and eavesdropper channels, for which the code construction is shown to be simple [8]. e , respectively. Here, the channels W m and W e are random, outcome of which result in the channels of each block. Instantaneous eavesdropper CSI is not known at the users, only the statistical knowledge of it is assumed. The channels are assumed to be physically degraded w.r.t. some order at each block. 2 Note that, in this setup, eavesdropper channel can be better than the main channel on the average.
We utilize the proposed secrecy encoding scheme for the wiretap channel at each fading block. Omitting the block indices, frozen and information bits are denoted as u Fm and u Am , respectively. Information bits are uniformly distributed binary random variables and are mapped to u Am . Secret and randomization bits among these information bits are denoted by Vm and Vr , respectively. Frozen bits are provided both to main receiver and eavesdropper at each block. (We omitted writing this side information below as all zero vector can be chosen as the frozen vector for the erasure channel [8].) Note that Alice and Bob do not know the length of V (i) m at fading block i. In particular, there may not be any secured bits at a given fading block.
Considering the resulting information accumulation over a block, we obtain the followings.
where the former denotes the amount of secure information generated at block i (here the secrecy level is the bound on the mutual information leakage rate), and the latter denotes the remaining information. Note that these entropies are random variables as channels are random over the blocks. Remarkable, this scheme converts the fading phenomenon to the advantage of Alice and Bob (similar to the enabling observation utilized in [12]). Exploiting this observation and coding over LM fading blocks, the proposed scheme below creates advantage for the main users: As L, M, N get large, information bits, denoted by W * , are w.h.p. reliably decoded at the Bob,
]. This accomplishes both advantage distillation and information reconciliation phases of a key agreement protocol [13], [14]. Now, a third phase (called as privacy amplification) is needed to distill a shorter string K from W * , about which Eve has only a negligible amount of information. The privacy amplification step can be done with universal hashing as considered in [13]. We first state the following definitions and lemma regarding universal hashing, and then formalize the main result of this section in the following theorem. Definition 8: A class G of functions A → B is universal if, for any x 1 = x 2 in A, the probability that g(x 1 ) = g(x 2 ) is at most 1 |B| when g is chosen as random from G according to the uniform distribution.
There are efficient universal classes, e.g., to map n bits to r bits, class of linear functions given by r × n matrices needs rn bits to describe [15]. Note that hash function should have complexity as 1) it will be revealed to each user, and 2) Alice and Bob will compute g(W * ). There are more efficient classes with polynomial time evaluation complexity and O(n) description complexity [15].
Generalized privacy amplification, proposed in [13], is based on the following property of universal hashing.
Lemma 9 (Theorem 3, [13]): Let X ∈ X be a random variable with distribution P X and Rényi entropy (of second order) R(X) = -log 2 E[P X (X)]. Let G be a random choice (according to uniform distribution) of a member of universal class of hash functions X → {0, 1} r , and let Q = G(X). Then, we have
Exploiting the proposed coding scheme, which creates advantage in favor of Bob over the fading channel, we use the hash functions described above and obtain the following result. as their secret key (here G is chosen uniformly random from universal class of hash functions {0, 1} n → {0, 1} r ) satisfying
where
denotes the Eve's total received symbols. Proof: We repeat the described scheme over LM fading blocks. Due to the construction above, we have
where
e )] + and ǫ 1 → 0 as N gets large (follows from the fact that conditioning does not increase entropy and the security of V (i) m ), and
where ǫ 2 → 0 as N → ∞ (follows from Fano's inequality).
We now consider the total information accumulation and leakage. Let
and denote the estimate of it at Bob as Ŵ * . We obtain that, there exist N 1 , M 1 , s.t. for any N ≥ N 1 and M ≥ M 1 , we have
for some β ∈ (0,
Focusing on a particular super block, omitting the index (l) in ( W (l) , Ȳ (l) e ), and using ( 16) and ( 17) in (20), we obtain
where ǫ 4 and ǫ 5 vanishes as M, N get large.
In order to translate H(W * |Y * e ) to Rényi entropy, to use Lemma 9 in our problem, we resort to typical sequences, as for a uniform random variable both measures are the same. Considering ( W
) as L repetitions of the experiment of super block random variables ( W, Ȳe ), we define the event T based on typical sets as follows [16]
). Otherwise, we set T = 0 and denote δ 0 Pr{T = 0}. Then, by Lemma 6 of [16], as L → ∞ Lδ 0 → 0, Lδ → 0, and
We continue as follows.
where δ * → 0 as M, N → ∞. Thus, for the given ǫ * , there exists
where in (a) δ 0 is s.t. Lδ 0 → 0 as L → ∞, (b) is due to Lemma 9 given above and due to (24) and the choice of r.
Here, for the given ǫ > 0, there exists M 3 , N 3 s.t. for M ≥ M 3
and
ln 2
≤ ǫ 2 . Hence, we obtain
where (a) holds if M ≥ M 2 and N ≥ N 2 and (b) holds if M ≥ M 3 and N ≥ N 3 . Now, we choose some M ≥ max{M 1 , M 2 , M 3 }. For this choice of M , we choose sufficiently large L and sufficiently large N such that N ≥ max{N 1 , N 2 , N 3 } and
which holds as δ 0 L → 0 as L → ∞ in ( 22). (In fact, due to [16, Lemma 4 and Lemma 6], for any ǫ ′ > 0, we can take δ 0 L ≤ ǫ ′ L as L gets large.) Therefore, for this choice of L, M, N , we obtain the desired result from (18), ( 19), ( 29 In addition, for this choice of L, M, N , we bound H(K) ≥ rǫ due to (25), which shows that the key is approximately uniform. Few remarks are now in order. 1) Existing code designs in the literature and the previous section of this work assume that Eve's channel is known at Alice and Bob. In the above scheme, Alice and Bob only need the statistical knowledge of eavesdropper CSI. Also, the main channel is not necessarily stronger than the eavesdropper channel, which is not the case for degraded wiretap settings.
2) The above scheme can be used for the wiretap channel of Section IV by setting M = 0 to achieve strong secrecy (assuring arbitrarily small information leakage) instead of the weak notion (making the leakage rate small). See also [16].
3) The results can be extended to arbitrary binary-input channels along the same lines, using the result of Section IV. In such a setting, the above theorem would be reformulated with n = LM N (E[I(W m )]ǫ * ) and r = LM N (E[[I(W m ) -I(W e )] + ]ǫ * ). However, the code construction complexity of such channels may not scale as good as that of the erasure channels [8].
In this work, we considered polar coding for binary-input DMCs with a degraded eavesdropper. We showed that polar coding can be utilized to achieve non-trivial secrecy rates these set of channels. The results might be extended to arbitrary discrete memoryless channels using the techniques given in [17]. The second focus of this work was the secret key agreement over fading channels, where we showed that Alice and Bob can create advantage over Eve by using the polar coding scheme at each fading block, which is then exploited with privacy amplification techniques to generate keys. This result is interesting in the sense that part of the key agreement protocol is established information theoretically over fading channels by only requiring statistical knowledge of eavesdropper CSI at the users.
We acknowledge that the concurrent work[9] independently established the result that polar codes can achieve the secrecy capacity of the degraded wiretap channels, when both main and eavesdropper channels are binary-input and symmetric (Corollary 7 of this note).
Remarkable, a random walk model with packet erasures can be covered with this model. Also, parallel channel model is equivalent to this scenario.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment