Analysis of a procedure for inserting steganographic data into VoIP calls

The paper concerns performance analysis of a steganographic method, dedicated primarily for VoIP, which was recently filed for patenting under the name LACK. The performance of the method depends on the procedure of inserting covert data into the str…

Authors: Wojciech Mazurczyk, Jozef Lubacz

Analysis of a procedure for inserting steganographic data into VoIP   calls
Key w ords – steganography, information hiding, performance analy sis Wojciech MAZURCZYK * and Józef LUB ACZ ∗ ANALYSIS OF A PROCEDURE FOR INSERTING STEGANOGRAPHIC DATA INTO VoIP CALLS The paper co ncerns performance an alysis of a steganographic method, dedicated prim arily for VoIP, w hich w as recently fi led for patenting [ 1] under the name LACK. T he pe rformance of the method depends on the procedure of inserting covert data into the stream of aud io packets. After a b rief presentation of t he LAC K method, the p aper focuses on analysis of the depende nce of the insertion procedure o n the probabi lity distribution of VoIP call duratio n. 1. INTRODUCTION Communication networks steganograp hy is a method o f hiding secret data inside usual data transmitted by users, so that the hidden data cannot be detected (in an ideal case) by scanning the data flow by a third party . A new stega nographic m ethod called LACK (L ost Audio PaCKe ts Steganogra phy) was recently proposed and f iled for patenting [ 1] in P oland. The method is described in some detail in [2]. A detailed review of steganogr aphic methods that may be appli ed for IP telephony can be found in [2] and [9] . In g eneral, the methods can be classified into the two following groups: - Steganog raphic meth ods that mo dify pa ckets: network protocol he aders or pay load fields. Examples of such sol utions include modifications of free/redundant headers’ fields for IP, UDP or RTP protocols during conversation phase and signalling messages in e.g. SIP[10]. I nformation hiding which is based on affecting packets’ pay load usually uses digital audio watermarking algorithms (e.g. DSSS [13] and QIM [ 14]), - Steganog raphic methods that modify p ackets’ time relations. Ex amples of such solution: affec ting sequence order of RTP packe ts [11] and modify ing their inter-packe t delay [12]. With respect to the two above groups L ACK i s a hy brid s teganog raphic m ethod since it modifies both packets’ content and their time depe ndencies. I n general , the LACK me thod is intended for a broad c lass of multimedia, real-time applications, but its main foreseen application (at least for now) i s VoIP. The proposed method ∗ Warsaw University of T echnology, Faculty of Electronics and Information Tec hnology , Institute of Tele comm unications, 15/19 N owow iejska Str. 00-66 5 Wa rsaw , P oland, {W. Mazurczyk, J.Lubacz}@tele.pw.edu.pl utilises the fact that for usual mul timedia communication protocols l ike RTP (Real-Time Transport Protocol) [3] ex cessively delay ed packets a re not used for reconstruction of t ransmitted data at the receiver ( the packets are considered usel ess and discarded). The main idea of LACK is as follows. At the transmitter, some selected audio packets are intentionally delay ed before transmitti ng. If the delay of such packets at t he receiver is considered excessive, the packets are discarded by a receiver not aware of the steganog raphic procedure. The pay load of the intentionally delaye d packets is used to transmit secret in formation to re ceivers a ware of the procedure so no e xt ra packets are g enerated. For unaw are receiver s the hidden data is “invisible”. The effectivene ss of LACK depends on many fac tors such as the details of the communication procedure (in p articular the type of codec used, the s ize of the voi ce frame, the size of the receiving buffer, etc.) a nd on the network QoS (packet delay and pac ket loss probability). No real-worl d st eganog raphic method is perfect – whatever the method, t he hidden information can be potentially discovered. In general, the more hidden information is inserted i nto the data stream, the greater the chance that it will be detected, e.g. by scanning t he data flow or by some other steganaly sis m ethods. Moreover, the more audio packets are used to send covert data, the greater the deterioration of the quality of the VoIP connection. Thus t he procedure of inserting hidden data s hould be carefully chosen and controlled i n order to minimize the chance of detecting inserted data and to avoid excessive deterior ation of the QoS. To avoid ex cessive deterioration of the QoS lost packet ratio must be kept be low certain accepted level. This level de pends on the speech codec use d. For example, according to [8], maximum loss tolerance is 1% for G.723.1, 2% for G.729A, 3% fo r G.71 1 codecs. I f special mechanism to de al wit h lo st packets at the receiver is u tilized, like t he PL C (Packet Loss Concealment) [15], acceptable l evel of lost packets e.g. for the G.711 codec increases from 3% to 5%. Thus this value provides us with upper limit for transmission rate. I n general , the amount of steganographic data using LACK depends on the acceptable level of packet loss . For example, for the G.711 speech codec with data rate 64 kbit /s and data frame size of 20 ms, if the packet loss probability introduced for LACK purposes is 0.5 %, then under condition that packet losses do not exceed acceptable l evel, the theoretical hidden communication rate is 320 b/s. I n t he present pap er we shall focus on the hidden data ins ertion rate IR [ bits/s] . Obviously, IR depends on the amount of hidden data t o be sent and on the call duration. I n principle, the call duration m ay be adjusted to the amount of hid den data to be sent. This however could cause that the distribution of calls apply ing LACK differs from the call duration distribution of LACK-less calls, and as a consequence make LACK vulnerable to statistical s teg analysis based on call duration. Thus rather than adjusting th e call duration to the amount of hidden data t o be sent, it i s preferable to adjust the hidden data insertion rate IR to L ACK-less call s duration dis tribution. This, in turn, requires making IR dependent on that di stribution. Obviously , t his is not an important question in case of s poradic LACK use; it becomes important i n case o f a predefined group of frequent LACK users. In the present paper we focus on such a case. Moreover, in the presented analysis we consider the dependence of IR of a particular call on the elapsed time o f t hat call, i.e. we cons ider IR that can (pot entially ) be made time dependent, adjusted to t he foreseen residual call duration. As shown in our analy sis, such time-depende nt IR procedure potentially allows for decreasing the IR during the c all duration, compared to the IR at call initi ation time. In effect, the negative i nfluence of LACK on QoS can be decreas ed. Such an effect can be achieved for call duration dis tributions with coefficient of variation greater th an 1; available experimental data concerning VoI P call d uration distrib utions seem to indicate that this i s a realistic assumption for real-lif e VoIP calls. I t s hould be emphasised that the LACK procedure in troduced in th is paper can be utili zed by decent LACK users who use their own VoIP calls to exchange covert data, but als o by intruders who are able to covertly s end data using third party VoI P calls (e.g. in effect of earlier successful attacks by using troj ans or worms or by distribution of modified version of a popu lar VoIP software). This is a usual tradeoff r equiring consideration in a broader steganogra phy context which is bey ond the scope of this paper. 2. THE VoIP CALL DURATION DISTRIBUTION For PSTN the call duration probabil ity distribution was well known b ased on extensive experimental resear ch. For many d ecades an exponential dis tribution was assumed a good enough approximation for engineering pu rposes. VoI P is a relatively new service a nd thus only few reliable experimental data is available, so in many research papers concerning IP voice traffic (e.g. [4] , [ 5], [6] ) th e ex ponential call duration is still assumed. Cu rrent experiments prove however that this assumption is far from being rea listic. Birke et al. [7] captured real VoIP traffic traces (about 150 000 calls) from FastWeb, an Ita lian telecom operator. The obtained call duration p robability distribution is reproduced in Fig. 1 with a s olid li ne. To illustrate qualit atively t he degree i n which the ex perimental results differ from the exponential distribution and s ome other chosen distributions (hypere xp onential a nd log-normal) these were drawn with bro ken lines in Fig.1. As can be seen, the differe nces are considerable and no straightforward approximation of the experimental data with s tandard distribution s is available. In particular, the exponential distribution is far from being r ealistic. The experimental data yield averag e call du ration E(D) = 117.31 and s tandard deviation σ (D) = 278.74 , thus t he coefficient of variation C v = σ (D) /E(D) = 2.37 ( for e xponential distribution C v = 1 ). To achieve an analy tic approxim ation of the experimental data a combination of some standard distributions can be used, for example: (1) The above analy tic approxi mation is quite complex and of little practical use for our purposes, i.e. for establishing the depende nce of the insertion rate IR on some simple enough characterization of the call duration distribution. Our guess was that thi s ca n be achieved t hrough characterizing a considerably wide range of call duration distrib ution ty pes with the coefficient of variation C v and then expressing the IR through C v .          ≤ ≤ ≤ < + < ≤ = − − − − 455 5 . 66 2 55 . 1 1 5 . 27 5 . 66 0.027252e 0.0001 14e 5 . 27 0 2 55 . 1 1 ) ( 805 . 4 ) 8 . 3 ) (ln( 0.03028x - 0.00114x - 805 . 4 ) 8 . 3 ) (ln( 2 2 x for e x x for x for e x x f x x D π π 0 50 100 150 0 0.005 0.01 0.015 0.02 0.025 C all dur ation x [ s ] f D (x) Ex per imental (C v =2. 37) Log -n orma l (Cv =3. 17) H iper -expone ntial (C v =2.37) Ex pone ntial (C v =1) Fig. 1. VoIP call duration – co mparison of experimental data w ith selected pr obability distributions A reasonably wide range of call distribution t y pes can be achieved and effec tively analy zed with the 2-paramete r Weibull distribut ion, with appropriately chosen parameters: the shape parameter k > 0 and the scale parameter λ > 0 . The complementary cumulative probability distribution function ) ( D F and probability density function f D are as follows: (2) The λ parameter was set s o to achieve the above experimental average call duration time E(D) = 11 7.31 and the k parameter was varied so to obtain a wide range of C v va lues. In Table 1 the analy zed values are summarized. Weibull param eters k=3.4, λ = 130.57 k=2, λ = 132.37 k=1.2, λ = 124.71 k=1 , λ = 117.3 1 k=0 .8, λ = 103.54 k=0 .6, λ = 77.97 k=0.5, λ = 58.65 k=0.4, λ = 35.3 C V 0.32 0.52 0.84 1 1.26 1.76 2.23 3.14 Table 1. Chosen Weibull distribution para meters k and λ and corresp onding C v values. I n Fig. 2 the Weibull probability di stribution is depicted for the paramete rs from Tab. 1 to illustrate the resulting wide range of distribution shapes. Note by the way that fo r k = 1 the W eibull distribution equals the exponential distribution ( C v = 1 ), for k = 2 it becomes the Rayleig h distribution ( C v = 0.52 ) and for k = 3.4 it resembles the norm al distributi on ( C v =0.32 ). k x D e k x F       − = λ λ ) , ; ( k x k D e x k k x f       − −       = λ λ λ λ 1 ) , ; ( 0 50 100 150 200 25 0 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 Ca ll dur ation x [ s ] f D ( x;k, λ ) k = 3.4 (C v = 0.32) k = 2 (C v =0. 52 ) k = 1.5 (C v = 0.68) k = 1.2 (C v = 0.84) k = 1 (C v =1) k = 0.8 (C v = 1.26) k = 0.6 (C v = 1.76) k = 0.5 (C v = 2.23) k = 0.4 (C v = 3.14) Fig. 2. Weib ull distribution for various k , λ and C V 3. TIME DEPENDENT INSERTION RATE For an arbi trary instant of a call the averag e residual call duration is well know to be equal (3) or equivalently (4) Suppose that at the beg inning of a call the insertion rate is set to IR = S/E(D) , where S is amount of steganographi c data to be sent covertly . As m entioned in section 1, if C v >1 , and thus E(R) > E(D) , which seems to be the case for VoI P calls as indicated above, then beginning from some arb itrary instant of the call we may decreas e the insertion rate to I R = S /E(R) , which is beneficial from the point of view of QoS and resistance to detection of the hidden data. The above indicates that it reasonable to m ake the insertion rate dependent on t he elapsed time of a call. It is nevertheless not practical t o use the classical definition of residual call duration since it involves an arbitrary time instant and not the current call duration. We are rather int erested in the expected call duration on condition it has already last ed t units of time: ) ( 2 1 ) ( 2 D E C R E V + = ) ( 2 ) ( ) ( 2 D E D E R E = (5) thus (6) For the Weibull distributions considered in the previous sec tion (7) and (8) For the par ameters from Tab.1 we obtain re sults shown in Fig. 3. The figure illustrates also the E(D|D>t) function for the experimental data prese nted in the previous section. The curve s from Fig. 3 may be approximated with good a ccuracy as follows: (9) Using this simple approximation we may establish a time-dependent insertion rate we were looking for. S uppose that the amount of remaining st eganog raphic data t o be sent at time t is S R (t) . Then the insertion rate at time t may be expresse d as (10) Finally , we may modify the above I R(t) with a correction factor CF<1 to reflect t he fact that the LACK procedure may decrease to some extent the QoS of the s peech transmission and also to take account of the required robustness to st egana lysis: IR * (t) = CF · IR(t) . This however is a very simplified solution of the problem. dx e e t t D D E t x t k k ∫ ∞       −       + = > λ λ ) | ( dx x F t F t dx x f x t D P t D D E t D D t D ∫ ∫ ∞ ∞ + = > = > ) ( ) ( 1 ) ( ) ( 1 ) | ( ) ( ) ( ) | ( t F D E t D D E t D ≤ > ≤       + Γ ≤ > ≤       k e t D D E t k x 1 1 ) | ( λ λ 59 . 0 32 . 1 ) | ( + + ≈ > v v C t C t D D E ) | ( ) ( ) ( t D D E t S t IR R > = 0 1 2 3 4 5 6 7 8 9 10 0 5 10 15 20 25 t [ min] E ( D |D >t) [m in ] W eibull k = 3.4 (Cv = 0.32) W eibull k = 2 (Cv = 0.52) W eibull k = 1.2 (Cv = 0.84) W eibull k = 1 (Cv = 1) W eibull k = 0.8 (Cv = 1.26) W eibull k = 0.6 (Cv = 1.76) W eibull k = 0.5 (Cv = 2.23) Ex perimenta l (Cv = 2.37) W eibull k = 0.4 (Cv = 3.14) Fig. 3. E(D|D>t) for different Weibull distributions Base d on results presented in Fig. 3 and equation 10, for chosen W eibull dist ributions, t he IR(t) values are as presented in Fig. 4 : 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 70 t [ s ] IR (t ) [ bit/ s ] W eibull k = 3.4 (Cv = 0.32) W eibull k = 2 (C v = 0.52) W eibull k = 1.2 (Cv = 0.84) W eibull k = 1 (C v = 1) W eibull k = 0.8 (Cv = 1.26) W eibull k = 0.6 (Cv = 1.76) W eibull k = 0.5 (Cv = 2.23) W eibull k = 0.4 (Cv = 3.14) Fig. 4. IR(t) for different Weibull distributions for S=1000 bits As can be seen, call duration distributions with higher C v yield higher IR values. In e ffect distributions with higher C v allow to transmit more stegan ographic data. 4. CONCLUSIONS AND FUTURE WORK The LACK s tegan ographic method is a new idea which requires detailed perfor mance evaluation. This paper i s only an init ial s tep in t his direction. We have focused only on one aspect, namely the dependence of the procedure for inserting hidden data on the call duration probability distribution. I t was shown that the insertion rate may be effectively made dependent on the current call duration t ime, and that this dependence can be expressed wi th good acc uracy throug h the coefficient of variation of the call duration probability distribution. The derived formulae are simp le and can be straightforwardly implemented. The effectiveness of the resulting procedure will depend on the accuracy of the estimated mean call duration and the coefficient of variation of the call duration. The proposed procedure was made as s imple as possible. A more sophisticated version of t he procedure would require more detailed information about the c all duration probabil ity distributi on, which might b e too demanding considering the current lim ited experience with VoIP traffic. Nevertheless a theoretical research seems worthwhile. The authors have analy zed the problem of expressing the insertion rate function IR(t) through a P(D>x|D>t) distribution instead of the E(D|D>t) f unction which was considered in t he present paper. The results will be presented in a future paper. Another task is to ta ke into account the depende nce of the IR(t) function on c onstraints implied by QoS requirements and by the required resistance of LACK to steganaly s is. In the present paper we have practically not considered this problem apart from introducing a correction factor into the IR(t) which is clearly only an indication of the problem. ACKNOWLEDGMENTS • This research was partially supported by the Ministr y of Science and Higher Education, Poland (grant no. 3968/B/T02/2008/34). • The authors woul d like to thank R. Birke, M. Mellia, M. Petracca and D. R ossi from Politecnico di Torino for sharing details of their VoI P ex perimental data. REFERENCES [1] MAZURCZ YK W., S ZCZY PI ORSKI K., Sposób steganograf icznego ukrywa nia i przesyłania danych w sieci telekomun ikacyjnej, P atent Application no. P-3849 40 , 1 5 April 2008 [2] MAZURCZ YK W. , SZCZYPIORSKI K., Steganogr aphy of VoIP Streams , URL : http://ww w.arxiv.o rg/abs/08 05.2938 [3] SCHULZRINNE H., CASNER S., F REDERICK R., J ACOBSON V ., RTP: A Transport Protocol for Real-Time Applications , IE TF, RFC 355 0, July 2003 [4] CHOI Y., LEE J., K IM T. G., LEE K.H, Efficient QoS Scheme for Voice Traffic in Converged LAN , Proce edings of International Symposium o n Perfor man ce Evaluation of Computer and Te lecommu nication System s (SPECT S'03), July 20-24, 200 3, Mont real, Canada. [5] MILOUCHEVA I ., NASSRI A., ANZAL ONI A., Automated Analysis of Network QoS Parameters for Voice over IP Applications , D41 – 2nd Inter -Dom ain P erformance and Sim ulation Wo rkshop (IPS 200 4). [6] BARTOLI M., et a l., Deliverable 19: Evaluation of Inter-Doma in QoS Modelling, Simulation and Opt imization , INTERM ON-IST-2001-34 123 URL: http://w ww .ist-interm on.org/overview/im -wp5-v100-unibe-d19-pf.pdf [7] BIRKE R., ME LLIA M., PE TRACCA M., RO SSI D., Understan ding VoIP from Backbone Measurement s , 26th IEEE I nternational Conference on Com puter Comm unications (INFOCOM 2007), 6-12 May 20 07, pp . 2027-35 , ISBN 1 -4244-1047 -9 [8] NA S., YOO S., Allowable Propag ation Delay fo r VoIP Calls of Acceptable Qu ality , In Proc. of First International Wor kshop, AISA 2 002, Seoul, Kor ea, Augu st 1-2, 200 2, LNCS, Springer Berlin / Heidelbe rg, Volume 2402/200 2, pp. 469-480, 200 2 [9] MAZURCZ YK W., SZCZYPIORSKI K., Covert Channels in SIP for VoIP signalling , In: Hamid Jahankhani, Kenneth Revett, and Domi nic Palmer-Brown (Eds.): ICGeS 2008 - Communica tions in Com puter and Information Science (CCIS) 12 , Springer Ver lag Ber lin Heidelberg, Pro c. of 4th International Conference on Global E-security 2008, London, United Kingdom, pp. 65-72, June 2008 [10] HANDLEY M ., SCHULZRINNE H., ROSENBERG J., SIP: Session Initiation Protocol , IET F RFC 3261, June 2002 [11] BERK V., GIANI A., CYBENKO G., Detection of Covert Chan nel Encoding in Network Pac ket Del ays , T ech. Rep. T R2005-536 , Department of Computer Science, Dartmouth Colle ge, Nov. 2005 URL: http://w ww .ists.dartmouth.edu/library/149.pdf [12] VENKATRAMAN B. R., NEW MAN-WOLFE R. E. , Capacity Estimation and Auditab ility of Network Covert Chann els , P roc. I EEE Sym p. Security and Pri vacy, Ma y 1995, pp. 186– 98. [13] COX I. , KILIAN J., LEIGHTON F., SHAMOON T., Secure spread spectrum watermar king for multime dia , IEEE Tr ansactions on Imag e Pr ocessing 6(12): 1 997, p p. 16 73–16 87. [14] CHEN B ., WORNELL G . W., Quantizat ion index mod ulation: A cl ass of provably good methods for digital waterma rking and info rmation em bedding , IEEE Tr ans. Info. Theory, vol. 47, no. 4, pp . 14 23–144 3, May 200 1. [15] IT U-T Recomm endation, G .711: Pulse cod e modulation (PCM) of voice frequen cies , Nove mber 1988.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment