A Molecular Model for Communication through a Secrecy System

Codes have been used for centuries to convey secret information.To a cryptanalyst, the interception of a code is only the first step in recovering a secret message.Deoxyribonucleic acid (DNA) is a biological and molecular code.Through the work of Mar…

Authors: O. Okunoye Babatunde

A Molecular Model for Communication through a Secrecy System
A Molecular Model for Communication through a Secrecy System. Okunoye Babatunde O. Department of Pure and Applied Biology , Ladoke Akintola University of T echnology , P .M.B 4000 Ogbomoso, Nigeria. Email: Codes have been used for centuries to convey secret information. T o a cryptanalyst, the interception of a code is only the first step in recovering a secret message. Deoxyribonucleic acid (DNA) is a biological and molecular code. Through the work of Marshall Nirenber g and others, DNA is now understood to specify for amino acids in triplet codes of bases. The possibility of DNA encoding secret information in a natural language is explored, since a code is expected to have a distinct mathematical solution. Key W ords : Molecular Communication Secr ecy INTRODUCTION babatundeokunoye@yahoo.co.uk The problems of cryptography and secrecy acids (Brock & Ma digan, 1991). A systems furnish an interesting application of mathematical solution to the genetic code is communication theory (Shannon, 1948). A proposed in this paper , using the virus secrecy system can be defined as a single Bacteriophage T4 as a model. transformation on a language of the form T M = E where T = transformation, M = METHODS 1 1 The numbers of Adenine, Thymine, Message and E = Cryptogram (Shannon, G ua n in e a nd C y to s in e r es i du e s w er e 1949). A code carrying secret information is counted in 3,183 turns of bases in the expected to have a distinct mathematical genome of Bacteriophage T4, in the 5' to 3' solution. Codes have extensive military direction (i.e from base 168900 backwards). applications, and the breaking of the Enigma This represents 31,830 bases. Bacteriophge code was pivotal in the Allied W ar effort in T4 and its genome sequence represents the the Second W orld W ar . be st u nd er st oo d m od el f or fu nc ti on al Marshall Nirenberg (Nirenberg & genomics and proteomics (Miller et al., Matthei, 1961; Nirenberg & Leder , 1964) 2003). Bacteriophage T4 base sequence and co-workers at the National Institutes of was obtained for GenBank with accession Health Bethesda, Maryland, number AF158101. U .S. A. between the years 1961-1964 put There are 10 bases per turn of the forward a biochemical solution to the DNA helix. (Brock & Madigan, 1991). genetic code. W e now understand that four Th e re f or e t h e n um b er s o f Ad en in e, bases of DNA are arranged as 64 triplet Thymine, Guanine and Cytosine which Per codons, which in turn specify for 20 amino turn of the helix would add up to 10 will have t h e E n g l i s h L a n g u a g e w i t h o u t l o s i n g different permutations. For Example Adenine information due to the statistical nature of the 1, Thymine 2, Guanine 3, Cytosine 4. Another language, high frequency of certain words etc. example is Adenine 0, Thymine 5, Guanine 0, (Shannon, 1949). This property of the English Cy t os in e 5. Th ro ug h ou t t he c ou rs e of Language whereby certain letters can be omitted investigation involving 3,183 turns of bases, no without losing information conveyed in the individual base occurred more than 8 times. language is called Redundancy (Shannon, The number of any one bases ranged from 1949). Redundancy is of central importance in 0,1,2,3,… 8. A base which did not appear per the study of secrecy systems (Shannon, 1949). turn was recorded as 0. Thus we have 230 The letters C, Q, V , X, Z, were thus omitted. The permutations of the numbers 0,1,2,3,….8 which frequency histogram of both the number groups add up to 10. These 230 permutations can be and English letters show similarities. A simple grouped into 21 groups based on the numbers substitution is then established, by replacing which constitute them. For example, the each number group with English letters, in order permutations 0055,0505, 5005, 0550, 5500, of increasing probabilities. 5050 are grouped as one group. The frequencies are probabilities of each number group in 3,183 RESUL TS turns of T4 phage DNA was calculated. When the substitution is effected, 3183 Assumi ng that the genetic code is English letters are obtained. The probable word producing English text, the probabilities of method (Shannon, 1949) is used to recover any English letters were calculated in 3,183 letters secret message in the English text produced by chosen from chapter 5 of W uthering Heights Substitution. The 'probable words' may be (Br Ö nte, 1965), a classical English novel. words or phrases expected in the particular Considerable reductions in text are possible in message due to its source, or they may merely be S / N N u m b e r G r o u p N u m b e r o f P e r m u t a t i o n s F r e q u e n c y P r o b a b i l i t y 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 0 0 5 5 ( X 1 8 ) 0 0 2 8 ( X 1 9 ) 0 1 1 8 ( X 2 1 ) 1 1 1 7 ( X 1 4 ) 0 0 3 7 ( X 1 7 ) 0 0 4 6 ( X 2 0 ) 0 2 2 6 ( X 1 6 ) 0 1 2 7 ( X 3 ) 0 1 3 6 ( X 1 0 ) 1 1 4 4 ( X 1 3 ) 0 2 4 4 ( X 6 ) 0 1 4 5 ( X 1 ) 1 1 2 6 ( X 8 ) 0 3 3 4 ( X 1 2 ) 1 3 3 3 ( X 9 ) 2 2 2 4 ( X 1 5 ) 1 1 3 5 ( X 7 ) 0 2 3 5 ( X 1 1 ) 1 2 2 5 ( X 4 ) 2 2 3 3 ( X 2 ) 1 2 3 4 ( X 5 ) 6 1 2 1 2 4 1 2 1 2 1 2 1 6 1 6 6 1 2 1 6 1 2 1 2 4 4 1 2 1 6 1 2 6 1 6 5 7 1 1 1 5 1 6 2 3 6 3 7 6 1 0 9 1 1 9 1 3 5 1 4 0 1 4 4 1 4 5 1 8 1 1 8 8 1 9 5 2 2 5 2 9 0 2 9 5 8 0 0 0 . 0 0 1 6 0 . 0 0 2 2 0 . 0 0 3 5 0 . 0 0 4 7 0 . 0 0 5 0 0 . 0 0 7 2 0 . 0 1 9 8 0 . 0 2 3 9 0 . 0 3 4 3 0 . 0 3 7 4 0 . 0 4 2 4 0 . 0 4 4 0 0 . 0 4 5 3 0 . 0 4 5 6 0 . 0 5 6 9 0 . 0 5 9 1 0 . 0 6 1 3 0 . 0 7 0 7 0 . 0 9 1 1 0 . 0 9 2 7 0 . 2 5 1 4 2 3 0 3 1 8 3 0 . 9 9 8 8 Figure 1a. T able showing the number groups, the number of their permutations, their frequencies and probabilities in 3,183 turns of T4 Phage Genome. S/N LETTER FREQUENCY PROBABILITY 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 J K P F G Y B W M U L D R O S N T H I A E 5 26 52 53 69 70 83 86 91 92 131 149 183 219 221 235 239 253 253 291 382 0. 0016 0. 0082 0. 0163 0. 0167 0. 0217 0. 0220 0. 0261 0. 0270 0. 0286 0. 0289 0. 0412 0. 0468 0. 0575 0. 0688 0. 0694 0. 0738 0. 0751 0. 0795 0. 0795 0. 0914 0. 1200 3183 1. 0001 Figure 1b . Frequencies and Probabilities of English Letter out of 3,183 letters of an English text (W uthering Heights) in the case of the letters H and I which both occur 253 times, thereby having the same probabilities, the consonant is placed before the vowel. Frequency graph of 3,183 letters of chapter 5 of Wuthering Heights. The letters are arranged in the order they appear in the text. 0 50 100 150 200 250 300 350 400 450 I N T H E O U R S F M A W B G L D Y P K J Frequency graph of number groups in 3,183 turns of T4 phage DNA. The number groups are written in the order they appear in T4 phage DNA in the 5' to 3' direction. 0 100 200 300 400 500 600 700 800 900 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X1 1 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 Series1 S/N Number group English Letter 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 0055 0028 0118 1117 0037 0046 0226 0127 0136 1144 0244 0145 1126 0334 1333 2224 1135 0235 1225 2233 1234 J K P F G Y B W M U L D R O S N T H I A E Figure 2.A simple substitution table W ords or phrases expected in the particular Possible to make out a few sentences from message due to its source, or they may merely words adjacent to each other or closely be common words or syllables which occur in aggregated. Reconstructions are necessary to any text in the language, such as the, and, bring out the sense in the sentences. The only tion, that, and the like in English (Shannon, sentence found not needing reconstruction 1949). was HO A SEAL which, is an exclamation HO Using the probable word method did A SEAL! Hundreds of words and phrases were not exactly yield a message, yet it was found in the text without reconstruction; discovered that over 300 English words could nevertheless if reconstructions are applied be spelt out from the text. Phrases occur to words and phrases, the word and phrase occasionally , and it was count will be considerably increased. S/N WORDS PHRASES SENTENCES 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 IT HE IS ME US THEI R DEN SUN MEN ROB L I E S EED A REA HEA L WES T S TA R EL I TE HEA RD TENET BERTH ' BE HIM' ' A S EA' ' I T D I E' ' L ET TWO ' ' I RE A D' 'A N H A T' ' AS IS' 'SEE A SANE' ' A R I DE' ' L ORE DEN ' ' US E L EG' ' END IN' ' HER HAT' ' S ELL BE A R' ' BE OR' ' I S UE' ' NUNS ARE' ' A REAL' ' LET TWO' 'TEA SEED ' HO A S EAL. I EE TIRE = I TI RE WE RRH HE EAT EE A TREE = WE EAT A TREE/HE E A TA TREE. USE O HE A NE SEA = HE USE A SEA HE A IS RR WOE = HE IS A WOE/ HE IS WOE. WOE E HIT Y A IT = WOE HIT I T. A W NUN HE E SEE = A NUN HE SEE(S). TOE LET AU TIMS RTRI HEAL = LET TIM'S TOE HE AL . HEAL DI US = HEAL U S . SO E LOAD ET LET T WO = S O LET TWO LOAD. I READ I H T SIR = I READ (IT) SIR. IT T FIT HUE = HUE I T FIT. RED EROS H BE B ME = RED ERO S BE ME / RED ROSE BE ME. YES HH HE ERED = YES HE ERRED. HE A SEE ASA NET = HE SEE(S) A NET/ HE SEES AS A NET. Figure 3a. Some words, phrases and sentences found in the text. S/N WORDS AND PHRASES RECONSTRUCTIO N 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 TWROH REAML STARDMO SEDN SEALDE EROS + EAEST ADEEENO SI NE HEAEEARD SEEED NEEAT HEEAT TEEEN HEAET ROTEN ISAUED AT INHELAED SUNGER SOLD SADEN BEBLE THROW REALM STARDOM SEND SEALED ROSE EAST ADENO SI NE HEARD SEED NEAT HEAT/HE EAT TEN/TEEN HEAT/HE ATE ROTTEN ISSUED AT IN HE LAI D SINGER SOLD SADDEN BIBLE Figure 3b: Some additional words and phrases by reconstruction. † . EROS is a Latin word in its own right, meaning love DISCUSSION numbers 0, 0, 1, 9 will be added, making 22. Had In substituting the number groups with English any of the bases exceeded 9, e.g. 10, the number Letters, the assumption made was that the genetic groups will total 23, because of the permutations code is producing English text. For the 3,183 of the numbers 0,0,0,10. tur ns o f T4 ph age DNA in vesti gated , no It remains to be shown whether a similar individual base occurred more than 8 times. This result can be obtained with other languages, produces 21 number groups. Had any of the especially the ones with number of letters nearing bases exceeded 8, e.g. 9, an additional number 2 1 , f o r e x a m p l e H e b r e w w i t h 2 2 group consisting of the permutations of the letters/alphabets. Counting of number groups and substitution 2. Br onte, E. (1965). W uthering with English letters was done in the 5' to 3' Heights (The Penguin English Library: Middlesex). direction, i.e. starting from the last turn of T4 phage DNA (168900 168891) backwards. A 3. Miller , E. S., Kutter , E., similar result might be obtained in the 3' to 5' Mosia, G., Arisaka, F ., Kunisawa T ., Ruger , W . (2003). Bacteriophage T4 direction. Genome. Micr obiol. Mol. Biol. Rev . 67, Finally , in choosing which of the 86-157. English letters will be omitted, the author did 4. Nir enberg, N. W ., Leder , P . (1964). not rely solely on the property Redundancy R N A c o d e w o r d s a n d P r o t e i n (D). For example the letter u usually follows q Synthesis. Science 145, 1399 1407. in English words, so the u can be omitted 5. Nir enberg, N. W ., Matthei, J. H. without loss (Shannon, 1949). Y et the letter u (1961). The dependence of cell-free was not omitted. It can be said that intuition protein synthesis in E. Coli upon n a t u r a l l y o c c u r r i n g o r s y n t h e t i c played a role in choosing what to include and polyribonucleotides. Pr oceedings of omit. It also remains to be seen if a similar the National Academy of Sciences of result can be obtained by omitting a different the USA, 47, 1588-1602. set of letters of the English alphabet. 6. Shannon, C. E. (1948) A M a t h e m a t i c a l T h e o r y o f References Communication. Bell. Syst. T ech. J. 1. Br ock, D. T . Madigan, T . 27, 379 423, 623 656. M . ( 1 9 9 1 ) . B i o l o g y o f M i c r o - o r g a n i s m s . ( P r e n t i c e - H a l l : N e w - 7. S h a n n o n , C . E . ( 1 9 4 9 ) Jersey). Comm unic ation th eory of Se crecy Systems. Bell Syst. T ech. J. 28, 656 715.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment