Protein Folding: A New Geometric Analysis

Protein F olding: A New Geometric Analysis W alter A. Simmons Departmen t of Ph ysics and Astronom y Univ ersit y of Ha w aii at Manoa Honolulu, HI 96822 Jo el L. W einer Departmen t of Mathematics Univ ersit y of Ha w aii at Manoa Honolulu, HI 96822 7/23/2008 1 Abstract A geometric analysis of protein folding, whic h compliments man y of the mo dels in the literature, is presented. W e examine the pro cess from unfolded strand to the p oin t where the strand b ecomes self-in teracting. A cen tral question is ho w it is possible that so man y initial conﬁgurations pro ceed to fold to a unique ﬁnal conﬁguration. W e put energy and dynamical considerations temp orarily aside and fo cus up on the geom- etry alone. W e parameterize the structure of an idealized protein using the concept of a ribb on from diﬀerential geometry . The deformation of the ribb on is des cribed by in tro ducing a generic twisting Ansatz. The folding pro cess in this picture entails a c hange in shap e guided b y the lo cal amino acid geometry . The theory is reparamater- ization in v arian t from the start, so the ﬁnal shap e is indep enden t of folding time. W e dev elop diﬀeren tial equations for the c hanging shap e. F or some parameter ranges, a sine-Gordon torsion soliton is found. This purely geometric wa v eform, has prop erties similar to dynamical solitons. Namely: A threshold distortion of the molecule is re- quired to initiate the soliton, after whic h, small additional distortions do not change the w av eform. In this analysis, the soliton twists the molecule until b onds form. The analysis reveals a quantitativ e relationship b et w een the geometry of the amino acids and the folded form. 1 In tro duction In the more than half cen tury , since it w as established that the amino acid sequence of a protein molecule determines the unique folded conﬁguration [1] and that unfolded proteins re-fold from some range of initial conditions to the same end state, a wide range of ph ysics and geometry based mo dels hav e b een in tensiv ely studied. Recen t reviews can b e found in [2], [3], and [4] and there are many b o oks that deal with the sub ject [5], [6], [7]. The fact that proteins fold sp on taneously from a range of initial conﬁgurations to a unique end state, in spite of the small energy a v ailable is astonishing. In particular, there are many forces inv olv ed [2] and each of these is related to a change in shap e through some non-linear, non-lo cal, (and p ossibly temp erature dep enden t) material resp onse tensor. P articularly challenging to our understanding is the initial phase, which ends with the molecule b ecoming self-in teracting. The insensitivit y to v ariation of initial conditions, immu- nit y to noise, and to ambien t conditions, leads us to conjecture that there is some essentially geometric asp ect of folding that guides this initial stage. In this pap er, we put the dynamics, suc h as [8], and energy landscap es [3] temp orarily aside and analyze the geometry of fold- ing using diﬀerential geometry , whic h is the natural mathematical language for describing shap es and c hanges of shap e. Our analysis leads to a set of diﬀerential equations which are p oten tially useful to mo del-builders. F or a limited range of parameters, we solv e the equa- tions analytically and ﬁnd that a torsion-wa v e soliton emerges. This soliton lives in a space of p ossible molecular shap es, wherein it describ es a twisting deformation, whic h ultimately stops when the molecule b ecomes self-interacting. This soliton is diﬀerent from the v arious dynamical solitons that hav e app eared in the literature for some time, e.g. [9], but it has 2 similar characteristics; namely , a threshold for formation, a stability against noise, hierarch y of forms, and a non-linear sup erp osition principle. Our starting p oin t is a construct known as a ribb on, which is a pair of writhing space curv es. One space curv e, the base curv e, aligns with an av erage backbone of the molecule. The other curve, the neighboring curve, carries information ab out the lo cal (amino acid) geometry , esp ecially the lo cation of the side-chains, and writhes ab out the base curve. (The ribb on has b een previously applied to double-stranded DNA [10] and is compatible with the coils and kinks seen in protein molecules.) W e in tro duce a deformation Ansatz, which is a generic resp onse to any torque. W e ha v e constructed the Ansatz so that the theory is reparameterization in v arian t; the folding is the same no matter how quickly or slowly it happ ens. Diﬀerential geometry then leads us to a set of diﬀeren tial equations, which will b e discussed in some generality elsewhere. In this pap er we fo cus up on a sub-set, whic h arises b y making sp eciﬁc, empirically motiv ated, assumptions ab out the parameters, and which leads to the soliton. W e discuss the solutions to the soliton and discuss how they can b e accommo dated to a structure which is segmen ted, not con tin uous. One surprising prop ert y of the simplest, antikink, solution is that the more planar the initial shap e of the of the unfolded molecule, the faster the molecule folds. This prop ert y may not generalize to other solutions. Before presenting our calculations in the next section, w e conclude this section by re- marking that it w ould b e natural to combine our results with mo dels based up on torsion angle [8] and/or energy landscap e. If the geometrical features describ ed here also app ear in mo dels with dynamics, then a step will ha ve b een taken to w ard understanding insensitivit y of folding to initial conditions, to noise, and to environmen tal factors. W e also remark that the reparameterization inv ariance in our analysis are also encouraging. 2 Ribb ons and their deformations In this section w e discuss ribb ons and a certain kind of deformation asso ciated to them. A ribb on can b e view ed as a space curve with a ﬁeld of planes tangent to the curv e. It is reasonable for a discussion of ribb ons to start with a discussion of the diﬀerential geometry of space curv es. Thus let ~ x : [0 , L ] → E 3 b e an em b edded, i.e. one-to-one, curve in Euclidean 3-space given as a function of arc length s , where L is the length of the curv e ~ x . W e supp ose that ~ x has enough diﬀeren tiabilit y so that all that we discuss exists. W e ma y deﬁne the unit tangen t vector ﬁeld ~ e 1 = ~ x s , where the subscript denotes diﬀerentiation with respect to s . If w e assume that the curve ~ x is non-degenerate, i.e., that ~ x s and ~ x ss are linearly indep endent at all p oints of the curv e, then w e can complete the v ector ﬁeld ~ e 1 to the F renet frame ~ e 1 , ~ e 2 , ~ e 3 , where ~ e 2 is the principal normal and ~ e 3 is the binormal. This moving frame satisﬁes the well-kno wn F renet-Serret 3 equations. d ~ e 1 = κ ~ e 2 ds d ~ e 2 = ( − κ ~ e 1 + τ ~ e 3 ) ds d ~ e 3 = − τ ~ e 2 ds The functions κ and τ give the curv ature and torsion of the space curv e ~ x , resp ectiv ely . F or later use, w e wish to p oint out that we may view ~ e 1 as a curve taking v alues in S 2 , the unit sphere centered at the origin of E 3 . If we represent ~ e 1 as a function of its arc length σ , then one ma y write ~ e 1 : [0 , K ] → S 2 , where K = R L 0 κ ds is the length of ~ e 1 . It follows from the deﬁnition of the curv ature κ that dσ ds = κ. (1) It is straigh tforw ard that the geo desic curv ature k of ~ e 1 is given by k = τ κ . (2) T o obtain a ribb on from a space curve w e need to asso ciate to the space curv e a ﬁeld of planes tangent to the space curv e. Since at eac h p oin t of the space curve, the vector ~ e 1 lies in the plane tangent at that p oin t, the plane is completely determined by giving a vector ~ ν that is p erp endicular to ~ e 1 that lies in the tangen t plane. Necessarily ~ ν is in the plane spanned by ~ e 2 and ~ e 3 . Th us w e introduce the function ψ : [0 , K ] → R by requiring that the follo wing hold: ~ ν = cos ψ ~ e 2 + sin ψ ~ e 3 . (3) The space curv e ~ x we introduced represen ts the base curv e of our model. The neigh b oring curv e in our mo del is represented b y ~ x + f ~ e 2 , where f is a p ositive function of the arc length s . Our primary interest in this section lies in how this ribb on deforms under a twisting op eration on the base curve whic h dep ends up on the neighboring curve at ev ery p oin t. More sp eciﬁcally w e are interested in an essen tially adiabatic, reparameterization inv ariant c hange in the shap e of the base curve. T o this end, we parameterize the v ariation of the ribb on by means of a parameter u . Thus all quan tities under consideration b ecome functions of s and u . F or example, the torsion τ ( s ) b ecomes τ ( s, u ). In this section, we examine the v ariation in the geometry of the ribb on as u ranges o ver some domain by means diﬀeren tial equations in the geometric inv arian ts mentioned in the preceding paragraphs. Our ansatz is the following equation: ∂~ e 1 ∂ u ( s, u ) = γ ( u ) f ( s ) ~ ν ( u, s ) . The co eﬃcien t γ ( u ) is a p ositive function c hosen for later con v enience. Moreo ver, it is a function of u so that the form of the equation is inv ariant under changes of the parameter u . W e consider an y parameter which is a contin uously diﬀerentiable function of u with p ositiv e 4 deriv ativ e as an admissible parameter for representing the v ariation. Since u will c hange o ver some in terv al, time dep endence en ters indirectly through this parameter. W e emphasize that in this section w e are studying c hanges in shap e under a twisting deformation, not dynamics. Giv en what has just b een said w e ma y as w ell c ho ose a parameter u for whic h γ = 1. Th us we study the v ariation in the ribb on induced by the following diﬀerential equation: ∂~ e 1 ∂ u ( s, u ) = f ( s ) ~ ν ( s, u ) . (4) W e study this v ariation under the follo wing assumptions: If we view the base curve as a p olygon with atoms at the v ertices, then the lengths of the segmen ts and the angles b et w een successive segments will remain constant under the v ariation. In our mo del which is diﬀerentiable this corresp onds to the follo wing: 1. The elemen t of arc length ds is in v arian t during the v ariation. 2. The curv ature κ is inv arian t during the v ariation. It follo ws from these assumptions and equation (1) that the element of arc dσ remains unc hanged during the v ariation. It is w ell-kno wn that the shap e of the base curv e ~ x is completely determined b y κ and τ giv en as functions of the arc length s . Since κ do es not dep end on u , w e regard it as a kno wn function. Thus our goal is to determine ho w τ dep ends on u . Since τ = k κ , w e can just as w ell determine how k dep ends on u . Finally , our ansatz can b e view ed as deﬁning a v ariation of the curv e ~ e 1 . Thus w e transfer the v ariational problem to sphere S 2 , and study ho w ~ e 1 v aries under our ansatz which amoun ts to studying how k v aries under our ansatz. In what follows w e use the “metho d of moving frames” and diﬀerential forms to make our calculations. The reader ma y wan t use a text by [11] as a reference for what is done b elo w. W e summarize all we know ab out ~ e 1 and the v ariation in the follo wing equations, where v represents a quantit y to b e determined. These equations follo w from the F renet-Serret equations and equations (1), (2), (3) and (4). d ~ e 1 = ~ e 2 dσ + ( f cos ψ ~ e 2 + f sin ψ ~ e 3 ) du (5) d ~ e 2 = ( − ~ e 1 + k ~ e 3 ) dσ + ( − f cos ψ ~ e 1 + v ~ e 3 ) du (6) d ~ e 3 = − k ~ e 2 dσ + ( − f sin ψ ~ e 1 − v ~ e 2 ) du (7) W e compute the exterior deriv atives of the abov e equations and use the fact that d 2 ~ e i = 0. F rom the ~ e 2 and ~ e 3 comp onen ts of d 2 ~ e 1 and ~ e 3 comp onen t of d 2 ~ e 2 , w e get the follo wing equations. k = − ψ σ + f σ f cot ψ (8) v = f σ csc ψ (9) 0 = − k u + w σ + f sin ψ (10) 5 Using equations (8) and (9), we substitute for k and w in equation (10) to obtain the follo wing second order p.d.e. in ψ . [ ψ σ − f σ f cot ψ ] u + [ f σ csc ψ ] σ + f sin ψ = 0 (11) If we make the further assumption that f is constan t, this equation b ecomes ψ σ u + f sin ψ = 0 , (12) the sine-Gordon equation. W e consider the implications of this equation given that it has soliton solutions. If the base curv e is initially fairly planar, i.e., its torsion (or equiv alently k ) is not to o far from zero along its entire length, we consider how a solition migh t explain the folding whic h is alwa ys observ ed. Note, for later use, if k is close to zero ov er the en tire length of the base curv e, it follo ws from equation (8) (assuming f is constant), that ψ is close to b eing constant. W e base our argumen ts on an tikinks which for us take the form ψ ( σ, u ) = 4 tan − 1 h exp  p f ( au − σ a + b ) i , (13) where a > 0 and b are constan ts. W e need to recall that the partial diﬀerential equation (11), and hence (12), is deﬁned on the domain [0 , K ] × [0 , U ], where U may b e some p ositiv e real or ∞ . Th us one must consider either equation as part of an initial v alue problem on that domain. Since the curves σ = constant and u = constant are the c haracteristic curves of either partial diﬀeren tial equation, w e need to consider our problem as a characteristic initial v alue problem. Thus it is natural to supp ose that the v alues of ψ are given on the curves u = 0 and σ = 0. Then we must accept as kno wn that ψ ( σ, 0) = 4 tan − 1 h exp  p f ( − σ a + b ) i . Giv en that initially w e assume the base curv e is fairly planar, the function ψ is close to b eing constan t function on [0 , K ]. W e c ho ose the constan t b so that 4 tan − 1  exp  √ f ( b )  appro xi- mates that constant v alue and a very large so that 4 tan − 1  exp  √ f ( − σ a + b )  appro ximates that constant v alue on the in terv al [0 , K ]. W e m ust also accept as kno wn that ψ (0 , u ) = 4 tan − 1 h exp  p f ( au + b ) i . If a vibration of the left end point of the base curv e b e can represen ted by this function, then the antikink giv en by equation (13) describ es the subsequent motion of the base curve in the follo wing fashion. As the u increases in v alue, the antikink mov es along the base curv e and sim ultaneously , due to equation (8) a “bump” of geo desic curv ature, which corresp onds to a “bump” of torsion mo ves along the base curve. The eﬀect of this “bump” of torsion is to 6 t wist the base curve in to a p osition where b onds are sure to b e formed. A t this p oin t, our mo del is no longer viable and the “bump” leav es in its wak e the twisted form of the base c hain of the protein. Should the initial v alues of ψ b e diﬀeren t and the vibrations of the left end p oin t b e of a diﬀeren t form as well, one can presume that there are other solitons whic h satisfy these initial conditions and th us ultimately pro duce a twist that mo ves along the base curv e leading to the formation a stable twisted molecule. The pro cess just describ ed for antikinks can, in fact, lead to similar conclusions if one assumes that f is a piecewise constan t function, rather then a constant function on [0 , K ]. Let’s suppose that f tak es the v alue f 1 and f 2 on the subin terv als [0 , σ 1 ] and [ σ 1 , σ 2 ] of [0 , K ], resp ectiv ely . T o construct a contin uous, piecewise diﬀerentiable solution of equation (12) we need the follo wing to b e true, for all u in [0 , U ]: p f 1  a 1 u − σ 1 a 1 + b 1  = p f 2  a 2 u − σ 1 a 2 + b 2  If we regard a 1 and b 1 as known, we can clearly choose a 2 and b 2 for this to b e true. Thus if again assume that ψ is fairly constan t on [0 , K ], one can still ﬁnd solutions of equation (12) that give rise to a mo ving “bump” of torsion along the base curv e. Our parameter u do es not represen t time but m ust be a monotonically increasing function of time. Ev en though w e are not dealing with dynamics, if we w an t to bring time in to our considerations then our ansatz b ecomes ∂~ e 1 ∂ t ( s, t ) = γ ( t ) f ( s ) ~ ν ( s, t ) , for some p ositiv e real-v alued function γ ( t ). One easily ﬁnds that equation (11) b ecomes [ ψ σ − f σ f cot ψ ] t + γ [ f σ csc ψ ] σ + γ f sin ψ = 0 , and the sine-Gordon equation takes the form ψ σ t + γ f sin ψ = 0 . The formula for an antikink b ecomes ψ ( σ, t ) = 4 tan − 1 h exp  p f ( ag ( t ) − σ a + b ) i , where dg dt = γ and g (0) = 0. If we supp ose g is fairly constant and dep ends primarily the up on medium in whic h the protein is found (as opp osed to dep ending up on the protein, itself ) then we can argue as follo ws. The more planar the initial shap e of the base curve, the more closely to b eing constan t the initial v alues of ψ are. Hence, the larger the v alue of a m ust b e so that the an tikink approximates w ell those initial v alues. How ever, the larger the v alue of a , the faster the soliton mo ves along the base curve, and hence the faster the “bump” of torsion mov es along the base curve and consequen tly the faster the formation of the t wisted molecule. 7 3 Conclusions In this pap er w e hav e presen ted the results of a purely geometric analysis of the protein folding pro cess. Our most imp ortan t results are as follows. i. W e parameterized a course-grained mo del of a protein molecule, whic h quan titativ ely describ es the shap e only of the molecule; the description is indep endent of p osition and orientation. The parameterization includes the backbone and a distillation of the geometry of the side-c hains that we refer to as the neigh b oring curv e. W e call this mo del a ribb on. This parameterization is particularly useful for geometrical and structural studies, pro viding that the coarse-graining is appropriate. The ribb on is in tro duced to allow for analytic studies using diﬀerential geometry . ii. W e extended this parameterization to include c hanges of shap e. This parameterization that, again, dep ends only up on shap e, not up on p osition, orientation, or space-time motion, deﬁnes a curve, i.e., a tra jectory , of p ossible protein shap es. W e constrained the p ossible shap es in some w a ys that are appropriate by ﬁxing length and b ending. Changes in the t wisting of the protein shap es is allo w ed and reﬂects the presence of amino acids through the idealized neigh b oring curve acting on the bac kb one. A molecule in this description follows some tra jectory from unfolded to partly folded, where chemical b onds form. Our formulation is reparameterization in v arian t from the outset, therefore the folding geometry in indep enden t of wall-clock time. iii. Most imp ortan tly , we used the ab ov e parameterization to study p ossible tra jectories in the case of an ad ho c b y not unreasonable constraint on the shap e. W e found a tra jectory asso ciated to a soliton solution of the sine-Gordon equation, pro ducing what one migh t call a torsion soliton. The soliton pro duces a torsional distortion of the molecule. The distortion of the molecule is ﬁxed, because of bond formation, during propagation of the soliton. This soliton, whic h arises from geometric relationships within the folding molecule, is geometrical and thus diﬀerent from dynamical solitons that are well known in protein science. Ho w ever, it has similar prop erties. a. Its stabilit y , i.e., the fact that propagates without contin uous input of energy , is indiﬀeren t to scattering, temp erature or forces that may v ary from cell to cell, ma y explain ho w it is p ossible that the folding is unique in spite of the v ariety of forces, resp onse tensors, and environmen tal conditions in volv ed. b. If this soliton o ccurs in nature, it may also explain the other puzzles that w ere raised in the In tro duction; a threshold distortion of the molecule is required to establish the soliton, but the sensitivit y to initial conditions is minimal. 8 c. The soliton describ es a self-fo cusing torsional w a ve. Since energy w as temp orarily put aside at the outset, the energetics of the soliton remains unspeciﬁed here. Ob- viously , the soliton will b e relev an t only if it is energetically allo w ed, but the fact that it arises from geometry alone suggests that a relatively ﬂat energy landscap e migh t suﬃce. List of References 1. Anﬁnsen CB (1973) Principles that go vern the F olding of Protein Chains. Scienc e 181: 223-230. 2. Dill KA, Ozk an B, W eikl TR (2008) The Protein F olding Problem. A nnu R ev Biophys 37: 289-316. 3. Onuc hic JN, Luthey-Sch ulten Z, W olynes G (1997) Theory of Protein F olding: The Energy Landscap e Perspective. Annu R ev Phys Chem 48: 545-600. 4. Finkelstein A V, Galzitsk ay a O V (2004) Physics of Protein F olding. Physics of Life R eviews 1: 23-56. 5. F ersht A (1991) Structur e and Me chanism in Pr otein Scienc e , (W.H. F reeman & Co., New Y ork). 6. Finkelstein A V, Ptitsyn OB (2002) Pr otein Physics , (Academic Press, London). 7. W ales DJ (2003) Ener gy L andsc ap es , (Cambridge, New Y ork). 8. Guntert P , Mumenthaler C, W uethric h K (1997) T orsion Angle Dynamics for NMR Structure Calculation with the New Program DY ANA. J Mol Biol 273: 283-298. 9. Davydo v AS (1971) The ory of Mole cular Excitations , (Plen um Press, New Y ork-London). 10. White J H, Bauer WR (1986) Calculation of the Twist and the W rithe for represen tative mo dels of DNA. J Mol Biol 189: 329-341. 11. O’Neill B (1997) Elementary Diﬀer ential Ge ometry, Se c ond Edition , (Academic Press, San Diego, London). 9

Protein Folding: A New Geometric Analysis

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment