Convergence Thresholds of Newtons Method for Monotone Polynomial Equations

Symposium on Theoretical Aspects of Computer Science 2008 (Bordeaux), pp. 289-300 www .stacs-conf .org CONVERGENCE THRESHOLDS OF NEWTON ’S METHOD F OR MONO T ONE POL YNOMIAL EQUA TIONS JA VIER ESP AR Z A , STEF AN KI EFER, AND MICHAEL LUTTENBERGE R Institut f ¨ ur I nformatik T echnisc he Un ivers it¨ at M¨ unchen, Germany E-mail addr ess : {esparza,kiefer, luttenbe}@in.tum.de Abstra ct. Monotone systems of p olynomial equ ations (MSPEs) are systems of ﬁxed- p oin t equations X 1 = f 1 ( X 1 , . . . , X n ) , . . . , X n = f n ( X 1 , . . . , X n ) where each f i is a p oly- nomial with positive real co eﬃcients. The question of computing the least n on-negative solution of a giv en MSPE X = f ( X ) arises naturally in the analysis of stochastic models such as stochastic context-free grammars, probabilistic pushdow n automata, and back- button p ro cesses. Etessami and Y ann akakis have recently adapted Newton’s iterative metho d to MSPEs. In a prev ious p aper we ha ve prove d the existence of a threshold k f for strongly connected MSPEs, suc h that after k f iterations of Newton’s metho d eac h new iteration computes at least 1 new bit of the solution. How ever, the pro of was purely existentia l. In th is paper we give a n upper b oun d for k f as a function of the minimal com- p onent of the least ﬁxed-p oint µ f of f ( X ). Using this result we sho w that k f is at most single ex p onential resp. linear for strongly connected MSPEs deriv ed from p robabilistic pushdown automata resp. from bac k- button processes. F urther, we p ro ve the existence of a threshold for arbitrary MSPEs after which each new iteration computes at least 1 /w 2 h new bits of the solution, where w and h are the width an d heigh t of the DAG of strongly connected comp onents. 1. In tro duction A monoto ne system of p olynomial e quations (MSPE for short) has the f orm X 1 = f 1 ( X 1 , . . . , X n ) . . . X n = f n ( X 1 , . . . , X n ) where f 1 , . . . , f n are p olynomials with p ositive real co eﬃcien ts. In v ector f orm we d enote an MSPE b y X = f ( X ). W e call MSPEs “monotone” b ecause x ≤ x ′ implies f ( x ) ≤ f ( x ′ ) for ev ery x , x ′ ∈ R n ≥ 0 . MSPEs app ear n aturally in the analysis of man y sto c h astic mo dels, 1998 A CM Subje ct Cl assiﬁc ation: G.1.5, Mathematics of Computing, Nu merical Analysis. Key wor ds and phr ases: Newton’s Method, Fixed-Poin t Equations, F ormal V eriﬁcation of Soft ware, Prob- abilistic Pushdown Systems. This work was supp orted b y th e pro ject A lgorithms for Softwar e Mo del Che cking of the Deutsche F orschungsgemeinschaft (DFG) . P art of this work w as done at the Universit¨ at Stuttgart. c  J. Esparza, S. Kiefer, and M. Lutten berger CC  Creative Commons Attribution-NoDer ivs Lice nse 290 J. ESP ARZA, S. KIEFER, AND M. LUTTE NBERG ER suc h as con text-free grammars (with numerous applications to natural language pro cessing [19, 15], and computational biology [21, 4, 3, 17]), probabilistic programs w ith p ro cedures [6, 2, 10, 8, 7, 9, 11], and web-surﬁng mo dels with bac k buttons [13 , 14]. By Kleene’s theorem, a feasible MSPE X = f ( X ) (i.e., an MSPE with at least one solution) h as a least solution µ f ; this solution can b e irrational and non-expressib le by radicals. Giv en an MSPE and a ve ctor v encod ed in binary , the pr ob lem whether µ f ≤ v holds is in PS P ACE and at least as hard as the S QUARE-R OOT-S UM p roblem, a we ll- kno wn p roblem of computational geometry (see [10, 12] for more details). F or the applications mentio ned ab o v e the most imp ortan t question is th e eﬃcien t n u- merical approximati on of the least solution. Finding the least solution of a feasible system X = f ( X ) amoun ts to ﬁn ding the least solution of F ( X ) = 0 for F ( X ) = f ( X ) − X . F or this we can apply (the m ultiv ariate version of ) Newton ’ s metho d [20]: starting at some x (0) ∈ R n (w e us e upp ercase to den ote v ariables and lo wercase to denote v alues), compute the sequence x ( k +1) := x ( k ) − ( F ′ ( x ( k ) )) − 1 F ( x ( k ) ) where F ′ ( X ) is the Jacobian matrix of p artial deriv ativ es. While in general the metho d ma y n ot ev en b e deﬁned ( F ′ ( x ( k ) ) may b e singular for some k ), Etessami and Y annak akis prov ed in [10, 12] that this is not the case for the De c omp ose d Newton ’s Metho d (DNM) , th at decomp oses the MS PE in to str ongly c onne cte d c omp onents (SCCs) an d applies Newton’s m etho d to them in a b ottom-up f ash ion 1 . The results of [10, 12] provide n o information on the num b er of iterations needed to compute i valid bits of µ f , i.e., to compute a v ector ν suc h that   µ f j − ν j   /   µ f j   ≤ 2 − i for ev ery 1 ≤ j ≤ n . In a former pap er [16] we h a v e obtained a ﬁ rst p ositiv e resu lt on this problem. W e ha ve pro ved that for ev ery str on gly connected MS PE X = f ( X ) there exists a threshold k f suc h that for eve r y i ≥ 0 the ( k f + i )-th iteratio n of Newton’s metho d has at least i v alid bits of µ f . So, lo osely sp eaking, after k f iterations DNM is guarante ed to compute at least 1 new bit of th e solution p er iteration; we say that DNM con ve rges line arly with r ate 1 . The problem with this result is that its pro of provi d es no information on k f other than its existence. In this p ap er w e sh ow that the threshold k f can b e c hosen as k f = 3 n 2 m + 2 n 2 | log µ min | where n is the n umb er of equations of th e MSPE, m is suc h th at all coeﬃcient s of the MSPE can b e giv en as ratios of m -bit intege r s , and µ min is the minimal component of the least solution µ f . It can b e ob jected that k f dep end s on µ f , which is precisely what Newton’s m etho d should compute. Ho w eve r , for MSPEs coming from sto chasti c mo dels, suc h as the o n es listed ab ov e, we can d o far b etter. The follo wing observ ations and results help to deal w ith µ min : • W e obtain a syntact ic b ound on µ min for p robabilistic p rograms with pr o cedures (ha ving sto c h astic con text-free grammars and bac k-button sto c hastic pro cesses as sp ecial instances) and pr o v e th at in this case k f ≤ n 2 n +2 m . 1 A subset of va riables and th eir associated equations form an SCC, if the v alue of any v ariable in th e subset inﬂuen ces the v alue of all v ariables in the subset, see Section 2 for details. CONVERGENCE THRESHOLDS OF NEWTON’ S METHOD 291 • W e sho w that if every pr o cedure has a non-zero pr obabilit y of terminating, then k f ≤ 3 nm . This c ond ition alw a ys holds in the sp ecial case of b ac k-button pro- cesses [13, 14]. Hence, our result sho ws that i v alid bits can b e computed in time O (( nm + i ) · n 3 ) in the unit cost mo del of Blum, Shub and Smale [1], where eac h single arithmetic op eration ov er the reals can b e carried out exactly and in constan t time. It was prov ed in [13 , 14] b y a reduction to a semideﬁnite programming prob- lem that i v alid bits can b e computed in p oly( i, n, m )-time in th e classical (T uring- mac hine based) computation mod el. W e do not imp ro ve th is resu lt, b ecause we do n ot ha v e a pro of that r ound-oﬀ errors (wh ic h are inevitable on T u ring-mac hine based mo dels) d o not cru cially aﬀect the con vergence of Newton’s metho d. But ou r result sheds ligh t on the conv ergence of a p r actical metho d to compute µ f . • Finally , since x ( k ) ≤ x ( k +1) ≤ µ f h olds for eve ry k ≥ 0, as Newton’s metho d pro ceeds it p ro vides b etter and b etter lo w er b ounds for µ min and th us f or k f . In the pap er we exhibit a MSPE for wh ic h, u sing this fact and our th eorem, we can prov e that no comp onen t of the solution reac hes the v alue 1. This cannot b e prov ed by just computing more iterations, n o m atter ho w m an y . The pap er con tains t wo further results concernin g non-strongly-connected MSPEs: Fi r s tly , w e sh o w that DNM still con verge s linearly ev en if the MSPE has more than one SCC, albeit the con v ergence rate is p o orer. Secondly , we p r o v e that Newton’s metho d is w ell-deﬁned also for n on-strongly-connected MS PEs. Th us , it is not necessary to decomp ose an MSPE in to its S CCs – although decomp osing the MSP E may b e preferred for eﬃciency reasons. The pap er is structured as follo ws. In S ection 2 we state preliminaries and giv e some bac kground on Newton’s metho d app lied to MSP Es. Sections 3, 5, and 6 con tain the three results of the p ap er. Section 4 con tains applications of our main result. W e conclude in Section 7. Missin g pro ofs can b e foun d in a tec hnical r ep ort [5]. 2. Preliminaries In this section we in tro d uce our notation and formalize the concepts men tioned in the in tro d uction. 2.1. Notation R and N denote the s ets of real, resp ectiv ely natural num b ers. W e assume 0 ∈ N . R n denotes the set of n -dimensional real v alued c olumn ve ctors and R n ≥ 0 the su bset of vect ors with non-negativ e compon ents. W e u se b old letters for vecto rs , e.g. x ∈ R n , where w e assume that x has the comp onents x 1 , . . . , x n . Similarly , the i th comp onent of a function f : R n → R n is denoted by f i . R m × n denotes the set of matrices ha ving m rows and n columns. Th e transp ose of a v ector or matrix is indicated b y the sup erscript ⊤ . The iden tit y matrix of R n × n is denoted b y Id. The formal Neumann series of A ∈ R n × n is deﬁned by A ∗ = P k ∈ N A k . It is well- kn o wn that A ∗ exists if and only if the sp ectral radius of A is less than 1, i.e. max {| λ | | C ∋ λ is an eigen v alue of A } < 1. If A ∗ exists, w e h a v e A ∗ = (Id − A ) − 1 . The partial order ≤ on R n is deﬁned as usual by setting x ≤ y if x i ≤ y i for all 1 ≤ i ≤ n . By x < y w e mean x ≤ y an d x 6 = y . Finally , we write x ≺ y if x i < y i in every comp onent . 292 J. ESP ARZA, S. KIEFER, AND M. LUTTE NBERG ER W e use X 1 , . . . , X n as v ariable iden tiﬁers and arrange them in to the v ector X . In the follo wing n alw a ys denotes the n umb er of v ariables, i.e. the dimension of X . While x , y , . . . denote arbitrary elemen ts in R n , resp . R n ≥ 0 , we w rite X if we wan t to emphasize that a function is giv en w.r.t. th ese v ariables. Hence , f ( X ) repr esen ts the function itself, wh ereas f ( x ) denotes its v alue for x ∈ R n . If Y is a set of v ariables and x a v ector, then b y x Y w e mean the v ector obtained b y restricting x to the comp onents in Y . The Jac obian of a diﬀerentia ble fun ction f ( X ) w ith f : R n → R m is the matrix f ′ ( X ) giv en by f ′ ( X ) =    ∂ f 1 ∂ X 1 . . . ∂ f 1 ∂ X n . . . . . . ∂ f m ∂ X 1 . . . ∂ f m ∂ X n    . 2.2. Monotone Systems of P olynomials Deﬁnition 2.1. A f unction f ( X ) with f : R n ≥ 0 → R n ≥ 0 is a monotone system of p olyno- mials (MSP) , if ev ery comp onen t f i ( X ) is a p olynomial in the v ariables X 1 , . . . , X n with co eﬃcien ts in R ≥ 0 . W e call an MSP f ( X ) fe asible if y = f ( y ) f or some y ∈ R n ≥ 0 . F a ct 2.2. Every MSP f is monotone on R n ≥ 0 , i.e. for 0 ≤ x ≤ y we have f ( x ) ≤ f ( y ) . Since ev ery MSP is contin uous, Kleene’s ﬁxed -p oin t theorem (see e.g. [18]) applies. Theorem 2.3 (Kleene’s ﬁxed-p oin t th eorem) . Every fe asible MSP f ( X ) has a le ast ﬁxe d p oint µ f in R n ≥ 0 i.e., µ f = f ( µ f ) and, in add ition, y = f ( y ) implies µ f ≤ y . M or e over, the se quenc e ( κ ( k ) f ) k ∈ N with κ (0) f := 0 , and κ ( k +1) f := f ( κ ( k ) f ) = f k +1 ( 0 ) is monotonic al ly incr e asing with r esp e ct to ≤ (i.e. κ ( k ) f ≤ κ ( k +1) f ) and c onver ge s to µ f . In the follo wing we call ( κ ( k ) f ) k ∈ N the Kle ene se qu enc e of f ( X ), and drop the subscript whenev er f is clear from the con text. Similarly , w e sometimes write µ instead of µ f . A v ariable X i of an MS P f ( X ) is pr o ductive if κ ( k ) i > 0 for some k ∈ N . An MSP is cle an if all its v ariables are pro d u ctiv e. It is easy to see th at κ ( n ) i = 0 implies κ ( k ) i = 0 for all k ∈ N . As for conte xt-free grammars w e can determine all pro du ctiv e v ariables in time linear in the size of f . Notation 2.4. In the follo win g, we alwa ys assum e that an MSP f is clean and f easible. I.e., whenever w e write “MSP”, w e mean “clean and feasible MSP”, un less explicitly stated otherwise. F or the formal d eﬁ nition of th e De c omp ose d Newton ’s Metho d (D N M) (see also Section 1) w e n eed th e notion of dep e ndenc e b etw een v ariables. Deﬁnition 2.5. Let f ( X ) b e an MSP . X i dep ends dir e ctly on X k , denoted b y X i E X k , if ∂ f i ∂ X k ( X ) is not the zero-p olynomial. X i dep ends on X k if X i E ∗ X k , where E ∗ is the reﬂexiv e transitiv e closure of E . An MSP is str ongly c onne cte d (short: an scMSP ) if all its v ariables d ep end on eac h other. CONVERGENCE THRESHOLDS OF NEWTON’ S METHOD 293 An y MSP can b e decomp osed into s trongly connected comp onen ts (S C Cs), where an SC C S is a maximal set of v ariables su ch th at eac h v ariable in S dep ends on eac h other v ariable in S . The f ollo win g result for stron gly connecte d MSPs w as prov ed in [10, 12]: Theorem 2.6. L et f ( X ) b e an scMSP and deﬁne the Newton op er ator N f as fol lows N f ( X ) = X + (Id − f ′ ( X )) − 1 ( f ( X ) − X ) . We have: (1) N f ( x ) is deﬁne d for al l 0 ≤ x ≺ µ f (i.e., (Id − f ′ ( x )) − 1 exists). Mor e over, f ′ ( x ) ∗ = P k ∈ N f ′ ( x ) k exists for al l 0 ≤ x ≺ µ f , and so N f ( X ) = X + f ′ ( X ) ∗ ( f ( X ) − X ) . (2) The Newton se quenc e ( ν ( k ) f ) k ∈ N with ν ( k ) = N k f ( 0 ) is monoto nic al ly incr e asing, b ounde d fr om ab ove by µ f (i.e. ν ( k ) ≤ f ( ν ( k ) ) ≤ ν ( k +1) ≺ µ f ), and c onver ges to µ f . DNM works by su bstituting the v ariables of lo wer SCC s b y corresp ondin g Newton app ro x- imations that we re obtained earlier. 3. A Threshold for scMSPs In this section we obtain a threshold after whic h DNM is guaran teed to conv erge linearly with rate 1. W e sho w ed in [16] that for w orst-case results o n the con ve rgence of Newton’s metho d it is enough to consider quadr atic MSPs, i.e., MSPs whose monomials ha ve d egree at most 2 . The reason is that an y MSP (resp. scMSP) f can b e tr an s formed into a quadr atic MSP (resp. scMSP) e f b y introdu cing auxiliary v ariables. This transformation is v ery simila r to the transformation of a conte xt-free grammar in to Chomsky normal form. The transform ation do es not accelerate DNM, i.e., DNM on f is at least as fast (in a f ormal sense) as DNM on e f , and so for a worst-case analysis, it suﬃces to consider qu adratic systems. W e refer the reader to [16] for details. W e start by deﬁnin g the notion of “v alid bits”. Deﬁnition 3.1. Let f ( X ) b e an MSP . A v ector ν has i valid bits of the least ﬁxed p oin t µ f if   µ f j − ν j   /   µ f j   ≤ 2 − i for ev ery 1 ≤ j ≤ n . In the rest of the section we p r o v e th e follo wing: Theorem 3.2. L et f ( X ) b e a q uadr atic scMSP . L et c min b e the smal lest nonzer o c o eﬃcient of f and let µ min and µ max b e the minimal and maximal c omp onent of µ f , r e sp e ctively. L et k f = n · log  µ max c min · µ min · m in { µ min , 1 }  . Then ν ( ⌈ k f ⌉ + i ) has i valid bits of µ f for e very i ≥ 0 . Lo osely sp eaking, the theorem states that after k f iterations of Newton’s metho d, every subsequent iterati on g u aran tees at least one more v alid bit. It ma y b e ob jected that k f dep end s on the least ﬁxed p oin t µ f , w h ic h is pr ecisely wh at Newton’s metho d should compute. Ho w ev er, in the next s ection w e show that there are imp ortan t classes of MSPs (in fact, those which motiv ated our inv estigati on), for which b ounds on µ min can b e easily obtained. The follo wing corollary is w eak er than Theorem 3.2 , but less tec hnical in that it av oids a dep end ence on µ max and c min . 294 J. ESP ARZA, S. KIEFER, AND M. LUTTE NBERG ER Corollary 3.3. L et f ( X ) b e a quadr atic scMSP of dimension n whose c o eﬃcients ar e given as r atios of m -bit i nte g e rs. L et µ min b e the minimal c omp onent of µ f . L et k f = 3 n 2 m + 2 n 2 | log µ min | . Then ν ( ⌈ k f ⌉ + i ) has at le ast i valid bits of µ f for every i ≥ 0 . Corollary 3.3 f ollo ws from Theorem 3.2 by a suitable b oun d on µ max in terms of c min and µ min [5] (notice that, since c min is the quotien t of tw o m -bit int egers, w e hav e c min ≥ 1 / 2 m ). In th e r est of the section w e ske tch th e pr o of of Theorem 3.2. The pro of mak es crucial use of ve ctors d ≻ 0 suc h that d ≥ f ′ ( µ f ) d . W e call a v ector satisfying these t wo conditions a c one ve ctor of f or, wh en f is clear from the context , just a cone ve ctor. In a previous pap er we ha ve sh o wn that if the matrix (Id − f ′ ( µ f )) is singular, then f has a cone vec tor ([16], Lemmata 4 and 8). As a ﬁrs t step to w ards the pr o of of Theorem 3.2 w e sh o w the follo wing stronger prop osition. Prop osition 3.4. Any scM SP has a c one ve ctor. T o a cone ve ctor d = ( d 1 , . . . , d n ) w e asso ciate t wo parameters, namely the maxim u m and the minim um of the ratios µ f 1 /d 1 , µ f 2 /d 2 , . . . , µ f n /d n , whic h we denote b y λ max and λ min , resp ectiv ely . The second step consists of sho win g (Prop osition 3.6) that giv en a cone v ector d , th e threshold k f , d = log( λ max /λ min ) satisﬁes the s ame prop ert y as k f in Theorem 3.2, i.e., ν ( ⌈ k f , d ⌉ + i ) has i v alid bits of µ f for eve ry i ≥ 0. This follo ws r ather easily from the follo wing f undamenta l prop ert y of cone v ectors: a cone vect or leads to an u pp er b oun d on the error of Newton’s metho d. Lemma 3.5. L et d b e a c one ve ctor of an MSP f and let λ max = max { µ f i d i } . Then µ f − ν ( k ) ≤ 2 − k λ max d . Pr o of Ide a. Consider th e r a y g ( t ) = µ f − t d starting in µ f and headed in the direction − d (the dashed line in the picture b elo w). It is easy to see that g ( λ max ) is the int ersection of g with an axis whic h is lo cated farthest from µ f . One can then pro ve g ( 1 2 λ max ) ≤ ν (1) , wh ere g ( 1 2 λ max ) is the p oin t of the ra y equidistant from g ( λ max ) and µ f . By rep eated application of this argument one obtains g (2 − k λ max ) ≤ ν ( k ) for all k ∈ N . The follo wing p ictur e sho ws the Newton iterates ν ( k ) for 0 ≤ k ≤ 2 (sh ap e: × ) and the corresp onding p oints g (2 − k λ max ) (shap e: + ) lo cated on the r a y g . Notice that ν ( k ) ≥ g (2 − k λ max ). P S f r a g r e p l a c e m e n t s X 1 = f 1 ( X ) X 2 = f 2 ( X ) µ f = g (0) 0 − 0 . 4 − 0 . 2 0 . 2 0 . 4 0 . 6 0 . 2 X 1 X 2 g ( λ max ) CONVERGENCE THRESHOLDS OF NEWTON’ S METHOD 295 No w we easily obtain: Prop osition 3.6. L et f ( X ) b e an scMSP and let d b e a c one ve ctor of f . L et k f , d = log λ max λ min , wher e λ max = max j µ f j d j and λ min = min j µ f j d j . Then ν ( ⌈ k f , d ⌉ + i ) has at le ast i valid bits of µ f for every i ≥ 0 . W e now pro ceed to the third and ﬁnal step. W e ha ve the pr oblem that k f , d dep end s on the cone v ector d , ab out wh ic h w e only know that it exists (Prop osition 3.4). W e now sk etc h ho w to obtain the threshold k f claimed in Theorem 3.2, whic h is indep end ent of an y cone v ectors. Consider Pr op osition 3.6 and let λ max = µ f i d i and λ min = µ f j d j . W e hav e k f , d = log  d j d i · µ f i µ f j  . The idea is to b ound k f , d in terms of c min . W e sho w that if k f , d is v ery large, then there m ust b e v ariables X, Y suc h that X dep end s on Y only via a monomial that has a v ery small co eﬃcien t, which implies th at c min is v ery sm all. 4. Sto ch astic Mo dels As mentioned in th e introdu ction, several problems concerning s to chastic mo d els can b e reduced to problems ab out the least solution µ f of an MS PE f . In these cases, µ f is a v ector of probabilities, and so µ max ≤ 1. Moreo v er, w e can obtain information on µ min , whic h leads to b ounds on th e threshold k f . 4.1. Probabilistic Pushdo wn Automata Our study of MSPs was initially motiv ated by the veriﬁcat ion of pr ob ab ilistic pushd o wn automata. A pr ob abilistic pushdown automaton (pPDA) is a tuple P = ( Q, Γ , δ , Pr ob ) wh ere Q is a ﬁ nite s et of c ontr ol states , Γ is a ﬁnite stack alph ab et , δ ⊆ Q × Γ × Q × Γ ∗ is a ﬁ nite tr ansition r e lation (we write pX ֒ − → q α instead of ( p, X, q , α ) ∈ δ ), and Pr ob is a fu nction whic h to eac h transition pX ֒ − → q α assigns its probabilit y Pr ob ( pX ֒ − → q α ) ∈ (0 , 1] so that for all p ∈ Q and X ∈ Γ we ha ve P pX ֒ − → q α Pr ob ( pX ֒ − → q α ) = 1. W e wr ite pX x ֒ − → q α instead of Pr ob ( pX ֒ − → q α ) = x . A c onﬁgur ation of P is a pair q w , wher e q is a control state and w ∈ Γ ∗ is a stack c ontent . A probabilistic pushd o wn automaton P n atur ally indu ces a p ossibly inﬁnite Marko v c hain with the conﬁ gurations as states and transitions giv en by: pX β x ֒ − → q αβ for ev ery β ∈ Γ ∗ iff p X x ֒ − → q α . W e assum e w.l.o.g. that if pX x ֒ − → q α is a transition then | α | ≤ 2. pPD As and the equ iv alen t mo del of recursiv e Marko v c h ains h a v e b een v ery th oroughly studied [6, 2, 10, 8, 7, 9, 11]. Th ese pap ers ha v e sho wn that the key to th e analysis of pPDAs are the termination pr ob abilities [ pX q ], where p and q are states, and X is a stac k letter, deﬁned as follo ws (see e.g. [6] for a more formal deﬁnition): [ pX q ] is the p r obabilit y that, starting at the conﬁ guration pX , the pPDA ev entually r eac hes the conﬁguration q ε (emp ty stac k). It is not diﬃcult to sho w that the v ector of termination prob ab ilities is the least ﬁxed p oint of the MSPE conta inin g the equation [ pX q ] = X pX x ֒ − → rY Z x · X t ∈ Q [ r Y t ] · [ tZ q ] + X pX x ֒ − → rY x · [ r Y q ] + X pX x ֒ − → q ε x for eac h triple ( p, X , q ). Call th is quad r atic MSPE the termination MSPE of the p PD A (w e assume that termination MSPEs are clean, and it is easy to see that they are alwa ys 296 J. ESP ARZA, S. KIEFER, AND M. LUTTE NBERG ER feasible). W e immediately hav e that if X = f ( X ) is a termination MSP , then µ max ≤ 1. W e also obtain a lo w er b ound on µ min : Lemma 4.1. L et X = f ( X ) b e a termination MSP E with n variables. Then µ min ≥ c (2 n +1 − 1) min . T ogether with Theorem 3.2 we get the follo wing exp onenti al b ound for k f . Prop osition 4.2. L et f b e a str ongly c onne cte d termination MSP with n variables and whose c o eﬃcients ar e expr esse d as r atios of m -bit nu mb e rs. Then k f ≤ n 2 n +2 m . W e conjecture that there is a lo wer b ound on k f whic h is exp onen tial in n for the follo win g reason. W e kno w a f amily ( f ( n ) ) n =1 , 3 , 5 ,... of strongly connected MSPs w ith n v ariables and irrational co eﬃcient s su c h that c ( n ) min = 1 4 for all n and µ ( n ) min is doub le-exp onen tially small in n . E xp eriments suggest that Θ(2 n ) iteratio ns are n eeded f or th e ﬁr st b it of µ f ( n ) , but we do not ha ve a p ro of. 4.2. St rict pPD As and Back-B utt on Processes A pPDA is strict if f or all p X ∈ Q × Γ and all q ∈ Q the transition relation contai n s a p op-rule pX x ֒ − → q ǫ for some x > 0. Essentia lly , strict pP DAs mo del programs in whic h ev ery pro cedure has at least one terminating execution that do es not call an y other pro cedure. The termination MSP of a strict pPD A is of the form b ( X , X ) + lX + c for c ≻ 0 . So we ha ve µ f ≥ c , which implies µ min ≥ c min . T ogether with Th eorem 3.2 w e get: Prop osition 4.3. L et f b e a str ongly c onne cte d termination MSP with n variables and whose c o eﬃcients ar e expr esse d as r atios of m - bit numb ers. If f is derive d fr om a strict pPDA, then k f ≤ 3 nm . Since in most applications m is small, we obtain an excellen t con v ergence threshold. In [13, 14] F ag in et al. introdu ce a sp ecial class of strict pPD As called b ack- button pr o c esses : in a bac k-button pro cess there is only one con trol state p , and any rule is of the form pA b A ֒ − → pε or pA l AB ֒ − − → pB A . So the stac k corresp onds to a path through a ﬁnite graph with Γ as set of no des and edges A → B for pA l AB ֒ − − → pB A . In [13, 14] bac k-button pro cesses are used to mo d el the b eha viour of w eb-su rfers: Γ is the s et of w eb-pages, l AB is the probability that a w eb-surfer uses a link from page A to page B , and b A is the probabilit y th at th e sur fer pushes the “bac k”-button of the web-bro wser while visiting A . Thus, the termination probabilit y [ pAp ] is s im p ly the probabilit y that, if A is on top of the stac k, A is eve ntually p opp ed from the stac k. The termination probabilities are the least solution of the MSP E consisting of the equations [ pAp ] = b A + X pA l AB ֒ − − → pB A l AB [ pB p ][ pAp ] = b A + [ pAp ] X pA l AB ֒ − − → pB A l AB [ pB p ] . CONVERGENCE THRESHOLDS OF NEWTON’ S METHOD 297 4.3. An Example As an example of application of Th eorem 3.2 consider the follo wing scMSPE X = f ( X ).   X 1 X 2 X 3   =   0 . 4 X 2 X 1 + 0 . 6 0 . 3 X 1 X 2 + 0 . 4 X 3 X 2 + 0 . 3 0 . 3 X 1 X 3 + 0 . 7   The least s olution of th e system giv es the rev o cation probabilities of a bac k-button pro cess with three w eb-pages. F or instance, if the surfer is at page 2 it can c ho ose b et we en follo win g links to pages 1 and 3 with probabilities 0.3 and 0 .4, resp ective ly , or pressing the bac k bu tton with probabilit y 0.3. W e wish to kno w if any of the revocation probabilities is equal to 1. P erform ing 14 New- ton steps (e.g . with Maple) yields an appr oximati on ν (14) to the termination p robabilities with   0 . 98 0 . 97 0 . 992   ≤ ν (14) ≤   0 . 99 0 . 98 0 . 993   . W e h a v e c min = 0 . 3. In addition, since Newton’s metho d con v erges to µ f from below, w e know µ min ≥ 0 . 97. Moreov er, µ max ≤ 1, as 1 = f ( 1 ) a n d s o µ f ≤ 1 . Hence k f ≤ 3 · log 1 0 . 97 · 0 . 3 · 0 . 97 ≤ 6. T h eorem 3.2 then implies that ν (14) has (at least) 8 v alid bits of µ f . As µ f ≤ 1 , the absolute errors are b oun d ed by the relativ e errors, an d since 2 − 8 ≤ 0 . 004 w e kn o w: µ f ≺ ν (14) +   2 − 8 2 − 8 2 − 8   ≺   0 . 994 0 . 984 0 . 997   ≺   1 1 1   So Th eorem 3.2 giv es a pro of that all 3 rev o cation probabilities are strictly sm aller than 1. 5. Linear Conv ergence of the Decomp osed Newt on’s Metho d Giv en a strongly connected MSP f , Th eorem 3.2 states that, if w e h a v e computed k f preparatory iterations of Newton’s metho d, then after i additional iterations we can b e sure to h a v e computed at least i bits of µ f . W e call this linear con verge n ce with rate 1. Now w e sho w that DNM, whic h handles n on-strongly-connected MSPs, con v erges linearly as well. W e also giv e an explicit con v ergence rate. Let f ( X ) b e an y quadratic MSP (again w e assum e quadr atic MSPs throughout this section), and let h ( f ) denote the h eigh t of the D AG of strongly connected compon ents (SCCs). T he con v ergence rate of DNM crucially d ep ends on this heigh t: In th e w orst case one needs asymptotically Θ(2 h ( f ) ) iterations in eac h comp onent p er bit, assum in g one p erforms the same num b er of iterations in eac h comp onen t. T o get a sharp er r esult, we suggest to p erform a diﬀerent num b er of iterations in eac h SCC, d ep ending on its depth . The d epth of an SCC S is th e length of the longest path in the D AG of S CCs from S to a top S CC. In addition, we use the follo wing notation. F or a d epth t , w e denote by c omp ( t ) th e set of S CCs of d ep th t . F urthermore w e d eﬁne C ( t ) := S c omp ( t ) and C > ( t ) := S t ′ >t C ( t ′ ) and, analogously , C < ( t ). W e will sometimes write v t for v C ( t ) and v >t for v C > ( t ) and v t / ν ( j ) >t ]  , i.e., f µ t ( j ) is the least ﬁxed p oin t of f t after the ap- pro ximations from the lo we r SCCs hav e b een applied. So, ∆ ( j ) t consists of the pr op agation err or ( µ t − f µ t ( j ) ) an d the newly inﬂicted appr oximation err or ( f µ t ( j ) − ν ( j ) t ). The follo win g lemma, tec hnically non-trivial to pr o v e, giv es a b ound on the propagation error. Lemma 5.2 (Propagatio n error) . Ther e is a c onstant c > 0 such that k µ t − f µ t k ≤ c · q k µ >t − ν >t k holds for al l ν >t with 0 ≤ ν >t ≤ µ >t , wher e f µ t = µ  f t ( X )[ X >t / ν >t ]  . In tuitivel y , Lemma 5.2 state s that if ν >t has k v alid bits of µ >t , then f µ t has r oughly k / 2 v alid bits of µ t . In other w ords, (at most) one half of the v alid bits are lost on eac h lev el of the D AG d ue to the propagation error. The follo wing theorem assu r es that after com bining the propagation error and the appro ximation err or, DNM still con verge s linearly . Theorem 5.3. L et f b e a quadr atic MSP. L et ν ( j ) denote the r e su lt of c al ling DNM( f , j ) (se e Figur e 1). Then ther e is a k f ∈ N such that ν ( k f + i ) has at le ast i valid bits of µ f for every i ≥ 0 . CONVERGENCE THRESHOLDS OF NEWTON’ S METHOD 299 W e conclude that increasing i b y one gives us asymptotically at least one additional bit in eac h comp onent and, b y Pr op osition 5.1, costs w ( f ) · 2 h ( f )+1 additional Newton iterations. In the tec hn ical rep ort [5] w e giv e an example that shows that the b ound ab o ve is essen tially o p timal in the sense that an exp onen tial (in h ( f )) n um b er of iteratio ns is in general needed to obtain an add itional bit. 6. Newton’s Metho d for General MSPs Etessami and Y ann ak akis [10] in tro duced DNM b ecause they could sho w that the m atrix in verses used by Newton’s metho d exist if Newton’s metho d is run on eac h SCC separately (see Theorem 2.6). It ma y b e sur prising that the matrix in v erses used b y Newton’s method exist ev en if the MSP is not decomp osed. More precisely one can show the follo wing theorem, see [5]. Theorem 6.1. L et f ( X ) b e any MSP, not ne c essarily str ongly c onne cte d. L et the Newton op er ator N f b e deﬁne d as b efor e: N f ( X ) = X + (Id − f ′ ( X )) − 1 ( f ( X ) − X ) Then the N ewton se quenc e ( ν ( k ) f ) k ∈ N with ν ( k ) = N k f ( 0 ) is wel l-deﬁne d (i.e., the matrix inverses exist), monotonic al ly incr e asing, b ounde d fr om ab ove by µ f (i.e. ν ( k ) ≤ ν ( k +1) ≺ µ f ), and c onver ges to µ f . By exploiting Th eorem 5.3 and Th eorem 6.1 o n e can show the follo win g th eorem w hic h addresses the con ve rgence sp e e d of Newton’s Metho d in general. Theorem 6.2. L et f b e any quadr atic MSP. Then the Newton se quenc e ( ν ( k ) ) k ∈ N is wel l-deﬁne d and c onver ges line arly to µ f . Mor e pr e cisely, ther e is a k f ∈ N such that ν ( k f + i · ( h ( f )+1) · 2 h ( f ) ) has at le ast i valid bits of µ f for every i ≥ 0 . Again, the 2 h ( f ) factor cannot b e a voi d ed in general as sh o wn by an example in [5 ]. 7. Conclusions W e ha v e pro v ed a threshold k f for strongly connected MSPEs. After k f + i Newton iterat ions w e hav e i bits of acc u racy . Th e thr esh old k f dep end s on the represent ation size of f and on the least solution µ f . Although this latter dep endence migh t seem to b e a problem, lo w er and u pp er b ounds on µ f can b e easily d er ived for sto c hastic m o dels (probabilistic programs with p ro cedures, sto chastic con text-free grammars an d bac k-bu tton pr o cesses). In particular, this allo ws us to sho w that k f dep end s linearly on the representa tion size for bac k-button p ro cesses. W e ha ve also sh o wn by means of an examp le that the thr eshold k f impro ves wh en the n umber of iterations increases. In [16] we left the problem w hether DNM con verges linearly for non-strongly-connected MSPEs op en. W e hav e pro v en that this is the case, although the co nv ergence r ate is p o orer: if h and w are the heigh t and width of the graph of SCCs of f , then there is a threshold e k f suc h that e k f + i · w · 2 h +1 iterations of DNM compute at least i v alid bits of µ f , wh ere the exp onential factor cannot b e a v oided in general. Finally , w e ha ve sho wn that th e J acobian of the wh ole MSP E is guaran teed to exist, whether the MSPE is stron gly connected or n ot. 300 J. ESP ARZA, S. KIEFER, AND M. LUTTE NBERG ER Ac knowled gments. The authors wish to thank Koush a Etessami and anon ymous referees for ve r y v aluable commen ts. References [1] L. Blum, M. S hub, and S. Smale. On a th eory of computation and complexity ov er th e real numbers: NP-completeness, recursive functions and universal machines. Bul letin of the Amer. Math. So ciety , 21(1):1–46 , 1989. [2] T. Br´ azdil, A. Kuˇ cera, and O. S tra ˇ zovsk´ y. On the decidability of temp oral prop erties of probabilistic pushdown automata. In Pr o c e e dings of ST A CS’2005 , volume 3404 of LNCS , pages 145–157. Sprin ger, 2005. [3] R.D. Do wel l and S.R. Edd y . Ev aluation of se veral li ghtw eigh t stochastic con text-free grammars for RNA secondary structure prediction. BM C Bi oi nformatics , 5(71), 2004. [4] R. D u rbin, S.R. Ed d y , A . Krogh, and G.J. Mic hison. Biolo gic al Se quenc e Analysis: Pr ob abilistic Mo dels of Pr oteins and N ucl ei c A cids . Cam bridge U niversit y Press, 1998. [5] J. Esparza, S. Kiefer, and M. Lutten b erger. Conv ergence thresholds of Newton’s method for monotone p olynomial equations. T ec hnical rep ort, T echnisc h e Universit¨ at M¨ un c h en , 2007. [6] J. Esparza, A. Kuˇ cera, and R . Mayr. Model-chec king probabilistic pushd own automata. In Pr o c e e dings of LICS 2004 , pages 12–21, 2004. [7] J. Esparza, A. Kuˇ cera, and R . Mayr. Quantitativ e analysis of probabilistic pushdown automata: Ex- p ectations and v ariances. I n Pr o c e e dings of LICS 2005 , pages 117–126. IEEE Computer S ociety Press, 2005. [8] K. Etessami and M. Y annak akis. Algorithmic veriﬁcation of recursive probabilistic systems. In Pr o c e e d- ings of T ACAS 2005 , LN CS 3440, pages 253–270. S pringer, 2005. [9] K. Etessami and M. Y annaka k is. Checking L TL prop erties of recursive Marko v chains. In Pr o c e e dings of 2nd I nt. Conf . on Quantitative Evaluation of Systems (QEST’ 05) , 2005. [10] K. Etes sami and M. Y annak akis. Recursive Marko v c hains, sto chastic grammars, and monotone systems of nonlinear equ ations. In ST ACS , pages 340–35 2, 2005. [11] K. Etessami and M. Y annak akis. Recursive Marko v decision pro cesses and recursiv e sto chas tic games. In Pr o c e e di ngs of ICALP 2005 , volume 3580 of LNCS , pages 891–903. Springer, 2005. [12] K. Etessami and M. Y ann ak akis. R ecursive Marko v chains, stochastic grammars, and monotone systems of nonlinear equations, 2006. Draft journal submission, http://hom epages.inf.ed.ac. uk/kousha/bib_index.html . [13] R. F agin, A.R. Karlin, J. Kleinberg, P . R aghav an, S. Ra jagopalan, R. Rubinfeld, M. Sudan, and A. T omkins. Random w alks with “bac k buttons” (extend ed ab stract). In STOC , pages 484–493, 2000. [14] R. F agin, A.R. Karlin, J. Kleinberg, P . R aghav an, S. Ra jagopalan, R. Rubinfeld, M. Sudan, and A. T omkins. Random w alks with “bac k buttons”. Annals of Appli e d Pr ob ability , 11(3):810–862, 2001. [15] S. Geman and M. Johnson. Probabilistic grammars and their applications, 2002. [16] S. Kiefer, M. Luttenberger, and J. Esparza. On the conv ergence of Newton’s meth od for monotone systems of p olynomial equations. I n Pr o c e e dings of STOC , pages 217–226. ACM , 2007. [17] B. Knudsen and J. Hein. Pfold: RNA secondary structure p rediction using sto chastic context-free grammars. Nucleic A cids R ese ar ch , 31(13):3423–3 428, 2003. [18] W. Kuich. Handb o ok of F ormal L anguages , volume 1, chapter 9: Semirings and F ormal Po w er Series: Their Relev ance to F ormal Languages and Aut omata, p ages 609 – 677. S pringer, 1997. [19] C. Manning and H. Sch¨ u tze. F oundations of Statistic al Natur al L anguage Pr o c essing . MIT Press, 1999. [20] J.M. Ortega and W.C. Rheinboldt. Iter ative solution of nonline ar e quations in sever al variables . Aca- demic Press, 1970. [21] Y. S ak abik ara, M. Brown, R. Hughey , I.S. Mian, K. Sjolander, R.C. Underwood, and D. Haussler. Sto c h astic context-free grammars for tR N A. Nucleic Ac i ds Re se ar ch , 22:5112 –5120, 1994. This wor k is licensed under the Cre ative Commons Attr ibution-NoDer ivs License. T o view a copy of this license, visit http:// creativecommons.org/licenses/by- nd/3.0/ .

Convergence Thresholds of Newtons Method for Monotone Polynomial Equations

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment