Alternatives with stronger convergence than coordinate-descent iterative LMI algorithms

In this note we aim at putting more emphasis on the fact that trying to solve non-convex optimization problems with coordinate-descent iterative linear matrix inequality algorithms leads to suboptimal solutions, and put forward other optimization met…

Authors: Emile Simon, Vincent Wertz

1 Alternati v es with stronger con v er gence than coordinate-descent iter ati ve LMI algorithms Emile Simon and V incent W ertz Abstract —In this note we aim at putti n g more emphasis on the fact that trying to solve n on - con vex optimization problems with coordinate-descent iterative linear matrix i n equality algorithms leads to suboptimal solutions, and put forwa rd other optimization methods better equipped to deal with such problems (having theoretical conv ergence guarantees and/or being mor e efficient in p ractice). This fact, alr eady outlined at sev eral p laces in the literature, still ap p ears to be disrega rded by a sizable p art of the systems and control community . Thus, main elements on this issue and better opti mization alternative s are presented and illustrated by means of an example. Index T erms —Optimization algorithms, LMIs, BMIs, Con ver - gence I . M A I N P O I N T Probably because of the success and large d omination of conv ex a nd Linear Ma trix Ineq uality (LMI) optimization in systems an d contro l theory for th e last two deca des, it regularly happen s that researcher s try to solve no n-conve x optimization p roblems thr ough the use o f LMIs or conve x subsets/approx imations, a s if using th is p ath was adequ ate for all pro blems or the only acceptable possibility . The point of this brief note is to shed m ore light on this misun derstanding , and put forward other o ptimization ap proache s w ith strong conv ergence g uarantees an d very efficient in practice f or n on- conv ex problems. W e consider in particula r the prob lems admitting a Bilinear Matrix Inequality ( BMI) representation: in volving product terms between variables, som etimes called ‘ complicating vari- ables’. A classical attempt to solve th ese prob lems is gi ven b y the following heuristic: 1) Sp lit th e set of com plicating variables in two subsets, so that fixing either subset tu rn th e BMI in to an LMI. 2) Cho ose/design initial values for on e of the two subsets. 3) Fix the variables of th e subset obtained at the previous step, an d take the variables of the other subset as fr ee optimization variables. 4) Min imize th e ob jecti ve with LMI op timization. 5) Repe at steps 3) and 4) iteratively , until the o bjectiv e value reaches a given target or d ecreases less than a giv en accu racy . This form s what can be d escribed as a co ordinate- descent iterativ e LMI algorithm (CDILMI), which is the most common kind of iterative LMI algorithms (ILMI s). The seq uence of objectives generated by th is typ e of scheme is only guar anteed to be mono tonously non- increasing. E. Simon and V . W ertz are with the Pole of Applied Mathematics, ICTE AM Institut e, Uni versit´ e Catholique de Louvain, 4 ave nue Georges L ema ˆ ıtre, 1348 Louv ain-la-Ne uve, Belgium (Corresponding author: simonemile@gma il.com, T el/Fa x. +32 10 47 88 10/ 21 80). A. Lack of c on ver gence of CDILMIs The issue is that such heu ristics will lead to ‘partial optimal’ solutions [1]: optimal in the ‘directio ns’ of the two subsets of variables, but not in all d irections. Th is means that algo rithms of this type stop at solu tions which are not loca lly optimal and ar e u nable to follow directions that would improve the objective. The con tingency where such a n alg orithm lead s to a locally optimal solution should be very seldom: as outlined in a similar context in [2 ], it should almost never be the case. No te also that ILMIs typically stall on ce the sequence of solutio ns reaches a bord er of th e BMI feasible set, while being nowhere close to a lo cally o ptimal so lution. Remark however that it is possible to develop ILMIs g uar- anteed to conver ge to locally optimal solutions: o ne such rare example is given in [ 3], where the p roof of convergence d oes not rely on a coordina te-descent pr inciple ( other ingredien ts are used to ensur e conver gence). Anyhow , ILMIs will not maintain the gu arantee of conver gence towards g lobally op- timal solution s no r th e poly nomial tim e complexity b ound (unless P = N P ), both ho lding fo r convex/LMI pro blems. Instead of only tryin g to explore non -conv ex spaces with conv ex subsets or approximatio ns, other alternatives that should b e c onsidered are local optimization metho ds (which moreover do not require ad ditional optim ization variables, much u nlike convex app roximation s o r re laxations). B. Alterna tive 1: Gradient-b ased methods The main difficulty is that many optimization problems in systems and con trol are non -smooth, so usual gradients d o not exist ev erywher e. For instance, minim izing the H ∞ norm is a min-ma x ob jecti ve function , i.e. locally Lip schitz, an d the spectral abscissa is not even locally Lipschitz. T o ensur e conv ergence for these p roblems, it is necessary to conside r non-smo oth optimization m ethods. In th e cur rent context, two meth ods clearly stand ou t: the open-so urce HIFOO [4], [5] an d commercial h infstruct [6] ([7]) (both implemen ted under Matlab). W e strong ly advise the re ader to consult these p apers an d related works. Both methods consider classical optim al reduced- order (so no n- conv ex) controller design s for con tinuous-time L inear Time In variant systems. While the range of pro blems covered by HIFOO is curr ently bro ader than that of hinfstruct , the latter metho d allows design ing a ny con trol ar chitecture made of conv entional compo nents as well as cu stomized blo cks. More to the po int, what matters are th eir underlyin g mech- anisms to deal with non -smoothn esses. W e g iv e a simp lified description as follows. 2 HIFOO uses a conventional BFGS algorithm in its first phase, and then a ran dom g radient sam pling and bundle in the second p hase. This converges based o n the hy pothesis that the function is d ifferentiable almost everywhere, so the re exist gradients near the curren t iterate which can b e exploited. The mechanism behind hinfs truct is more elegant b ecause it relies on extension sets of the Clarke sub- differential at each iteration. T hus, local information is comp letely av a ilable and is used to build a q uadratic tangent mo del of the objective, efficiently optimize d at the curre nt iterate. It can b e foun d in some p laces like [8] that hi nstruct shou ld in general be faster th an HI FOO. Considering their theor etical conv ergence gu arantees, ease of use, robustness and ef ficiency in practice, using either of these me thods should present o nly advantages co mpared to ILMIs or any h euristics with weak convergence pr operties. C. Altern ative 2 : Derivative- fr ee method s Another po ssible directio n to solve optimization p roblems, which makes sense when gradien t in formation is no t av ailab le (at all, or accu rately , or at an acc eptable com putational cost), is to con sider Deriv ative-Free Optimization ( DFO) methods (see th e boo k [9] f or a compr ehensive presentatio n, or [1 0, Chap. 6] for a summ arized on e). Apparen tly , th e first work in systems and contro l where the idea of using a DFO method was thoroug hly investigated is developed in th e related papers [11] and [7]: The autho rs in vestigate the conv ergence of the multidirectional search (MDS) algorithm on no n-smooth p roblems (spe ctral abscissa and H ∞ norm) , outline the lack of co n vergenc e of this method for the se pro blems, and prop ose additional n on-smoo th (fir st- order) steps wh ich en sure c on vergence (this work later led to hinfstruct [6]). Next to [1 1] a nd [ 7], the d irection of using classical DFO methods was also v isited in [ 12], with the fu ndamental pro b- lem o f static o utput feedba ck (SOF) stabilizatio n (using thr ee implementatio ns available in [13]). The benchmark results in [12] show that these metho ds ar e very successful for these (conjectur ed) N P -h ard prob lems. Motivated by bo th these papers and the av a ilability of the proved c on vergent me thod HIFOO, we inves tigated fu rther the perform ance of a DFO method on a bro ad bench mark of not on ly find ing stabilizing SOF con trollers but also minimizing the H 2 and H ∞ norm of a perform ance ch annel [14]. I t app eared tha t, desp ite on ly 0th order information was used by th e DFO metho d, it performed reasonably well c ompared to HIFOO ( sometimes b etter , both for the cpu times or ob jecti ve values). Still, as ou tlined in [12], an explanatio n f or the goo d perfor mance of DFO methods remains to be identified. I n our o pinion, such r easons were partly outlined in [7] wh en referenc es are given to [15] an d [ 16], w here are developed importan t conv ergence results. It is worthwh ile to no te that these strong co n vergence g uarantees are largely absent f rom the system s an d con trol liter ature: we could not find any paper where a development relies on these gua rantees. These proo fs, giv en as follows, are howe ver the key behind the convergence of DFO methods. Fi ve results must be noted in particular: [15], [17], [18], [16], [19], su mmarized as follows. For smooth unco nstrained p roblems, co n vergence to a sta- tionary poin t is gu aranteed b y [1 5], and th e gradient no rm is shown to be tied to the step size par ameter [17],[9, p. 123]. A h ierarchical c on vergence an alysis is prop osed in [18] for the n on-smoo th case: the autho rs provide convergence results for assump tions ranging fro m strict d ifferentiability , regularity , Lipschitz continuity , lower semi-contin uity and e ven g eneral non-smo oth fu nctions. Their main result in the Lipschitz case is that the m ethod generates a limit p oint where the Clarke directional derivati ves are n on-negative in a set of positive spanning direction s. Later, [16] g eneralizes the method and propo ses the MADS algo rithm for n on-smooth con strained optimization prob lems, for which the set of directions with non-n egati ve Clarke d eriv atives becomes a symptotically dense in R n . Th is con vergence a nalysis is exten ded to the second - order in [20] and f or a class o f discon tinuous functio ns in [19]. Note that since only a finite nu mber of search directions are generated in practice, these asymptotical co n vergence guaran tees may p resent some limitations (see [10, pp. 111- 114] and also [21]). From an user per spectiv e, DFO methods are simple to use and imp lement and may quickly be tr ied as a first attempt to gain insights on an optimizatio n pr oblem or alternatively as last resort when other methods are not adequ ate. Th ese methods, now supp orted by strong co n vergence p roperties, will yield better solu tions than th ose obtained with heuristics not backed up b y such co n vergence analysis. W e recommend [9, Chap. 1] for a mor e d etailed d escription. W e also refer to the thesis [10] and ref. therein fo r broad er analysis and presentation s on the curr ent topic. In summary , as long as an I LMI is not guaranteed to lead to loc ally optimal solu tions, this k ind of scheme might only be useful to find initial solu tions ( and not even nece ssarily then, bec ause such solutio ns may be poorly located ). There- fore, o ther o ptimization metho ds having better conv ergence proper ties and/or efficiency in practice sh ould b e u sed instead . The main po int o f this note has been dr awn. In the second (and last) section, we draw an illustratio n by means of an example with a CDILMI recently pr oposed. I I . E X A M P L E O F C D I L M I A N D R E S U LT S W e illustrate the above-men tioned id eas with the a lgorithm of [22] which con siders the design of redu ced-ord er filter s that must be positiv e (i.e. with all en tries of the state-space matrices positi ve in the discrete-time case) and respecting a giv en perfo rmance level, with the perfo rmance ch osen as the H ∞ norm of the filtering error system, in the co ntext of discrete-time p ositi ve L TI systems. It must be noted that in [22] the objecti ve is to find a solution under a giv en pe rforman ce level a nd not to minim ize this objective function. In that sense, the focu s there is n ot put on the co n vergence o f the algo rithm towards locally optimal solutions, b u t rather on s everal other contributions o f the pa per . In pa rticular , a structure is put fo rward using the system augmen tation appr oach which can be exploited to deal with constraints ( here the po siti vity co nstraint) tha t co uld other wise not be cast un der a BMI and co rrespond ing I LMI. 3 Numerical r esults and commen ts The details o f the problem are gi ven in [22, Sec. IV], and are omitted h ere for the sake of brevity 1 . W e directly copy the state-space matrices of the filter ob tained in [22] her eunder: ˆ A = 0 . 228 19 , ˆ B = [0 . 00003 0 . 000 03] , ˆ C = 0 . 1413 0 , ˆ D = [0 . 17889 0 . 34404 ] . This filter solution is n ot satisfying because its dy namical part (matr ices ˆ A , ˆ B , ˆ C ) is canceled o ut by the very sm all entries of the ˆ B matrix. So this filter ca n be appro ximated by only its ˆ D ma trix, without almost n o impact ( 0 . 0 03% ) on the perfor mance lev el ( around 0. 1417). What pro bably happe ned is that the CDILMI was b locked to this solution against a border o f the feasible set (here the positivity co nstraint) which is a typical p henomen on of these type of algorith ms. This illustrates th at m ore conver gent lo cal optim ization methods should ha ve been considered instead. For such L TI filter/controller d esign p roblems, we recomm end in particular the meth ods HIFOO and hinf struct , guar anteed to fin d locally op timal solutions and very efficient in pr actice. Note howe ver that the curren t versions o f these progr ams ar e implemented f or contin uous-time L TI system s and do not yet feature the possibility to add a constrain t of an admissible range f or the variables, such as th e po siti vity constraint. Anyhow , using gradient informatio n will not prove neces- sary to get excellent results on the conside red pro blem. For illustration purposes, we per formed 100 o ptimizations from ran dom in itial solutions with se vera l DFO m ethods 2 . Almost all of the solution s had a lower p erforman ce le vel th an 0.141 5 an d m ost of the solutions thu s obta ined (de pending on the m ethod u sed) had p erforman ce levels betwee n 0.04 47 and 0.044 8, and never lower le vels (th us 0 .0447 is a ca ndidate f or a g lobally op timal level). This g iv es one illu stration o f the fact that g eneral-pu rpose local o ptimization metho ds may largely outperf orm CDILMIs. One of the obtained solutions is for in stance the following: ˆ A = 0 . 0 561 , ˆ B = [0 . 268 6 1 . 07 49] , ˆ C = 0 . 3094 , ˆ D = [0 . 1521 0 . 1 089] , which is located inside the feasible set, i.e. not again st the positivity co nstraint n or the limit of stability . W e tried many local optimization methods f rom that solution and n one could improve it, including a version of MADS which has a th eo- retical co n vergence guarantee app lying to the curr ent context of non-smo oth locally Lipschitz pro blems [18]. One m ethod in p articular was b etween 90 an d 100% su c- cessful at reaching solutions with such pe rforman ce le vel (depen ding on the accu racy used) 3 : Th e imp lementation o f the Nelder-Mead alg orithm by N. Higham, av ailable in [13] and also u sed in [1 2], which we restart at the last solution re turned until no im provement is obtained to a given accuracy ( this im - proves the method a lot, see the h yperlinked works for details). In practice, this m ethod should lead to better solution s th an 1 W e only m entio n that the val ue of b 3 was erroneously written = 0 . 0128 in [22], instead of its correct val ue = 0 . 385 2 Full experi mentati ons details are giv en on [10 , Subsec. 8. 2.3] 3 For con veni ence, the Matla b files reproduci ng these resul ts are av ailable on www .mathworks.com/matl abcentra l/filee xchange/33219 ILMIs that are n ot supported by a strong convergence analysis. And when (sub)gr adient information is readily av ailable, we recommen d using HIFOO or hi nfstruct wh ich come with solid conver gence certificates of lo cal op timality , or to develop a method based on the sam e mech anisms. A C K N OW L E D G M E N T S The authors gratefully ackn owled ge the re viewers and editor f or their constructi ve comments and Charles Audet for i ts hand i n the summary of DFO con ver gence results. Also ackno wledged are Ping Li for c omments on an ea rlier version and Didier Henrion for pointing out the reference [2]. This research is supported by t he Belgian Network D YSCO (Dynamical S ystems, C ontrol, and Opti mization) funded by the Interuniv ersity Attraction Poles Programme of the Belgian S tate, Science P olicy Office. R E F E R E N C E S [1] R.E. W endell and A.P . Hurter , “Minimization of a non-separable objecti ve functio n subject to disjoint constrai nts”, Operati ons Research , vol. 24, no. 4, pp. 643–657, 1976. [2] J.W . Helton and O. Merino, “Coordinate optimization for bi-con ve x matrix inequal ities”, in Proceedi ngs of the 36th IEE E Conf. on Decision & Control , 1997, pp. 3609–3613. [3] T .D . Quoc, S. Gumussoy , W . Michiels, and M. Diehl, “Combinin g con vex-c onca ve decompositions and lineari zation approaches for solving BMIs, with applicati on to s tati c output feedback”, IEEE Transac tions on Automatic Control , accepted , 2011. [4] J.V . Burke, D. Henrion, A.S. Lewi s, and M.L. Overton, “HIFOO - A Matlab package for fixed-order control ler design and H ∞ optimiza tion”, in Proceedings of the IF AC Sym posium on Robust Control Design , 2006. [5] J.V . Burke and D. Henrion and A. S. Lewis and M.L . Ove rton, “Sta- biliz ation via nonsmooth, noncon vex opti mization ”, in Tra nsactions on Automatic Control , vol. 51, no. 11, pp. 1760-1769, 2006. [6] P . Apkaria n and D. Noll, “Nonsmooth H ∞ synthesis”, Transacti ons on Automatic Control , vol. 51, no. 1, pp. 71–86, 2006. [7] P . Apkarian and D. Noll, “Contro ller design via nonsmooth multidi rec- tional search”, SIAM J. on Control and Optimization , vol. 44, no. 6, pp. 1923–1949, 2006. [8] D. Ankelhed, “On design of low-or der H ∞ control lers”, PhD thesis, Link ¨ oping Uni versit y , 2011. [9] A. Conn, K. Scheinb erg and L.N. V icente, “Introduc tion to Deri va ti ve- Free Optimization” , MPS-SIAM Series on Optimizati on, SIAM , 2009. [10] E. Simon, “ A perspecti ve for optimization in systems and control : from LMIs to deriv ati ve-fre e methods”, PhD thesis, Univ ersit ´ e catholique de Louv ain, 2012. http:// hdl.handle .net/2078.1/114822 [11] P . Apkarian and D. Noll, “Contro ller design via nonsmooth multi- direct ional searc h”, 2nd IF AC Symposium on S SC, Mexico , 2004. [12] D. Henri on, “Solvin g static output feedba ck problems by direc t searc h optimiza tion”, in Proceedi ngs of the IEEE CA CSD Conf., Munich , 2006. [13] N.J. Higham, “The matrix computati on toolb ox. ” http:/ /www .maths.manchester . ac.uk/ ∼ higham/mct oolbox/ [14] E. Simon, “Optimal static output feedback design through direct search”, in Proc. of the 50th IEEE CDC & ECC , pp. 296-301, 2011. http:/ /arxi v .org/a bs/1104.5369v2 [15] V . T orczon, “On the con ver gence of patte rn search algorithms”, SIAM J. on Optimizat ion , vol. 7, pp. 1–25, 1997. [16] C. Audet and J.E. De nnis Jr ., “Mesh Adaptati ve Direct Search algorit hms for const rained optimizat ion”, SIAM J. of Optimizat ion , vol. 17, no. 1, pp. 188–217, 2006. [17] D. Dolan, R.M. Lewis and V . T orczon, “On the local con vergenc e of patte rn search”, SIAM J. on Optimization , vol. 14, pp. 567–583, 2003. [18] C. Audet and J.E. Dennis, “ Analysis of generaliz ed pattern searches”, SIAM J. on Optimizat ion , vol. 13, no. 3, pp. 889–903, 2003. [19] L.N. V icente and A. Custodi o, “ Analysis of direct s earch es for discon- tinuous functio ns”, to appear in Mathemat ical Programming , 2011. [20] M.A. Abramson and C. Audet., “Con verg ence of mesh adapti ve direct search to second-o rder stationary points”, SIAM J. on Optimization , vol. 17, no. 2, pp. 606–619, 2006. [21] C. Audet, “Con ver gence results for patte rn search algorithms are tight”, Optimiza tion and Engineering , vol. 5, no. 2, pp. 101–122, 2004. [22] P . Li, J. Lam and Z. Shu, “ H ∞ positi ve filtering for positi ve linear discrete -time systems: an augmentation approach”, IE EE Transacti ons on Automatic Control , vol. 55, no. 10, pp. 2337–2342, 2010.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment