Complexity of Unconstrained L_2-L_p Minimization
We consider the unconstrained $L_2$-$L_p$ minimization: find a minimizer of $\|Ax-b\|^2_2+\lambda \|x\|^p_p$ for given $A \in R^{m\times n}$, $b\in R^m$ and parameters $\lambda>0$, $p\in [0,1)$. This problem has been studied extensively in variable s…
Authors: Xiaojun Chen, Dongdong Ge, Zizhuo Wang
Complexit y of Unconstrained L 2 - L p Minimization Xiao jun Chen ∗ Dongdong Ge † Zizh uo W ang ‡ Yin yu Y e § Septem ber 1 3, 2018 Abstract W e consider the unconstrained L 2 - L p minimization: find a minimizer of k Ax − b k 2 2 + λ k x k p p for g iven A ∈ R m × n , b ∈ R m and par ameters λ > 0 , p ∈ [0 , 1). This problem has b een studied extensively in v ariable selection and sparse least sq uares fitting for high dimensional data. Theoretical results s how tha t the minimizers of the L 2 - L p problem ha ve v ario us attractive featur e s due to the co ncavit y and no n- Lipschitzian prop er t y of the regulariz a tion function k · k p p . In this pap er, we show that the L q - L p minimization problem is stro ngly NP-hard for a ny p ∈ [0 , 1) and q ≥ 1, including its smo othed version. On the o ther hand, we show that, by cho osing par ameters ( p, λ ) carefully , a minimizer, g lobal or lo cal, will hav e certain desired s parsity . W e be lie ve that these results pr ovide new theor etical insights to the studies and applica tions of the co nc ave reg ularized optimization problems. Keyw ords. Nonsmo oth optimization, non conv ex optimization, v ariable selection, sparse solution reconstruction, bridge estimator. MSC2010 Classification. 9 0C26, 90C51 1 In tro duction In this pap er , we consider the follo wing L 2 - L p minimization pr oblem: Minimize x f p ( x ) := k Ax − b k 2 2 + λ k x k p p (1) where data and parameter A = ( a 1 , ..., a n ) ∈ R m × n , 0 6 = b ∈ R m , λ > 0 and 0 ≤ p < 1, and v ariables x ∈ R n . This regularized formulation h as b een studied extensive ly in v ariable selection and sparse least squares fitting for high dimensional data, see [1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13 ] and references therein. Here, w hen p = 0, k x k 0 0 = k x k 0 = |{ i : x i 6 = 0 }| ∗ Department of Applied Mathematics, The Hong Kong P olytechnic U niversi ty , H ong Kong, China. E- mail: maxj chen@polyu .edu.hk † Antai School of Economics and Management, Shanghai Jia o T ong Univers ity , Shanghai, China. E-mail: ddge@sjtu. edu.cn ‡ Department of Manag ement Science and Engineering, Stanford Universit y , Stanfo rd, CA 94305. E–mail: zzwang@sta nford.edu § Department of Management Science and Engineering, Stanford U niversit y , Stanford, CA 94305; and Visiting Professor of Department of App lied Mathematics, The Hong Kong Po lytechnic Un iversi ty , Hong Kong, China. E–mail: yinyu-ye@stanfo rd.edu 1 that is, the num b er of nonzero entries in x . The original goal of the mo del was to fin d a least squ ares solution w ith few er n onzero en tries for an und er-determined linear system that has m ore v ariables than the data measurements. F or this p urp ose, p eople considered the regularized L 2 - L 0 problem. F or ins tance, the v ariable subset s election metho d can b e view ed as th e L 2 - L 0 problem, wh ic h is the most p opu lar metho d of regression regularization u sed in statistics [6]. Ho we v er, the L 0 regularized p roblem is d ifficult to deal with b eca use of the d iscrete structure of the 0-norm, while the solv abilit y of the L 2 - L p problem for p ∈ (0 , 1) can b e derive d fr om the con tinuit y and leve l b oundedness of f p . A (global) minimizer of the L 2 - L p problem is also called a bridge estimator in statistical literature [6 ] and h as v arious nice prop erties includin g the oracle prop erty [4 , 10, 11]. Moreo v er, theoretical results show that in distingu ish ing zero and nonzero en tries of co efficien ts in sparse h igh-dimensional app ro ximation, the br idge estimators hav e adv an tages o ve r the Lasso estimators that minimize the follo wing conv ex L 2 - L 1 minimization problem: Minimize x f 1 ( x ) := k Ax − b k 2 2 + λ k x k 1 . (2) Due to these adv an tages, researc h er s ha ve b een interested in the L p regularization problem for 0 < p < 1. Ho we v er, the L 2 - L p problem (1) is a noncon vex, non-Lipschitz optimization problem. There are n ot man y optimization theories on analyzing this t yp e of problems. Man y practical approac hes hav e b een dev elop ed to tac kle the pr oblem (1), see, e.g., [1, 2 , 3 , 10, 12]; but there is no globally con verge n t algorithm that guaran tees to find a global minimizer or bridge estimator. T o the b est of our kno w ledge, the computational complexit y of the L 2 - L p minimization problem remains an op en problem. One ma y attempt to d ra w a hardness resu lt from th e follo wing problem: Minimize k x k p p Sub ject to Ax = b, (3) whic h is sho w n in [9] to b e strongly NP-hard for p ∈ [0 , 1); or the problem Minimize k x k 0 Sub ject to k Ax − b k 2 ≤ ǫ, (4) whic h is sh o wn in [13] to b e NP-hard for certain ǫ . F rom a complexit y theory p ersp ectiv e, an NP-hard optimization problem with a p olynomially b ounded ob jectiv e function d o es not adm it a p olynomial-time algorithm, and a strongly NP-hard optimization p roblem with a p olynomially b ound ed ob jectiv e fun ction do es not ev en adm it a fully-p olynomial- time appro ximation sc heme (FPT AS), u n less P=NP [16]. Indeed, the L 2 - L p problem (1) can b e view ed as a quadr atic p enalt y problem of problem (3). In tuitiv ely , solving an unconstrained p enalty optimization problem is easier than solving th e constrained optimization problem. Unfortu nately , we show th at this is n ot tru e. More p recisely , w e sho w that fin ding a global minim izer of L 2 - L p problem (1) r emains strongly NP-hard for all 0 ≤ p < 1 and λ > 0, including its s mo othed v ersion. W e also extend the strong NP-hard ness result to the L q - L p minimization pr oblem for q ≥ 1. On th e p ositive side, we present a su fficien t cond ition on the c hoice of λ for the desired sparsit y of all minimizers, global or lo cal, of the L 2 - L p problem for giv en ( A, b, p ), as long as their ob jectiv e v alue is b elo w that of the all-zero solution. Und er this condition, an y such a lo cal optimal solution of pr oblem (1) is a sparse estimator to the original problem. This ma y 2 explain w h y m an y metho d s, e.g., [1, 2, 3, 10, 12], ha ve rep orted encouraging computational results, although what they calculate ma y n ot b e a global minimizer. The remainder of this pap er is organized as follo ws: in Section 2, w e present sufficient conditions on the choic e of λ to meet the sparsit y requirement of global or lo cal minimizers of the L 2 - L p minimization problem. In general, w hen λ is su fficien tly large with resp ect to data ( A, b ) and p , the n umb er of n onzero entries in any min imizer of th e problem must b e s m all. I n Section 3, w e prov e that the L q - L p minimization problem: Minimize x f q ,p ( x ) := k Ax − b k q q + λ k x k p p (5) is strongly NP-hard for an y giv en 0 ≤ p < 1, q ≥ 1 and λ > 0. W e then extend our hardness result to its smo othed version: Minimize x f q ,p,ǫ ( x ) := k Ax − b k q q + λ P n i =1 ( | x i | + ǫ ) p (6) for any giv en 0 < p < 1, q ≥ 1, λ > 0 and ǫ > 0, even though the ob jectiv e function in th is case is Lipsc h itz contin uous. Thus, c h an ging the non-Lipsc hitz regularizatio n mo d el (5) to a Lipsc hitz con tinuous mo del (6) gains n o adv an tage in terms of computational complexit y . Finally , w e sho w that our results are consisten t with the existing findings from statistical literature, bu t giv e more s p ecific b ounds on c ho osing regularization parameters. W e also illustrate that f or the purp ose of finding a least squ ares solution with a targeted n u m b er of nonzero en tries, fi nding a local m in imizer of pr oblem (1 ) is likel y to accomplish the same ob jectiv e as fin d ing a global minimizer do es. In the rest of the p ap er, w e d efine z 0 = 0 if z = 0 and z 0 = 1 if z 6 = 0. W e use ( x · y ) to represent the vec tor ( x 1 y 1 , . . . , x n y n ) T ∈ R n and k · k to denote the L 2 norm. 2 Cho osing the parameter λ for sparsit y In app lications lik e v ariable selection and sparse solution reconstruction, one w an ts to find least square estimators with no more than k nonzero entries. On the other h and, one ob viously w ants to av oid the all-zero solution. The L 2 - L p regularized app roac h is to firs t solv e L 2 - L p problem (1) to fi n d a minimizer. Then, eliminate all v ariables who hav e zero v alues in the minimizer, and solve the least square problem using only remaining v ariables. Thus, th e key is to control the supp ort size of minimizers of problem (1) suc h that it do es n ot exceed k , and this is typica lly accomplished by selecting a su itable λ . W e no w giv e a sufficient cond ition on λ for the minimizers of th e L 2 - L p problem to ha v e d esirable sparsity . Theorem 1. L et β ( k ) = k p/ 2 − 1 2 α p (1 − p ) p/ 2 k b k 2 − p , α = max 1 ≤ i ≤ n k a i k 2 , 1 ≤ k ≤ n. (7) The fol lowing statements hold . (1) If λ ≥ β ( k ) , any minimizer x ∗ of L 2 - L p pr oblem (1) satisfies k x ∗ k 0 < k for k ≥ 2 . (2) If λ ≥ β (1) , x ∗ = 0 is the uniqu e minimizer of L 2 - L p pr oblem (1). (3) Supp ose that set C := { x | Ax = b } is non-empty. Then, if λ ≤ k b k 2 k x c k p p for some x c ∈ C , any minimizer x ∗ of L 2 - L p pr oblem (1) satisfies k x ∗ k 0 ≥ 1 . 3 Pr o of. S upp ose that x ∗ 6 = 0 is a global minimizer of the L 2 - L p problem (1). Let B = A T ∈ R m ×| T | , w here T =supp ort( x ∗ ) and | T | = k x ∗ k 0 is th e cardin alit y of the set T . By T h eorem 2.1 and Theorem 2.3 in [3], the columns of B are lin early indep enden t and x ∗ m u st satisfy 2 B T ( B x ∗ T − b ) + pλ ( | x ∗ T | p − 2 · ( x ∗ T )) = 0 . (8) This implies Ax ∗ − b = B x ∗ T − b 6 = 0. Hence w e ha ve f p ( x ∗ ) = k Ax ∗ − b k 2 + λ k x ∗ k p p > λ X i ∈ T | x ∗ i | p ≥ λ | T | λp (1 − p ) 2 α p/ (2 − p ) , (9) where the last inequalit y is from th e lo wer b ound theory f or lo cal m inimizers of (1 ) in [3, Theorem 2.1]. (1) Supp ose that λ ≥ β ( k ) . If x ∗ is a n onzero minimizer of (1) w ith k x ∗ k 0 ≥ k ≥ 1, then from (9) and the defi nition of β ( k ) in (7), w e ha ve f p ( x ∗ ) > k λ 2 / (2 − p ) p (1 − p ) 2 α p/ (2 − p ) ≥ k b k 2 = f p (0) . This con trad icts to that x ∗ is a minimizer of (1). Hence k x ∗ k 0 < k . (2) S upp ose λ ≥ β (1). If x ∗ is a nonzero minimizer of (1), th en there is i suc h that x ∗ i 6 = 0 and f p ( x ∗ ) = k Ax ∗ − b k + λ k x ∗ k p p > λ | x ∗ i | p ≥ λ λp (1 − p ) 2 α p/ (2 − p ) ≥ k b k 2 = f (0) . This con trad icts to that x ∗ is a minimizer of (1). Hence, x = 0 is the uniqu e solution of (1). (3) Note that f p (0) = k b k 2 and f p ( x c ) = λ k x c k p p for x c ∈ C . T h erefore, if λ ≤ k b k 2 k x c k p p for some x c ∈ C (10) then f p (0) ≥ f p ( x c ). Since x c is not a stationary p oin t of L 2 - L p problem [3], there is ˜ x near x c suc h that f p ( x c ) > f p ( ˜ x ) . Hence x = 0 cann ot b e a global minimizer of (1). Remark 1 It w as kno w n that x = 0 is a lo cal minimizer of th e L 2 - L p problem (1) f or any v alue of λ > 0 [3], and x = 0 is a global m inimizer of (1 ) for a “su ffi cien tly large” λ [10]. Theorem 1, for the fir st time, establishes a sp ecific b ound β (1), such that x = 0 is the u nique global minimizer of (1) for λ ≥ β (1). An imp ortan t algorithmic implication of Theorem 1 is that, for giv en data ( A, b ) and p , c ho osing λ ≥ β ( k ) for a sm all constan t k do es n ot help to solv e the original sparse least squares p roblem. F or a small constan t k , say from 1 to 3, one migh t b e b etter off to enumerate all com b inations of solutions, eac h with no m ore than k nonzero en tries, to fin d a minimizer. T his can b e done in a strongly p olynomial time of th e problem dimens ions . One ma y b e also interested in the relation of λ and the s u pp ort sizes of lo cal min imizers of L 2 - L p problem (1). W e present the follo w in g resu lt for the spars it y of certain lo cal minimizers of (1). Theorem 2. L et γ ( k ) = k p − 1 2 k A k p p k b k 2 − p . (11) If λ ≥ γ ( k ) , then any lo c al minimizer x ∗ of pr oblem (1 ), with f p ( x ∗ ) ≤ f p (0) = k b k 2 , satisfies k x ∗ k 0 < k for k ≥ 2 . 4 Pr o of. Note that (8) h olds for any lo cal minimizer of L 2 - L p problem (1). By Theorem 2.3 in [3], for any local minimizer x ∗ of L 2 - L p problem (1) in the lev el set { x : f p ( x ) ≤ f p (0) } , we ha ve f p ( x ∗ ) = k Ax ∗ − b k 2 + λ k x ∗ k p p > λ X i ∈ T | x ∗ i | p ≥ λ | T | λp 2 k A kk b k p/ (1 − p ) , (12) where T =supp ort( x ∗ ) . If | T | = k x ∗ k 0 ≥ k ≥ 1, then f p ( x ∗ ) > λk λp 2 k A kk b k p/ (1 − p ) = λ 1 / (1 − p ) k p 2 k A k p/ (1 − p ) k b k p/ ( p − 1) ≥ k b k 2 = f p (0) , whic h is a con tradiction. Theorem 1 concerns global minimizers of L 2 - L p problem (1) while Theorem 2 concerns its lo cal minimizers in the lev el set { x : f p ( x ) ≤ f p (0) } . Since x = 0 is a trivial lo cal minimizer for problem (1), we b eliev e an y go o d metho d would lik ely fi n d a minimizer that at least is b etter than x = 0. Belo w, w e use an example to illustrate the b ounds presen ted in Theorems 1 and 2. Example 2.1 Consider the follo wing L 2 - L 1 / 2 minimization pr oblem Minimize f ( x ) := ( x 1 + x 2 − 1) 2 + λ ( p | x 1 | + p | x 2 | ) . (13) F rom A = (1 , 1), b = 1 and x c = (1 , 0), w e easily fin d these d ata in T h eorem 1 and Th eorem 2, α = 1 , k b k = 1 , β ( k ) = 8 1 / 4 k − 3 / 4 , k b k 2 k x c k p p = 1 , γ ( k ) = 32 1 / 4 k − 1 / 2 . F or k = 2, we h a ve β (2) = 1. Using parts 1 and 3 of Theorem 1, w e can claim that an y minimizer x ∗ of (13) with λ = 1 s atisfies k x ∗ k 0 = 1 . Using part 2 of Theorem 1, we can claim that x = 0 is the uniqu e m in imizer of (13) w ith λ ≥ β (1) = 8 1 / 4 . The low er b ound β (1) can b e imp ro ved fur th er. In fact, we can give a n u m b er β ∗ ≤ β (1) suc h that x = 0 is the unique minimizer of (13) with λ ≥ β ∗ b y u sing the first and second ord er necessary conditions [3] for (1). F or λ = 8 3 √ 3 < 8 1 / 4 , it is easy to see that ( x 1 , x 2 ) = (1 / 3 , 0) and ( x 1 , x 2 ) = (0 , 1 / 3) are t wo v ectors satisfying 2 x 1 ( x 1 + x 2 − 1) + λ 2 p | x 1 | = 0 , 2 x 2 ( x 1 + x 2 − 1) + λ 2 p | x 2 | = 0 , and H ( x ) = 2 x 2 1 x 1 x 2 x 1 x 2 x 2 2 − λ 4 p | x 1 | 0 0 p | x 2 | = 0 . Ho we v er, since the third order deriv ativ e of g ( t ) := f ((1 / 3 + t ) e 1 ) (or g ( t ) := f ((1 / 3 + t ) e 2 )) is strictly p ositiv e on b oth side of t = 0, ( x 1 , x 2 ) = (1 / 3 , 0) and ( x 1 , x 2 ) = (0 , 1 / 3) are n ot lo cal minimizers. Moreo ver, these tw o v ectors are the only nonzero v ectors satisfying b oth fi r st and second ord er necessary cond itions. W e can claim that x = 0 is the un ique global minimizer of (13). 5 Our theorems reinforce the findin gs from statistical literature that global minimizers of the L 2 - L p regularization problem may hav e man y adv ant ages o ver those from other con v ex regularization problems, and the new r esults actually giv e precise b ounds on h o w to c ho ose λ for desirable sp arsit y . The remaining qu estion: is th e L 2 - L p regularization p roblem (1) tractable for giv en λ > 0 and 0 ≤ p < 1? Or more sp ecifically , is th ere an efficien t or p olynomial-time algorithm to find a global minimizer of prob lem (1)? Unfortunately , w e pro ve a strong negativ e result in the next section. 3 The L 2 - L p problem is strongly NP-h ard As we men tioned earlier, one m a y attempt to d ra w a negativ e result d ir ectly from constrained L p problem (3) or (4). Ho wev er, it is we ll kno wn that the qu adratic p en alty function is not exact b ecause its minimizer is generally not the same as the solution of the corresp onding constr ained optimization; see, e.g., [14]. F or example, the all-zero v ector is a lo cal minimizer of the L 2 - L p problem (1 ), but it may not ev en b e feasible for the L p problem (3 ). On the other hand, the set of all basic feasible solutions of (3) is exactly the set of its lo cal minimizer [9], but suc h a lo cal minimizer of (3) may not ev en b e a stationary p oint of prob lem (1). In fact, there is no λ > 0 suc h that ¯ x , any feasible solution of p r oblem (3), satisfies th e first ord er necessary condition of L 2 - L p problem (1). Another difference b et ween (3) and (1) is the follo wing: it has b een sho wn in [9] that any solution is a lo cal minimizer of (3) as long as it satisfies the fi rst and second order necessary optimalit y conditions of (3). Ho w ever, Example 2.1 shows that this fact is not tru e for L 2 - L p problem (1). Th us, w e need somewhat new pro ofs for the hardness result. T o facilitate the new pro of, we first prov e that problem (5) is NP-hard, and then extend to the strongly NP-hard result. Theorem 3. Minimization pr oblem (5) is N P-har d for any given 0 ≤ p < 1 , q ≥ 1 and λ > 0 . W e fi rst pro ve a useful tec hnical lemma. Lemma 4. Consider the pr oblem Minimize z ∈ R g ( z ) := | 1 − z | q + 1 2 | z | p (14) for some giv en 0 ≤ p < 1 and q ≥ 1 . It is minimize d at a u nique p oint (denote d by z ∗ ( p, q ) ) on (0 , 1] . And the optimal v alue c ( p, q ) is less than 1 2 . Pr o of. Firs t it is easy to see that when p = 0, g ( z ) has a unique minimizer at z = 1, and the optimal v alue is 1 2 . No w w e consider the case when p 6 = 0. Note th at g ( z ) > g (0) = 1 for all z < 0, and g ( z ) > g (1) = 1 2 for all z > 1. Therefore the minim um p oint must lie within [0 , 1]. T o optimize g ( z ) on [0 , 1], w e chec k its fi r st d eriv ativ e g ′ ( z ) = − q (1 − z ) q − 1 + pz p − 1 2 . (15) W e ha ve g ′ (0 + ) = + ∞ and g ′ (1) = p 2 > 0. Therefore, if function g ( z ) has at most t wo stationary p oint s in (0,1), the fir s t one m u st b e a lo cal maxim um and the s econd one m ust b e th e un ique global minimum and the m inim u m v alue c ( p, q ) m u st b e less than 1 2 . 6 No w we chec k the p ossible stationary p oin ts of g ( z ). Consider s olving g ′ ( z ) = − q (1 − z ) q − 1 + pz p − 1 2 = 0. W e get z 1 − p (1 − z ) q − 1 = p 2 q . Define h ( z ) = z 1 − p (1 − z ) q − 1 . W e h av e h ′ ( z ) = h ( z )( 1 − p z − q − 1 1 − z ) . Note that 1 − p z − q − 1 1 − z is d ecreasing in z and m u st h a ve a r o ot on (0 , 1). Therefore, there exists a p oin t ¯ z ∈ (0 , 1) such th at h ′ ( z ) > 0 for z < ¯ z and h ′ ( z ) < 0 for z > ¯ z . This implies that h ( z ) = p 2 q can ha v e at most t wo solutions in (0 , 1), i.e., g ( z ) can hav e at most t wo stationary p oint s. By the pr evious discussions, the lemma holds. Pro of of Theorem 3. First w e claim that without loss of generalit y w e only need to consider the p roblem with λ = 1 2 . T his is b ecause giv en any problem of form (5), w e can mak e the follo wing transformation: ˜ x = (2 λ ) 1 /p x , ˜ A = (2 λ ) − 1 /p A and ˜ b = b and scale this pr oblem to: Minimize ˜ x k ˜ A ˜ x − ˜ b k q q + 1 2 k ˜ x k p p . (16) Note that this transf orm ation is inv ertible, i.e., for any give n λ 0 , one can transform an instance with λ = λ 0 to one with λ = 1 2 and vice v ersa. Therefore, we only need to consider the case when λ = 1 2 . No w we p resen t a p olynomial time reduction from the well kno wn NP-complete p artition pr oblem [8] to pr oblem (16). T h e partition problem can b e describ ed as follo ws: giv en a set S of rational n u m b ers { a 1 , a 2 , . . . , a n } , is there a wa y to p artition S into tw o disjoint sub sets S 1 and S 2 suc h that the sum of the num b ers in S 1 equals to the sum of the num b ers in S 2 ? Giv en an in stance of the p artition problem with a = ( a 1 , a 2 , . . . , a n ) T ∈ R n . W e consider the follo wing minimization problem in form (16): Minimize x,y P ( x, y ) = | a T ( x − y ) | q + X 1 ≤ j ≤ n | x j + y j − 1 | q + 1 2 X 1 ≤ j ≤ n ( | x j | p + | y j | p ) . (17) W e h a ve Minimize x,y P ( x, y ) ≥ Minimize x j ,y j X 1 ≤ j ≤ n | x j + y j − 1 | q + 1 2 X 1 ≤ j ≤ n ( | x j | p + | y j | p ) = X 1 ≤ j ≤ n Minimize x j ,y j | x j + y j − 1 | q + 1 2 ( | x j | p + | y j | p ) = n · Minimize z | 1 − z | q + 1 2 | z | p , where the last equalit y is from the fact that | x j | p + | y j | p ≥ | x j + y j | p and that we can alw a ys c ho ose on e of them to b e 0 su c h that th e equalit y holds. By applying Lemma 4, we hav e P ( x, y ) ≥ nc ( p, q ) . 7 No w we claim that there exists an equitable partition to th e partition p roblem if and only if the optimal v alue of (16) equals to nc ( p, q ). First, if S can b e ev enly partitioned into t wo sets S 1 and S 2 , then w e define ( x i = z ∗ ( p, q ) , y i = 0) if a i b elongs to S 1 and d efine ( x i = 0 , y i = z ∗ ( p, q )) otherwise. Th ese ( x j , y j ) pro vide an optimal solution to P ( x, y ) with optimal v alue nc ( p, q ). On the other hand, if the optimal v alue of (5) is nc ( p, q ), then in the optimal solution, for eac h i , w e m u s t hav e either ( x i = z ∗ ( p, q ) , y i = 0) or ( x i = 0 , y i = z ∗ ( p, q )). And w e must also h a ve a T ( x − y ) = 0, which implies that there exists an equitable p artition to set S . Thus Theorem 3 is pro v ed. ✷ In the follo wing, using the similar id ea, we pro ve a stronger result: Theorem 5. Minimization pr oblem (5) is str ongly NP- har d for any gi ven 0 ≤ p < 1 , q ≥ 1 and λ > 0 . Pr o of. W e presen t a p olynomial time reduction f rom the w ell kno w n s trongly NP-hard 3- partition problem [7, 8]. Th e 3-partition p roblem can b e describ ed as follo ws: giv en a multiset S of n = 3 m in tegers { a 1 , a 2 , . . . , a n } with sum mB , can S b e partitioned into m su bsets, suc h that the sum of the num b ers in eac h sub set is equal? W e consid er the follo wing minimization problem in the form (16): Minimize P ( x ) = m X j =1 | n X i =1 a i x ij − B | q + n X i =1 | m X j =1 x ij − 1 | q + 1 2 n X i =1 m X j =1 | x ij | p . (18) The remaining argument will b e the same as the pro of for Theorem 3. Theorem 5 implies that the L 2 - L p minimization pr oblem is strongly NP-hard. Next we generalize the NP-hardn ess resu lt to the smo othed ve r sion of th is p roblem in (6). Theorem 6. Minimization pr oblem (6) is str ongly N P -har d for any give 0 < p < 1 , q ≥ 1 , λ > 0 and ǫ > 0 . Pr o of. W e again consider the same 3-partition problem, we claim that it can b e reduced to a minimization pr oblem in form (6). Again, it su ffices to only consider the case when λ = 1 2 (Here w e consider the hardness result for an y giv en ǫ > 0. Note that after the s caling, ǫ may hav e c hanged). Consider: Minimize x P ǫ ( x ) = m X j =1 | n X i =1 a i x ij − B | q + n X i =1 | m X j =1 x ij − 1 | q + 1 2 n X i =1 m X j =1 ( | x ij | + ǫ ) p . (19) W e h a ve Minimize x P ǫ ( x ) ≥ Minimize x n X i =1 | m X j =1 x ij − 1 | q + 1 2 n X i =1 m X j =1 ( | x ij | + ǫ ) p = n X i =1 Minimize x | m X j =1 x ij − 1 | q + 1 2 m X j =1 ( | x ij | + ǫ ) p = n · Minimize z | 1 − z | q + 1 2 ( | z | + ǫ ) p + ( m − 1) 2 ǫ p . The last equalit y comes from the submo d ularit y of the f u nction ( x + ǫ ) p and th e f act that one can alw a y s c ho ose only one of x ij to b e non zero in eac h set suc h that the equalit y h olds. Consider 8 function g ǫ ( z ) = | 1 − z | q + 1 2 ( | z | + ǫ ) p . Similar to Lemma 4, one can pr o ve that g ǫ ( z ) has a u nique minimizer in [0 , 1]. Denote this minimum v alue by c ( p, q , ǫ ), we kn o w that P ǫ ( x ) ≥ nc ( p, q , ǫ ). Then w e can argue th at the 3-partition problem has a solution if and on ly if P ǫ ( x ) = n c ( p, q , ǫ ). Therefore Theorem 6 holds. The ab o v e results r eveal that find in g a global minimizer for the L q - L p minimization pr oblem is s tr ongly NP-hard , or the original sparse least squares problem is intrinsicall y hard, and no regularized optimization mo dels/metho d s could help m uc h in the worst case. That is, relaxing L 0 to L p for some 0 < p < 1 in the regularizati on gains no significant adv anta ge in terms of the (w orst-case) computational complexit y . 4 Bounds β ( k ) and γ ( k ) for asymptotic prop erties Giv en the strong n egativ e result f or computing a global minimizer, our hop e no w is to find a lo cal m inimizer of problem (1 ), still go o d enough for the desired sparsit y – sa y no more than k n onzero en tries. This is in deed guaranteed by Th eorem 2 if one c ho oses λ ≥ γ ( k ) of (11), instead of λ ≥ β ( k ) of (7). In the follo w ing, we present a p ositiv e result in the bridge estimator mo del considered by [4, 10 , 11]. Consider asymptotic pr op erties of the L 2 - L p minimization (1) wh ere the sample size m tends to infinit y in the mo del of [4, 10, 11]. S upp ose that th e tru e estimator x ∗ has no more than k nonzero ent r ies. One exp ects th at there is a sequence of br idge estimators, i.e. solutions x ∗ m of Minimize k Ax − b k 2 + λ m k x k p p suc h that dist(supp ort { x ∗ m } , supp ort { x ∗ } ) → 0, as m → ∞ , with pr obabilit y 1. In applications of v ariable selection, the d esign matrix is t yp ically standardized so that k a i k 2 = m for i = 1 , . . . , n . Moreo v er, the smallest and largest eigen v alues ρ 1 and ρ 2 of the co v ariate matrix P m = 1 m A T A satisfy 0 < c 1 ≤ ρ 1 ≤ ρ 2 < c 2 for some constan ts c 1 and c 2 , see [10]. This assump tion imp lies that √ c 1 m ≤ k A k ≤ √ c 2 m . F or simplicit y , let us fix k A k = √ m and p = 1 / 2. Th en we ha ve β ( k ) = k − 3 / 4 (8 m ) 1 / 4 k b k 3 / 2 and γ ( k ) = k − 1 / 2 (16 m ) 1 / 4 k b k 3 / 2 . One can see that γ ( k ) > β ( k ) for all k ≥ 1. If k is a constant , we see that β ( k ) and γ ( k ) are in the same order of m and k b k . Th us, finding any lo cal m inimizer of pr oblem (1) in the ob jectiv e leve l set f p (0) is sufficient to gu arantee desired sp arsit y when λ m = β ( k ). That is, there is no signifi cant guaran teed sparsity difference b et ween global and lo cal minimizers of p roblem (1). Th is s eems also observ ed in computational exp eriments when the true estimator is extremely spars e. O f course, when k increases as m → ∞ , a global m in imizer of p roblem (1) would likely b ecome sp ars er than its lo cal minimizer, since β ( k ) /γ ( k ) = O ( k − 1 / 4 ). In general, b oth β ( k ) and γ ( k ) meet the conditions in th e analysis of consistency and oracle efficiency of br idge estimators of [10, 11]. In their mo d el, the parameter λ m is requir ed to satisfy certain conditions. F or instances, ([11 , T heor em 3]) λ m m − p/ 2 → λ 0 ≥ 0 as m → ∞ (20) 9 ([10 , A 3 , ( a ) ]) λ m m − 1 / 2 → 0 as m → ∞ . (21) With k a i k 2 = m for i = 1 , . . . , n and k A k = √ m in their mo del, we ha v e β ( k ) m − p/ 2 = k p/ 2 − 1 2 p (1 − p ) p/ 2 k b k 2 − p → λ 0 ≥ 0 as m → ∞ and β ( k ) m − 1 / 2 = k p/ 2 − 1 2 p (1 − p ) p/ 2 k b k 2 − p m ( p − 1) / 2 → 0 as m → ∞ . F or γ ( k ), w e h a ve γ ( k ) m − p/ 2 = k p − 1 2 p p k b k 2 − p → λ 0 ≥ 0 as m → ∞ and γ ( k ) m − 1 / 2 = k p − 1 2 p p k b k 2 − p m ( p − 1) / 2 → 0 as m → ∞ . Hence, b oth λ m = β ( k ) and λ m = γ ( k ) satisfy (20) and (21). Moreo ver, by Theorem 1 and Th eorem 2, any minimizer of L 2 - L p problem (1) with λ = λ m is likely to h a ve less than k nonzero ent ries. Hence eac h of them could b e a go o d choice for consistency and oracle efficiency of bridge estimators via solving the unconstrained L 2 - L p minimization problem (1). References [1] R. Chartrand , Exact reconstruction of sparse signals via nonconv ex minim ization, IE E E Signal Pro cessing L etters , 14 (2007), 707- 710. [2] R. Chartrand and V. Stanev a, Restricted isometry pr op erties and nonconv ex compressiv e sensing, Inv erse Problem , 24 (2008), 1-14. [3] X. Ch en , F. Xu and Y. Y e, Lo we r b ound theory of nonzero en tries in s olutions of ℓ 2 - ℓ p minimization, SIAM J. Scien tific Comp u ting , 32 (2010 ), 2832-2852 . [4] J. F an and R. Li, V ariable selection via n onconca v e p enalized lik eliho o d and its oracle prop erties, Journ al of American Statistical So ciet y , 96 (2001), 1348-1360 . [5] S. F oucart and M. J. Lai, S parsest solutions of und er-determined Linear Systems via l q minimization for 0 < q ≤ 1, Applied and Compu tational Harmonic An alysis , 26 (2009), 395-4 07. [6] I. E. F ran k and J. H. F reidman, A s tatistica l view of some chemometrics regression to ols (with discuss ion), T ec hnometrics , 35(19 93), 109-1 48. [7] M. R. Garey and D. S. John son, “Strong” NP-Completeness results: motiv ation, examples, and implications, Jour nal of the Asso ciation of Compu tin g Mac h inery , 25 (1978), 499- 508. [8] M. R. Garey and D. S . Johnson, Computers and I ntractabilit y; A Guid e to the Th eory of NP-Completeness, W. H. F r eeman, New Y ork, 1979. 10 [9] D. Ge, X. Jiang and Y. Y e, A note on the complexit y of L p minimization, to app ear in Math. Programming , 2011. [10] J. Hu ang, J. L. Horo witz and S. Ma, Asymptotic prop erties of b ridge estimators in sparse high-dimensional regression mo d els, The Annals of Statistics , 36 (2008), 587-613. [11] K. Kn igh t an d W.J. F u, Asymptotics for lasso-t yp e estimators, T he Ann als of Statistics , 28 (2000), 1356-137 8. [12] M. Lai and Y. W ang, An unconstrained l q minimization with 0 < q < 1 for sparse solution of und er-determined linear systems, SIAM J. Op timization , 21 (201 1), 82-101. [13] B. K. Natara jan, Sparse appro ximate solutions to linear systems, SIAM J. Computing , 24 (1995 ), 227-2 34. [14] J. No cedal an d S.J. W righ t, Numerical Optimization, 2n d Ed ition, Springer, New Y ork, 2006. [15] R. Tibsh irani, Regression s h rink age and selection via the Lasso, J Ro yal Statistical So ciet y B , 58 (1996), 267-288. [16] V. V azirani, Appro ximation Algorithms, Spr inger, Berlin, (2003). 11
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment