Support Recovery and $\ell_2$-Error Bound for Sparse Regression with Quadratic Measurements via Weakly-Convex-Concave Regularization

The recovery of unknown signals from quadratic measurements finds extensive applications in fields such as phase retrieval, power system state estimation, and unlabeled distance geometry. This paper investigates the finite sample properties of weakly…

Authors: Jun Fan, Jingyu Yang, Xinyu Zhang

Support Recovery and $\ell_2$-Error Bound for Sparse Regression with Quadratic Measurements via Weakly-Convex-Concave Regularization
Supp ort Reco v ery and   -Error Bound for Sparse Regression with Quadratic Measuremen ts via W eakly-Con v ex-Conca v e Regularization Jun F an, Jingyu Y ang, Xin yu Zhang Sc ho ol of Science, Heb ei Univ ersity of T ec hnolog and Liqun W ang ∗ Departmen t of Statistics, Universit y of Manitoba F ebruary 20, 2026 Abstract The reco v ery of unkno wn signals from quadratic measurements nds extensiv e applications in elds suc h as phase retriev al, p o w er system state estimation, and unlab eled distance geometry . This pap er inv estigates the nite sample prop erties of w eakly con v ex–conca ve regularized estimators in high-dimensional quadratic mea- suremen ts mo dels. By employing a w eakly conv ex–concav e p enalized least squares approac h, w e establish supp ort recov ery and   -error b ounds for the lo cal minimizer. T o solve the corresp onding optimization problem, we adopt tw o pro ximal gradient strategies, where the pro ximal step is computed either in closed form or via a weigh ted   appro ximation, dep ending on the regularization function. Numerical examples demonstrate the ecacy of the prop osed method. K eywor ds: Noncon vex statistics, Finite sample error bound, Consistency , Optimization algorithm. ∗ Corresp onding author: Liqun.W ang@umanitoba.ca. This work was supp orte d by the National Nat- ur al Scienc e F oundation of China under Gr ants 12571345 and 12271022; and the Natur al Scienc es and Engine ering R ese ar ch Council of Canada under Gr ant 4924-2023. 1 1 INTR ODUCTION 1 In tro duction W e consider the follo wing quadratic measuremen t regression mo del    󰆸    󰆸  󰄤         (1) where     denotes the observ ed resp onse,      is a symmetric design matrix, 󰆸    is the v ector of unkno wn parameters represen ting true signals, and 󰄤    is the random noise. Mo del ( 1 ) arises in numerous applications in physics, engineering, and data science, includ- ing phase retriev al ( Candes et al. 2015 ), generalized phase retriev al ( W ang & Xu 2019 ), the unassigned distance geometry problem ( Huang & Dokmanić 2021 ), and p o wer system state estimation ( W ang et al. 2019 ). Recen t y ears hav e witnessed a surge of in terests and metho dological dev elopmen ts for this mo del. Notable adv ances include Thaker et al. ( 2020 ), Huang et al. ( 2020 ), Chen & Ng ( 2022 ), and F an, Sun, Y an & Zhou ( 2025 ), who prop osed v arious algorithms and theoretical guaran tees of optimal solutions under v arious problem settings. It is w orth noting that when the design matrix can b e decomp osed as a outer pro duct of a real v ector         , mo del ( 1 ) is referred to as real-v alued phase retriev al. In this case, it also represents a sp ecial case of the single index mo del, with index    󰆸 and square link function. Single index mo dels are widely used in statistics and econometrics ( Xu et al. 2022 , Y ang et al. 2024 ). In high-dimensional regime where  is large or even exceeds  , a common assumption is that the true signal is sparse, i.e., only a small num b er of its en tries are non-zero. This sparsit y assumption is crucial for ensuring identiabilit y and statistical eciency , leading to the problem of sparse regression with quadratic measuremen ts. Several studies ha ve explored this c hallenging problem. F or instance, F an et al. ( 2018 ) studied an        - regularized least squares metho d, established a weak oracle prop erty and proposed a xed- 2 1 INTR ODUCTION p oin t iterativ e algorithm. Bolte et al. ( 2018 ) called the regularized least squares problem for mo del ( 1 ) quadratic in verse problem and proposed a Bregman proximal gradien t algorithm (PGA). Subsequently , Zhang et al. ( 2023 ) and Ding et al. ( 2025 ) developed Bregman PGA to solv e regularized problems including quadratic in v erse problem as a sp ecial case. While existing literature has primarily fo cused on algorithmic dev elopmen t for the regularized least squares formulation of quadratic in v erse problem (or model ( 1 )), the corresp onding statistical guaran tees remain relativ ely underexplored. More recently , Chen et al. ( 2025 ) emplo y ed the thresholded Wirtinger o w algorithm originally prop osed b y ( Cai et al. 2016 ) for sparse phase retriev al to solv e the regularized least squares problem for mo del ( 1 ). A sp ecial case of sparse quadratic mo del is sparse phase retriev al that attracted considerable atten tion ( Cai et al. 2016 , 2022 , Soltanolk otabi 2019 , Xia & Xu 2021 , Huang & Xu 2024 ). T o address sparse regression in linear mo dels, v arious other regularization techniques hav e b een dev elop ed, such as the        p enalt y ( F rank & F riedman 1993 ), the SCAD ( F an & Li 2001 ), and the MCP ( Zhang 2010 ). It is well-kno wn that LASSO has constant deriv ativ e that induces p ersistent shrinkage and biased estimation( Zou 2006 ). In con trast, conca v e penalties (e.g., SCAD, MCP , and        ) ha ve deriv atives that v anish for large co ecien ts, thereb y mitigating shrinkage bias and yielding nearly unbiased estimates. Ho w ev er, noncon vex p enalties in tro duces computational challenges. In particular, highly noncon v ex p enalties ma y cause n umerous lo cal minima, and the con vergence of rst-order algorithm (e.g., PGA) relies on fa v orable prop erties of the ob jectiv e function. One such prop ert y is w eak con vexit y , which is a key condition that generally ensures the pro ximal op erator is single-v alued ( W ang 2010 , Khanh et al. 2025 ), under standard tec hnical assump- tions. If this condition is violated, the pro ximal op erator ma y be multi-v alued, which poses a p oten tial risk that iterativ e algorithms could fail to con v erge to a meaningful solution. This c hallenge motiv ates the study of regularizers that balance statistical and computa- 3 1 INTR ODUCTION tional considerations. Loh & W ainwrigh t ( 2015 ) dev elop ed a general statistical analysis for a class of weakly conv ex regularized M-estimators and established nite sample error b ounds b et w een an y stationary p oin t of the p enalized estimator and the p opulation-optimal solution without regularization (i.e., the solution minimizing the exp ected p opulation risk). While their framew ork applies to a wide range of mo dels, including linear and generalized linear mo dels and graphical LASSO, it cannot b e directly applied to sparse quadratic mea- suremen t problems. The main diculty is that the least squares loss for this mo del leads to a highly nonconv ex quartic p olynomial that are computationally v ery c hallenging to minimize. In fact, as p oin ted out by Candes et al. ( 2015 ), it is already known that deter- mining whether a stationary p oin t of a quartic p olynomial is a lo cal minimizer is NP-hard. Moreo v er, Loh & W ainwrigh t ( 2015 ) imp osed an additional constrain t, whic h in tro duces an extra tuning parameter that must b e chosen carefully to ensure that the true v alue 󰆸  is a feasible p oint, thereby further increasing the diculty of solving the problem. W eakly con v ex regularization metho ds hav e also b een widely used in sparse signal recov ery ( Y ang et al. 2019 , K omuro et al. 2022 ) and linear in verse problems ( Sh uma ylov et al. 2024 , Goujon et al. 2024 , Ebner et al. 2025 ), but these w orks do not explicitly exploit concav e penalties. T o mitigate the abov e men tioned numerical issues, in this pap er, we prop ose a specic class of concav e p enalties that also satisfy w eak con v exity , termed W eakly-Con v ex-Conca ve P enalties (W CCP). The ma jor adv antage of this class is that it pro vides a exible framew ork for analyzing lo cal minimizers in sparse quadratic measuremen t problems, combining the statistical b enets of concavit y with the algorithmic guarantees of w eak conv exity . Sp ecically , in this pap er, w e study the regularization problem min 󰈌       󰆸    󰆸        󰅫  󰆸 (2) where 󰄗    is a tuning parameter that encourages sparsit y , and  󰅫   is a W CCP . As men tioned b efore, model ( 1 ) constitutes a sp ecial case of the single index mo del with a 4 1 INTR ODUCTION kno wn square link function. Therefore, the prop osed metho d ( 2 ) has potential applications for single index models. Our con tributions are threefold. First, w e bridge a notable gap in the literature b y providing the rst systematic statistical analysis of the WCCP-regularized estimator for the quadratic measuremen t model ( 1 ). W e establish rigorous supp ort recov ery guarantees and   -error b ounds for its lo cal minimizer. w e deriv e the statistical prop erties of the W CCP-regularized estimator for the quadratic measuremen t mo del ( 1 ) and establish supp ort reco very and tigh t   -error b ounds for the lo cal minimizer. Second, w e dev elop iterativ e algorithms tailored to dierent regularization functions that provide con v ergence guarantees. Third, w e demonstrate the eectiveness of our metho d, particularly in the context of sparse phase retriev al, and sho w that it p erforms comparable or b etter than the existing LASSO-based approac hes. The rest of this pap er is organized as follo ws. Section 2 presen ts the assumptions and statistical prop erties of the estimator. Section 3 in tro duces tw o iterativ e algorithms and pro vides a con v ergence analysis. Numerical experiments are presen ted in Section 4. T ech- nical lemmas and proofs are deferred to App endices A and B. Notation F or a v ector            , let  ,   , and   denote its Euclidean,   , and   norms, resp ectively . Denote b y   the  th column of the    identit y matrix   . F or any    matrix  , denote    max        . The submatrix of  with rows and columns indexed b y sets   and   is denoted      . The sub v ector of 󰆸 indexed b y  is denoted 󰆸  . The  th standard basis vector in   is denoted   . F or a function  , we denote its gradien t and Hessian b y  and    , resp ectively , and its sub dierential b y 󰄪  . Let 󰆸  b e the true parameter v alue, and denote    supp 󰆸         󰆸   5 2 FINITE SAMPLE ST A TISTICAL RESUL TS       . Dene the oracle regularized least square estimator as follows,  󰆸    arg min 󰈌         󰆸         󰆸            󰅫  󰄍   2 Finite Sample Statistical Results In this section, w e deriv e the nite sample prop erties and consistency of the w eakly conv ex- conca v e estimator in the high-dimensional case where  is large than  . W e assume that ln     󰅠  with some constant 󰄌    and denote the n um b er of non-zero elements of the true signal b y  . F ollowing literature ( Huang et al. 2008 , F an et al. 2018 ), w e assume that there exists constan ts          suc h that    min    󰆸         max    󰆸            (3) F urther regarding the observ ed response and design matrix, following F an et al. ( 2018 ), the data are assumed to b e standardized as        and           (4) F or the regularization function, noise and design matrix, we mak e the following assump- tions. Assumption 1. The r e gularization function is c o or dinate sep ar able:  󰅫  󰆸      󰅫  󰆸   for some sc alar function  󰅫  which satises (i) The function  󰅫  is c onc ave on   and ther e exists 󰄘   such that  󰅫    󰅬    is c onvex. (ii) The function  󰅫  satises  󰅫     and  󰅫     for    . 6 2 FINITE SAMPLE ST A TISTICAL RESUL TS (iii) F or    , the function  󰅫  is non-de cr e asing and    󰅁    is non-incr e asing in  . (iv) The function  󰅫  is dier entiable with derivative   󰅫  for al l    , and sub dier en- tiable at    , satisfying lim     󰅫    󰄗  󰄧 with 󰄧   . The function   󰅫  is lo c al ly Lipschitz c ontinuous in   . Compared to the conditions of ( Loh & W ain wright 2017 , Loh 2017 ), Assumption 1 addition- ally requires the p enalt y function to b e conca v e and its deriv ative   󰅫  to b e lo cally Lipschitz con tin uous on the p ositiv e half-line. In fact, the condition used b y ( Loh & W ain wrigh t 2017 , Loh 2017 ) is relaxed enough to include the LASSO p enalt y . As mentioned earlier, conca v e penalties are kno wn to p ossess stronger v ariable selection capabilities. Therefore, w e explicitly emphasize the concavit y of the penalty function in our framework. Com- pared to a general concav e p enalt y , the imp osition of w eak con vexit y here constrains its degree of noncon v exity . As previously men tioned, this ensures the proximal mapping of the p enalt y function remains single-v alued, which is crucial for ac hieving superior algorithmic con v ergence. Combining the second order con tinuous dierentiabilit y of the loss and the assumption ab out the lo cally Lipsc hitz’s con tin uity of   󰅫  enables us to use the general- ized Hessian to verify that a stationary p oin t is a lo cal minimizer. In fact, many concav e p enalties, including SCAD and MCP , possess locally Lipsc hitz contin uous rst deriv ativ es on the p ositive half-line. Moreo v er, in addition to the well-kno wn SCAD and MCP regularizers, sev eral other com- monly used regularizers also satisfy the conditions in Assumption 1, we present a few examples b elo w. Example 1. Firm regularizer ( W o o dw orth & Chartrand 2016 ):  󰅫         󰄗    󰅫          󰅫        where 󰄗  is regularization parameter and    is a xed parameter. Assumption 1 holds 7 2 FINITE SAMPLE ST A TISTICAL RESUL TS with 󰄧   and 󰄘    . Example 2. LOG regularizer ( Lob o et al. 2007 ):  󰅫    󰄗  log     where 󰄗  is regularization parameter and    is a xed parameter. Assumption 1 holds with 󰄧    and 󰄘   . Example 3. EXP regularizer ( Bradley et al. 1998 ):  󰅫    󰄗       where 󰄗  is regularization parameter and    is a xed parameter. Assumption 1 holds with 󰄧   and 󰄘    . Assumption 2. L et err ors 󰄤     󰄤   b e indep endent and identic al ly distribute d Sub- Gaussian with varianc e pr oxy 󰄝  , and 󰄤  have zer o me an and p ositive varianc e  󰄤   . Assumption 3. F or any      , ther e exist c onstants        satisfying that                               T o establish the nite‑sample prop erties of our prop osed estimator, we rst analyze the oracle estimator  󰆸   , which is dened with prior knowledge of the true supp ort set. The follo wing theorem provides its non‑asymptotic consistency guaran tee. Theorem 1. Under mo del ( 1 ) and A ssumption 1-3, if     , then    󰆸    󰆸                 wher e     exp  ln           ,        ln     󰄗  󰄧   and    max 󰄝               . 8 2 FINITE SAMPLE ST A TISTICAL RESUL TS The condition     is non‑restrictive, as the right‑hand side diminishes to near zero for an y practical  and  , meaning an y nite uniform b ound   will automatically satisfy it. Theorem 1 shows that the estimation error of the oracle estimator is bounded by   . Under the standard high‑dimensional scaling  ln   ln    , one can choose 󰄗  suc h that 󰄗       , whic h ensures     and thus the consistency of  󰆸   as    . Building on this oracle result, a standard and necessary step for analyzing the high- dimensional estimator is to imp ose conditions on the design matrix   to con trol its column correlations. Assumption 4. Ther e exists a c onstant     such that                                                 wher e  is the K r one cker pr o duct. The inequalities in Assumption 4 are similar to the partial orthogonality condition prop osed b y Huang et al. ( 2008 ) for linear mo dels, which was later extended to quadratic mo dels by F an et al. ( 2018 ). They can b e regarded as that the sub-matrix of   corresp onding to   and the other complemen tary matrix being orthogonal. Theorem 2. Under mo del ( 1 ) and A ssumption 1-4, supp ose               ln                  (5) and max    ln          󰄝    ln     ln     󰄗  󰄧          (6) If 󰄘        and           󰄝     ln             (7) then ther e exists a lo c al minimizer  󰆸 of ( 2 ) such that 9 2 FINITE SAMPLE ST A TISTICAL RESUL TS (i)   󰆸                 , (ii)   min   󰆸  󰆸     󰆸  󰆸                  . Remark 1. The conditions in Theorem 2 dene a non‑asymptotic feasible region for the problem dimensions     and the regularization parameter 󰄗  , within whic h the desired statistical guaran tees hold with explicit constan ts. F rom an asymptotic p ersp ectiv e (    , allo wing     ), these conditions are naturally satised under standard sparsit y assumptions. The inequalities in ( 5 ) impose constrain ts on the scaling b etw een  and  . The rst in- equalit y ,             , implies      . The remaining t wo inequalities inv olve terms that decay to zero as  gro ws and are th us satised for suciently large  . The admissible range for 󰄗  in ( 6 ) requires that the lo w er b ound (whic h dominates the noise) do es not exceed the upper bound (whic h preserv es the signal). Under the standard high‑dimensional scaling  ln   ln     and the sparsit y condition      , the lo w er bound is  while the upp er b ound is    . Hence, for large  , the admissible in terv al for 󰄗  is non‑empt y . Moreov er, b y choosing 󰄗  to be of the same order as the lo w er b ound (which tends to zero), we ensure 󰄗    and 󰄗       . The upper bound is only needed to con trol the bias in the non‑asymptotic analysis and does not conict with 󰄗    asymptotically . The condition 󰄘        restricts the w eak con v exit y parameter 󰄘 of the p enalty function, ensuring that it does not dominate the curv ature of the loss function. F or t ypical w eakly con v ex p enalties (e.g., SCAD, MCP), 󰄘 is a xed constant. If    , the righ t‑hand side gro ws un bounded, so a xed 󰄘 satises the inequality for large  ; if  remains b ounded, the condition imp oses a mild xed upp er b ound on 󰄘 . The inequalit y ( 7 ) inv olv es        ln     󰄗  󰄧   . Because 󰄗       as 10 3 OPTIMIZA TION ALGORITHM argued ab o v e, we ha ve     as    , which implies the estimation error b ound in Theorem 2 v anishes asymptotically . In summary , these conditions join tly describ e a regime where sparsit y grows slowly and dimension may grow exp onen tially with  , a t ypical setting in high‑dimensional statistics. Therefore, all conditions in Theorem 2 are compatible with b oth nite‑sample guaran tees and asymptotic consistency . Remark 2. F or the sparse phase retriev al problem, Cai et al. ( 2016 ) considered parameter estimation of real-v alued signals b y minimizing the empirical   loss function. Their results sho w that the estimator  󰆸 satises min   󰆸  󰆸     󰆸  󰆸      ln   as ln     . In our result, when the regularization parameter satises 󰄗      ln   ln  , the error b ound simplies to   ln   under the assumption that  ln    ln   , whic h sho ws that our estimator attains the kno wn optimal rate for sparse phase retriev al. Recen tly , Huang & Xu ( 2020 ) studied the estimation p erformance of the nonlinear Lasso of phase retriev al. Their results sho w that the estimator  󰆸 satises min   󰆸  󰆸     󰆸  󰆸    󰄤   . A ccording to Cai et al. ( 2009 ), the error 󰇏 satises 󰇏  󰄝       ln  with high probability as    . In this case, the error bound simplies to 󰄝       ln  . It is clear that the estimation error exceeds the estimation error b ound established in our w ork. 3 Optimization Algorithm In this section, w e discuss the n umerical computation of the problem to assess the p erfor- mance of the prop osed metho d. Since  and  are giv en, to simplify notation we denote 󰄗   󰄗 , so that ( 2 ) is written as min 󰅡   󰆸  󰆸   󰅫 󰆸 (8) 11 3 OPTIMIZA TION ALGORITHM where 󰆸       󰆸    󰆸      and 󰄗   . Noting the term 󰆸 is smo oth and the term  󰅫 󰆸 is w eakly conv ex-concav e, we emplo y the PGA to solv e the problem. The PGA is an established classical algorithm ( Beck 2017 ) and has been applied in v arious inv erse and optimization problems (e.g., Bolte et al. ( 2018 ), Soltanolkotabi ( 2019 ), Zhang et al. ( 2023 ), F an, Y an, Xiu & Liu ( 2025 )). The core component of PGA is the pro ximal op erator. Giv en a function        , recall that the proximal op erator associated with  is dened as prox    arg min             In particular, given a w eigh t vector           T with     for all  , w e denote the pro ximal op erator of the w eighted   -norm     at  by      max  min   where the max and min op erations are p erformed comp onen t-wise. Note that this reduces to the classical soft thresholding op erator prox 󰅫   when    󰄗 for all  . F or proximal op erators that does not admit a closed-form solution, we can adopt weigh ted   algorithm based on the conca vity of the regularization function. This analysis yields tw o alternative xed-p oin t characterizations of the minimizers as fol- lo ws. Prop osition 1. Ther e exists a c onstant    , for any 󰄞       , with    sup 󰈌    󰆸 and    󰆸     󰆸   , such that for any minimizer  󰆸 of pr oblem ( 2 ) , it satises  󰆸  prox 󰅲  󰅁   󰆸  󰄞   󰆸 (9) and  󰆸    󰆸  󰄞   󰆸 󰄞   󰅫  (10) 12 3 OPTIMIZA TION ALGORITHM wher e the  th element of the ve ctor   󰅫 is dene d as      󰅫         max   󰅫   󰄍   󰄐   if  󰄍    󰄗󰄧 if  󰄍    Her e 󰄐  is any p ositive numb er. These t wo characterizations pro vide the foundation for our algorithmic design. According to the relationship ( 9 ), the iterativ e sc heme for solving ( 2 ) is outlined in Algorithm 1. F or ( 10 ), the weigh t   󰅫 dep ends on the unknown optimal solution  󰆸 rendering the problem circular and not directly solv able. T o o v ercome this, w e adopt the classic framework of iterativ ely rew eigh ted   algorithms ( Zou ( 2006 ), Bai et al. 2024 ). Within this framework, w e employ the Ma jorization-Minimization (MM) tec hnique. At eac h iteration, the regularization term is linearly appro ximated via its rst-order expansion, whic h constructs a surrogate function. This leads to a sequence of subproblems of the form min 󰈌  󰆸    󰅫  󰆸   where   󰅫 is adaptiv ely up dated weigh ts computed from the curren t iterate 󰆸  . The next iterate 󰆸  is then updated according to the rule sp ecied in Algorithm 2, which stems from an approximate minimization step for this surrogate function. It is w orth noting that computing the step size 󰄞  is a crucial step, and its selection depends on the parameter   , which is dicult to determine. T o address this, we emplo y the Armijo line searc h method ( 11 ). The existence of such a smallest   and the admissible range for 󰄞  are established in Lemma 5 of App endix B. Ha ving established the step size selection strategy , w e no w turn to the con v ergence analysis of the algorithms. Prop osition 2. A ssume that 󰆸   is the se quenc e gener ate d by A lgorithm 1 or 2. Then the fol lowing c onclusions hold, 13 4 NUMERICAL EXPERIMENTS Algorithm 1 Require: Data       , parameters   󰄐   󰄏   󰄎     󰄎     Ensure: 󰆸  1: Initialization: Cho ose a sp ectral initialization point 󰆸  and set    2: Computation: Compute 󰆸   pro x 󰅲   󰅁 󰆸   󰄞  󰆸   , where 󰄞   󰄎  󰄎    and   is the smallest nonnegativ e integer suc h that:  󰆸     󰆸    󰄏 󰆸   󰆸     (11) 3: while 󰆸   󰆸    󰄐 max  󰆸   and    do 4: Set      , go bac k to Step 3 5: end while 6: Output: 󰆸  (i)  󰆸   and   󰆸   b oth ar e monotonic al ly de cr e asing. (ii) lim  󰆸   󰆸     (iii) Every ac cumulation p oint of the se quenc e 󰆸   gener ate d by A lgorithm 1 satises the xe d-p oint e quation ( 9 ) , and that gener ate d by A lgorithm 2 satises ( 10 ) . 4 Numerical Exp erimen ts In this section, w e examine the nite sample p erformance of the prop osed metho d through n umerical simulations. All experiments are implemen ted in MA TLAB(R2023b) on a laptop with 32GB memory . The MA TLAB co de is av ailable at htt ps: //gi thub .co m/fj math /sp ar seQMR_WCCP . W e compares the p erformance of the SCAD, MCP , Firm, LOG, and EXP regularizers with the   and   regularizers. The initial p oint is obtained using the sparse sp ec- 14 4 NUMERICAL EXPERIMENTS Algorithm 2 Require: Data       , parameters   󰄐 󰄐    󰄏   󰄎     󰄎     Ensure: 󰆸  1: Initialization: Cho ose a sp ectral initialization point 󰆸  and set    2: W eight computation: T ake   󰅫             with   󰅫  max   󰅫 󰄍    󰄐   if 󰄍     and     󰄗󰄧 if 󰄍     3: Up date: Compute 󰆸    󰆸   󰄞  󰆸   󰄗󰄞     , where 󰄞   󰄎 󰄌   and   is the smallest nonnegative integer suc h that condition ( 11 ) holds 4: while 󰆸   󰆸    󰄤 max  󰆸   and    do 5: Set      , go bac k to Step 3 6: end while 7: Output: 󰆸  tral initialization metho d prop osed b y Chen et al. ( 2025 ). F or all regularization meth- o ds, the optimal step size is determined using the Armijo line search. The regulariza- tion parameter is c hosen according to the sc heme of Chen & Ng ( 2022 ), dened as 󰄗     ln        󰆸    󰆸      󰆸 , where the co ecien t  is determined via cross-v alidation. The relative error in the exp erimen ts is computed as follo ws, Relerr  min   󰆸  󰆸     󰆸  󰆸   󰆸    Exp erimen t 1. The sample size is  , and the measuremen t matrix  is taken to b e a symmetric matrix with Gaussian entries. The noise 󰇏 follo ws a sub-Gaussian distribution. The true signal 󰆸  is generated as a Gaussian random vector. This exp eriment compares the p erformance of WCCP-regularized with the   and   regularizers under v arying  ratios. In the exp eriments, noise 󰄤   󰄝   is added to the measurements   . Regarding the signal dimension  and sparsity level  , w e consider t w o cases:       and       . F or 󰄝   and  , the reco v ery of the unkno wn signal 󰆸  is considered 15 4 NUMERICAL EXPERIMENTS successful when the error is b elow   and   resp ectiv ely . The measurement ratios are set as        , with eac h exp eriment run 100 times. The performance is ev aluated based on success rate and a verage relative error, with the latter is calculated only for successful trials. F urthermore, Supplementary T ables S1–S8 conrm that as  approac hes 1, several WCCP regularizers—particularly SCAD and MCP—achiev e low er relativ e errors and higher stability than   and   , while b eing orders of magnitude faster. In contrast, Firm, LOG and EXP incur substantially higher computational costs. Ov erall, SCAD and MCP oer the best trade‑o betw een accuracy and eciency . Figure 1 illustrates the curv es of success rate as functions of the ratio  . The results conrm that WCCP methods generally outp erform   regularization and are comp etitive with   . The SCAD and MCP p enalties achiev e the highest success rates, esp ecially as  increases. Other W CCP members (Firm, LOG, and EXP) also exhibit strong p erformance, with LOG and EXP proving particularly robust across noise lev els. Detailed mean squared error and v ariance results for all measuremen t ratios are pro vided in App endix C (T ables 5- 12). These results sho w that as  approac hes 1, our works exhibits sup erior relativ e-error p erformance. Therefore, our method is more stable in identifying the sparse supp ort. Exp erimen t 2. This exp eriment demonstrates the p erformance of WCCP-regularized metho d in the sparse phase retriev al problem. The proximal op erator of the EXP regular- izer is computationally exp ensiv e; therefore, Algorithm 2 is used to solv e the problem in this exp eriment. The sparse test signal is generated from the standard test image ”camera- man.tif ” (a v ailable in MA TLAB’s image processing to olbox) follo wing the w a v elet-based sparsication pro cedure Jagatap & Hegde ( 2018 ) and Shah & Hegde ( 2021 ). Our implemen- tation adapts the publicly av ailable co de from Shah & Hegde ( 2021 ): w e resize the image to 64×64 pixels, apply a 4-level Haar wa velet transform, and retain only the top  of the co ecien ts b y magnitude. This diers from Shah & Hegde ( 2021 ) in b oth the test image 16 4 NUMERICAL EXPERIMENTS 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 n/d 0 0.2 0.4 0.6 0.8 1 Success rate 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 n/d 0 0.1 0.2 0.3 0.4 0.5 Success rate 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 n/d 0 0.2 0.4 0.6 0.8 1 Success rate 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 n/d 0 0.2 0.4 0.6 0.8 1 Success rate Figure 1 used and the sparsity criterion (p ercentage-based vs. xed coun t). In the exp erimen ts, noise 󰄤   󰄝   is added to the measuremen ts. The comprehensive numerical exp eriments demonstrate the clear adv an tages of W CCP- regularized metho ds ov er traditional approac hes for sparse phase retriev al. As sho wn in T ables 1 and 3 , WCCP metho ds, particularly SCAD and MCP , consisten tly achiev e near- p erfect reconstruction (SSIM  0.9992) across all noise lev els and measurement ratios, while traditional   and   metho ds sho w signicant p erformance degradation, esp ecially under high noise conditions and lo w measuremen t ratios. Notably , MCP ac hiev es p er- fect reconstruction at measuremen t ratio    , whereas   requires    to reac h comparable qualit y , demonstrating the sup erior sparse reco v ery capability of WCCP regularizers. In addition to enhanced reconstruction qualit y , WCCP metho ds oer substan tial compu- 17 A PR OOFS OF THEOREM 1 AND 2 tational adv antages. As evidenced in T ables 2 and 4 , SCAD and MCP complete recon- struction tasks in signican tly less time compared to   regularization, while main taining sup erior accuracy . F or instance, at noise lev el 󰄝    ,   requires 139.08 seconds while SCAD and MCP complete in only 0.61 and 0.58 seconds resp ectively . These results con- rm that WCCP-regularized metho ds provide an ecient and robust framework for sparse signal recov ery , making them particularly suitable for practical applications requiring high- qualit y reconstruction under challenging conditions. (a) Original Image (256×256) Original resolution (b) Resized Image (64×64) Used in sparse phase retrieval experiments (c) Sparsified Image (64×64, 5% coeff.) Ground truth for SSIM evaluation Figure 2 A Pro ofs of Theorem 1 and 2 Without loss of generalit y , let       and 󰆸   󰆸        , and then denote                       F or con venience, we denote  󰆸  󰆸   󰅫  󰆸 where   󰆸       󰆸    󰆸      . Since   󰅫  is lo cally Lipsc hitz con tin uous on   , b y Rademac her’s theorem, the second deriv ative   󰅫  exists almost everywhere. F or any 18 A PR OOFS OF THEOREM 1 AND 2 T able 1: Structural Similarit y Index (SSIM) p erformance of sparse regularization metho ds at dierent noise lev els (measurement ratio    ) Method Noise Level 󰅱 0           1 L1 0.8142 (0.3295) 0.8410 (0.3246) 0.8020 (0.3508) 0.8805 (0.2901) 0.8416 (0.3238) 0.8825 (0.2848) 0.7733 (0.3554) SCAD 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) MCP 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000)   0.9597 (0.1765) 0.9582 (0.1831) 0.9614 (0.1690) 0.9992 (0.0000) 0.9197 (0.2446) 0.9577 (0.1855) 0.9221 (0.2373) Firm 0.8787 (0.2960) 0.9564 (0.1912) 0.7615 (0.3731) 0.8071 (0.3422) 0.8418 (0.3231) 0.9206 (0.2424) 0.8782 (0.2956) LOG 0.9212 (0.2401) 0.9992 (0.0000) 0.9992 (0.0000) 0.9191 (0.2466) 0.9591 (0.1792) 0.9992 (0.0000) 0.9591 (0.1793) EXP 0.8353 (0.3372) 0.8310 (0.3453) 0.7186 (0.3931) 0.8468 (0.3129) 0.8371 (0.3337) 0.9122 (0.2675) 0.7116 (0.4023) T able 2: Computational time of sparse regularization metho ds at dieren t noise levels (measuremen t ratio    ) Method Noise Level 󰅱 0           1 L1 1.7725 1.9901 139.0842 1.9897 1.8820 1.9666 1.3740 SCAD 0.5458 0.6096 0.6105 0.5703 0.5774 0.5958 0.7209 MCP 0.5206 0.5516 0.5760 0.5341 0.5313 0.7673 0.6821   0.4745 0.5401 0.5185 0.5150 0.4807 0.4364 0.3899 Firm 0.6950 0.7791 0.7060 0.6844 0.6630 84.0347 0.5769 LOG 1.1945 1.3752 1.3729 1.1638 1.2105 1.3171 0.9131 EXP 1.1392 15.2622 1.1028 1.1917 1.1054 1.2542 0.8009 19 A PR OOFS OF THEOREM 1 AND 2 T able 3: Structural Similarity Index (SSIM) p erformance at dieren t measuremen t ratios (noise level 󰄝    ) Method Measurement Ratio  0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 L1 0.1255 (0.0708) 0.4185 (0.2080) 0.6918 (0.3482) 0.8556 (0.2946) 0.8875 (0.2736) 0.8916 (0.2631) 0.9647 (0.1541) 0.9992 (0.0000) 0.9646 (0.1543) SCAD 0.4886 (0.0312) 0.6165 (0.0409) 0.9082 (0.1958) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) MCP 0.4614 (0.0298) 0.5978 (0.0347) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000)   0.2317 (0.1600) 0.6062 (0.0509) 0.9019 (0.2536) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) Firm 0.0820 (0.0823) 0.5535 (0.3544) 0.7897 (0.3724) 0.9112 (0.2708) 0.9287 (0.2172) 0.9648 (0.1535) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) LOG 0.2770 (0.1481) 0.5740 (0.1302) 0.8487 (0.2430) 0.9180 (0.2500) 0.9614 (0.1689) 0.9259 (0.2256) 0.9992 (0.0000) 0.9992 (0.0000) 0.9992 (0.0000) EXP 0.0564 (0.0198) 0.1998 (0.1355) 0.7809 (0.3571) 0.7959 (0.3620) 0.8439 (0.3187) 0.8873 (0.2733) 0.9292 (0.2156) 0.9992 (0.0000) 0.9992 (0.0000) T able 4: Computational time at dierent measuremen t ratios (noise level 󰄝    ) Method Measurement Ratio  0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 L1 0.8037 1.5078 1.9668 1.9321 2.0649 2.0567 2.1940 2.3576 2.3357 SCAD 0.3875 0.4539 0.5693 0.5984 0.6111 0.6245 0.6540 0.6864 0.7105 MCP 0.3673 0.4219 0.5328 0.5630 0.5906 0.5945 0.6095 0.6428 0.6673   0.3269 0.3634 0.4327 0.4728 0.5130 0.5303 0.5673 0.5848 0.6059 Firm 0.4306 0.5050 0.6558 0.6647 0.6916 0.7159 0.7635 0.7762 0.8061 LOG 0.7325 0.8526 1.0285 1.1950 1.3259 1.4074 1.5055 1.5480 1.6134 EXP 0.7063 0.8373 1.0086 1.1550 1.2877 1.3680 1.4670 1.5155 1.5763 20 A PR OOFS OF THEOREM 1 AND 2     , the Clarke generalized second-order deriv ative of  󰅫 at   is dened as 󰄪   󰅫     co  lim    󰅫             󰅫    exists  where co denotes the conv ex h ull (all conv ex com binations of the set). Moreov er, since  󰅫 is 󰄘 -weakly conv ex, its Clarke generalized second-order deriv ative satises 󰄪   󰅫   󰄘      The following lemma is analogous to Lemma 3.1 in F an, Sun, Y an & Zhou ( 2025 ) but uses a dieren t tail b ound for the noise term, whic h leads to a slightly dierent probabilit y lo w er b ound and a dierent constan t in the deviation inequalit y . Lemma 1. Under A ssumptions 2-4, if    and     it fol lows that        exp  ln         wher e the event is dene d by     sup             󰄤    󰄝      ln      Pr o of. The pro of follows the same structure as Lemma 3.1 in F an, Sun, Y an & Zhou ( 2025 ). The only dierence is our c hoice of the parameter  when b ounding      󰄤   via Bernstein’s inequalit y . While F an, Sun, Y an & Zhou ( 2025 ) uses   󰄝  , w e take   󰄝  ln  . This yields a dieren t tail probability   and, under the condition     , leads to the deviation inequalit y sup             󰄤    󰄝      ln      W e no w detail the mo died steps. By Assumption 2 and Lemma 1.12 of Rigollet & Hütter ( 2017 ), 󰄤    󰄤   is sub‑exponential with parameter 󰄝  . Applying Bernstein’s inequalit y (Theorem 1.13 in Rigollet & Hütter ( 2017 )) with   󰄝  ln  , we obtain       󰄤    󰄤     󰄝  ln   exp  ln      21 A PR OOFS OF THEOREM 1 AND 2 Since 󰄝   󰄤   (due to the sub‑Gaussian prop erty), it follo ws that       󰄤    󰄝   󰄝  ln      (12) The remaining argumen ts are identical to those in F an, Sun, Y an & Zhou ( 2025 ). In particular, with probability at least    exp  ln       , sup             󰄤    󰄝      ln       󰄝   󰄝  ln    Using     , we hav e  󰄝   󰄝  ln   󰄝      ln        ln      ln      whic h implies  󰅱  󰅱  ln    󰄝      ln   . Consequently , sup             󰄤     󰄝      ln      󰄝      ln      This completes the proof. Pro of of Theorem 1 Without loss of generality , w e assume   󰆸    󰆸       󰆸    󰆸    . Subsequen tly , it is easy to verify 󰆸       󰆸    󰆸    . Consider the lev el set   󰆸         󰆸      󰅫  󰆸       󰆸       󰅫  󰆸    . It is clear that inf 󰈌       󰆸      󰅫  󰆸    inf 󰈌      󰆸      󰅫  󰆸   Since        󰅫   is contin uous and the level set is compact, there exists at least one minimizer  󰆸  in the level set. Noting that condition (iv) of Assumption 1 implies that all sub-dieren tial and deriv atives of  󰅫  are b ounded in magnitude b y 󰄗  󰄧 , we get    󰅫    󰆸       󰅫  󰆸     󰄗  󰄧  󰆸    󰆸      󰄗   󰄧  󰆸    󰆸    (13) 22 A PR OOFS OF THEOREM 1 AND 2 A ccording to the denition of  󰆸   , for any 󰆸     ,      󰆸       󰅫    󰆸        󰆸      󰅫  󰆸   Then we get        󰆸       󰅫    󰆸        󰆸       󰅫  󰆸            󰆸       󰆸              󰆸      󰆸           󰅫    󰆸       󰅫  󰆸            󰆸    󰆸          󰆸    󰆸             󰆸    󰆸          󰆸    󰆸   󰄤     󰅫    󰆸       󰅫  󰆸          󰆸    󰆸       󰆸    󰆸             󰆸    󰆸          󰆸    󰆸   󰄤     󰅫    󰆸       󰅫  󰆸    Com bining the inequality ( 13 ), w e obtain      󰆸    󰆸       󰆸    󰆸          󰆸    󰆸     󰆸    󰆸         󰆸    󰆸     󰆸    󰆸           󰆸    󰆸     󰆸    󰆸    󰄤   󰄗   󰄧  󰆸    󰆸    Under the even t   , we get   󰆸    󰆸     󰄝      󰆸     ln      󰄗  󰄧     󰆸     whic h yields that   󰆸    󰆸     󰅱         ln    󰅫  󰅻             ln    󰅫  󰅻    (14) 23 A PR OOFS OF THEOREM 1 AND 2 where    max 󰄝               . Combing this and Lemma 1 , w e get the desired result. T o pro ve Theorem 2, w e also need the following lemma. Lemma 2. Dene     max  sup               󰄤        󰄝   󰄝  ln    with     󰄝     ln  ln   . Under A ssumptions 2-4, it fol lows that              Pr o of. Let  󰄏            󰄏  . Then follo ws from the Lemma 14.27 of Bühlmann & V an De Geer ( 2011 ) that there exist       v ectors   with      and       suc h that                     Using this fact, w e obtain the follo wing inequalities max  sup               󰄤    max  sup                      󰄤    max  max                󰄤    max  sup                       󰄤   (15) Noting the fact that 󰆸  󰆸      󰆸  󰆸    for any    matrix  and vectors 󰆸 󰆸     , one can conclude from ( 4 ) that                                              (16) for any     . 24 A PR OOFS OF THEOREM 1 AND 2 F or the rst term of ( 15 ), we can apply the coun table subadditivit y of probabilit y , Bern- stein’s inequality , inequalit y ( 16 ) and              to obtain, for an y    ,  max  max                󰄤                          󰄤       exp      󰄝                  exp    󰄝       exp    󰄝       ln     ln    T aking     , we drive  max  max                󰄤           F or the second term of ( 15 ), we can calculate that max  sup                       󰄤    max  sup                               󰄤    max  sup                                     󰄤    sup                                󰄤             󰄤    (17) Com bing this, inequalities ( 12 ), ( 15 ) and ( 17 ), get the desired result. 25 A PR OOFS OF THEOREM 1 AND 2 Lemma 3. Under A ssumptions 2-4, it fol lows that         wher e the event   is dene d by     max              󰆸   󰄤       󰄝   ln     ln    Pr o of. F rom Bernstein’s inequality , ( 16 ) and ( 3 ), w e can conclude that  max              󰆸   󰄤      exp      󰄝           󰆸       exp    󰄝  󰆸        exp    󰄝        ln  By taking         󰄝   ln  ln   , we subsequently obtain  max              󰆸   󰄤           Then, we get the desired result. By combining the ideas from the pro ofs of Theorem 3.3 in Chen et al. ( 2013 ) and Theorem 2.4 in Bai et al. ( 2024 ), w e deriv e the following result. Lemma 4. Supp ose A ssumption 1 holds. L et 󰆸 b e a stationary p oint of Pr oblem ( 2 ) , i.e.,     󰆸  󰄪  󰅫  󰆸 If for any nonzer o ve ctor     with     for al l     ,       󰆸     min 󰄪   󰅫  󰄍        (18) then 󰆸 is a strict lo c al minimum of Pr oblem ( 2 ) . 26 A PR OOFS OF THEOREM 1 AND 2 Pro of of Theorem 2 A ccording to Theorem 1, we can obtain the existence of a estimator  󰆸      , and the error b ound betw een  󰆸   and 󰆸   is   . Let  󰆸 be a vector with  󰆸     󰆸   and  󰆸     . Firstly , w e prov e that  󰆸 is a stationary point of Problem ( 2 ), i.e.,       󰆸  󰄪  󰅫    󰆸 (19) By simple caiculation, w e hav e     󰆸  󰄪  󰅫    󰆸            󰆸       󰆸          󰆸          󰆸       󰆸          󰆸               󰅫    󰄍      󰄪  󰅫    󰄍                󰆸        󰅫    󰆸        󰆸    󰄪  󰅫    󰆸       Since       󰆸        󰅫    󰆸      w e only need to prov e       󰆸  󰄪  󰅫    󰆸    It suces to pro ve that     󰆸      󰄗  󰄧 (20) since 󰄪  󰅫    󰄍    󰄗  󰄧 󰄗  󰄧 . Noting     󰆸           󰆸     󰆸        󰆸           󰆸    󰆸          󰆸    󰆸       󰆸             󰆸   󰄤   w e get     󰆸             󰆸    󰆸          󰆸    󰆸       󰆸                󰆸   󰄤     (21) W e rst estimate the upper b ound of the rst term on the righ t-hand side of the ab o v e inequalit y . Using again the fact that 󰆸  󰆸      󰆸  󰆸    for an y    matrix  and 27 A PR OOFS OF THEOREM 1 AND 2 v ectors 󰆸 󰆸     , we conclude from Hölder’s inequalit y , Assumption 4 and ( 3 ) that         󰆸    󰆸          󰆸    󰆸       󰆸      max    󰆸    󰆸                󰆸    󰆸           󰆸     max    󰆸    󰆸       󰆸                󰆸    󰆸             󰆸    󰆸       󰆸     max  max  max          󰆸    󰆸                     󰆸    󰆸       󰆸       󰆸    󰆸     max  max                         󰆸    󰆸     󰆸    󰆸     󰆸                      󰆸    󰆸     󰆸      󰆸    󰆸     󰆸      󰆸    󰆸          󰆸                      (22) if the ev ent   o ccurs. Here, the fth inequality relies on the b ounds   󰆸    󰆸       and    󰆸    . The rst b ound follows from ( 14 ) under even t   . The second is a consquence of conditions ( 7 ) and      whic h together imply                        󰆸    W e pro ceed to estimate the upp er b ound of the second term on the right-hand side of inequalit y ( 21 ). Notice that           󰆸   󰄤     max               󰆸   󰄤    max                󰆸    󰆸   󰄤    max              󰆸   󰄤   28 A PR OOFS OF THEOREM 1 AND 2  max  sup               󰄤    max              󰆸   󰄤       󰄝           󰄝   ln     ln    under the even t      . Therefore, com bining the ab ov e inequalit y , inequalities ( 21 ) and ( 22 ), we get     󰆸                     󰄝   󰄝  ln       󰄝   ln     ln   (23) under the even t         . On the other hand, note that the left inequality of condition ( 7 ) results in    󰄝   ln     ln      󰄗  󰄧 (24) By the rst inequalit y of condition ( 5 ), w e get            󰄗  󰄧      󰄗  󰄧 whic h together with the left inequalit y of condition ( 7 ) also leads to             ln                   ln          ln        󰄗  󰄧 By simple calculation, w e get    󰄝   ln  ln    ln      󰄝   ln  ln         ln         ln    ln     ln        ln        29 A PR OOFS OF THEOREM 1 AND 2 where the last ineauality follo ws from the second inequality of condition ( 5 ). Com bing this and the inequality ( 24 ) enable us to get    󰄝     ln      ln    ln        󰄗  󰄧 By simple calculation, w e get    󰄝   ln  ln   󰅫  󰅻      󰄝   ln  ln    󰄗  󰄧          ln    ln     ln    󰄗  󰄧        where the last ineauality follo ws from the second inequality of condition ( 6 ). Com bing this and the inequality ( 24 ) enable us to get    󰄝     ln      ln    󰄗  󰄧      󰄗  󰄧 Considering  󰅱  󰅱  ln      󰄝   ln  ln               ln   ln     ln          where rst inequality deriv es from    (resulting in ln    ) and ln   ln     ln  and the last ineauality follo ws from the second inequalit y of condition ( 6 ). Combing this and the inequality ( 24 ) enable us to get  󰄝   󰄝  ln      󰄗  󰄧 By these inequalities, the denition of   and inequality ( 23 ), we get ( 20 ) under the ev ent         . Next, we prov e that  󰆸 is a lo cal minimizer if the even t         o ccurs. Based on Lemma 4 , it suces to prov e that  󰆸 satises condition ( 18 ). Note that the conv exity of  󰅫    󰅬    implies min 󰄪   󰅫   󰄘   for an y    30 A PR OOFS OF THEOREM 1 AND 2 and hence     min 󰄪   󰅫    󰄍    󰄘     On the other hand, a simple calculation yields that         󰆸     min 󰄪   󰅫    󰆸             󰆸      min 󰄪   󰅫    󰆸    󰄘       󰄘    Therefore, it remains to pro v e that for an y nonzero v ector     with     for all     ,         󰆸     󰄘     (25) W e no w pro ceed to pro ve this inequalit y . Since               󰄤        sup             󰄤   w e ha v e               󰄤    󰄝      ln         if the even t   o ccurs. Notice that        󰆸    󰆸          󰆸    󰆸                    󰆸    󰆸          󰆸    󰆸                    Com bining these tw o inequalities and Assumption 3, under the ev ent   , we ha v e for an y 31 A PR OOFS OF THEOREM 1 AND 2 non-zero vector     with     for all     ,         󰆸     󰄘              󰆸  󰆸             󰆸     󰆸             󰄘           󰆸                  󰆸    󰆸          󰆸    󰆸                         󰄤      󰄘       󰆸              󰆸    󰆸     󰆸    󰆸        󰄝      ln          󰄘        󰆸              󰆸    󰆸     󰆸    󰆸          󰆸          󰄘      󰆸        󰆸    󰆸      󰆸     󰆸    󰆸          󰆸       󰄍    󰆸     󰆸    󰆸     󰄝      ln      󰄘      󰆸        󰆸              󰆸      󰄝      ln      󰄘                            󰄝      ln      󰄘                       󰄝      ln      󰄘                         󰄝      ln      󰄘     where the fth inequalit y follo ws from        and      . Com bing the ab o v e inequalit y and the condition 󰄘        and ( 6 ), w e get ( 25 ) under the ev en t   . Ov erall, we prov e that  󰆸 is a lo cal minimizer under the even t         . By Lemmas 1 , 2 and 2 , we complete the pro of. 32 B PR OOFS OF PROPOSITIONS 1 AND 2 B Pro ofs of Prop ositions 1 and 2 Pro of of Prop osition 1 Based on the similar pro of metho d to that of Theorem 5 in F an et al. ( 2018 ), it is easy to pro v e equation (3.6) and th us it is omitted. W e mainly pro v e equation (3.7). F or an y 󰄞   , dene the follo wing auxiliary function  󰅲 󰆸󰆹   󰆹   󰆹  󰆸  󰆹    󰄞 󰆸  󰆹     󰅫 󰆹       󰅫 󰄍    󰄎   (1) where  󰅫         max   󰅫 󰄎   󰄐   if 󰄎    󰄗󰄧 if 󰄎    It is clear that minimizing ( 3 ) with resp ect to 󰆸 is equiv alent to the follo wing minimization problem min 󰈌    󰆸  󰆹  󰄞 󰆹    󰄞     󰅫 󰄍   (2) F or any    , let    󰆸     󰆸   and    sup 󰈌    󰆸 Then for an y 󰄞       and 󰆸 󰆹    , we hav e  󰆸  󰆹   󰆹  󰆸  󰆹     󰆸  󰆹     󰇅󰆸  󰆹    󰅫 󰆸  󰆹   󰆹  󰆸  󰆹     󰆸  󰆹     󰇅󰆸  󰆹    󰅫 󰆹        󰄍    󰄎    󰆹   󰆹  󰆸  󰆹     󰆸  󰆹     󰇅󰆸  󰆹    󰅫 󰆹    󰅢   max   󰅫 󰄎   󰄐  󰄍    󰄎     󰅢   󰄗󰄧󰄍    󰄎     󰅲 󰆸󰆹     󰆸  󰆹     󰇅󰆸  󰆹    󰄞 󰆸  󰆹     󰅲 󰆸󰆹      󰆸  󰆹     󰄞 󰆸  󰆹     󰅲 󰆸󰆹  33 B PR OOFS OF PROPOSITIONS 1 AND 2 where    󰄪  󰅫 󰄎   and 󰇅  󰆹  󰄌󰆸  󰆹  for some 󰄌    . The rst inequality follo ws from the conca vity of  󰅫 , the second inequalit y follows from the upp er bound of 󰄪  󰅫 and the third inequality follo ws from 󰇅   . Assuming  󰆸  arg min 󰈌   󰆸 and  󰆸  arg min 󰈌  󰆸  󰆸 , then w e get  󰅲   󰆸  󰆸   󰅲   󰆸  󰆸     󰆸     󰆸   󰅲   󰆸  󰆸 hence  󰆸  arg min 󰈌   󰅲 󰆸  󰆸 . It means that  󰆸 is also a minimizer of the problem ( 2 ) with 󰆹   󰆸 . It is clear that problem ( 2 ) can b e solved using the soft-thresholding op erator  , therefore w e can get ( 10 ). T o pro ve Prop osition 2 , we also need the follo wing lemma. Lemma 5. L et    󰄍    ,    sup 󰅡    󰄍   wher e    󰄍     󰄍   󰄍        F or any 󰄏   󰄌   󰄎     , dene          if 󰄎     󰄏     [ log 󰅢  󰄎     󰄏  ]   otherwise  Then ( 11 ) holds, and ther e exists a nonne gative inte ger   such that 󰄞   󰄎  󰄎     󰄎   Pro of sketc h The pro of follows the same structure as that of Lemmas 8 and 9 in F an et al. ( 2018 ). The core function  is iden tical. Although the regularizer considered here (a w eakly con vex concav e regularizer or a w eigh ted   p enalt y) diers in form from the        regularizer in F an et al. ( 2018 ), the argument relies only on the co ercivit y of the regularizer and the nonexpansiveness of the proximal op erator, b oth of whic h hold in our setting. Consequently , the entire line of reasoning applies directly , and detailed calculations are omitted. Pro of of Prop osition 2 Regarding Algorithm 1, based on the similar pro of metho d to that of Theorem S3.1 in F an et al. ( 2018 ), it is easy to prov e it and so is omitted. The 34 B PR OOFS OF PROPOSITIONS 1 AND 2 conclusion for Algorithm 2 is pro ved b elo w, where the pro of of (i) follo ws the approac h of Prop osition 3.2 in Bai et al. ( 2024 ). F or any 󰄞   , dene the follo wing auxiliary function   󰅲 󰆸 󰆸    󰆸    󰆸   󰆸  󰆸     󰄞 󰆸  󰆸      󰅫 󰆹        󰅫 󰄍    󰄍    (3) where   󰅫         max   󰅫 󰄍    󰄐   if 󰄍     󰄗󰄧 if 󰄍     Notice that the iteration 󰆸   󰆸   󰄞  󰆸   󰄗󰄞     is equiv alent to 󰆸   arg min   󰆸  󰆸   󰄞  󰆸     󰄗󰄞     󰆸  Hence the sequence 󰆸   is b ounded b ecause ev ery entry of   is p ositiv e, 󰄗   and 󰄞   󰄎  󰄎     󰄎   from Lemma 5 . Then there exists a positive constant  suc h that   sup   󰆸    . (i) F rom the conv avit y of   󰅫    for any    , w e conclude that     󰅫 󰄍         󰅫 󰄍      󰅡      󰅫 󰄍   󰄍     󰄍      󰅡    󰄚   󰄍     󰄍      󰅡      󰅫 󰄍   󰄍     󰄍      󰅡    󰄗󰄧󰄍     󰄍          󰅫 󰄍     󰄍    (4) where the second inequality follows from lim     󰅫   󰄗󰄧 and 󰄧   . A dditionally , from the T a ylor expansion, it follows that 󰆸    󰆸    󰆸   󰆸   󰆸      󰆸   󰆸     (5) 35 B PR OOFS OF PROPOSITIONS 1 AND 2 Com bining ( 4 ) and ( 5 ), w e can get  󰆸     󰆸   󰆸        󰅫 󰄍     󰆸        󰅫 󰄍    󰆸   󰆸   󰆸       󰆸   󰆸          󰅫 󰄍     󰄍    (6) F or eac h subproblem, 󰆸  staises the optimal condition   󰆸     󰄞  󰆸   󰆸      󰅫  󰄪 󰆸     whic h implies that there exists an v ector 󰆼   󰄑      󰄑      󰄪 󰆸    suc h that 󰆸     󰄞  󰆸   󰆸      󰅫  󰆼    (7) where   󰅫    󰅫      󰅫   . Therefore, we obtain   󰅲  󰆸   󰆸      󰅲  󰆸   󰆸   󰆸   󰆸   󰆸     󰄞  󰆸   󰆸           󰄍     󰄍    󰆸   󰆸   󰆸       󰄑      󰄍    󰄍      󰄞  󰆸   󰆸    󰆸     󰄞 󰆸   󰆸          󰄑     󰆸   󰆸     󰄞  󰆸   󰆸      󰄞  󰆸   󰆸     where the inequalit y follo ws from the con vexit y of    and   󰅫   , and the last equalit y deriv es from ( 7 ). Com bining this and ( 6 ), w e get  󰆸     󰆸     󰅲  󰆸   󰆸      󰅲  󰆸   󰆸      󰆸   󰆸      󰄞  󰆸   󰆸       󰄞      󰆸   󰆸     (8) 36 B PR OOFS OF PROPOSITIONS 1 AND 2 (ii) F rom the denition of 󰆸  and ( 8 ), we ha v e     󰆸     󰆸     󰆸     󰆸       󰄞         󰆸   󰆸     Hence,    󰆸   󰆸      , whic h implies lim  󰆸   󰆸      . (iii) Because 󰆸   and 󰄞   ha ve conv ergent subsequences, without loss of generality , as- sume that 󰆸   󰆸  and 󰄞   󰄞  as    (9) and denote   󰅲 󰆸 󰆸    󰆸    󰆸   󰆸  󰆸     󰄞 󰆸  󰆸      󰅫 󰆹         󰄍    󰄍    with            max   󰅫 󰄍    󰄐   if 󰄍     󰄗󰄧 if 󰄍     It is known that the iteration 󰆸   󰆸   󰄞  󰆸   󰄗󰄞     Then for an y 󰆸    ,   󰅲  󰆸   󰆸      󰅲  󰆸 󰆸   Com bing the contin uousness of the function   󰅲  󰆸   and the limit ( 9 ), w e can obtain   󰅲  󰆸   󰆸    lim    󰅲  󰆸   󰆸    lim    󰅲  󰆸 󰆸      󰅲  󰆸 󰆸   Based on the abov e inequality , we get 󰆸   arg min 󰈌    󰅲  󰆸 󰆸   A ccording to the Prop osition 1, it means that 󰆸  is also satises 󰆸  󰆸  󰄞 󰆸 󰄞  󰅫  Then, we complete the proof. 37 REFERENCES C A dditional n umerical results T able 5: MSE(SD) of relative error for       and 󰄝        SCAD MCP Firm LOG EXP 0.1 MSE 1.091 (0.1414) 1.114 (0.1394) 1.091 (0.1646) 1.111 (0.1571) 1.152 (0.1743) 1.166 (0.1704) 1.187 (0.1779) 0.2 MSE 1.068 (0.1778) 1.108 (0.1568) 1.042 (0.1955) 1.03 (0.2445) 1.12 (0.3158) 1.133 (0.2951) 1.122 (0.349) 0.3 MSE 0.9705 (0.293) 1.052 (0.2643) 0.9269 (0.2784) 0.8665 (0.3783) 0.7603 (0.6056) 0.9416 (0.4998) 0.823 (0.5875) 0.4 MSE 0.5529 (0.5201) 0.7402 (0.4788) 0.6218 (0.407) 0.4724 (0.4664) 0.4097 (0.5788) 0.5031 (0.5934) 0.4627 (0.6022) 0.5 MSE 0.3079 (0.5235) 0.4849 (0.5386) 0.4257 (0.4788) 0.3526 (0.477) 0.2461 (0.4983) 0.2959 (0.5359) 0.2387 (0.4991) 0.6 MSE 0.05125 (0.2487) 0.2159 (0.4243) 0.2071 (0.3718) 0.06997 (0.2575) 0.04582 (0.2242) 0.08147 (0.2997) 0.04663 (0.2289) 0.7 MSE 0.07634 (0.302) 0.08627 (0.316) 0.1135 (0.3098) 0.05085 (0.2215) 0.07238 (0.2859) 0.07605 (0.3014) 0.05124 (0.2502) 0.8 MSE 0.01417 (0.1356) 0.008939 (0.08315) 0.02032 (0.141) 0.01137 (0.1108) 0.0005478 (0.0003627) 0.0003913 (0.0003058) 0.0004205 (0.0003104) 0.9 MSE 0.01311 (0.1256) 0.02341 (0.1631) 0.01126 (0.1101) 0.02173 (0.1514) 0.01205 (0.1157) 0.01312 (0.1279) 0.012 (0.1164) 1.0 MSE 0.0005213 (0.0002837) 0.0004813 (0.0008072) 0.008467 (0.08181) 0.0002522 (0.0001432) 0.0004765 (0.0002836) 0.0003329 (0.0002278) 0.0003577 (0.0002334) References Bai, L., Hu, Y., W ang, H. & Y ang, X. (2024), ‘A v oiding strict saddle p oin ts of noncon v ex regularized problems’, arXiv pr eprint . Bec k, A. (2017), First-or der metho ds in optimization , SIAM. Bolte, J., Sabach, S., T eb oulle, M. & V aisb ourd, Y. (2018), ‘First order metho ds b eyond con v exit y and lipsc hitz gradient con tinuit y with applications to quadratic in v erse prob- lems’, SIAM Journal on Optimization 28 (3), 2131–2151. Bradley , P . S., Mangasarian, O. L. & Street, W. N. (1998), ‘F eature selection via mathe- matical programming’, INFORMS Journal on Computing 10 (2), 209–217. Bühlmann, P . & V an De Geer, S. (2011), Statistics for high-dimensional data: metho ds, the ory and applic ations , Springer Science & Business Media. 38 REFERENCES REFERENCES T able 6: MSE(SD) of relative error for       and 󰄝        SCAD MCP Firm LOG EXP 0.1 MSE 1.085 (0.1328) 1.106 (0.1324) 1.087 (0.1423) 1.101 (0.1628) 1.153 (0.1718) 1.16 (0.1624) 1.175 (0.1777) 0.2 MSE 1.049 (0.1958) 1.101 (0.1642) 1.021 (0.2048) 1.043 (0.2243) 1.094 (0.3356) 1.121 (0.2632) 1.13 (0.342) 0.3 MSE 0.8273 (0.3511) 0.9234 (0.3213) 0.8148 (0.3337) 0.7896 (0.3815) 0.6387 (0.593) 0.6949 (0.5856) 0.6682 (0.612) 0.4 MSE 0.6436 (0.5085) 0.8282 (0.4484) 0.6899 (0.395) 0.5693 (0.4823) 0.4613 (0.5924) 0.59 (0.5957) 0.5024 (0.607) 0.5 MSE 0.243 (0.4811) 0.509 (0.5236) 0.4271 (0.471) 0.2844 (0.4427) 0.2429 (0.4918) 0.2886 (0.5111) 0.205 (0.4681) 0.6 MSE 0.203 (0.4607) 0.2593 (0.493) 0.2211 (0.405) 0.1493 (0.3698) 0.174 (0.4275) 0.1916 (0.454) 0.1882 (0.4472) 0.7 MSE 0.02853 (0.1732) 0.07949 (0.2845) 0.08671 (0.2745) 0.02283 (0.1482) 0.02904 (0.1793) 0.03935 (0.211) 0.04164 (0.2227) 0.8 MSE 0.003159 (0.002873) 0.03442 (0.1868) 0.05568 (0.2171) 0.02166 (0.1408) 0.002817 (0.002608) 0.0135 (0.1145) 0.01544 (0.1327) 0.9 MSE 0.0153 (0.1272) 0.01504 (0.1357) 0.01208 (0.1083) 0.01222 (0.1099) 0.01495 (0.1276) 0.01413 (0.1258) 0.01482 (0.1317) 1.0 MSE 0.002603 (0.001302) 0.0154 (0.1383) 0.01241 (0.1108) 0.01251 (0.112) 0.01539 (0.1304) 0.01415 (0.1254) 0.001726 (0.0009298) T able 7: MSE(SD) of relative error for       and 󰄝        SCAD MCP Firm LOG EXP 0.1 MSE 1.121 (0.06421) 1.143 (0.05974) 1.138 (0.09982) 1.144 (0.07845) 1.18 (0.1058) 1.231 (0.08987) 1.263 (0.1073) 0.2 MSE 1.103 (0.15) 1.147 (0.1259) 1.062 (0.2094) 1.049 (0.2355) 1.087 (0.3919) 1.165 (0.3045) 1.15 (0.377) 0.3 MSE 0.9678 (0.3185) 1.101 (0.2053) 0.8716 (0.3311) 0.866 (0.3872) 0.7233 (0.6405) 0.8856 (0.5504) 0.7427 (0.6431) 0.4 MSE 0.5304 (0.5944) 0.8253 (0.482) 0.5715 (0.497) 0.4767 (0.5222) 0.3911 (0.5887) 0.5146 (0.6151) 0.4587 (0.6179) 0.5 MSE 0.2341 (0.476) 0.4294 (0.534) 0.293 (0.4405) 0.1878 (0.4069) 0.1399 (0.3996) 0.2263 (0.4728) 0.1496 (0.4082) 0.6 MSE 0.1276 (0.3851) 0.1752 (0.4071) 0.1024 (0.3097) 0.08888 (0.3035) 0.07372 (0.2932) 0.08987 (0.3296) 0.08984 (0.329) 0.7 MSE 0.06553 (0.2857) 0.08276 (0.3045) 0.04321 (0.2121) 0.04276 (0.21) 0.06264 (0.2731) 0.06505 (0.2842) 0.05078 (0.2491) 0.8 MSE 0.0003446 (0.000144) 0.02425 (0.1695) 0.01038 (0.1022) 0.02158 (0.1507) 0.0003082 (0.0001376) 0.0002127 (0.0001035) 0.0002275 (0.0001077) 0.9 MSE 0.0002978 (0.0001248) 0.0001692 (0.0001473) 0.01013 (0.09999) 0.0001356 (6.212e-05) 0.0002639 (0.0001202) 0.0001809 (9.108e-05) 0.0001931 (9.303e-05) 1.0 MSE 0.0002786 (0.0001441) 0.0001873 (0.0002725) 0.0001275 (6.871e-05) 0.0001276 (7.027e-05) 0.0002492 (0.0001398) 0.0001706 (0.0001048) 0.0001842 (0.0001158) 39 REFERENCES REFERENCES T able 8: MSE(SD) of relative error for       and 󰄝        SCAD MCP Firm LOG EXP 0.1 MSE 1.129 (0.07115) 1.151 (0.06742) 1.153 (0.07808) 1.175 (0.09573) 1.192 (0.1179) 1.245 (0.09804) 1.273 (0.1223) 0.2 MSE 1.12 (0.1201) 1.169 (0.09863) 1.114 (0.1472) 1.094 (0.2043) 1.221 (0.2625) 1.227 (0.2284) 1.219 (0.3407) 0.3 MSE 0.9922 (0.2916) 1.091 (0.2339) 0.9472 (0.2679) 0.9405 (0.3577) 0.8092 (0.6053) 0.95 (0.5133) 0.8946 (0.5978) 0.4 MSE 0.4294 (0.5732) 0.8253 (0.4619) 0.5233 (0.4976) 0.431 (0.5097) 0.3336 (0.5659) 0.4446 (0.6011) 0.3967 (0.6094) 0.5 MSE 0.2368 (0.4876) 0.5002 (0.5624) 0.2478 (0.4343) 0.1508 (0.3751) 0.1756 (0.4334) 0.2441 (0.4929) 0.2194 (0.4847) 0.6 MSE 0.05631 (0.2677) 0.1342 (0.3705) 0.1246 (0.3398) 0.05633 (0.2435) 0.06604 (0.2818) 0.07899 (0.3117) 0.06739 (0.29) 0.7 MSE 0.01509 (0.1325) 0.01446 (0.1328) 0.01168 (0.1076) 0.02301 (0.156) 0.01473 (0.1312) 0.02736 (0.1845) 0.01441 (0.132) 0.8 MSE 0.001777 (0.0008609) 0.01031 (0.09262) 0.01142 (0.106) 0.000823 (0.0004187) 0.001569 (0.0008123) 0.00113 (0.0006776) 0.00119 (0.0006817) 0.9 MSE 0.001553 (0.0006901) 0.0008882 (0.0005476) 0.01075 (0.09993) 0.000747 (0.0003353) 0.001367 (0.0006242) 0.0009917 (0.0004932) 0.001044 (0.0004984) 1.0 MSE 0.00136 (0.0005866) 0.0007221 (0.0004205) 0.0006343 (0.0002849) 0.0006249 (0.0002817) 0.0012 (0.0005614) 0.0008326 (0.0004163) 0.0008955 (0.0004457) T able 9: Mean Time (SD) for       and 󰄝        SCAD MCP Firm LOG EXP 0.1 Time 0.03377 (0.01813) 0.1205 (0.09658) 0.01371 (0.007892) 0.01626 (0.01263) 0.1419 (0.09485) 0.3334 (0.1808) 0.55 (0.3274) 0.2 Time 0.02298 (0.007626) 0.05555 (0.02541) 0.003957 (0.00106) 0.005396 (0.009777) 0.06276 (0.03421) 0.08763 (0.01004) 0.2456 (0.1286) 0.3 Time 0.07842 (0.04432) 0.0535 (0.05498) 0.006069 (0.003454) 0.00785 (0.007068) 0.07034 (0.07559) 0.1158 (0.06931) 0.519 (0.3738) 0.4 Time 0.341 (0.1211) 0.1132 (0.08343) 0.02539 (0.01485) 0.04185 (0.04007) 0.1626 (0.1324) 0.2637 (0.2602) 1.015 (1.207) 0.5 Time 0.6451 (0.1132) 0.1928 (0.1897) 0.0556 (0.02597) 0.06179 (0.06821) 0.1708 (0.0657) 0.2991 (0.3289) 1.042 (1.746) 0.6 Time 0.5028 (0.1514) 0.1376 (0.1614) 0.05767 (0.02264) 0.05976 (0.02864) 0.1421 (0.04185) 0.1573 (0.2019) 0.3919 (0.93) 0.7 Time 0.3975 (0.1663) 0.1015 (0.1064) 0.07437 (0.07595) 0.08195 (0.1162) 0.1253 (0.03575) 0.1538 (0.2255) 0.3738 (1.007) 0.8 Time 0.3499 (0.1075) 0.09129 (0.03348) 0.07957 (0.03062) 0.07738 (0.03206) 0.142 (0.03733) 0.1259 (0.1176) 0.1909 (0.06753) 0.9 Time 0.3183 (0.1172) 0.0907 (0.03449) 0.09587 (0.1122) 0.1031 (0.1207) 0.1454 (0.02508) 0.1609 (0.2382) 0.2925 (0.9214) 1.0 Time 0.303 (0.05565) 0.1067 (0.0494) 0.1029 (0.0561) 0.1073 (0.0597) 0.1552 (0.03678) 0.1841 (0.2233) 0.1944 (0.06617) 40 REFERENCES REFERENCES T able 10: Mean Time (SD) for       and 󰄝        SCAD MCP Firm LOG EXP 0.1 Time 0.01993 (0.01462) 0.07793 (0.06112) 0.00739 (0.00417) 0.008899 (0.005657) 0.08419 (0.06419) 0.1891 (0.09824) 0.2951 (0.1907) 0.2 Time 0.1336 (0.04051) 0.3351 (0.1643) 0.02581 (0.01689) 0.02901 (0.0108) 0.4944 (0.2485) 0.61 (0.1112) 1.245 (0.4835) 0.3 Time 0.3753 (0.1275) 0.2123 (0.1632) 0.0257 (0.01092) 0.04378 (0.08122) 0.302 (0.2415) 0.4586 (0.2208) 1.296 (1.137) 0.4 Time 0.6192 (0.1141) 0.1942 (0.1632) 0.03955 (0.02311) 0.04445 (0.0207) 0.191 (0.1407) 0.4549 (0.3075) 1.53 (1.698) 0.5 Time 0.6784 (0.1806) 0.1867 (0.179) 0.05155 (0.02586) 0.06454 (0.1035) 0.1413 (0.05757) 0.3199 (0.3196) 0.9733 (1.625) 0.6 Time 0.4605 (0.161) 0.1458 (0.1995) 0.04942 (0.02432) 0.05805 (0.06407) 0.1279 (0.04818) 0.2552 (0.2949) 1.105 (2.237) 0.7 Time 0.3264 (0.1112) 0.0944 (0.1296) 0.05519 (0.02142) 0.06366 (0.08287) 0.1171 (0.04002) 0.1778 (0.1704) 0.33 (0.7549) 0.8 Time 0.3621 (0.118) 0.1184 (0.1856) 0.0924 (0.08049) 0.09844 (0.07525) 0.1592 (0.04999) 0.1901 (0.1602) 0.3504 (1.017) 0.9 Time 0.3463 (0.1191) 0.1029 (0.05027) 0.1115 (0.1813) 0.1292 (0.215) 0.1702 (0.03466) 0.2192 (0.1908) 0.2837 (0.6244) 1.0 Time 0.3231 (0.1258) 0.09973 (0.035) 0.1363 (0.2276) 0.1741 (0.3248) 0.1673 (0.04315) 0.2882 (0.3655) 0.2437 (0.1679) T able 11: Mean Time (SD) for       and 󰄝        SCAD MCP Firm LOG EXP 0.1 Time 0.05887 (0.01293) 0.2234 (0.08841) 0.01697 (0.005035) 0.01699 (0.004914) 0.1988 (0.09614) 0.3827 (0.07207) 0.6776 (0.1144) 0.2 Time 0.1291 (0.04144) 0.3178 (0.1516) 0.01979 (0.004716) 0.02326 (0.01892) 0.2496 (0.1418) 0.4005 (0.08792) 2.426 (1.495) 0.3 Time 0.7639 (0.2156) 0.289 (0.2028) 0.04582 (0.01895) 0.05365 (0.02381) 0.2879 (0.1784) 0.7338 (0.3489) 4.571 (3.882) 0.4 Time 1.514 (0.2979) 0.4177 (0.3474) 0.1032 (0.06255) 0.123 (0.1618) 0.3198 (0.1482) 0.7819 (0.6811) 4.638 (5.593) 0.5 Time 1.508 (0.4161) 0.3399 (0.3124) 0.1315 (0.05244) 0.1227 (0.03739) 0.291 (0.09094) 0.5467 (0.6743) 2.529 (4.824) 0.6 Time 3.892 (1.556) 1.135 (1.459) 0.5243 (0.1609) 0.5964 (0.9066) 1.018 (0.2602) 1.366 (1.971) 3.584 (9.087) 0.7 Time 2.256 (1.078) 0.6103 (0.5344) 0.4549 (0.08223) 0.4341 (0.0722) 0.7783 (0.1122) 0.9275 (1.521) 2.688 (9.078) 0.8 Time 1.738 (0.2359) 0.5042 (0.092) 0.5566 (0.7428) 0.5519 (0.7202) 0.7686 (0.1213) 0.702 (0.708) 1.018 (0.9496) 0.9 Time 1.576 (0.1965) 0.531 (0.1265) 0.7153 (0.9071) 0.73 (1.01) 0.7896 (0.1359) 1.044 (1.544) 0.9093 (0.4753) 1.0 Time 1.438 (0.183) 0.5723 (0.2167) 0.645 (0.6552) 0.6464 (0.5954) 0.7828 (0.09955) 0.8698 (0.9806) 0.9996 (0.9732) 41 REFERENCES REFERENCES T able 12: Mean Time (SD) for       and 󰄝        SCAD MCP Firm LOG EXP 0.1 Time 0.06015 (0.01408) 0.2368 (0.08766) 0.01761 (0.004725) 0.01972 (0.01006) 0.2015 (0.08702) 0.3895 (0.06751) 0.6818 (0.07771) 0.2 Time 0.1358 (0.06205) 0.3425 (0.1847) 0.02148 (0.008248) 0.02143 (0.007374) 0.2777 (0.1744) 0.4298 (0.1278) 2.522 (1.414) 0.3 Time 0.859 (0.2789) 0.3643 (0.2975) 0.04648 (0.02163) 0.06987 (0.1505) 0.2603 (0.182) 0.7928 (0.3222) 5.155 (3.626) 0.4 Time 1.832 (0.2002) 0.484 (0.3226) 0.1184 (0.06135) 0.1091 (0.04349) 0.3176 (0.09057) 0.8818 (0.7331) 4.047 (5.325) 0.5 Time 1.812 (0.435) 0.3944 (0.3735) 0.1328 (0.04252) 0.1257 (0.03247) 0.2781 (0.08556) 0.6853 (0.7801) 2.898 (5.014) 0.6 Time 1.234 (0.4237) 0.3135 (0.3604) 0.1599 (0.04195) 0.1796 (0.2797) 0.2856 (0.06056) 0.4151 (0.5578) 1.49 (4.121) 0.7 Time 0.9523 (0.2734) 0.2349 (0.275) 0.1719 (0.0329) 0.1647 (0.03235) 0.2858 (0.06317) 0.3363 (0.3792) 0.654 (2.279) 0.8 Time 0.7881 (0.1328) 0.2051 (0.05422) 0.1973 (0.1323) 0.201 (0.1327) 0.2805 (0.04738) 0.3521 (0.2724) 0.4649 (0.4407) 0.9 Time 0.7322 (0.1035) 0.2063 (0.02253) 0.2012 (0.03551) 0.2529 (0.3343) 0.2927 (0.04338) 0.3249 (0.1853) 0.4405 (0.2648) 1.0 Time 0.6781 (0.0835) 0.2296 (0.05449) 0.2365 (0.1932) 0.2358 (0.1469) 0.3107 (0.05463) 0.4408 (0.4755) 0.4186 (0.06463) Cai, J.-F., Li, J., Lu, X. & Y ou, J. (2022), ‘Sparse signal recov ery from phaseless measure- men ts via hard thresholding pursuit’, A pplie d and Computational Harmonic A nalysis 56 , 367–390. Cai, T. T., Li, X. & Ma, Z. (2016), ‘Optimal rates of conv ergence for noisy sparse phase retriev al via thresholded wirtinger o w’, The A nnals of Statistics 44 (5), 2221–2251. Cai, T. T., Xu, G. & Zhang, J. (2009), ‘On recov ery of sparse signals via l1 minimization’, IEEE T r ansactions on Information The ory 55 (7), 3388–3397. Candes, E. J., Li, X. & Soltanolkotabi, M. (2015), ‘Phase retriev al via wirtinger o w: Theory and algorithms’, IEEE T r ansactions on Information The ory 61 (4), 1985–2007. Chen, J. & Ng, M. K. (2022), ‘Error b ound of empirical   risk minimization for noisy standard and generalized phase retriev al problems’, arXiv pr eprint . Chen, J., Ng, M. K. & Liu, Z. (2025), ‘Solving quadratic systems with full-rank matrices using sparse or generativ e priors’, IEEE T r ansactions on Signal Pr o c essing . 42 REFERENCES REFERENCES Chen, X., Niu, L. & Y uan, Y. (2013), ‘Optimalit y conditions and a smo othing trust region newton metho d for nonlipsc hitz optimization’, SIAM Journal on Optimization 23 (3), 1528–1552. Ding, K., Li, J. & T oh, K.-C. (2025), ‘Nonconv ex sto c hastic bregman proximal gradi- en t method with application to deep learning’, Journal of Machine L e arning R ese ar ch 26 (39), 1–44. Ebner, A., Sc h w ab, M. & Haltmeier, M. (2025), ‘Error estimates for weakly conv ex frame-based regularization including learned lters’, SIAM Journal on Imaging Scienc es 18 (2), 822–850. F an, J., Kong, L., W ang, L. & Xiu, N. (2018), ‘V ariable selection in sparse regression with quadratic measurements’, Statistic a Sinic a 28 (3), 1157–1178. F an, J. & Li, R. (2001), ‘V ariable selection via nonconcav e p enalized likelihoo d and its oracle prop erties’, Journal of the A meric an statistic al A sso ciation 96 (456), 1348–1360. F an, J., Sun, J., Y an, A. & Zhou, S. (2025), ‘An oracle gradien t regularized newton metho d for quadratic measuremen ts regression’, A pplie d and Computational Harmonic A nalysis 78 , 101775. F an, J., Y an, A., Xiu, X. & Liu, W. (2025), ‘Robust sparse phase retriev al: Statistical guar- an tee, optimality theory and con vergen t algorithm’, arXiv pr eprint . F rank, L. E. & F riedman, J. H. (1993), ‘A statistical view of some chemometrics regression to ols’, T e chnometrics 35 (2), 109–135. Goujon, A., Neuma yer, S. & Unser, M. (2024), ‘Learning w eakly conv ex regularizers for con- v ergen t image-reconstruction algorithms’, SIAM Journal on Imaging Scienc es 17 (1), 91– 115. 43 REFERENCES REFERENCES Huang, J., Horo witz, J. L. & Ma, S. (2008), ‘Asymptotic prop erties of bridge estimators in sparse high-dimensional regression models’, The A nnals of Statistics 36 (2), 587–613. Huang, M. & Xu, Z. (2020), ‘The estimation p erformance of nonlinear least squares for phase retriev al’, IEEE T r ansactions on Information The ory 66 (12), 7967–7977. Huang, M. & Xu, Z. (2024), ‘Performance b ounds of the in tensity-based estimators for noisy phase retriev al’, Applie d and Computational Harmonic A nalysis 68 , 101584. Huang, S. & Dokmanić, I. (2021), ‘Reconstructing p oin t sets from distance distributions’, IEEE T r ansactions on Signal Pr o c essing 69 , 1811–1827. Huang, S., Gupta, S. & Dokmanić, I. (2020), ‘Solving complex quadratic systems with full-rank random matrices’, IEEE tr ansactions on Signal Pr o c essing 68 , 4782–4796. Jagatap, G. & Hegde, C. (2018), T ow ards sample-optimal metho ds for solving random quadratic equations with structure, in ‘2018 IEEE International Symposium on Informa- tion Theory (ISIT)’, pp. 2296–2300. Khanh, P . D., Mordukho vich, B. S., Phat, V. T. & T ran, D. B. (2025), ‘Inexact pro ximal metho ds for weakly con vex functions’, Journal of Glob al Optimization 91 (3), 611–646. K om uro, K., Y ukaw a, M. & Ca v alcan te, R. L. G. (2022), ‘Distributed sparse optimization with w eakly con vex regularizer: Consensus promoting and appro ximate moreau enhanced p enalties to w ards global optimalit y’, IEEE T r ansactions on Signal and Information Pr o- c essing over Networks 8 , 514–527. Lob o, M. S., F azel, M. & Bo yd, S. (2007), ‘P ortfolio optimization with linear and xed transaction costs’, A nnals of Op er ations R ese ar ch 152 , 341–365. Loh, P .-L. (2017), ‘Statistical consistency and asymptotic normalit y for high-dimensional robust m-estimators’, The A nnals of Statistics 45 (2), 866–896. 44 REFERENCES REFERENCES Loh, P .-L. & W ainwrigh t, M. J. (2015), ‘Regularized m-estimators with nonconv exity: Sta- tistical and algorithmic theory for lo cal optima’, The Journal of Machine L e arning R e- se ar ch 16 (1), 559–616. Loh, P .-L. & W ainwrigh t, M. J. (2017), ‘Supp ort recov ery without incoherence: A case for noncon v ex regularization’, The A nnals of Statistics 45 (6), 2455–2482. Rigollet, P . & Hütter, J.-C. (2017), High dimensional statistics. Lecture notes for course. Shah, V. & Hegde, C. (2021), ‘Sparse signal recov ery from mo dulo observ ations’, EURASIP Journal on A dvanc es in Signal Pr o c essing 2021 (1), 1–17. Sh uma ylo v, Z., Budd, J., Mukherjee, S. & Schönlieb, C.-B. (2024), ‘W eakly con vex regu- larisers for in v erse problems: Con vergence of critical p oin ts and primal-dual optimisa- tion’, arXiv pr eprint . Soltanolk otabi, M. (2019), ‘Structured signal reco v ery from quadratic measuremen ts: Breaking sample complexity barriers via noncon v ex optimization’, IEEE T r ansactions on Information The ory 65 (4), 2374–2400. Thak er, P . K., Dasarath y , G. & Nedić, A. (2020), ‘On the sample complexity and opti- mization landscape for quadratic feasibilit y problems’, IEEE International Symp osium on Information The ory (ISIT) pp. 1438–1443. W ang, G., Zhu, H., Giannakis, G. B. & Sun, J. (2019), ‘Robust p ow er system state estima- tion from rank-one measurements’, IEEE T r ansactions on Contr ol of Network Systems 6 (4), 1391–1403. W ang, X. (2010), ‘On c heb yshev functions and klee functions’, Journal of mathematic al analysis and applic ations 368 (1), 293–310. 45 REFERENCES REFERENCES W ang, Y. & Xu, Z. (2019), ‘Generalized phase retriev al: measurement num b er, matrix reco v ery and b ey ond’, A pplie d and Computational Harmonic A nalysis 47 (2), 423–446. W o o dworth, J. & Chartrand, R. (2016), ‘Compressed sensing reco v ery via nonconv ex shrinkage p enalties’, Inverse Pr oblems 32 (7), 075004. Xia, Y. & Xu, Z. (2021), ‘Sparse phase retriev al via phaselifto ’, IEEE T r ansactions on Signal Pr o c essing 69 , 2129–2143. Xu, W., W ang, H. J. & Li, D. (2022), ‘Extreme quantile estimation based on the tail single-index mo del’, Statistic a Sinic a 32 (2), 893–914. Y ang, C., Shen, X., Ma, H., Chen, B., Gu, Y. & So, H. C. (2019), ‘W eakly con v ex regu- larized robust sparse recov ery metho ds with theoretical guarantees’, IEEE T r ansactions on Signal Pr o c essing 67 (19), 5046–5061. Y ang, W., Shi, H., Guo, X. & Zou, C. (2024), ‘Robust group and simultaneous inferences for high-dimensional single index mo del’, A dvanc es in Neur al Information Pr o c essing Systems 37 , 65217–65259. Zhang, C.-H. (2010), ‘Nearly unbiased v ariable selection under minimax conca v e p enalty’, The A nnals of Statistics 38 (2), 894–942. Zhang, H., Zhang, L. & Y ang, H.-X. (2023), ‘Revisiting linearized bregman iterations under lipsc hitz-lik e con v exity condition’, Mathematics of Computation 92 (340), 779–803. Zou, H. (2006), ‘The adaptiv e lasso and its oracle prop erties’, Journal of the A meric an statistic al asso ciation 101 (476), 1418–1429. 46

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment