Approximation properties of certain operator-induced norms on Hilbert spaces

Appro ximatio n prop ert ies of certai n op era tor-induced norms on Hilb ert spaces Arash A. Amini b , Martin J. W ain wright a,b a Dep artment o f Statistics and b Dep artment of Ele ctric al Engine ering and Computer Scienc es UC Berkeley, Berkeley, CA 94720 Abstract W e consider a class of op erator- induced norms, a cting as ﬁnite-dimensional surrogates to the L 2 norm, and study their appro ximation prop erties o ver Hilb ert subspaces of L 2 . The class includes, as a sp ecial case, the usual em- pirical norm encoun tered, for example, in the con text of nonparametric re- gression in repro ducing kerne l Hilb ert spaces (RKHS). Our results hav e impli- cations to the analysis of M -estimators in mo dels based on ﬁnite-dimensional linear approximation of functions, and also to some related pac king problems. Keywor ds: L 2 appro ximation, Empirical norm, Quadratic functionals, Hilb ert spaces with repro ducing k ernels, Analysis of M -estimators 1. In tro duction Giv en a probabilit y measure P su pp orted on a compact s et X ⊂ R d , consider the function class L 2 ( P ) :=  f : X → R | k f k L 2 ( P ) < ∞  , (1) where k f k L 2 ( P ) := q R X f 2 ( x ) d P ( x ) is the usual L 2 norm 1 deﬁned with re- sp ect to the measure P . It is often of in terest to construct approximations Email addr esses: am ini@e ecs.b erkeley.edu (Arash A. Amini), wainwr ig@st at.berkeley.edu (Martin J. W a inwrigh t) 1 W e als o use L 2 ( X ) or simply L 2 to refer to the space (1), with cor resp onding conv en- tions for its norm. Also, one can take X to b e a co mpact subset of any separ able metric space a nd P a (re gular) Bor el meas ure. Pr eprint submitte d to J ournal of Appr oximation The ory Octob er 19, 2018 to this L 2 norm that are “ﬁnite-dimensional” in nature, a nd to study the qualit y o f approx imatio n ov er the unit ball of some Hilb ert space H t ha t is con tinuously em b edded within L 2 . F or example, in approximation theory and mathematical stat istics, a collection of n design p oin ts in X is often used to deﬁne a surrogate for the L 2 norm. In other settings, one is giv en some or- thonormal basis of L 2 ( P ), and deﬁnes an approx imat io n based on the sum of squares of the ﬁrst n (generalized) F ourier co eﬃcien t s. F or problems of this t yp e, it is of in terest t o gain a precise understanding of the appro ximation accuracy in terms of its dimension n and other problem parameters. The goal of this pap er is to study suc h questions in r easonable generalit y for the case of Hilb ert spaces H . W e let Φ n : H → R n denote a contin uous linear o p erator on the Hilb ert space, whic h acts by mapping any f ∈ H to the n -ve cto r  [Φ n f ] 1 [Φ n f ] 2 · · · [Φ n f ] n  . This op erator deﬁnes the Φ n - semi-norm k f k Φ n := v u u t n X i =1 [Φ n f ] 2 i . (2) In t he sequel, with a minor abuse of terminology , 2 w e refer to k f k Φ n as t he Φ n -norm of f . Our goal is to study ho w w ell k f k Φ n appro ximates k f k L 2 o ve r the unit ball of H as a function of n , and other problem parameters. W e pro vide a n um b er of examples of the sampling op er ator Φ n in Section 2.2. Since the dep ende nce on the parameter n should b e clear, w e frequen tly omit the subscript to simplify notation. In or der to measure the quality of appro ximatio n ov er H , w e consider the quan tity R Φ ( ε ) := sup  k f k 2 L 2 | f ∈ B H , k f k 2 Φ ≤ ε 2  , (3) where B H := { f ∈ H | k f k H ≤ 1 } is the unit ball of H . The goal of this pap er is to obtain sharp upp er b ounds o n R Φ . As discussed in Ap- p endix App endix C, a relativ ely straigh tforward argumen t can b e used to translate suc h upp er bo unds in to low er b ounds on the related quantit y T Φ ( ε ) := inf  k f k 2 Φ | f ∈ B H , k f k 2 L 2 ≥ ε 2  . (4) 2 This can b e justiﬁed by ident ifying f and g if Φ f = Φ g , i.e. co ns idering the quotient H / k er Φ. 2 W e also note tha t , for a complete picture of the relationship b etw een the semi-norm k · k Φ and the L 2 norm, one can also consider the related pair T Φ ( ε ) := sup  k f k 2 Φ | f ∈ B H , k f k 2 L 2 ≤ ε 2  , and (5a) R Φ ( ε ) := inf  k f k 2 L 2 | f ∈ B H , k f k 2 Φ ≥ ε 2  . (5b) Our metho ds are also applicable to these quan tities, but w e limit o ur treat- men t to ( R Φ , T Φ ) so as to k eep the contribution fo cused. Certain sp ecial cases of linear op erato rs Φ, and asso ciated functionals ha ve been studied in past work. In the special case ε = 0, w e hav e R Φ (0) = sup  k f k 2 L 2 | f ∈ B H , Φ( f ) = 0  , a quan tity that corresp o nds to t he squared diameter of B H ∩ Ker(Φ), mea- sured in the L 2 -norm. Quan tities of t his t yp e a re standard in approximation theory (e.g., [1, 2, 3]), for instance in the con text of Kolmog oro v and Gelfa nd widths. Our primar y interest in this pap er is the more general setting with ε > 0, for whic h additional factors are inv olv ed in con trolling R Φ ( ε ). In statistics, there is a literature on t he case in whic h Φ is a sampling op erator, whic h maps eac h function f to a v ector of n samples, and the norm k · k Φ corresp onds to the empirical L 2 -norm deﬁned by these samples. When these samples are chos en randomly , then tec hniques from empirical pro cess the- ory [4] can b e used to relate the tw o terms. As discussed in the sequel, o ur results ha ve conseq uences for this setting of random sampling. As an example of a problem in whic h an upp er b ound on R Φ is useful, let us consider a general linear inv erse problem, in whic h the go a l is to recov er an estimate of the function f ∗ based on the noisy observ ations y i = [Φ f ∗ ] i + w i , i = 1 , . . . , n, where { w i } are zero-mean noise v ariables, a nd f ∗ ∈ B H is unkno wn. An estimate b f can b e o btained b y solving a least-squares problem o v er the unit ball of the Hilb ert space—that is, to solv e the con v ex progra m b f := arg min f ∈ B H n X i =1 ( y i − [Φ f ] i ) 2 . F or suc h estimators, there are fairly standard tec hniques fo r deriving upp er b ounds on the Φ-semi-norm of the deviation b f − f ∗ . Our results in this pap er 3 on R Φ can then b e use d to t r anslate this to a corresp onding upp er b o und on the L 2 -norm of the deviation b f − f ∗ , whic h is often a more natural measure of p erformance. As an example where the dual quan tity T Φ migh t b e helpful, consider the pac king problem for a subset D ⊂ B H of the Hilbert ball. Let M ( ε ; D , k · k L 2 ) b e the ε -pac king n umber of D in k · k L 2 , i.e., the ma ximal num b er of function f 1 , . . . , f M ∈ D suc h that k f i − f j k L 2 ≥ ε f o r all i, j = 1 , . . . , M . Similarly , let M ( ε ; D , k · k Φ ) b e the ε -pac king n umber of D in k · k Φ norm. Now, supp ose that fo r some ﬁxed ε , T Φ ( ε ) > 0. Then, if w e hav e a collection of functions { f 1 , . . . , f M } whic h is an ε -pac king of D in k · k L 2 norm, then the same collection will b e a p T Φ ( ε )-pac king of D in k · k Φ . This implies the follow ing useful relationship b etw een pack ing n umbers M ( ε ; D , k · k L 2 ) ≤ M ( p T Φ ( ε ) ; D , k · k Φ ) . The remainder of this pap er is organized as follows. W e b egin in Section 2 with bac kgro und o n the Hilb ert space set-up, a nd provide v a rious examples of the linear op erators Φ to which our results apply . Section 3 con tains the statemen t of our main result, and illustration of some its consequence s f or diﬀeren t Hilb ert spaces and linear op erato rs. Finally , Section 4 is dev oted to the pro ofs of our results. Notation:. F or any positive integer p , we use S p + to denote the cone of p × p p ositiv e semideﬁnite matr ices. F or A, B ∈ S p + , w e write A  B or B  A to mean A − B ∈ S p + . F or an y square matrix A , let λ min ( A ) and λ max ( A ) denote its minimal and maximal eigen v a lues, respectiv ely . W e will use b o th √ A and A 1 / 2 to denote the symmetric square ro ot of A ∈ S p + . W e will use { x k } = { x k } ∞ k =1 to denote a (countable) sequence of ob jects (e.g. real-num bers and functions). Occasionally w e might denote an n -v ector as { x 1 , . . . , x n } . The con text will determine whe t her the eleme n ts b etw een br a ces are ordered. The sym b ols ℓ 2 = ℓ 2 ( N ) are used to denote the Hilbert sequence space consisting of real-v alued sequenc es equipp ed with the inner pro duct h{ x k } , { y k }i ℓ 2 := P ∞ k =1 x i y i . The corresp onding norm is denoted as k · k ℓ 2 . 2. Bac kground W e b egin with some background on the class of Hilb ert spaces o f in terest in this pap er and then pro ceed to prov ide some examples of t he sampling op erators of in terest. 4 2.1. Hilb ert sp ac es W e consider a class of Hilb ert function spaces contained within L 2 ( X ), and deﬁned as follows . Let { ψ k } ∞ k =1 b e an orthonormal sequence (not nec- essarily a basis) in L 2 ( X ) a nd let σ 1 ≥ σ 2 ≥ σ 3 ≥ · · · > 0 b e a sequence of p ositiv e w eigh ts decreasing to zero. Giv en these tw o ingredien ts, w e can consider the class of functions H := n f ∈ L 2 ( P )    f = ∞ X k =1 √ σ k α k ψ k , for some { α k } ∞ k =1 ∈ ℓ 2 ( N ) o , (6) where the series in (6) is assumed to conv erge in L 2 . (The series con- v erges since P ∞ k =1 ( √ σ k α k ) 2 ≤ σ 1 k{ α k }k ℓ 2 < ∞ .) W e refer to the sequence { α k } ∞ k =1 ∈ ℓ 2 as the represen ta t iv e of f . Note that this represen tatio n is unique due to σ k b eing strictly p o sitiv e for all k ∈ N . If f and g are t wo mem b ers of H , say with a sso ciated represen tative s α = { α k } ∞ k =1 and β = { β k } ∞ k =1 , then w e can deﬁne the inner pro duct h f , g i H := ∞ X k =1 α k β k = h α, β i ℓ 2 . (7) With this choice of inner pr o duct, it can b e veriﬁed that the space H is a Hilb ert space. (In fact, H inherits all the required prop erties directly from ℓ 2 .) F or future reference, w e note that for t w o functions f , g ∈ H with asso ciated represen tativ es α , β ∈ ℓ 2 , their L 2 -based inner pro duct is g iv en b y 3 h f , g i L 2 = P ∞ k =1 σ k α k β k . W e note t hat eac h ψ k is in H , as it is represen ted b y a sequence with a single nonzero elemen t, namely , the k -th elemen t whic h is equal to σ − 1 / 2 k . It follo ws fro m (7 ) that h √ σ k ψ k , √ σ j ψ j i H = δ k j . That is, { √ σ k ψ k } is an or- thonormal sequence in H . No w, let f ∈ H b e represen ted by α ∈ ℓ 2 . W e claim that the series in (6) also con v erges in H norm. In particular, P N k =1 √ σ k α k ψ k is in H , as it is represen ted by the sequenc e { α 1 , . . . , α N , 0 , 0 , . . . } ∈ ℓ 2 . It follo ws from ( 7) that k f − P N k =1 √ σ k α k ψ k k H = P ∞ k = N +1 α 2 k whic h conv erges to 0 as N → ∞ . Th us, { √ σ k ψ k } is in fact an orthonormal basis for H . 3 In pa rticular, for f ∈ H , k f k L 2 ≤ √ σ 1 k f k H which s hows tha t the inclusio n H ⊂ L 2 is contin uo us. 5 W e no w turn to a special case of particular imp or t ance to us, na mely the repro ducing ke rnel Hilb ert space (RKHS) of a contin uous ke rnel. Consider a sy mmetric biv ariate function K : X × X → R , where X ⊂ R d is compact 4 . F urthermore, assume K to b e p o sitiv e semideﬁnite and contin uous. Consider the in t egra l op erator I K mapping a function f ∈ L 2 to the function I K f := R K ( · , y ) f ( y ) d P ( y ). As a consequence of Mercer’s theorem [5, 6], I K is a compact op erator fr o m L 2 to C ( X ), the space of con tinuous functions on X equipp ed with t he uniform norm 5 . Let { σ k } b e the sequence of nonzero eigen v alues of I K , whic h are p ositive , can b e ordered in nonincreasing order and con verge to zero. Let { ψ k } b e the corresponding eigenfunctions whic h are con tinuous and can b e taken to b e o rthonormal in L 2 . With these ingredien ts, the space H deﬁned in equation (6) is t he RK HS of the k ernel function K . This can b e v eriﬁed as follo ws. As another consequenc e of the Mercer’s theorem, K has the decomp o sition K ( x, y ) := ∞ X k =1 σ k ψ k ( x ) ψ k ( y ) (8) where the con vergenc e is absolute and uniform (in x a nd y ). In partic- ular, for any ﬁxed y ∈ X , the sequence { √ σ k ψ k ( y ) } is in ℓ 2 . (In f act, P ∞ k =1 ( √ σ k ψ k ( y )) 2 = K ( y , y ) < ∞ .) Hence, K ( · , y ) is in H , as deﬁned in (6), with represen tativ e { √ σ k ψ k ( y ) } . F urthermore, it can be v eriﬁed that the con- v ergence in (6) can b e tak en to b e also p oin t wise 6 . T o b e more sp eciﬁc, fo r an y f ∈ H with represen tativ e { α k } ∞ k =1 ∈ ℓ 2 , w e hav e f ( y ) = P ∞ k =1 √ σ k α k ψ k ( y ), for all y ∈ X . Consequen tly , by deﬁnition of the inner pro duct (7), w e hav e h f , K ( · , y ) i H = ∞ X k =1 α k √ σ k ψ k ( y ) = f ( y ) , so that K ( · , y ) acts as the represen ter of ev aluation. This argumen t sho ws that for an y ﬁxed y ∈ X , the linear functional on H giv en b y f 7→ f ( y ) is 4 Also as sume that P assign p ositive mass to every op en Borel subset of X . 5 In fac t, I K is well deﬁned ov er L 1 ⊃ L 2 and the conclusions about I K hold as a op era tor from L 1 to C ( X ). 6 The conv erg e nce is actua lly ev en strong e r, namely it is absolute and uniform, as can be s een by noting that P m k = n +1 | α k √ σ k ψ k ( y ) | ≤ ( P m k = n +1 α 2 k ) 1 / 2 ( P m k = n +1 σ k ψ 2 k ( y )) 1 / 2 ≤ ( P m k = n +1 α 2 k ) 1 / 2 max y ∈X k ( y , y ). 6 b ounded, since w e ha ve | f ( y ) | =   h f , K ( · , y ) i H   ≤ k f k H k K ( · , y ) k H , hence H is indeed t he RK HS of the ke r nel K . This fact pla ys an imp ortan t role in the sequ el, since some o f t he linear op erators that w e consider inv olv e p oin t wise ev aluation. A commen t regarding the scop e: our general r esults hold for the basic setting in tro duced in equation (6). F or those example s that in v o lve p oint wise ev alua tion, w e assume the more reﬁned case of t he R K HS desc rib ed ab o v e. 2.2. Line ar op er ators, s emi-norms an d exam p les Let Φ : H → R n b e a con tin uous linear op erator , with co- ordinates [Φ f ] i for i = 1 , 2 , . . . , n . It deﬁnes the (semi)-inner pro duct h f , g i Φ := h Φ f , Φ g i R n , (9) whic h induces the semi-norm k · k Φ . By the Riesz represen tation theorem, for eac h i = 1 , . . . , n , there is a function ϕ i ∈ H suc h that [Φ f ] i = h ϕ i , f i H for an y f ∈ H . Let us illustrate the preceding deﬁnitions with some examples. Example 1 (Generalized F ourier truncation) . Recall the orthonormal basis { ψ i } ∞ i =1 underlying the Hilb ert space. Consider the linear o p erator T ψ n 1 : H → R n with co ordinates [ T ψ n 1 f ] i := h ψ i , f i L 2 , for i = 1 , 2 , . . . , n . (10) W e refer to this op erator as the (gener alize d) F o urier trunc ation op er ator, since it acts by truncating the (g eneralized) F o ur ier represen tation of f to its ﬁrst n co- ordinates. More precisely , b y construction, if f = P ∞ k =1 √ σ k α k ψ k , then [Φ f ] i = √ σ i α i , for i = 1 , 2 , . . . , n . (11) By deﬁnition o f the Hilb ert inner pro duct, we ha v e α i = h ψ i , f i H , so that w e can write [Φ f ] i = h ϕ i , f i H , where ϕ i := √ σ i ψ i . ♦ 7 Example 2 (Domain sampling) . A collection x n 1 := { x 1 , . . . , x n } of p oin ts in the domain X can b e used to deﬁne the (scaled) sampli n g op er ator S x n 1 : H → R n via S x n 1 f := n − 1 / 2  f ( x 1 ) . . . f ( x n )  , for f ∈ H . (12) As previously discus sed, when H is a repro ducing kerne l Hilb ert space (with k ernel K ), the (scaled) ev aluatio n functional f 7→ n − 1 / 2 f ( x i ) is b ounded, and its Riesz represen tation is giv en by the f unction ϕ i = n − 1 / 2 K ( · , x i ). ♦ Example 3 (W eigh ted domain sampling) . Consider the setting of the pre- vious example. A sligh t v ariation on the sampling op erator (12) is obt a ined b y adding some w eights to the samples W x n 1 ,w n 1 f := n − 1 / 2  w 1 f ( x 1 ) . . . w n f ( x n )  , for f ∈ H . (13) where w n 1 = ( w 1 , . . . , w n ) is chose n suc h that P n k =1 w 2 k = 1. Clearly , ϕ i = n − 1 / 2 w i K ( · , x i ). [As an example of how this mig ht arise, consider approxim ating f ( t ) by P n k =1 f ( x k ) G n ( t, x k ) where { G n ( · , x k ) } is a collection of f unctions in L 2 ( X ) suc h that h G n ( · , x k ) , G n ( · , x j ) i L 2 = n − 1 w 2 k δ k j . Prop er c hoices of { G n ( · , x i ) } migh t pro duce b etter appro ximations to the L 2 norm in the cases whe r e one insists on c ho osing elemen ts of x n 1 to b e uniformly space d, while P in (1) is not a uniform distribution. Another slightly diﬀerent but closely related case is when one appro ximates f 2 ( t ) ov er X = [0 , 1], b y say n − 1 P n − 1 k =1 f 2 ( x k ) W ( n ( t − x k )) for some function W : [ − 1 , 1] → R + and x k = k /n . Again, non-uniform w eigh ts are obtained when P is non uniform.] ♦ 3. Main result and some consequences W e now t ur n to the statemen t of our main result, and the dev elopment of some its conseq uences fo r v arious mo dels. 3.1. Gener al upp er b ounds on R Φ ( ε ) W e no w turn to upp er b ounds o n R Φ ( ε ) whic h w a s deﬁned previously in (3). Our b ounds are stated in terms of a real-v alued function deﬁned as follo ws: for matrices D , M ∈ S p + , L ( t, M , D ) := max  λ max  D − t √ D M √ D  , 0  , for t ≥ 0. (14) 8 Here √ D denotes the matrix square ro o t, v alid for p ositiv e semideﬁnite ma- trices. The upp er b ounds on R Φ ( ε ) inv olv e principal submatrices of certain inﬁnite-dimensional matrices—or equiv a len tly linear op erators on ℓ 2 ( N )— that w e deﬁne here. Let Ψ b e the inﬁnite-dimens iona l matrix with en tries [Ψ ] j k := h ψ j , ψ k i Φ , for j, k = 1 , 2 , . . . , (15) and let Σ = diag { σ 1 , σ 2 , . . . , } b e a diagonal op erato r. F or an y p = 1 , 2 , . . . , w e use Ψ p and Ψ e p to denote the principal submatrices of Ψ on rows and columns indexed b y { 1 , 2 , . . . , p } and { p + 1 , p + 2 , . . . } , resp ective ly . A similar notation will b e used to denote submatrices of Σ . Theorem 1. F or al l ε ≥ 0 , we have: R Φ ( ε ) ≤ inf p ∈ N inf t ≥ 0 n L ( t, Ψ p , Σ p ) + t  ε + q λ max (Σ 1 / 2 e p Ψ e p Σ 1 / 2 e p )  2 + σ p +1 o . (16) Mor e ov e r, for any p ∈ N such that λ min (Ψ p ) > 0 , we have R Φ ( ε ) ≤  1 − σ p +1 σ 1  1 λ min (Ψ p )  ε + q λ max (Σ 1 / 2 e p Ψ e p Σ 1 / 2 e p )  2 + σ p +1 . (17) R emark (a):. These b ounds cannot b e improv ed in g eneral. This is most easily seen in the sp ecial case ε = 0. Setting p = n , bo und (17) implies that R Φ (0) ≤ σ n +1 whenev er Ψ n is strictly p ositiv e deﬁnite and Ψ e n = 0. This b ound is sharp in a “ minimax sense”, meaning that equalit y holds if we tak e the inﬁm um ov er all b ounded linear op erators Φ : H → R n . In particular, it is straigh tfor ward to sho w that inf Φ: H→ R n Φ surjective R Φ (0) = inf Φ: H→ R n Φ surjective sup f ∈ B H  k f k 2 L 2 | Φ f = 0  = σ n +1 , (18) and moreo v er, this inﬁm um is in f a ct a c hiev ed by some linear op erator. Suc h results are kno wn from the g eneral theory o f n - widths f o r Hilb ert spaces (e.g., see Chapter IV in Pinkus [2] and Chapter 3 of [7].) In the more general setting of ε > 0 , there are op era t ors for whic h the b ound (17) is met with equalit y . As a simple illustration, recall the (gen- eralized) F ourier truncation op erator T ψ n 1 from Example 1. First, it can b e 9 Figure 1: Geo metr y of F ourier truncation. The plot shows the s et { ( k f k L 2 , k f k Φ ) : k f k H ≤ 1 } ⊂ R 2 for the case of (gener alized) F ourie r trunca tion op er ator T ψ n 1 . v eriﬁed that h ψ k , ψ j i T ψ n 1 = δ j k for j, k ≤ n and h ψ k , ψ j i T ψ n 1 = 0 otherwise. T aking p = n , w e ha v e Ψ n = I n , that is, the n -by - n iden tit y matrix, and Ψ e n = 0. T aking p = n in (17), it follo ws that for ε 2 ≤ σ 1 , R T ψ n 1 ( ε ) ≤  1 − σ n +1 σ 1  ε 2 + σ n +1 , (19) As sho wn in App endix App endix E, the b ound (19 ) in fact holds with equal- it y . In other w ords, the b o unds of Theorems 1 are tigh t in this case. Also, note that (19) implies R T ψ n 1 (0) ≤ σ n +1 sho wing that the (generalized) F ourier truncation op erator achie ves the minimax b ound of (18) . Fig 1 provide s a geometric in terpretation of these results. R emark (b) : . In general, it migh t b e diﬃcult to obtain a b ound on λ max (Σ 1 / 2 e p Ψ e p Σ 1 / 2 e p ) as it inv olv es the inﬁnite dimensional matrix Ψ e p . One ma y o btain a simple (although not usually sharp) b ound on this quan tity b y noting that for a p os- itiv e semideﬁnite matr ix, the maximal eigen v alue is b o unded b y the trace, that is, λ max  Σ 1 / 2 e p Ψ e p Σ 1 / 2 e p  ≤ tr  Σ 1 / 2 e p Ψ e p Σ 1 / 2 e p  = X k > p σ k [Ψ ] k k . (20) Another relativ ely easy-to- handle upp er b ound is λ max  Σ 1 / 2 e p Ψ e p Σ 1 / 2 e p  ≤ | | | Σ 1 / 2 e p Ψ e p Σ 1 / 2 e p | | | ∞ = sup k > p X r > p √ σ k √ σ r   [Ψ ] k r   . (21) 10 These b ounds can b e used, in com bination with appropriate blo c k partitio n- ing o f Σ 1 / 2 e p Ψ e p Σ 1 / 2 e p , to provid e sharp b o unds on the maximal eigen v alue. Blo c k partitioning is useful due to the f ollo wing: fo r a p ositive semideﬁnite matrix M =  A 1 C C T A 2  , w e ha ve λ max ( M ) ≤ λ max ( A 1 ) + λ max ( A 2 ). W e lea ve the the details on the application of these ideas to examples in Section 3.2. 3.2. So m e il lustr ative examples Theorem 1 has a num b er of concrete consequences for diﬀeren t Hilb ert spaces and linear o p erators, and w e illustrate a few of them in the f o llo wing subsections. 3.2.1. R andom doma in sampling W e b egin by stating a corolla r y of Theorem 1 in applicatio n to random time sampling in a repro ducing kerne l Hilb ert space (RKHS). Recall from equation (12) the time sampling op erator S x n 1 , and assume tha t the sample p oin ts { x 1 , . . . , x n } are dra wn in a n i.i.d. manner according to some distri- bution P on X . Let us further assume that the eigenfunctions ψ k , k ≥ 1 are uniformly b ounded 7 on X , meaning that sup k ≥ 1 sup x ∈X | ψ k ( x ) | ≤ C ψ . (22) Finally , w e assume that k σ k 1 := P ∞ k =1 σ k < ∞ , and that σ pk ≤ C σ σ k σ p , for some p ositiv e constan t C σ and for all large p , (23 ) P k >p m σ k ≤ σ p , for some p ositiv e in teger m a nd for all large p . (24) Let m σ b e the smallest m for whic h (24) ho lds. These conditio ns on { σ k } are satisﬁed, for example, for b oth a p olynomial deca y σ k = O ( k − α ) with α > 1 and an exp onential decay σ k = O ( ρ k ) with ρ ∈ (0 , 1). In part icular, for the p olynomial deca y , using the tail b ound ( B.1) in App endix App endix B, w e can ta ke m σ = ⌈ α α − 1 ⌉ to satisfy (24). F or the exponen t ial deca y , w e can t a k e m σ = 1 for ρ ∈ (0 , 1 2 ) and m σ = 2 for ρ ∈ ( 1 2 , 1) to satisfy (24). Deﬁne the function G n ( ε ) := 1 √ n v u u t ∞ X j =1 min { σ j , ε 2 } , (25) 7 One ca n r eplace sup x ∈X with es sential suprem um with r esp ect to P . 11 as w ell as the critic al r adius r n := inf { ε > 0 : G n ( ε ) ≤ ε 2 } . (26) Corollary 1. Supp ose that r n > 0 and 64 C 2 ψ m σ r 2 n log(2 nr 2 n ) ≤ 1 . Th en for any ε 2 ∈ [ r 2 n , σ 1 ) , we h ave P h R S x n 1 ( ε ) > ( e C ψ + e C σ ) ε 2 i ≤ 2 exp  − 1 64 C 2 ψ r 2 n  , (27) wher e e C ψ := 2(1 + C ψ ) 2 and e C σ := 3(1 + C − 1 ψ ) C σ k σ k 1 + 1 . W e pr ovide the pro of of this corolla ry in App endix App endix A. As a concrete example consider a p olynomial deca y σ k = O ( k − α ) for α > 1, whic h satisﬁes assumptions o n { σ k } . Using the tail b o und (B.1 ) in Ap- p endix App endix B, one can v erify that r 2 n = O ( n − α/ ( α +1) ). Note that, in this case, r 2 n log(2 nr 2 n ) = O ( n − α α +1 log n 1 α +1 ) = O ( n − α α +1 log n ) → 0 , n → ∞ . Hence conditions of Corollary 1 a r e met for suﬃcien tly large n . It follows that for some constan ts C 1 , C 2 and C 3 , w e hav e R S x n 1 ( C 1 n − α 2( α +1) ) ≤ C 2 n − α α +1 with probabilit y 1 − 2 exp( − C 3 n α α +1 ) for suﬃcie n tly large n . 3.2.2. S ob olev kernel Consider the k ernel K ( x, y ) = min( x, y ) deﬁned on X 2 where X = [0 , 1]. The corresponding RKHS is of Sob olev type and can b e expressed as  f ∈ L 2 ( X ) | f is a bsolutely con tinuous, f (0) = 0 and f ′ ∈ L 2 ( X )  . Also consider a uniform domain sampling op erato r S x n 1 , that is, tha t of (12) with x i = i/n, i ≤ n and let P b e uniform (i.e., the Leb esgue measure restricted to [0 , 1]). This setting has the b eneﬁt that man y intere sting quan tities can b e com- puted explicitly , while also hav ing some practical app eal. The follo wing can 12 b e sho wn ab out the eigen-decomposition of the integral op erator I K in tro- duced in Section 2, σ k = h (2 k − 1) π 2 i − 2 , ψ k ( x ) = √ 2 sin  σ − 1 / 2 k x  , k = 1 , 2 , . . . . In particular, the eigen v alues deca y as σ k = O ( k − 2 ). T o compute the Ψ , w e write [Ψ ] k r = h ψ k , ψ r i Φ = 1 n n X ℓ =1 n cos ( k − r ) ℓπ n − cos ( k + r − 1) ℓπ n o . (28) W e no t e that Ψ is p erio dic in k and r with p erio d 2 n . It is easily v eriﬁed that n − 1 P n ℓ =1 cos( q ℓπ /n ) is equal to − 1 fo r o dd v alues of q and zero fo r ev en v a lues, other than q = 0 , ± 2 n, ± 4 n, . . . . It follows that [Ψ ] k r =      1 + 1 n if k − r = 0 , − 1 − 1 n if k + r = 2 n + 1 1 n ( − 1) k − r otherwise , (29) for 1 ≤ k , r ≤ 2 n . Letting I s ∈ R n b e the v ector with en t ries, ( I s ) j = ( − 1) j +1 , j ≤ n , w e observ e that Ψ n = I n + 1 n I s I T s . It f ollo ws that λ min (Ψ n ) = 1. It remains to bo und the terms in (17) inv olving the inﬁnite sub-blo c k Ψ e n . The Ψ mat rix of this example, giv en b y (29), shares certain prop erties with the Ψ obtained in other situations in volving p erio dic eigenfunctions { ψ k } . W e abstract aw a y these prop erties b y in tro ducing a class of p erio dic Ψ matrices. W e call Ψ e n a sp arse p erio dic matrix, if each ro w ( o r column) is p erio dic and in eac h p erio d only a v anishing fraction of elemen ts are large. More precisely , Ψ e n is sp arse p erio dic if there exist p ositiv e in tegers γ a nd η , and p ositiv e constan ts c 1 and c 2 , all indep enden t of n , suc h t ha t each row of Ψ e n is p erio dic with p erio d γ n. a nd f o r any row k , t here exits a subset of elemen ts S k = { ℓ 1 , . . . , ℓ η } ⊂ { 1 , . . . , γ n } suc h that   [Ψ ] k ,n + r   ≤ c 1 , r ∈ S k , (30a)   [Ψ ] k ,n + r   ≤ c 2 n − 1 , r ∈ { 1 , . . . , γ n } \ S k , (30b) The elemen ts of S k could dep end on k , but the cardinality of this set should b e the constan t η , indep enden t of k and n . Also, note that w e are indexing ro ws and columns of Ψ e n b y { n +1 , n +2 , . . . } ; in particular, k ≥ n +1 . F or this class, w e ha v e the fo llo wing whose pro of can b e found in App endix App endix B. 13 (a) (b) Figure 2 : Spar se p erio dic Ψ matrices. Display (a) is a plot o f the N -by- N leading principal submatrix of Ψ for the Sobo lev k ernel ( s, t ) 7→ min { s, t } . Here n = 9 and N = 6 n ; the per io d is 2 n = 18. Display (b) is a the sa me plot for a F ourier- type kernel. The plots exhibit s parse p erio dic patterns as deﬁned in Section 3.2.2. Lemma 1. Assume Ψ e n to b e sp arse p erio dic as deﬁne d a b ove and σ k = O ( k − α ) , α ≥ 2 . Then, (a) for α > 2 , λ max  Σ 1 / 2 e n Ψ e n Σ 1 / 2 e n  = O ( n − α ) , n → ∞ , (b) for α = 2 , λ max  Σ 1 / 2 e n Ψ e n Σ 1 / 2 e n  = O ( n − 2 log n ) , n → ∞ . In particular (29 ) implies that Ψ e n is sparse p erio dic with parameters γ = 2, η = 2, c 1 = 2 and c 2 = 1. Hence, part (b) of Lemma 1 applies. No w, w e can use (17) with p = n to obtain R S x n 1 ( ε ) ≤ 2 ε 2 + O  n − 2 log n  (31) where w e hav e also used ( a + b ) 2 ≤ 2 a 2 + 2 b 2 . 3.2.3. F ourier-typ e kernels In this example, w e consider an RKHS o f functions on X = [0 , 1 ] ⊂ R , generated b y a F ourier-typ e k ernel deﬁned as K ( x, y ) := κ ( x − y ), x, y ∈ [0 , 1], where κ ( x ) = ζ 0 + ∞ X k =1 2 ζ k cos(2 π k x ) , x ∈ [ − 1 , 1] . (32) 14 W e assume that ( ζ k ) is a R + -v alued nonincreasing sequenc e in ℓ 1 , i.e. P k ζ k < ∞ . Th us, the trigonometric series in (32) is absolutely (and uniformly) con v ergent. As for the op erato r Φ, w e consider the uniform time sampling op erator S x n 1 , as in the previous example. That is, the op erator deﬁned in (12) with x i = i/n, i ≤ n . W e tak e P to b e uniform. This setting again has the b eneﬁt o f b eing simple enough to allo w for explicit computations while also practically imp orta n t. One can argue that the eigen-decomp osition of the k ernel in tegra l op erator is giv en b y ψ 1 = ψ ( c ) 0 , ψ 2 k = ψ ( c ) k , ψ 2 k + 1 = ψ ( s ) k , k ≥ 1 (33) σ 1 = ζ 0 , σ 2 k = ζ k , σ 2 k + 1 = ζ k , k ≥ 1 (34) where ψ ( c ) 0 ( x ) := 1, ψ ( c ) k ( x ) := √ 2 cos(2 π k x ) and ψ ( s ) k ( t ) := √ 2 sin(2 π k x ) for k ≥ 1. F o r any in teger k , let ( ( k ) ) n denote k mo dulo n . Also, let k 7→ δ k b e the function deﬁned o ver in tegers whic h is 1 at k = 0 and zero elsewhere. Let ι := √ − 1. Using the identit y n − 1 P n ℓ =1 exp( ι 2 π k ℓ/n ) = δ ( ( k ) ) n , one obtains the follo wing, h ψ ( c ) k , ψ ( c ) j i Φ =  δ ( ( k − j ) ) n + δ ( ( k + j ) ) n   1 √ 2  δ k + δ j , (35a) h ψ ( s ) k , ψ ( s ) j i Φ = δ ( ( k − j ) ) n − δ ( ( k + j ) ) n , (35b) h ψ ( c ) k , ψ ( s ) j i Φ = 0 , v a lid for all j, k ≥ 0. (35c) It follo ws that Ψ n = I n if n is o dd and Ψ n = diag { 1 , 1 , . . . , 1 , 2 } if n is ev en. In particular, λ min (Ψ n ) = 1 for all n ≥ 1. It is also clear that the principal submatrix of Ψ on indices { 2 , 3 , . . . } has p erio dic rows and columns with p erio d 2 n . If follows that Ψ n is sparse p erio dic as deﬁned in Section 3 .2.2 with parameters γ = 2, η = 2, c 1 = 2 and c 2 = 0. Supp ose f o r example that the eigen v alues deca y p o lynomially , say as ζ k = O ( k − α ) fo r α > 2. Then, applying (17) with p = n , in com binatio n with Lemma 1 part (a), w e get R S x n 1 ( ε ) ≤ 2 ε 2 + O ( n − α ) . (36) As another example, consider the exp onen tia l deca y ζ k = ρ k , k ≥ 1 for some ρ ∈ (0 , 1), whic h corresp onds to the Poiss o n k ernel. In this case, the tail sum 15 of { σ k } deca ys a s the sequence itself, namely , P k >n σ k ≤ 2 P k >n ρ k = 2 ρ 1 − ρ ρ k . Hence, w e can simply use the trace b ound (20) together with (17) to obtain R S x n 1 ( ε ) ≤ 2 ε 2 + O ( ρ n ) . (37) 4. Pro of of Theorem 1 W e now turn to the pro o f of our main theorem. Recall f rom Section 2.1 the corresp o ndence b et w een a ny f ∈ H and a sequence α ∈ ℓ 2 ; also, recall the diagona l op erator Σ : ℓ 2 → ℓ 2 deﬁned b y the matrix diag { σ 1 , σ 2 , . . . } . Using the deﬁnition of (15) of the Ψ matrix, we ha v e k f k 2 Φ = h α, Σ 1 / 2 Ψ Σ 1 / 2 α i ℓ 2 , By deﬁnition (6) of the Hilb ert space H , we hav e k f k 2 H = P ∞ k =1 α 2 k and k f k 2 L 2 = P k σ k α 2 k . Letting B ℓ 2 =  α ∈ ℓ 2 | k α k ℓ 2 ≤ 1  b e the unit ba ll in ℓ 2 , w e conclude that R Φ can b e written as R Φ ( ε ) = sup α ∈ B ℓ 2  Q 2 ( α ) | Q Φ ( α ) ≤ ε 2  , (38) where w e hav e deﬁned the quadratic functionals Q 2 ( α ) := h α, Σ α i ℓ 2 , and Q Φ ( α ) := h α, Σ 1 / 2 Ψ Σ 1 / 2 α i ℓ 2 . (39) Also let us deﬁne the symmetric bilinear form B Φ ( α, β ) := h α , Σ 1 / 2 Ψ Σ 1 / 2 β i ℓ 2 , α, β ∈ ℓ 2 , (40) whose diagonal is B Φ ( α, α ) = Q Φ ( α ). W e no w upper b ound R Φ ( ε ) using a truncation argumen t. D eﬁne the set C := { α ∈ B ℓ 2 | Q Φ ( α ) ≤ ε 2 } , (41) corresp onding to the feasible set for the optimization problem (38). F or eac h in teger p = 1 , 2 , . . . , consider the follow ing truncated sequence spaces T p :=  α ∈ ℓ 2 | α i = 0 , for all i > p  , and T ⊥ p :=  α ∈ ℓ 2 | α i = 0 , for all i = 1 , 2 , . . . p  . 16 Note that ℓ 2 is the direct sum of T p and T ⊥ p . Consequen tly , an y ﬁxed α ∈ C can b e decomp osed as α = ξ + γ for some (unique) ξ ∈ T p and γ ∈ T ⊥ p . Since Σ is a diagonal op erator, w e ha ve Q 2 ( α ) = Q 2 ( ξ ) + Q 2 ( γ ) . Moreo v er, since any α ∈ C is feasible f or the optimization pro blem (38), we ha ve Q Φ ( α ) = Q Φ ( ξ ) + 2 B Φ ( ξ , γ ) + Q Φ ( γ ) ≤ ε 2 . (42) Note that sinc e γ ∈ T ⊥ p , it can b e written as γ = (0 p , c ), where 0 p is a v ector of p zero es, a nd c = ( c 1 , c 2 , . . . ) ∈ ℓ 2 . Similarly , w e can write ξ = ( x, 0) where x ∈ R p . Then, eac h of the terms Q Φ ( ξ ), B Φ ( ξ , γ ), Q Φ ( γ ) can b e expressed in terms of blo c k pa rtitions of Σ 1 / 2 Ψ Σ 1 / 2 . F or example, Q Φ ( ξ ) = h x, Ax i R p , Q Φ ( γ ) = h y , D y i ℓ 2 , (43) where A := Σ 1 / 2 p Ψ p Σ 1 / 2 p and D := Σ 1 / 2 e p Ψ e p Σ 1 / 2 e p , in corr esp o ndence with the blo c k partitio ning notatio n of App endix App endix F. W e now apply in- equalit y (F.2) derive d in App endix App endix F. Fix some ρ 2 ∈ (0 , 1) and tak e κ 2 := ρ 2 λ max (Σ 1 / 2 e p Ψ e p Σ 1 / 2 e p ) , (44) so that condition (F.5) is satisﬁed. Then, (F.2) implies Q Φ ( ξ ) + 2 B Φ ( ξ , γ ) + Q Φ ( γ ) ≥ ρ 2 Q Φ ( ξ ) − κ 2 1 − ρ 2 k γ k 2 2 . (45) Com bining (42) and (45), w e obtain Q Φ ( ξ ) ≤ ε 2 ρ 2 + λ max (Σ 1 / 2 e p Ψ e p Σ 1 / 2 e p ) 1 − ρ 2 k γ k 2 2 . (46) W e further note that k γ k 2 2 ≤ k γ k 2 2 + k ξ k 2 2 = k α k 2 2 ≤ 1. It follo ws that Q Φ ( ξ ) ≤ e ε 2 , where e ε 2 := ε 2 ρ 2 + λ max (Σ 1 / 2 e p Ψ e p Σ 1 / 2 e p ) 1 − ρ 2 . (47) 17 Let us deﬁne e C := { ξ ∈ B ℓ 2 ∩ T p | Q Φ ( ξ ) ≤ e ε 2 } . (48) Then, our argumen ts so far sho w that for α ∈ C , Q 2 ( α ) = Q 2 ( ξ ) + Q 2 ( γ ) ≤ sup ξ ∈ e C Q 2 ( ξ ) | {z } S p + sup γ ∈ B ℓ 2 ∩T ⊥ p Q 2 ( γ ) | {z } S ⊥ p . (49) T aking the suprem um ov er α ∈ C yields the upp er b ound R Φ ( ε ) ≤ S p + S ⊥ p . It remains to b ound eac h of the tw o terms on the right-hand side. Begin- ning with the term S ⊥ p and recalling the decomp osition γ = (0 p , c ), w e hav e Q 2 ( γ ) = P ∞ k =1 σ k + p c 2 k , from whic h it f o llo ws that S ⊥ p = sup n ∞ X k =1 σ k + p c 2 k | ∞ X k =1 c 2 k ≤ 1 o = σ p +1 , since { σ k } ∞ k =1 is a nonincreasing sequence b y a ssumption. W e now control the term S p . Recalling the decomp osition ξ = ( x, 0) where x ∈ R p , w e hav e S p = sup ξ ∈ e C Q 2 ( ξ ) = sup  h x, Σ p x i : h x, x i ≤ 1 , h x, Σ 1 / 2 p Ψ p Σ 1 / 2 p x i ≤ e ε 2  = sup h x,x i ≤ 1 inf t ≥ 0  h x, Σ p x i + t  e ε 2 − h x, Σ 1 / 2 p Ψ p Σ 1 / 2 p x i  ( a ) ≤ inf t ≥ 0  sup h x,x i ≤ 1 h x, Σ 1 / 2 p ( I p − t Ψ p )Σ 1 / 2 p x i + t e ε 2  where inequality (a) follow s by Lagr a nge (weak) duality . It is not hard to see that for an y symmetric matrix M , one has sup  h x, M x i : h x, x i ≤ 1  = max  0 , λ max ( M )  . Putting the pieces together and optimizing ov er ρ 2 , noting that inf r ∈ (0 , 1) n a r + b 1 − r o = ( √ a + √ b ) 2 18 for an y a, b > 0, completes the pro of of the b ound (16). W e now pro ve b ound ( 1 7), using the same decomp osition and notation established ab ov e, but writing an upp er b o und on Q 2 ( α )slightly diﬀeren t form (49). In particular, the argument leading to (49), also sho ws that R Φ ( ε ) ≤ s up ξ ∈ T p , γ ∈ T ⊥ p  Q 2 ( ξ ) + Q 2 ( γ ) | ξ + γ ∈ B ℓ 2 , Q Φ ( ξ ) ≤ e ε 2  . (50) Recalling the expression (39) for Q Φ ( ξ ) and noting that Ψ p  λ min (Ψ p ) I p implies A = Σ 1 / 2 p Ψ p Σ 1 / 2 p  λ min (Ψ p )Σ p , w e hav e Q Φ ( ξ ) ≥ λ min (Ψ p ) Q 2 ( ξ ) . (51) No w, since w e are assuming λ min (Ψ p ) > 0, w e hav e R Φ ( ε ) ≤ s up ξ ∈ T p , γ ∈ T ⊥ p n Q 2 ( ξ ) + Q 2 ( γ )    ξ + γ ∈ B ℓ 2 , Q 2 ( ξ ) ≤ e ε 2 λ min (Ψ p ) o . (52) The RHS of the abov e is an instance of the F ourier truncation problem with ε 2 replaced with e ε 2 /λ min (Ψ p ). That problem is w o rk out in detail in Ap- p endix App endix E. In particular, applying equation (E.1) in App endix App endix E with ε 2 c hanged to e ε 2 /λ min (Ψ p ) completes t he pro of of (1 7). Figure 3 provides a graphical represen tation of the geometry of the pro of. 5. Conclusion W e considered the problem of b ounding (squared) L 2 norm of functions in a Hilb ert unit ball, ba sed on restrictions on an o p erator-induced norm acting as a surroga te for the L 2 norm. In particular, giv en tha t f ∈ B H and k f k 2 Φ ≤ ε 2 , o ur results enable us to obtain, b y estimating norms o f certain ﬁnite and inﬁnite dimensional matrices, inequalities o f the form k f k 2 L 2 ≤ c 1 ε 2 + h Φ , H ( σ n ) where { σ n } are the eigen v alues of the op erator em b edding H in L 2 , h Φ , H ( · ) is an increasing function (dep ending o n Φ and H ) and c 1 ≥ 1 is some constant. W e considered examples of operat o rs Φ (unifor m time sampling and F ourier truncation) and Hilbert spaces H (Sob olev, F ourier-type RKHSs) and show ed 19 (a) (b) Figure 3: Geometry of the pro of of (1 7). Display (a) is a plot of the set Q := { ( Q 2 ( α ) , Q Φ ( α )) : k α k ℓ 2 = 1 } ⊂ R 2 . This is a conv ex set as a cons equence o f Hausdorﬀ- T o eplitz theorem on conv exity of the numerical ra ng e and preserv ation o f convexit y under pro jections. Display (b) sho ws the set e Q := conv (0 , Q ), i.e., the con vex hull of { 0 } ∪ Q . Observe that R Φ ( ε ) = sup { x : ( x, y ) ∈ e Q , y ≤ ε 2 } . F or an y ﬁxed r ∈ (0 , 1), the b ound of (17) is a piecewise linea r a pproximation to one side o f e Q as shown in Display (b). that it is p ossible to obtain optimal sc aling h Φ , H ( σ n ) = O ( σ n ) in most of those cases. W e also considered random time sampling, under p olynomial eigen- deca y σ n = O ( n − α ), and eﬀectiv ely sho w ed that h Φ , H ( σ n ) = O ( n − α/ ( α +1) ) (for ε small enough), with high probabilit y as n → ∞ . This last r esult com- plemen ts those on related quan tities obta ined b y tec hniques form empirical pro cess theory , and w e conjecture it to b e sharp. A cknow le dgements AA a nd MJW were par t ially supp o rted b y NSF Gran t CAREER-CCF- 0545862 and AF OSR G r a n t 09NL184 . App endix A. Analysis of random time sampling This section is dev oted to the pro of of Corollary 1 on random time sam- pling in repro ducing ke r nel Hilb ert space s. The pro of is based on an auxiliary result, whic h w e b egin by stating . Fix some p ositive in teger m and deﬁne ν ( ε ) = ν ( ε ; m ) := inf n p : X k >p m σ k ≤ ε 2 o . (A.1) With this notation, w e hav e 20 Lemma 2. Assume ε 2 < σ 1 and 32 C 2 ψ m ν ( ε ) log ν ( ε ) ≤ n . Then, P  R S x n 1 ( ε ) > e C ψ ε 2 + e C σ σ ν ( ε )  ≤ 2 exp  − 1 32 C 2 ψ n ν ( ε )  . (A.2) W e prov e this claim in Section App endix A.2 b elo w. App endix A.1. Pr o of of Cor ol lary 1 T o apply the lemma, recall that w e assum e that there exists m suc h that for all (large) p , one has X k >p m σ k ≤ σ p . (A.3) and w e let m σ b e the smallest suc h m . W e deﬁne µ ( ε ) := inf  p : σ p ≤ ε 2  , (A.4) and note that b y (A.3), w e hav e ν ( ε ; m σ ) ≤ µ ( ε ). Then, Lemma 2 states that as long as ε 2 < σ 1 and 32 C 2 ψ m σ µ ( ε ) log µ ( ε ) ≤ n , w e hav e P  R S x n 1 ( ε ) > ( e C ψ + e C σ ) ε 2  ≤ 2 exp  − 1 32 C 2 ψ n µ ( ε )  . (A.5) No w by the deﬁnition of µ ( ε ), w e hav e σ j > ε 2 for j < µ ( ε ), and hence G 2 n ( ε ) ≥ 1 n X j < µ ( ε ) min { σ j , ε 2 } = µ ( ε ) − 1 n ε 2 ≥ µ ( ε ) 2 n ε 2 , since µ ( ε ) ≥ 2 when ε 2 < σ 1 . One can argue that ε 7→ G n ( ε ) /ε is nonincreas- ing. It follows from deﬁnition ( 2 6) that for ε ≥ r n , w e hav e µ ( ε ) ≤ 2 n  G ( ε ) ε  2 ≤ 2 n  G ( r n ) r n  2 ≤ 2 nr 2 n , whic h completes the pro of of Corollary 1. 21 App endix A.2. Pr o of of L emma 2 F o r ξ ∈ R p , let ξ ⊗ ξ b e the ra nk-o ne op erator o n R p giv en b y η 7→ h ξ , , η i 2 ξ . F or an op erator A on R p , let | | | A | | | 2 denote its usual op erator nor m, | | | A | | | 2 := sup k x k 2 ≤ 1 k Ax k 2 . Recall that for a symme tr ic (i.e., real self-adjoin t) op erator A on R p , | | | A | | | 2 = sup {| λ | : λ an eigen v alue of A } . It follows that | | | A | | | 2 ≤ α is equiv alen t to − αI p  A  α I p . Our approach is to ﬁrst sho w that | | | Ψ p − I p | | | 2 ≤ 1 2 for some prop erly c ho- sen p with high pro babilit y . It then follow s that λ min (Ψ p ) ≥ 1 2 and w e can use b ound (17) for that v alue of p . Then, w e need t o control λ max  Σ 1 / 2 e p Ψ e p Σ 1 / 2 e p  . T o do this, w e further partition Ψ e p in to blocks . In order to hav e a consis tent notation, we lo ok at the whole matrix Ψ and let Ψ ( k ) b e the principal sub- matrix indexed b y { ( k − 1) p + 1 , . . . , ( k − 1) p + p } , for k = 1 , 2 , . . . , p m − 1 . Throughout the pro of, m is assumed to b e a ﬁxed p ositiv e inte g er. Also, let Ψ ( ∞ ) b e the principal submatrix of Ψ indexe d b y { p m + 1 , p m + 2 , . . . } . This prov ides a full partitioning of Ψ for whic h Ψ (1) , . . . , Ψ ( p m − 1 ) and Ψ ( ∞ ) are the diagonal blo c ks, the ﬁrst p m − 1 of whic h are p -by- p matrices and the last a n inﬁnite matrix. T o connect with o ur previous no tations, w e note that Ψ (1) = Ψ p and that Ψ (2) , . . . , Ψ ( p m − 1 ) , Ψ ( ∞ ) are diagonal blo c ks of Ψ e p . Let us also partition the Σ ma t r ix and name its diagonal blo ck s similarly . W e will argue that, in fact, w e ha ve | | | Ψ ( k ) − I p | | | 2 ≤ 1 2 for all k = 1 , . . . , p m − 1 , with high probabilit y . Let A p denote the ev ent on whic h this claim holds. In particular, on ev en t A p , w e hav e Ψ ( k )  3 2 I p for k = 2 , . . . , p m − 1 ; hence, w e can write λ max  Σ 1 / 2 e p Ψ e p Σ 1 / 2 e p  ≤ p m − 1 X k =2 λ max  q Σ ( k ) Ψ ( k ) q Σ ( k )  + λ max  q Σ ( ∞ ) Ψ ( ∞ ) q Σ ( ∞ )  ≤ 3 2 p m − 1 X k =2 λ max  Σ ( k )  + tr  q Σ ( ∞ ) Ψ ( ∞ ) q Σ ( ∞ )  = 3 2 p m − 1 X k =2 σ ( k − 1) p +1 + X k > p m σ k [Ψ ] k k . (A.6) Using assump t ions (23) on the sequence { σ k } , t he ﬁrst sum can b e b ounded as p m − 1 X k =2 σ ( k − 1) p +1 ≤ p m − 1 X k =2 σ ( k − 1) p ≤ p m − 1 X k =2 C σ σ k − 1 σ p ≤ C σ k σ k 1 σ p 22 Using the unifo r m b oundednes s assumption (A.1), we hav e [Ψ] k k = n − 1 P n i =1 ψ 2 k ( x i ) ≤ C 2 ψ . Hence the second sum in (A.6) is b ounded a b ov e b y C 2 ψ P k >p m σ k . W e can no w apply Theorem 1. Assume for the moment that ε 2 ≥ P k >p m σ k so that the righ t-hand side of (A.6) is b ounded ab o v e b y 3 2 C σ k σ k 1 σ p + C 2 ψ ε 2 . Applying b ound (1 7), on ev en t A p , with 8 r = (1 + C ψ ) − 1 , w e get R S x n 1 ( ε 2 ) ≤ 2 n r − 1 ε 2 + (1 − r ) − 1  3 2 C σ k σ k 1 σ p + C 2 ψ ε 2 o + σ p +1 = 2(1 + C ψ ) 2 ε 2 + 3(1 + C − 1 ψ ) C σ k σ k 1 σ p + σ p +1 . ≤ e C ψ ε 2 + e C σ σ p where e C ψ := 2(1 + C ψ ) 2 and e C σ := 3(1 + C − 1 ψ ) C σ k σ k 1 + 1. T o summarize, w e ha v e shown the follo wing Ev en t A p and ε 2 ≥ X k >p m σ k = ⇒ R S x n 1 ( ε 2 ) ≤ e C ψ ε 2 + e C σ σ p . (A.7) It remains to control the probabilit y of A p := T p m − 1 k =1  | | | Ψ ( k ) − I p | | | 2 ≤ 1 2  . W e start with the deviation b o und on Ψ (1) − I p , and then extend by union b ound. W e will use the follo wing lemma whic h follo ws, for example, from the Ahlsw ede-Win ter b ound [8], or from [9]. (See also [10, 11, 12].) Lemma 3. L et ξ 1 , . . . , ξ n b e i.i.d. r a n dom v e ctors in R p with E ξ 1 ⊗ ξ 1 = I p and k ξ 1 k 2 ≤ C p almost sur ely fo r some c onstant C p . Then , for δ ∈ (0 , 1) , P n          n − 1 n X i =1 ξ i ⊗ ξ i − I p          2 > δ o ≤ p exp  − nδ 2 4 C 2 p  . (A.8 ) Recall that for the time sampling op erator, [Φ ψ k ] i = 1 √ n ψ k ( x i ) so that from (15), Ψ k ℓ = 1 n n X i =1 ψ k ( x i ) ψ ℓ ( x i ) 8 W e are using the alternate for m of the bo und ba sed o n ( √ A + √ B ) 2 = inf r ∈ (0 , 1)  Ar − 1 + B (1 − r ) − 1  . 23 Let ξ i := ( ψ k ( x i ) , 1 ≤ k ≤ p ) ∈ R p for i = 1 , . . . , n . Then, { ξ i } satisfy the conditions of Lemma 3. In particular, letting e k denote the k -th standard basis v ector of R p , w e note that h e k , E ( ξ i ⊗ ξ i ) e ℓ i 2 = E h e k , ξ i i 2 h e ℓ , ξ i i 2 = h ψ k , ψ ℓ i L 2 = δ k ℓ and k ξ i k 2 ≤ √ p C ψ , where w e hav e used uniform b oundedness of { ψ k } as in (22). F urthermore, w e hav e Ψ (1) = n − 1 P n i =1 ξ i ⊗ ξ i . Applying Lemma 3 with C p = √ pC ψ yields, P  | | | Ψ (1) − I p | | | 2 > δ  ≤ p exp  − δ 2 4 C 2 ψ n p  . (A.9) Similar b ounds hold fo r Ψ ( k ) , k = 2 , . . . , p m − 1 . Applying the union b o und, w e get P p m − 1 [ k =1  | | | Ψ ( k ) − I p | | | 2 > δ  ≤ exp  m log p − δ 2 4 C 2 ψ n p  . F o r simplicit y , let A = A n,p := n/ (4 C 2 ψ p ). W e imp ose m log p ≤ A 2 δ 2 so that the exp o nent in (A.9) is b o unded ab o v e b y − A 2 δ 2 . F urthermore, for o ur purp ose, it is enough to tak e δ = 1 2 . It follow s tha t P ( A c p ) = P p m − 1 [ k =1  | | | Ψ ( k ) − I p | | | 2 > 1 2  ≤ exp  − 1 32 C 2 ψ n p  , (A.10) if 32 C 2 ψ m p log p ≤ n . Now, by (A.7), under ε 2 ≥ P k >p m σ k , R S x n 1 ( ε 2 ) > e C ψ ε 2 + e C σ σ p implies A c p . Thus , the exp onential b ound in (A.10) holds f or P { R S x n 1 ( ε 2 ) > e C ψ ε 2 + e C σ σ p } under the assumptions. W e are to c ho ose p and the b ound is optimized b y making p as small as p ossible. Hence, w e tak e p to b e ν ( ε ) := inf { p : ε 2 ≥ P k >p m σ k } whic h pro ve s Lemma 2. ( Note t hat, in general, ν ( ε ) take s its v alues in { 0 , 1 , 2 , . . . } . The assumption ε 2 < σ 1 guaran t ees that ν ( ε ) 6 = 0.) App endix B. Pro of of Lemma 1 Assume σ k = C k − α , for some α ≥ 2. First, note the following upper b ound on the tail sum X k >p σ k ≤ C Z ∞ p x − α dx = C 1 ( α ) p 1 − α . (B.1) 24 F urt hermore, from the b ounds (30a) and (30b), w e ha ve, for k ≥ n + 1, [Ψ ] k k ≤ min { c 1 , c 2 } . (B.2) T o simplify notatio n, let us deﬁne I n := { 1 , 2 , . . . , γ n } . Consider the case α > 2. W e will use the ℓ ∞ – ℓ ∞ upp er b o und of (21), with p = n . Fix some k ≥ n + 1. Note that σ k ≤ σ n +1 . Then, recalling the assumptions on Ψ a nd t he deﬁnition of S k , w e hav e X ℓ ≥ n +1 √ σ k √ σ ℓ   [Ψ ] k ,ℓ   ≤ √ σ n +1 ∞ X q =0 γ n X r =1 √ σ n + r + q γ n   [Ψ ] k ,n + r + q γ n   = √ σ n +1 ∞ X q =0 γ n X r =1 √ σ n + r + q γ n   [Ψ ] k ,n + r   ≤ √ σ n +1 ∞ X q =0 n c 1 X r ∈ S k √ σ n + r + q γ n + c 2 n X r ∈ I n \ S k √ σ n + r + q γ n o . (B.3) Using (B.1), the second double sum in (B.3) is b ounded by ∞ X q =0 X r ∈ I n \ S k √ σ n + r + q γ n ≤ X ℓ>n √ σ ℓ ≤ C 2 ( α ) n 1 − α/ 2 . (B.4) Recalling tha t S k ⊂ I n and | S k | = η , t he ﬁrst double sum in (B.3) can b e b ounded as follows ∞ X q =0 X r ∈ S k √ σ n + r + q γ n = √ C ∞ X q =0 X r ∈ S k ( n + r + q γ n ) − α/ 2 ≤ √ C ∞ X q =0 X r ∈ S k ( n + q γ n ) − α/ 2 ≤ √ C η ∞ X q =0 (1 + q γ ) − α/ 2 n − α/ 2 ≤ √ C η  1 + γ − α/ 2 ∞ X q =1 q − α/ 2  n − α/ 2 = C 3 ( α, γ , η ) n − α/ 2 (B.5) 25 where in the last line w e hav e used P ∞ q =1 q − α/ 2 < ∞ due to α/ 2 > 1. Com- bining (B.3 ), ( B.4 ) and (B.5) a nd noting that √ σ n +1 ≤ √ C n − α/ 2 , we obtain X ℓ ≥ n +1 √ σ k √ σ ℓ   [Ψ ] k ,ℓ   ≤ √ C n − α/ 2 n c 1 C 3 ( α, γ , η ) n − α/ 2 + c 2 n C 2 ( α ) n 1 − α/ 2 o = C 4 ( α, η , γ ) n − α . (B.6) T aking suprem um o ve r k ≥ 1 and applying the ℓ ∞ – ℓ ∞ b ound of (21 ) , with p = n , concludes the pro of of part (a). No w, consider the case α = 2. The ab o v e argumen t breaks down in this case because P ∞ q =1 q − α/ 2 do es not con v erge fo r α = 2. A remedy is to further partition t he matrix Σ 1 / 2 e n Ψ e n Σ 1 / 2 e n . Recall that the row s and columns of this matrix are indexed b y { n + 1 , n + 2 , . . . } . Let A b e the principal submatrix indexed b y { n + 1 , n + 2 , . . . , n 2 } a nd D b e the principal submatrix indexed by { n 2 + 1 , n 2 + 2 , . . . } . W e will use a com bination of the b ounds (30a) and (30b), and the w ell-kno wn perturba t ion b ound λ max  A C C T D  ≤ λ max ( A ) + λ max ( D ), to write λ max  Σ 1 / 2 e n Ψ e n Σ 1 / 2 e n  ≤ λ max ( A ) + λ max ( D ) ≤ | | | A | | | ∞ + tr( D ) . (B.7) The second term is b ounded as tr( D ) = X k >n 2 σ k [Ψ ] k k ≤ min { c 1 , c 2 } X k >n 2 σ k = min { c 1 , c 2 } ( n 2 ) 1 − 2 = C 5 ( γ ) n − 2 , (B.8) where w e ha ve used (B.1) and (B.2). T o b o und the ﬁrst term, ﬁx k ∈ { n + 1 , . . . , n 2 } . By an argumen t similar to that of part (a) and noting that γ ≥ 1, henc e γ n 2 ≥ n 2 , w e hav e n 2 X ℓ = n +1 √ σ k √ σ ℓ   [Ψ ] k ,ℓ   ≤ √ σ n +1 n X q =0 γ n X r =1 √ σ n + r + q γ n   [Ψ ] k ,n + r   ≤ √ σ n +1 n X q =0 n c 1 X r ∈ S k √ σ n + r + q γ n + c 2 n X r ∈ I n \ S k √ σ n + r + q γ n o . (B.9) 26 Using γ ≥ 1 again, the se cond double sum in (B.9) is b ounded as n X q =0 X r ∈ I n \ S k √ σ n + r + q γ n ≤ 3 γ n 2 X ℓ = n +1 √ σ ℓ ≤ √ C 3 γ n 2 X ℓ =2 1 ℓ ≤ √ C log (3 γ n 2 ) ≤ C 6 ( γ ) log n, (B.10) for suﬃcien tly larg e n . Note that w e ha ve used the b ound P p ℓ =2 ℓ − 1 ≤ R p 1 x − 1 dx = log p . The ﬁrst double sum in (B.9) is b ounded as follows ∞ X q =0 X r ∈ S k √ σ n + r + q γ n = √ C n X q =0 X r ∈ S k ( n + r + q γ n ) − 1 ≤ √ C η n X q =0 (1 + q γ ) − 1 n − 1 ≤ √ C η  1 + γ − 1 + γ − 1 n X q =2 q − 1  n − 1 = C 7 ( γ , η ) n − 1 log n, (B.11) for n suﬃcien tly lar g e. Com bining (B.9), (B.10) and (B.11) , taking supre- m um ov er k and using the simple b ound √ σ n +1 ≤ √ C n − 1 , w e g et | | | A | | | ∞ ≤ √ C n − 1 n c 1 C 7 ( γ , η ) log n n + c 2 n C 6 ( γ ) log n o = C 8 ( γ , η ) log n n 2 (B.12) whic h in view of (B.8) and (B.7) completes the pro of of part (b). App endix C. Relationship b etw een R Φ ( ε ) and T Φ ( ε ) In this app endix, w e prov e the claim made in Section 1 ab out the relation b et wee n the upp er quan tities R Φ and T Φ and the low er quan t ities T Φ and R Φ . W e only carry o ut the pro of fo r R Φ ; the dual v ersion holds fo r T Φ . T o simplify the ar g umen t, w e lo ok at sligh tly diﬀerent ve rsions of R Φ and T Φ , deﬁned as R ◦ Φ ( ε ) := sup  k f k 2 L 2 : f ∈ B H , k f k 2 Φ < ε 2  , (C.1) T ◦ Φ ( δ ) := inf  k f k 2 Φ : f ∈ B H , k f k 2 L 2 > δ 2  (C.2) 27 and pro v e the follow ing R ◦ Φ − 1 ( δ ) = T ◦ Φ ( δ ) (C.3) where R ◦ Φ − 1 ( δ ) := inf { ε 2 : R ◦ Φ ( ε ) > δ 2 } is a generalized in v erse of R ◦ Φ . T o see (C.3), w e note that R Φ ( ε ) > δ 2 iﬀ there exists f ∈ B H suc h tha t k f k 2 Φ < ε 2 and k f k 2 L 2 > δ 2 . But this last stat emen t is equiv alen t to T ◦ Φ ( δ ) < ε 2 . Hence, R ◦ Φ − 1 ( δ ) = inf { ε 2 : T ◦ Φ ( δ ) < ε 2 } (C.4) whic h prov es (C.3). Using the following lemma, w e can use r elation (C.3) to con v ert upper b ounds on R Φ to low er b ounds on T Φ . Lemma 4. L et t 7→ p ( t ) b e a nonde cr e asing function (deﬁne d o n the r e al line with values in the extende d r e al line.). L et q b e its gener ali z e d inv e rse deﬁne d as q ( s ) := inf { t : p ( t ) > s } . L et r b e a pr op erly invertible (i.e., one -to-one) function such that p ( t ) ≤ r ( t ) , for al l t . Then , (a) q ( p ( t )) ≥ t , fo r al l t , (b) q ( s ) ≥ r − 1 ( s ) , for a l l s . Pr o of . Assume ( a ) do es not hold, that is, inf { α : p ( α ) > p ( t ) } < t . Then, there exists α 0 suc h that p ( α 0 ) > p ( t ) and α 0 < t . But this contradicts p ( t ) b eing nondecreasing. F or part (b), note that ( a ) implies t ≤ q ( p ( t )) ≤ q ( r ( t )), since q is nondecreasing by deﬁnition. Letting t := r − 1 ( s ) and no ting that r ( r − 1 ( s )) = s , b y a ssumption, pro ves (b). Let p = R ◦ Φ , q = T ◦ Φ and r ( t ) = At + B fo r some constan t A > 0. Noting that R ◦ Φ ≤ R Φ and T Φ ( · + γ ) ≥ T ◦ Φ for any γ > 0, w e o btain from Lemma 4 and (C.3) that R Φ ( ε ) ≤ A ε 2 + B = ⇒ T Φ ( δ +) ≥ δ 2 A − B , (C.5) where T Φ ( δ +) denotes the rig ht limit of T Φ as δ 2 . This may b e used to translate an upp er b ound of the fo rm (17) on R Φ to a corresp onding low er b ound on T Φ . 28 App endix D. The 2 × 2 subproblem The follow ing subproblem arises in the pro of of Theorem 1. F ( ε 2 ) := sup n  r s   u 2 0 0 v 2   r s  | {z } =: x ( r,s ) : r 2 + s 2 ≤ 1 ,  r s   a 2 0 0 d 2   r s  | {z } =: y ( r,s ) ≤ ε 2 o , (D.1) where u 2 , v 2 , a 2 and d 2 are give n constan ts and the o pt imizatio n is ov er ( r, s ). Here, w e discuss the solution in some detail; in particular, we pro vide explicit form ulas for F ( ε 2 ). Without loss of generality a ssume u 2 ≥ v 2 . Then, it is clear that F ( ε 2 ) ≤ u 2 and F ( ε 2 ) = u 2 for ε 2 ≥ u 2 . Th us, w e are in terested in what happ ens when ε 2 < u 2 . The pro blem is easily solv ed b y dra wing a picture. Let x ( r , s ) and y ( r , s ) b e as denoted in the last displa y . Consider t he set S :=  x ( r , s ) , y ( r , s )  : r 2 + s 2 ≤ 1 } =  r 2 ( u 2 , a 2 ) + s 2 ( v 2 , d 2 ) + q 2 (0 , 0) : r 2 + s 2 + q 2 = 1  = con v  ( u 2 , a 2 ) , ( v 2 , d 2 ) , (0 , 0 )  . (D.2) That is, S is the con v ex h ull of the three po in ts ( u 2 , a 2 ), ( v 2 , d 2 ) and the origin (0 , 0). Then, tw o (or ma yb e t hree) diﬀeren t pictures arise dep ending on whether a 2 > d 2 (and whether d 2 ≥ v 2 or d 2 < v 2 ) o r a 2 ≤ d 2 ; see Fig. D.4. It follows that w e ha ve tw o (or three) diﬀerent pictures for the f unction ε 2 7→ F ( ε 2 ). In particular, for a 2 > d 2 and d 2 < v 2 , F ( ε 2 ) = v 2 min n ε 2 d 2 , 1 o + ( u 2 − v 2 ) max n 0 , ε 2 − d 2 a 2 − d 2 o , (D .3) for a 2 > d 2 and d 2 ≥ v 2 , F ( ε 2 ) = ε 2 , and for a 2 ≤ d 2 , F ( ε 2 ) = u 2 min n ε 2 a 2 , 1 o . All the equations ab ov e are v alid for ε 2 ∈ [0 , σ 1 ]. 29 Figure D.4: T op plots illustrate the set S as deﬁned in (D.2), in v ario us cases. The b ottom plots ar e the corres po nding ε 2 7→ F ( ε 2 ). App endix E. Details of the F ourier t runcation example Here w e establish the claim tha t the b o und (1 9) holds with equalit y . Recall that for the (generalized) F ourier truncation op erator T ψ n 1 , w e hav e R T ψ n 1 ( ε 2 ) = sup n ∞ X k =1 σ k α 2 k : ∞ X k =1 α 2 k ≤ 1 , n X k =1 σ k α 2 k ≤ ε 2 o Let α = ( tξ , sγ ), where t, s ∈ R , ξ = ( ξ 1 , . . . , ξ n ) ∈ R n , γ = ( γ 1 , γ 2 . . . ) ∈ ℓ 2 and k ξ k 2 = 1 = k γ k 2 . Let u 2 = u 2 ( ξ ) := P n k =1 σ k ξ 2 k and v 2 = v 2 ( γ ) := P k >n σ k γ 2 k . Let us ﬁx ξ and γ for no w and try to optimize ov er t and s . That is, w e lo ok at G ( ε 2 ; ξ , γ ) := sup n t 2 u 2 + s 2 v 2 : t 2 + s 2 ≤ 1 , t 2 u 2 ≤ ε 2 o . This is an instance of t he 2-by - 2 problem (D .1), with a 2 = u 2 and d 2 = 0. Note tha t our assumption that u 2 ≥ v 2 holds in this case, for all ξ and γ , 30 b ecause { σ k } is a nonincreasing seque nce. Hence, w e hav e, for ε 2 ≤ σ 1 , G ( ε 2 ; ξ , γ ) = v 2 + ( u 2 − v 2 ) ε 2 u 2 = v 2 ( γ ) +  1 − v 2 ( γ ) u 2 ( ξ )  ε 2 . No w w e can maximize G ( ε 2 ; ξ , γ ) o ve r ξ and then γ . Note that G is increasing in u 2 . Thus , the maxim um is achiev ed b y selecting u 2 to b e sup k ξ k 2 =1 u 2 ( ξ ) = σ 1 . Th us, sup ξ G ( ε 2 ; ξ , γ ) =  1 − ε 2 σ 1  v 2 ( γ ) + ε 2 . F o r ε 2 < σ 1 , the ab ov e is increasing in v 2 . Hence the maxim um is a c hiev ed b y setting v 2 to b e sup k γ k 2 =1 v 2 ( γ ) = σ n +1 . Hence, for ε 2 ≤ σ 1 R T ψ n 1 ( ε 2 ) := sup ξ , γ G ( ε 2 ; ξ , γ ) =  1 − σ n +1 σ 1  ε 2 + σ n +1 . (E.1) App endix F. An quadratic inequalit y In this app endix, we derive an inequalit y whic h will b e used in the pro of of Theorem 1. Consider a p ositiv e semideﬁnite matrix M (p ossibly inﬁnite- dimensional) partitioned as M =  A C C T D  . Assume that there exists ρ 2 ∈ (0 , 1) and κ 2 > 0 suc h that  A C C T (1 − ρ 2 ) D + κ 2 I   0 . (F.1) Let ( x, y ) b e a v ector partitioned to matc h t he blo c k structure of M . Then w e ha v e the fo llo wing. Lemma 5. Under (F.1), for al l x and y , x T Ax + 2 x T C y + y T D y ≥ ρ 2 x T Ax − κ 2 1 − ρ 2 k y k 2 2 . (F.2) 31 Pr o of . By assumption (F.1), w e hav e  p 1 − ρ 2 x T 1 √ 1 − ρ 2 y T   A C C T (1 − ρ 2 ) D + κ 2 I  p 1 − ρ 2 x 1 √ 1 − ρ 2 y ! ≥ 0 . (F.3) W riting (F.1) as a p erturbation of the original matrix,  A C C T D  +  0 0 0 − ρ 2 D + κ 2 I   0 , (F.4) w e observ e that a suﬃcien t condition for (F.1) to hold is ρ 2 D  κ 2 I . That is, it is suﬃcien t to hav e ρ 2 λ max ( D ) ≤ κ 2 . (F.5) Rewriting (F.1) diﬀeren tly , as  (1 − ρ 2 ) A 0 0 (1 − ρ 2 ) D  +  ρ 2 A C C T κ 2 I   0 , (F.6) w e ﬁnd another suﬃcien t conditio n for (F.1), namely , ρ 2 A − κ − 2 C C T  0. In particular, it is also suﬃcien t to ha v e κ − 2 λ max ( C C T ) ≤ ρ 2 λ min ( A ) . (F.7) References [1] R. DeV ore, Appro ximation of functions, in: Pro c. Symp. Applied Math- ematics, V o l. 36, 1986, pp. 1–20. [2] A. Pinkus, N-Widths in Appro ximation Theory (Ergebnisse Der Math- ematik Und Ihrer Grenzgebiete 3 F olge), Springer, 1985 . [3] A. Pinkus, N-widths and optimal reco v ery , in: Pro c. Symp. Applied Mathematics, V o l. 36, 1986, pp. 51–66. [4] S. A. v an de G eer, Empirical Pro cesses in M-Estimation, Cam bridge Univ ersit y Press, 2 0 00. 32 [5] F. Riesz, B. Sz.-Nagy , F unctional Analysis, Dov er Publications, 1990. [6] D. J. H. Ga r ling, Inequalities: a j ourney in to linear analysis, Cam bridge Univ Pr, 2007. [7] R. V. Ga mkrelidze, D . Newton, V. M. Tikhomiro v, Analysis: Con v ex analysis and approx imation theory, Birkh¨ auser, 1990. [8] R. Ahlsw ede, A. Winter, Strong conv erse for identiﬁcation via quan t um c hannels, IEEE T ransactions on Information Theory 48 (3) (2002) 569 – 579. doi:10.1109/18. 985947 . [9] M. R udelson, Random V ectors in the Isotropic P o si- tion,, Journal of F unctional Analysis 1 6 4 (1) (1999) 60–72. doi:10.1006 /jfan.1998.3384 . [10] R. V ershy nin, In tro duction to the non-asymptotic a nalysis of random matrices, uR L: http://www - p ersonal.umic h.edu/ roman v/pap ers/non- asymptotic-rm t-plain.p df. [11] J. A. T ropp, User-friendly t ail b ounds for sums of random matricesURL: h ttp://a r xiv.org/abs/1004.438 9. [12] A. Wigderson, D. Xiao, Derandomizing the Ahlsw ede-Win ter matrix-v alued Chernoﬀ b ound using p essimistic estimators, and applications, Theory o f Computing 4 (1) (20 08) 5 3 –76. doi:10.4086 /toc.2008.v004a003 . 33

Approximation properties of certain operator-induced norms on Hilbert spaces

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment