On the Complexity of Binary Samples
Consider a class $\mH$ of binary functions $h: X\to\{-1, +1\}$ on a finite interval $X=[0, B]\subset \Real$. Define the {\em sample width} of $h$ on a finite subset (a sample) $S\subset X$ as $\w_S(h) \equiv \min_{x\in S} |\w_h(x)|$, where $\w_h(x) =…
Authors: Joel Ratsaby
On the Complexit y of Binary Samples Jo el Ratsab y Electrical and Electronics Engineering Departmen t Ariel Univ ersity Cen ter of Samaria ISRAEL ratsaby@ariel.ac.il Septem b er 24, 2007 Abstract Consider a class H of binary functions h : X → {− 1 , +1 } on a finite interv al X = [0 , B ] ⊂ I R. Define the sample width of h on a finite subset (a sample) S ⊂ X as ω S ( h ) ≡ min x ∈ S | ω h ( x ) | where ω h ( x ) = h ( x ) max { a ≥ 0 : h ( z ) = h ( x ) , x − a ≤ z ≤ x + a } . Let S ` b e the space of all samples in X of cardinalit y ` and consider sets of wide samples, i.e., hyp ersets which are defined as A β ,h = { S ∈ S ` : ω S ( h ) ≥ β } . Through an application of the Sauer-Shelah result on the density of sets an upp er estimate is obtained on the gro wth function (or trace) of the class { A β ,h : h ∈ H} , β > 0, i.e., on the num b er of p ossible dichotomies obtained by in tersecting all hypersets with a fixed collection of samples S ∈ S ` of cardinalit y m . The estimate is 2 P 2 b B / (2 β ) c i =0 m − ` i . Keyw ords : Binary functions, densit y of sets, VC-dimension AMS Sub ject Classification: 06E30, 68Q32, 68Q25, 03C13, 68R05 1 Ov erview Let B > 0 and define the domain as X = [0 , B ]. In this paper we consider the class H of all binary functions h : X → {− 1 , +1 } which hav e only simple discontin uities, i.e., at an y p oin t x the limits h ( x + ) ≡ lim z → x + h ( z ) from the right and similarly from the left h ( x − ) exist (but are not necessarily equal). A main theme of our recen t work has b een to characterize binary functions based on their b eha vior on a finite subset of X . In ? w e show ed that the problem of learning binary functions from a finite lab eled sample can impro ve the generalization error-b ounds if the learner obtains a h yp othesis which in addition to minimizing the empirical sample-error is also ‘smo oth’ around elements of the sample. This notion of smo othness (used also in ?? ) is based on the simple notion of width of h at x whic h is defined as ω h ( x ) = h ( x ) max { a ≥ 0 : h ( z ) = h ( x ) , x − a ≤ z ≤ x + a } . F or a finite subset (also called sample ) S ⊂ X the sample width of h denoted ω S ( h ) is defined as ω S ( h ) ≡ min x ∈ S | ω h ( x ) | . 1 This definition of width resem bles the notion of sample mar gin of a real-v alued function f (see for instance ? ). W e sa y that a sample S is wide for h if the width ω S ( h ) is large. Wide samples implicitly con tain more side information for instance ab out a learning problem. The curren t pap er aims at estimating the complexit y of the class of wide samples for functions in H . This complexity is related to a notion of description complexit y and kno wing it enables to compute the efficiency of information that is implicit in samples for learning (see ? ). 2 In tro duction F or an y logical expression A denote b y I { A } the indicator function whic h takes the v alue 1 or 0 whenev er the statemen t A is true or false, resp ectively . Let ` b e any fixed p ositiv e in teger and define the space S ` of all samples S ⊂ X of size ` . On S ` consider sets of wide samples, i.e., A β ,h = { S ∈ S ` : ω S ( h ) ≥ β } , β > 0 . W e refer to such sets as hyp ersets . It will b e con venien t to asso ciate with these sets the indicator functions on S ` whic h are denoted as h 0 β ,h ( S ) = I A β ,h ( S ) . These are referred to as hyp er c onc epts and we ma y write h 0 for brevity . F or an y fixed width parameter γ > 0 define the hyp er class H 0 γ = h 0 γ ,h : h ∈ H . (1) In words, H 0 γ consists of all sets of subsets S ⊂ X of cardinality ` on whic h the corresp onding binary functions h are wide b y at least γ . The aim of the pap er is to compute the complexity of the h yp erclass H 0 γ that corresp onds to the class H . Since the domain X is infinite then so is H 0 γ hence one cannot simply measure its cardinalit y . Instead we apply a standard combinatorial measure of the complexity of a family of sets as follo ws: suppose Y is a general domain and G is an infinite class of subsets of Y . F or an y subset S = { y 1 , . . . , y n } ⊂ Y let Γ G ( S ) ≡ |G | S | (2) where G | S = { [ I G ( y 1 ) , . . . , I G ( y n )] : G ∈ G } . The gr owth function (see for instance ? ) is defined as Γ G ( n ) = max { S : S ⊂ Y , | S | = n } Γ G ( S ) . It measures the rate in which the num b er of dichotomies obtained by in tersecting subsets G of G with a finite set S increases as a function of the cardinalit y n of S in the maximal case (it is also called the trace of G in ? ). Since we are interested in h yp ersets as opposed to simple sets G (as ab ov e) then w e consider the trace on a finite collection ζ ⊂ S ` of samples (instead of a finite sample S as ab o ve). It will b e con venien t to define the cardinality of such a collection as the cardinality of the union of its comp onen t sets, i.e., for any given finite collection ζ ⊂ S ` let | ζ | = [ S : S ∈ ζ S (3) 2 and w e use m to denote a possible v alue of | ζ | . As a measure of complexity of H 0 γ w e compute the gro wth as a function of m , i.e. Γ H 0 γ ( m ) = max ζ : ζ ⊂ S ` , | ζ | = m Γ H 0 γ ( ζ ) . 3 Main result Let us state the main result of the pap er. Theorem 1 L et `, m > 0 b e finite inte gers and B > 0 a finite r e al numb er. L et H b e the class of binary functions on [0 , B ] (with only simple disc ontinuities). F or a given width p ar ameter value γ > 0 , the c orr esp onding hyp er class H 0 γ on the sp ac e S ` has a gr owth which is b ounde d as Γ H 0 γ ( m ) ≤ 2 2 b B / (2 γ ) c X i =0 m − ` i . Remark 1 F or m > ` + B /γ , the fol lowing simpler b ound holds Γ H 0 γ ( m ) ≤ 2 eγ ( m − ` ) B B γ . Before pro ving this result we need some additional notation. W e denote b y h a, b i a generalized interv al set of the form [ a, b ], ( a, b ), [ a, b ) or ( a, b ]. F or a set R we write I R ( x ) to represen t the indicator function of the stateme n t x ∈ R . In case of an interv al set R = h a, b i w e write I h a, b i . Pr o of : An y binary function h may b e represented by thresholding a real-v alued function f on X , i.e., h ( x ) = sgn( f ( x )) where for an y a ∈ I R, sgn( a ) = +1 or − 1 if a > 0 or a ≤ 0, resp ectiv ely . The idea is to choose a class F of real-v alued functions f whic h is rich enough (it has to b e infinite since there are infinitely many binary functions on X ) but is as simple as w e can find. This is imp ortant since, as we will sho w, the growth function of H 0 γ is b ounded from ab o ve by the complexity of a class that is a v arian t of F . W e start by constructing such an F . F or a binary function h on X consider the cor- resp onding set sequence { R i } i =1 , 2 ,... whic h satisfies the following prop erties: (a) [0 , B ] = S i =1 , 2 ,... R i and for any i 6 = j , R i ∩ R j = ∅ , (b) h alternates in sign ov er consecutive sets R i , R i +1 , (c) R i is an in terv al set h a, b i with p ossibly a = b (in which case R i = { a } ). Hence h has the follo wing general form h ( x ) = ± X i =1 , 2 ,..., ( − 1) i I R i ( x ) . (4) Th us there are exactly t wo functions h corresp onding uniquely to each sequence of sets R i , i = 1 , 2 , . . . . . Unless explicitly sp ecified, the end p oints of X = [0 , B ] are not considered ro ots of h , i.e., the default b ehavior is that outside X , i.e., x < 0 or x > B , the function ‘con tinues’ with the same v alue it tak es at the endp oin t h (0) or h ( B ), resp ectively . No w, 3 asso ciate with the set sequence R 1 , R 2 , . . . the unique non-decreasing sequence of right- endp oin ts a 1 , a 2 , . . . which define these sets (the sequence ma y hav e up to t wo consecutiv e rep etitions except for 0 and B ) according to R i = h a i − 1 , a i i , i = 1 , 2 , . . . . (5) with the first left end p oint b eing a 0 = 0. Note that differen t choices for h and i (see earlier definition of a generalized interv al h a, b i ) giv e different sets R i and hence differen t functions h . F or instance, supp ose X = [0 , 7] then the following set sequence R 1 = [0 , 2 . 4), R 2 = [2 . 4 , 3 . 6), R 3 = [3 . 6 , 3 . 6] = { 3 . 6 } , R 4 = (3 . 6 , 7] has a corresp onding end-p oint sequence a 1 = 2 . 4 , a 2 = 3 . 6 , a 3 = 3 . 6 , a 4 = 7. Note that a singleton set in tro duces a rep eated v alue in this sequence. As another example consider R 1 = [0 , 0] = { 0 } , R 2 = (0 , 4 . 1), R 3 = [4 . 1 , 7] with a 1 = 0, a 2 = 4 . 1, a 3 = 7. Next, define the corresp onding sequence of midp oin ts µ i = a i + a i +1 2 , i = 1 , 2 , . . . . Define the contin uous real-v alued function f : X → [ − B , B ] that corresp onds to h (via the end-p oin t sequence) as follows: f ( x ) = ± X i =1 , 2 ,... ( − 1) i +1 ( x − a i ) I [ µ i − 1 , µ i ] (6) where we take µ 0 = 0 (see for instance, Figure 1). Clearly , the v alue f ( x ) equals the width 100 200 300 400 500 600 700 800 200 100 0 100 200 2 1 0 1 2 Figure 1: h (solid) and its corresp onding f (dashed) on X = [0 , B ] with B = 800 ω h ( x ). Note that for a fixed sequence of endp oin ts a i , i = 1 , 2 , . . . the function f is inv arian t to the t yp e of interv als R i = h a i − 1 , a i i that h has, for instance, the set sequence [0 , a 1 ), [ a 1 , a 2 ), [ a 2 , a 3 ], ( a 3 , B ] and the sequence [0 , a 1 ], ( a 1 , a 2 ], ( a 2 , a 3 ], ( a 3 , B ] yield different binary functions h but the same width function f . F or conv enience, when h has a finite 4 n umber n of interv al sets R i , then the sum in (4) has an upp er limit of n and we define a n = B . Similarly , the sum in (6) go es up to n − 1 and w e define µ n − 1 = B . Let us denote b y F + = {| f | : f ∈ F } . (7) It follows that the hyperclass H 0 γ ma y b e represented in terms of the class F + as follows: define the h yp ersets A β ,f = { S ∈ S ` : f ( x ) ≥ β , x ∈ S } , β > 0 , f ∈ F + with corresp onding h yp erconcepts f 0 γ ,f = I A β ,f ( S ), let F 0 γ = { f 0 γ ,f : f ∈ F + } and H 0 γ = F 0 γ . (8) Hence, it suffices to compute the gro wth function Γ F 0 γ ( m ). Let us now b egin to analyze the h yp erclass F 0 γ . By definition, F 0 γ is a class of indicator functions of subsets of S ` . Denote by ζ N ⊂ S ` a collection of N suc h subsets. By a gener alize d collection w e will mean a collection of subsets S ⊂ X with cardinality | S | ≤ ` . Henceforth w e fix a v alue m and consider only collections ζ N , suc h that | ζ N | = m (9) where recall the definition of cardinality is according to (3). Let us denote the individual comp onen ts of ζ N b y S ( j ) ∈ S ` , 1 ≤ j ≤ N hence ζ N = { S (1) , . . . , S ( N ) } . The gro wth function may b e expressed as Γ F 0 γ ( m ) ≡ max ζ N ⊂ S ` , | ζ N | = m Γ F 0 γ ( ζ N ) ≡ max ζ N ⊂ S ` , | ζ N | = m n [ f 0 ( S (1) ) , . . . , f 0 ( S ( N ) )] : f 0 ∈ F 0 γ o . (10) Denote by S ( j ) i the i th elemen t of the sample S ( j ) based on the ordering of the elements of S ( j ) (whic h is induced by the ordering on X ). Then Γ F 0 γ ( ζ N ) = I min x ∈ S (1) f ( x ) > γ , . . . , I min x ∈ S ( N ) f ( x ) > γ : f ∈ F + = ` Y j =1 I f ( S (1) j ) > γ , . . . , ` Y j =1 I f ( S ( N ) j ) > γ : f ∈ F + . (11) Order the elements in eac h component of ζ N b y the underlying ordering on X . Then put the sets in lexical ordering starting with the first up to the ` th elemen t. F or instance, supp ose m = 7, N = 3, ` = 4 and ζ 3 = { { 2 , 8 , 9 , 10 } , { 2 , 5 , 8 , 9 } , { 3 , 8 , 10 , 13 }} 5 then the ordered v ersion is {{ 2 , 5 , 8 , 9 } , { 2 , 8 , 9 , 10 } , { 3 , 8 , 10 , 13 }} . F or any x ∈ X let θ γ f ( x ) ≡ I ( f ( x ) > γ ) (12) (w e will sometimes write θ f ( x ) for short). F or an y sample S ( i ) of cardinalit y | S ( i ) | ≥ 1 let e S ( i ) ( f ) = | S ( i ) | Y j =1 θ f ( S ( i ) j ) . Then for ζ N w e denote by v ζ N ( f ) ≡ [ e S (1) ( f ) , . . . , e S ( N ) ( f )] where for brevit y we sometimes write v ( f ). Let V F + ( ζ N ) = { v ζ N ( f ) : f ∈ F + } or simply V ( ζ N ). Then from (11) we hav e Γ F 0 γ ( ζ N ) = V F + ( ζ N ) . (13) Denote b y X 0 the union N [ j =1 S ( j ) = X 0 = { x i } m i =1 ⊂ X (14) and take the elements to b e ordered as x i < x i +1 , 1 ≤ i ≤ m − 1. The dep endence of X 0 on ζ N is left implicit. W e will need the following pro cedure which maps ζ N to a generalized collection. Pro cedure G : Given ζ N c onstruct ζ ˆ N as fol lows: L et ˆ S (1) = S (1) . F or any 2 ≤ i ≤ N , let ˆ S ( i ) = S ( i ) \ i − 1 [ k =1 ˆ S ( k ) . L et ˆ N b e the numb er of non-empty sets ˆ S ( i ) . Note that ˆ N may b e smaller than N since there may b e an element of ζ N whic h is con tained in the union of other elements of ζ N . It is easy to verify by induction that the sets of ζ ˆ N are mutually exclusive and their union equals that of the original sets in ζ N . W e ha ve the following: Claim 1 V F + ( ζ N ) ≤ V F + ( G ( ζ N )) . 6 Pr o of : W e mak e rep etitive use of the follo wing: let A, B ⊂ X 0 b e tw o non-empty sets and let C = B \ A . Then for any f , an y b ∈ { 0 , 1 } , if [ e A ( f ) , e B ( f )] = [ b, 0], then [ e A ( f ) , e C ( f )] ma y b e either [ b, 0] or [ b, 1] since the elemen ts in B which caused the pro duct e B ( f ) to b e zero ma y or may not also b e in C . In the other case if [ e A ( f ) , e B ( f )] = [ b, 1] then [ e A ( f ) , e C ( f )] = [ b, 1]. Hence |{ [ e A ( f ) , e B ( f )] : f ∈ F + }| ≤ |{ [ e A ( f ) , e C ( f )] : f ∈ F + }| . The same argument holds also for m ultiple A 1 , . . . , A k , B and C = B \ S k i =1 A i . Let ζ ˆ N = G ( ζ N ). W e no w apply this to the following: |{ [ e S (1) ( f ) , e S (2) ( f ) , e S (3) ( f ) , . . . , e S ( N ) ( f )] : f ∈ F + }| = e ˆ S (1) ( f ) , e S (2) ( f ) , e S (3) ( f ) , . . . , e S ( N ) ( f ) : f ∈ F + (15) ≤ e ˆ S (1) ( f ) , e ˆ S (2) ( f ) , e S (3) ( f ) , . . . , e S ( N ) ( f ) : f ∈ F + (16) ≤ e ˆ S (1) ( f ) , e ˆ S (2) ( f ) , e ˆ S (3) ( f ) , e S (4) ( f ) . . . , e S ( N ) ( f ) : f ∈ F + (17) ≤ · · · ≤ e ˆ S (1) ( h ) , e ˆ S (2) ( h ) , e ˆ S (3) ( h ) , e ˆ S (4) ( h ) , . . . , e ˆ S ( N ) ( h ) : f ∈ F + (18) where (15) follo ws since using G we hav e ˆ S (1) ≡ S (1) , (16) follo ws by applying the ab o ve with A = ˆ S (1) , B = S (2) and C = ˆ S (2) , (17) follo ws b y letting A 1 = ˆ S (1) , A 2 = ˆ S (2) , B = S (3) , and C = ˆ S (3) . Finally , remo ving those sets ˆ S ( i ) whic h are p ossibly empt y leav es ˆ N -dimensional vectors consisting only of the non-empty sets so (18) b ecomes e ˆ S (1) ( f ) , . . . , e ˆ S ( ˆ N ) ( f ) : f ∈ F + . u t Hence (11) is b ounded from ab o ve as Γ F 0 γ ( ζ N ) ≤ V F + ( G ( ζ N )) . (19) Denote b y N ∗ ≡ m − ` + 1 and define the follo wing pro cedure which maps a generalized collection of sets in X to another. Pro cedure Q : Given a gener alize d c ol le ction ζ N = { S ( i ) } N i =1 , S ( i ) ⊂ X . Construct ζ N ∗ as fol lows: let Y = S N i =2 S ( i ) and let the elements in Y b e or der e d ac c or ding to their or dering on X 0 (we wil l r efer to them as y 1 , y 2 , . . . ). L et S ∗ (1) = S (1) . F or 2 ≤ i ≤ m − ` + 1 , let S ∗ ( i ) = { y i − 1 } . W e now hav e the following: Claim 2 F or any ζ N ⊂ S ` with | ζ N | = m , then V F + ( G ( ζ N )) ≤ V F + ( Q ( G ( ζ N ))) . Pr o of : Let ζ ˜ N ≡ Q ( G ( ζ N )) and as b efore ζ ˆ N = G ( ζ N ). Note that by definition of Pro ce- dure Q , it follows that ζ ˜ N consists of ˜ N = N ∗ non-o verlapping sets, the first ˜ S (1) ha ving cardinalit y ` and ˜ S ( i ) , 2 ≤ i ≤ ˜ N , each ha ving a single distinct elemen t of X 0 . Their union satisfies S ˜ N i =1 ˜ S ( i ) = X 0 . 7 Consider the sets V F + ( ζ ˆ N ), V F + ( ζ ˜ N ) and denote them simply b y ˆ V and ˜ V . F or any ˆ v ∈ ˆ V consider the following subset of F + , B ( ˆ v ) = { f ∈ F + : ˆ v ( f ) = ˆ v } . W e consider tw o types of ˆ v ∈ ˆ V . The first do es not hav e the following prop erty: there exist functions f α , f β ∈ B ( ˆ v ) with θ γ f α ( x ) 6 = θ γ f β ( x ) for at least one element x ∈ X 0 . Denote b y θ γ f ≡ [ θ γ f ( x 1 ) , . . . , θ γ f ( x m )] . Then in this case all f ∈ B ( ˆ v ) ha ve the same θ γ f = ˆ θ , where ˆ θ ∈ { 0 , 1 } m . This implies that e ˜ S (1) ( f ) = e ˆ S (1) ( f ) = ˆ v 1 while for 2 ≤ j ≤ ˜ N w e ha ve e ˜ S ( j ) ( f ) = ˆ θ k ( j ) where k : [ N ∗ ] → [ m ] maps from the index of a (singleton) set ˜ S ( j ) to the index of an elemen t of X 0 and ˆ θ k ( j ) denotes the k ( j ) th comp onen t of ˆ θ . Hence it follows that | V B ( ˆ v ) ( ζ ˜ N ) | = | V B ( ˆ v ) ( ζ ˆ N ) | . Let the second type of ˆ v satisfy the complement condition, namely , there exist functions f α , f β ∈ B ( ˆ v ) with θ γ f α ( x ) 6 = θ γ f β ( x ) for at least one p oin t x ∈ X 0 . If suc h x is an element of ˆ S (1) then the first part of the argumen t ab ov e holds and w e still hav e | V B ( ˆ v ) ( ζ ˜ N ) | = | V B ( ˆ v ) ( ζ ˆ N ) | . If ho wev er there is also such an x in some set ˆ S ( j ) , 2 ≤ j ≤ ˆ N then since the sets ˜ S ( i ) , 2 ≤ i ≤ ˜ N are singletons then there exists some ˜ S ( i ) ⊆ ˆ S ( j ) with e ˜ S ( i ) ( f α ) 6 = e ˜ S ( i ) ( f β ) . Hence for this second t yp e of ˆ v we hav e | V B ( ˆ v ) ( ζ ˜ N ) | ≥ | V B ( ˆ v ) ( ζ ˆ N ) | . (20) Com bining the ab ov e, then (20) holds for an y ˆ v ∈ ˆ V . No w, consider an y tw o distinct ˆ v α , ˆ v β ∈ ˆ V . Clearly , B ( ˆ v α ) T B ( ˆ v β ) = ∅ since ev ery f has a unique ˆ v ( f ). Moreo ver, for an y f a ∈ B ( ˆ v α ) and f b ∈ B ( ˆ v β ) we hav e ˜ v ( f a ) 6 = ˜ v ( f b ) for the follo wing reason: there must exist some set ˆ S ( i ) and a p oint x ∈ ˆ S ( i ) suc h that θ γ f a ( x ) 6 = θ γ f b ( x ) (since ˆ v α 6 = ˆ v β ). If i = 1 then they must differ on ˜ S (1) , i.e., e ˜ S (1) ( f α ) 6 = e ˜ S (1) ( f β ). If 2 ≤ i ≤ ˆ N , then such an x is in some set ˜ S ( j ) ⊆ ˆ S ( i ) where 2 ≤ j ≤ ˜ N and therefore e ˜ S ( j ) ( f α ) 6 = e ˜ S ( j ) ( f β ). Hence no tw o distinct ˆ v α , ˆ v β map to the same ˜ v . W e therefore hav e V F + ( ζ ˆ N ) = X ˆ v ∈ ˆ V | V B ( ˆ v ) ( ζ ˆ N ) | ≤ X ˆ v ∈ ˆ V | V B ( ˆ v ) ( ζ ˜ N ) | (21) = | V F + ( ζ ˜ N ) | 8 where (21) follo ws from (20) which prov es the claim. u t Note that b y construction of Pro cedure Q , the dimensionality of the elements of V F + ( Q ( G ( ζ N ))) is N ∗ , i.e., m − ` + 1, whic h holds for an y ζ N (ev en maximally o verlapping) and X 0 as defined in (9) and (14). Let us denote by ζ N ∗ an y set obtained b y applying Procedure G on an y collection ζ N follo wed by Pro cedure Q , i.e., ζ N ∗ ≡ n S ∗ (1) , S ∗ (2) , . . . , S ∗ ( N ∗ ) o with a set S ∗ (1) ⊂ X 0 of cardinalit y ` and S ∗ ( k ) = { x i k } , where x i k ∈ X 0 \ S ∗ (1) , k = 2 , . . . , N ∗ . Hence w e hav e max ζ N ⊂ S ` , | ζ N | = m Γ F 0 γ ( ζ N ) ≤ max ζ N ⊂ S ` , | ζ N | = m V F + ( Q ( G ( ζ N ))) (22) ≤ max ζ N ∗ : | ζ N ∗ | = m V F + ( ζ N ∗ ) (23) where (22) follo ws from (11), (13) and Claims 1 and 2 while (23) follows by definition of ζ N ∗ . No w, V F + ( ζ N ∗ ) = |{ [ e S ∗ (1) ( f ) , . . . , e S ∗ ( N ∗ ) ( f )] : f ∈ F + }| ≤ 2 |{ [ e S ∗ (2) ( f ) , . . . , e S ∗ ( N ∗ ) ( f )] : f ∈ F + }| (24) where (24) follo ws trivially since e S ∗ (1) ( f ) is binary . So from (23) we hav e max ζ N ⊂ S , | ζ N | = m Γ F 0 γ ( ζ N ) ≤ 2 max ζ N ∗ : | ζ N ∗ | = m |{ [ e S ∗ (2) ( f ) , . . . , e S ∗ ( N ∗ ) ( f )] : f ∈ F + }| ≤ 2 max x 1 ,...,x m − ` ∈ X n [ θ γ f ( x 1 ) , . . . , θ γ f ( x m − ` )] : f ∈ F + o (25) where x 1 , . . . , x m − ` run o ver an y m − ` p oints in X . Define the following infinite class of binary functions on X b y Θ γ F + = { θ γ f ( x ) : f ∈ F + } and for an y finite subset X 00 = { x 1 , . . . , x m − ` } ⊂ X let θ γ f ( X 00 ) = h θ γ f ( x 1 ) , . . . , θ γ f ( x m − ` ) i and Θ γ F + ( X 00 ) = { θ γ f ( X 00 ) : f ∈ F + } . W e pro ceed to b ound | Θ γ F + ( X 00 ) | . The class Θ γ F + is in one-to-one corresp ondence with a class C γ F + of sets C f ⊂ X whic h are defined as C f = { x : θ γ f ( x ) = 1 } , f ∈ F + . 9 W e claim that an y such set C f equals the union of at most K ≡ b B / (2 γ ) c interv als. T o see this, note that based on the general form of f ∈ F + (see (6) and (7)) in order for f ( x ) > γ for ev ery x in an in terv al set I ⊂ X then I m ust b e contained in an in terv al set of the form (5) and of length at least 2 γ . Hence for any f ∈ F + the corresp onding set C f is comprised of no more than K distinct interv als as I . Hence the class C γ F + is a subset of the class C K of all sets that are comprised of the union of at most K subsets of X . A class H is said to shatter A if { h | A : h ∈ H } = 2 k . The V apnik-Cherv onenkis dimension of H , denoted as V C ( H ), is defined as the cardinalit y of the largest set shattered by H . It is easy to show that the V C-dimension of C K is V C( C K ) = 2 K . Hence it follows from the Sauer-Shelah lemma (see ? ) that the growth of C γ F + on an y finite set X 00 ⊂ X of cardinalit y m − ` (see (2)) satisfies Γ C γ F + ( X 00 ) ≤ 2 K X i =0 m − ` i . Since | Θ γ F + ( X 00 ) | = Γ C γ F + ( X 00 ) then from (8) and (25) it follo ws that | Γ H 0 γ ( m ) | ≤ 2 2 b B / (2 γ ) c X i =0 m − ` i whic h prov es the statemen t of the theorem. 10
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment