Near-Optimal Bounds for Parameterized Euclidean k-means

The $k$-means problem is a classic objective for modeling clustering in a metric space. Given a set of points in a metric space, the goal is to find $k$ representative points so as to minimize the sum of the squared distances from each point to its c…

Authors: Vincent Cohen-Addad, Karthik C. S., David Saulpic

Near-Optimal Bounds for P arameterized Euclidean k-means Vincen t Cohen-Addad Go ogle Researc h cohenaddad@google.com Karthik C. S. ∗ Rutgers Univ ersit y karthik.cs@rutgers.edu Da vid Saulpic Univ ersit´ e P aris Cit ´ e, CNRS david.saulpic@irif.fr Chris Sc h wiegelshohn † Aarh us Univ ersity schwiegelshohn@cs.au.dk Abstract The k -means problem is a classic ob jective for mo deling clustering in a metric space. Giv en a set of p oints in a metric space, the goal is to find k representativ e p oints so as to minimize the sum of the squared distances from each point to its closest representativ e. In this work, w e study the approximabilit y of k -means in Euclidean spaces parameterized by the n umber of clusters, k . In seminal works, de la V ega, Karpinski, Keny on, and Rabani [STOC’03] and Kumar, Sab- harw al, and Sen [JACM’10] sho wed how to obtain a (1 + ε )-approximation for high-dimensional Euclidean k -means in time 2 ( k/ε ) O (1) · dn O (1) . In this w ork, we in tro duce a new fine-grained hypothesis called Exp onential Time for Ex- p anders Hyp othesis (XXH) which roughly asserts that there are no non-trivial exponential time appro ximation algorithms for the v ertex co v er problem on near perfect v ertex expanders. As- suming XXH, we close the ab ov e long line of w ork on approximating Euclidean k -means b y sho wing that there is no 2 ( k/ε ) 1 − o (1) · n O (1) time algorithm achieving a (1 + ε )-appro ximation for k -means in Euclidean space. This low er b ound is tigh t as it matc hes the algorithm given by F eldman, Monemizadeh, and Sohler [SoCG’07] whose runtime is 2 e O ( k /ε ) + O ( ndk ). F urthermore, assuming XXH, we show that the seminal O ( n kd +1 ) run time exact algorithm of Inaba, Katoh, and Imai [SoCG’94] for k -means is optimal for small v alues of k . ∗ This work w as supp orted by the National Science F oundation under Grants CCF-2313372 and CCF-2443697, a gran t from the Simons F oundation, Gran t Num b er 825876, Aw ardee Thu D. Nguy en, and partially funded by the Ministry of Education and Science of Bulgaria’s supp ort for INSAIT, Sofia Universit y “St. Kliment Ohridski” as part of the Bulgarian National Roadmap for Researc h Infrastructure. † This w ork was partially supp orted by the Independent Research F und Denmark (DFF) under a Sap ere Aude Researc h Leader gran t No 1051-00106B and by a Go ogle Research Award. 1 In tro duction The k -clustering problem represen ts a fundamental task in data mining and machine learning, pro viding a model for grouping data p oints based on similarity . Giv en a set of points P in a metric space ( X , ∆), the ob jectiv e is to select a set C ⊆ X of k points, referred to as c enters , and the goal is to minimize an ob jective function t ypically defined as the sum of the z -th p ow ers of the distances from eac h p oint p ∈ P to its nearest center in C . This general formulation encompasses several widely studied problems: k -median corresp onds to z = 1, k -means uses z = 2 (minimizing the sum of squared distances), and k -cen ter is when z → ∞ (minimizing the maxim um distance). The algorithmic exploration of k -means, arguably the most popular v arian t, gained significan t traction with the seminal w ork of Llo yd [ Llo82 ]. Since then, the problem has attracted substan tial atten tion across div erse researc h comm unities, including operations researc h, mac hine learning, and theoretical computer science. The computational complexity of k -means problem inherently dep ends on the structure of the underlying metric space ( X, d ) and this pap er fo cuses on the setting where p oints reside in Euclidean space R d . F rom a complexit y theoretic p ersp ective, Euclidean k -means is known to b e NP-Hard even under seemingly restricted conditions, suc h as when the p oints lie in the Euclidean plane ( R 2 ) but k is part of the input [ MS84 , MNV12 ], or when k = 2 but the dimension d is large [ DF09 , AKP24 ]. On the algorithmic side, the b est-known exact algorithm for Euclidean k - means w as prop osed by Inaba, Katoh, and Imai [ IKI94 ], running in time O ( n kd +1 ). Remark ably , this has remained the state-of-the-art exact algorithm for ov er three decades. Giv en the hardness of finding exact solutions, particularly in high dimensions, a significant line of research has fo cused on dev eloping efficient (1 + ε )-approximation algorithms. Early break- throughs b y F ernandez de la V ega, Karpinski, Keny on, and Rabani [ FdlVKKR03 ] and Kumar, Sabharw al, and Sen [ KSS10 ] demonstrated that a (1 + ε )-approximation for high-dimensional Eu- clidean k -means could b e ac hieved in time 2 ( k/ε ) O (1) · dn O (1) . Subsequent improv emen ts, lev eraging tec hniques such as c or esets b y F eldman, Monemizadeh, and Sohler [ FMS07 ], refined the runtime to O ( ndk ) + 2 e O ( k /ε ) . F urther w ork b y Jaisw al, Kumar, and Sen [ JKS14 ], follo w ed b y Jaisw al, Kumar, and Y ada v [ JKY15 ], utilized a simpler approach based on D 2 -sampling to ac hiev e O ( nd · 2 e O ( k /ε ) ) run time (see also [ ABB + 23 , BGI25 ] for a generalized approac h). All these differen t approaches seem to ha ve hit a running-time barrier at 2 k/ε (up to loga- rithmic factors in the exponent). This con trasts sharply with the related k -cen ter problem, where Agarw al and Pro copiuc [ AP02 ] show ed that a (1 + ε )-approximation can b e obtained muc h faster, in time n log k + ( k /ε ) O ( k 1 − 1 /d ) . This run time for k -cen ter is known to b e essentially optimal under the Exp onential Time Hyp othesis (ETH) [ CS22 ]. F urthermore, there exists a close relationship b et ween k -means and the Partial V ertex Cov er (PVC) problem; indeed, many kno wn hard in- stances for k -means are deriv ed from hard PVC instances [ A CKS15 , LSW17 , CK19 ]. Interestingly , Man urangsi [ Man19 ] demonstrated that PVC admits a (1 + ε )-approximation in time ε − O ( k ) n O (1) (where k is the solution size). This implies that the specific k -means instances deriv ed from PVC can b e appro ximated more efficien tly than the runtime of the current best (1 + ε )-approximation algorithms for Euclidean k -means. This discrepancy motiv ates the cen tral question of our work: When parameterized by the n umber of clusters k , is the current exp onential dep endency in k /ε for approximating Euclidean k - means inheren t, signifying a fundamen tal computational gap compared to k -center and PV C-related instances? Or, is it p ossible to devise a significan tly faster approximation algorithm, p otentially 1 ac hieving a runtime closer to those known for related problems and offering a more unified algo- rithmic picture for k -clustering? More concretely: Question 1.1. Is it p ossible to design a (1 + ε ) -appr oximation to k -me ans running in time O ( nd ) + ε − O ( k ) ? A negativ e answer to the ab ov e question w ould also imply progress to ward understanding the hardness of exact algorithms. Curren tly , w e are not aw are of any fine-grained low er b ound that matc hes the algorithm of Inaba et al. [ IKI94 ]. The closest result we kno w of is by Cohen-Addad, de Mesma y , Roten b erg, and Roytman [ CdMRR18 ], who show ed that if the centers m ust b e selected from a prescrib ed set of “candidate centers”, then no exact algorithm with a runtime of n o ( k ) exists for k -median or k -means, ev en when the dimension is as lo w as four. It remains an open problem whether a similar result holds for the classic v ersion where cen ters can b e placed anywhere in R d , and whether an n o ( d ) lo wer bound also applies when k is constant. A negative answer to Question 1.1 would answer those t wo questions: mo dern dimension- reduction [ MMR19 ] and coreset computation [ CSS21 ] reduce the dimension and the num b er of distinct input p oin ts while preserving the clustering cost within a (1 ± ε ) factor. Th us, as w e sho w formally in Section 8 , any low er b ound for appro ximation algorithms translates in to a low er b ound for exact algorithms, making progress tow ard demonstrating the optimality of the algorithm from [ IKI94 ]. Negativ e answers to suc h questions typically arise from fine-gr aine d c omplexity assumptions, suc h as the Exp onen tial Time Hyp othesis (ETH) [ IP01 , IPZ01 ], or the more mo dern Gap Exp o- nen tial Time Hyp othesis (Gap-ETH) [ MR17 , Din16 ]. Indeed these tw o assumptions hav e b een v ery fruitful in explaining the intractabilit y of v arious imp ortan t geometric optimization problems [ Mar08 , LMS11 , dBBK + 20 , KNW21 ]. Ho wev er, Question 1.1 in volv es tw o parameters, k and ε . Supp ose we aim to rule out 2 ( k/ε ) 1 − o (1) · poly( n, d )-time algorithms, then our parameter of interest is the quan tity k /ε , where k and ε are free v ariables constrained only b y the fixed ratio k /ε . T o the b est of our kno wledge, suc h results are not known under ETH or Gap-ETH. In fact, pro ving suc h results en tail several technical c hallenges which w e discuss in Section 2.1 . 1.1 Our Results In this pap er we mak e substan tial progress tow ards answering these questions. W e introduce a h yp othesis morally capturing a Gap Exponential Time Hyp othesis for V ertex Co ver on Near P erfect V ertex Expanders, and we call it E x p onential Time for E x p anders H yp othesis (denoted XXH ). This hypothesis is formally stated in Section 4 , but an informal discussion ab out the statement of this hypothesis is giv en in Section 2.2 . Assuming XXH, w e are able to answer Question 1.1 in the follo wing wa y: Theorem 1.2 (Answ er to Question 1.1 ; informal statemen t) . Assuming XXH, for every β > 0 , ther e is no r andomize d algorithm running in time 2 ( k/ε ) 1 − β · poly ( n, d ) that c an (1 + ε ) -appr oximate the Euclide an k -me ans pr oblem whenever k ≫ 1 /ε . The formal statement of the low er b ound is Theorem 5.1 . This result might appear surprising due to the known close connection b et ween v ertex co ver and k -means. Sp ecifically , a partial vertex cov er can b e approximated within a (1 + ε ) factor 2 in ε − O ( k ) n O (1) time [ Man19 ] (where k is the solution size). This might lead one to exp ect an appro ximation scheme for k -means with similar complexit y . F or example, reductions suc h as those in Cohen-Addad et al. [ CdMRR18 ] and Awasthi et al. [ ACKS15 ] (see also [ LSW17 ]) transform the vertex cov er instance into a k -means instance b y creating a p oint for each edge of the input graph and p ositioning the p oin ts in space such that pair of edges sharing a common vertex are close to each other. The hardness pro of then relies on distinguishing b etw een t wo kinds of instances: (1) Instances deriv ed from graphs admitting a v ertex co v er of size k , i.e., the edge set can b e partitioned to k stars. In this case, the corresp onding point set can be partitioned into k clusters suc h that all p oin ts within the same cluster are close (say distance 1, representing edges cov ered b y the same v ertex from the cov er). (2) Instances derived from graphs where an y set of k vertices lea v es at least a constant fraction (sa y δ fraction) of the edges uncov ered. Consequen tly , in any partitioning of the corresp onding p oin t set in to k clusters, either a constan t fraction of these clusters contain p oints that are far apart (say distance 3, i.e., representing pairs of edges that do not share one of the selected k v ertices) or a few clusters contain a lot of p oints (and most pairs are far apart). The hardness for k -means then follows from the difficulty of distinguishing betw een these tw o cases based on the clustering cost. Sp ecifically , it inv olv es separating instances admitting a low er cost (e.g., n , asso ciated with Case 1) from those necessitating a higher cost (e.g., (1 − δ ) n + 2 δ n = n + δn , associated with Case 2), where δ is related to the minimum fraction of unco v ered edges in the latter case. Ho wev er, since partial v ertex cov er can be appro ximated within a (1 + ε ) factor, for any ε > 0, in time ε − O ( k ) n O (1) , and since the ob jectiv e scales linearly with the num b er of edges not cov ered (i.e., clients that are at distance 3, instead of 1, from their center), this type of instance can b e solv ed in time ε − O ( k ) n O (1) . Therefore, one cannot exp ect to b o ost the lo wer bound running time from the partial v ertex co ver result to 2 ( k/ε ) 1 − o (1) . Th us, to show that the k -means problem require an exponential dep endency in ε , we need to dev elop a no v el reduction framew ork. T o establish the ab o ve conditional low er b ound for the Euclidean k -means problem, we first reduce the v ertex co v er p roblem in the non-parameterized setting to an in termediate graph problem, essen tially in the parameterized setting, but with reduced structure so as to fail standard algorithmic tec hniques for partial k -vertex co ver, while retaining enough structure to embed the graph problem in to the Euclidean k -means problem. W e direct the reader to Section 2 for further details where w e also try to clarify ho w vertex expansion in the vertex cov er problem helps us o v ercome sev eral tec hnical difficulties. Theorem 1.2 implies lo w er-b ounds for exact algorithms as w ell: Corollary 1.3. Assuming XXH, for every β > 0 , ther e is no algorithm that for any k, d solves the Euclide an k -me ans pr oblem in time n ( k √ d ) 1 − β , nor in time n d 1 − β . Note that this result contrasts the running time of k -means clustering with that of k -center, whic h admits an exact algorithm running in time O  n k 1 − 1 /d  [ AP02 ]. The current state of the art algorithm enumerating ov er all V oronoi partitions in time O ( n kd +1 ) b y Inaba, Katoh and Imai [ IKI94 ] is th us a likely candidate for b eing optimal – in particular, it is almost optimal for constant k . 3 1.2 F urther Related W ork Hardness of Clustering. As we mentioned previously , the k -means and k -median problems are NP-hard, ev en when k = 2 (and d is large) [ DF09 , AKP24 ], or when d = 2 (and k is large) [ MS84 , MNV12 ]. When b oth parameters are part of the input, the problems b ecome APX-hard [ GK99 , JMS02 , GI03 , A CKS15 , CKL21 , CKL22 ]. Most tec hniques to show hardness of approximation are based on reducing from co vering problems to clustering problems, for instance through structured instances of max k -co verage or set cov er. Recen t w orks ha ve used differen t approaches: [ CKL21 ] sho wed how to use hardness of some coloring problems to prov e hardness of appro ximation for k -median and k -means in general metric spaces, and [ CKL22 ] fo cused on Euclidean spaces and tried to pinp oint what combinatorial structures allo w for gap-preserving em b eddings to Euclidean space. F or general metrics, the connection b etw een k -clustering and the set cov er problem (or rather max-co verage) has b een kno wn since the fundamental work of Guha and Khuller [ GK99 ], who established the best known hardness of appro ximation bounds. This connection was observ ed again when analyzing the parameterized complexit y of the problem: Cohen-Addad, Gupta, Kumar, Lee and Li [ CGK + 19 ] sho wed how to appro ximate k -median and k -means up to factors 1 + 2 /e and 1 + 8 /e resp ectiv ely and show ed that this is tigh t assuming Gap-ETH (see also [ CL19 , Man20 , ABM + 19 , CHX + 23 , HXDZ22 ]). FPT algorithms via Sk etching. The past decades hav e seen the developmen t of very p o werful sk etching and compression methods that allow reducing the dimension to O (log k /ε − 2 ) [ MMR19 ] and the n umber of distinct input p oints to e O ( k ε − z − 2 ) ( z = 1 for k -median, z = 2 for k -means) via the construction of coresets [ FL11 , BJKW21 , HV20 , CSS21 , CLSS22 ]. P erhaps surprisingly , these b ounds are indep endent of the original input size and dimension and can b e computed in near- linear time, whic h allows for the construction of simple FPT algorithms. Applying the O ( n kd +1 ) algorithm of [ IKI94 ] indeed giv es a complexity of 2 e O ( k /ε 2 ) plus the near-linear time to sketc h the input; naively en umerating all partitions yields a running time of k e O ( k /ε z +2 ) , plus the time to sk etch the input. W e crucially remark that the dep endency on ε cannot b e substan tially improv ed: [ CLSS22 ] sho wed a low er b ound of Ω( k ε − 2 ) for coresets, and [ LN17 ] sho wed the optimality of the dimension reduction. Therefore, one cannot hop e to go b elow 2 Ω( k/ε 2 ) and answer Question 1.1 using only these techniques. Other parameters w ere studied for k -means clustering: most notably , the cost has b een in ves- tigated by F omin, Golov ach and Simono v [ F GS21 ]. They presented a D D p oly( nd ) exact algorithm for k -median, where D is an upper bound on the cost. F rom Contin uous to Discrete Clustering. The other standard technique to design FPT algorithms is to find a small set of candidate centers that con tains a near-optimal solution. This approach was used, for instance, by [ BHI02 ] to obtain the first algorithm running in time 2 ( k/ε ) O (1) d O (1) n log O ( k ) ( n ), b y [ KSS10 ] to improv e the running time, and b y [ BJK18 ] for the capac- itated clustering problem. Appro ximation Algorithms in Euclidean Spaces. T o compute a (1 + ε )-appro ximation in time p olynomial in n and k , any algorithm m ust hav e a running time at least doubly exp onential in 4 d , as the problem is APX-hard in dimension Ω(log n ). The best of these algorithms is from [ CFS21 ], with a near-linear running time of f ( ε, d ) n p olylog n . If one stic ks to algorithms p olynomial in n, k and d , the low er b ounds on the appro ximation ratio are 1 . 06 for k -median and 1 . 015 for k -means, conditioned on P  = NP [ CKL22 ]. The upp er b ounds are still quite far: 2 . 41 for k -median, and 5 . 96 for k -means [ CEMN22 ]. 1.3 Organization of the pap er W e provide the proof o verview of Theorem 1.2 in Section 2 , and then in Section 3 w e provide some notations, definition, and to ols useful for this paper. Then, we presen t in Section 5 our formal proof of Theorem 1.2 based on the h yp othesis XXH defined in Section 4 . Sections 6 and 7 con tain the completeness and soundness analysis of reduction underlying Theorem 1.2 respectively . Finally , in Section 8 we pro ve Corollary 1.3 . 2 Our T ec hniques W e w ould like to no w con vey the conceptual and tec hnical ideas that w ent into proving the lo w er b ound in Theorem 1.2 . As alluded to earlier in this section, hard instances of Euclidean k -means are t ypically constructed from the V ertex Co ver problem, where every edge is mapp ed to a client and the partition of the edge set (which is the client set) b y an optimal v ertex co ver also yields the optimal clustering for the k -means ob jective. 2.1 Motiv ation and T ec hnical Bac kground Curren t Understanding of the Landscap e. Starting from Gap-ETH (for 3-SA T) [ Din16 , MR17 ], it is easy to sho w that there is no 2 o ( n ) time algorithm to 1 + δ appro ximate the V ertex Co ver problem (on sparse graphs) for some small constant δ > 0. By a standard reduction, this implies that Euclidean k -means cannot b e approximated within 1 + δ in time 2 o ( n ) , alb eit when k is linear in n (the n umber of clients). On the other hand, starting from ETH [ IP01 , IPZ01 ], it is easy to sho w (for example follo wing the reduction in [ DF09 ] or [ AKP24 ]) that Euclidean 2-means cannot b e exactly solved in 2 o ( n ) time. Thus, we can show that there is no 2 o ( k/ε ) · p oly ( n ) time algorithm for 1 + ε approximating Euclidean k -means problem, when either (i) k = Ω( n ) and ε = Ω(1), or (ii) k = 2 and ε = 1 / Ω( n ). Exploring Unc harted T erritories. F rom the ab o ve discussion, w e know that if k was Ω( n ) then we cannot obtain a 2 o ( n ) time approximation algorithm, but what if k was appro ximately √ n ? Then algorithmic tec hniques based on coresets yield a (1 + ε ) approximation in time 2 e O ( √ n/ε ) . But is it p ossible to do better? W e can try to answ er this from the low er bound viewpoint. W e can lo ok at the ab o ve mentioned Gap-ETH hard instance, i.e., the setting k = Ω( n ) and ε = Ω(1) and duplicate eac h p oint n times to obtain a point-set with N = n 2 p oin ts, and we ha v e that there is no 2 √ N time algorithm that can constant approximate the ob jective. But w e c annot even rule out the possibility that there is an exact algorithm running in time 2 N 0 . 51 time for this v alue of k . Ideally , w e w ould lik e to be able to rule out algorithms running in time 2 N 0 . 5+ ρ that pro vide a (1 + Ω(1 / N ρ )) appro ximate solution, for every ρ ∈ [0 , 0 . 5]. Thus, the result that we are sho oting for is: 5 R ule out 2 o ( k/ε ) · p oly ( n ) time algorithms, when k /ε is fixe d, i.e., we wan t the lo w er bound to hold on the entire tradeoff curv e b etw een the n umber of clusters k and the accuracy of clustering ε . T o the b est of our knowledge, there are no such results known in fine-grained complexity . Th us, we are motiv ated to develop a new framework to prov e such results. T ec hnical Challenges. One approac h is to start from the lo wer b ound giv en in ( ii ), i.e., when k = 2 and ε = 1 / Ω( n ), and reduce it to a different instance of k - means, where k has increased (sa y to √ n ), but also ε has increased to 1 / √ n (all this with a linear blowup in size). But this requires “gap creation”, a notorious c hallenging task, p oten tially muc h harder than even proving Gap-ETH from ETH! Therefore, we pursue the approach of starting from the lo wer b ound given in ( i ), i.e., when k = Ω( n ) and ε = Ω(1), and reduce it to a different instance of k -means, where k has decreased (sa y to √ n ), and ε has also decreased to 1 / √ n . This is the approach of trading off the gap to reduce the num b er of clusters. A (F ailed) Simple Approac h. Let G = ([ n ] , E ) be a v ertex cov er instance (where | E | := m = O ( n )) which is hard to approximate to 1 + δ factor (for some p ositive constant δ ) under Gap-ETH. Th us, size of an optimal v ertex cov er of G , denoted α G is Ω( n ). Supp ose, our target k -means instance is when k = o ( m ) and ε = Ω( δ k /α G ): then w e ma y s imply lo ok at the embedding where eac h edge { u, v } ∈ E is mapp ed to the p oint (i.e., client), e u + e v ∈ R n , where e i is the standard basis vector whic h is 1 on i th co ordinate, and 0 ev erywhere else. If we w ere asked to cluster this clien t set to α G clusters minimizing the k -means ob jectiv e in Euclidean metric, then the optimal solution w ould simply be the partition based on some optimal vertex co v er of G . Thus, w e could hop e that ev en when asked to cluster the clien ts to k parts, the optimal solution w ould be to first cluster the clien ts in to α G clusters (based on the vertex co ver) and then merge clusters, so as to end up with only k clusters in the end. Ho wev er, since k ≪ α G , we can hav e near-optimal cost from clustering whic h do not corresp ond to any vertex cov er of G . F or example, a typical obstacle is when w e hav e the follo wing clustering: k − 1 clusters each contain a single clien t, and one cluster con tains all the remaining p oin ts. It is entirely p ossible that such clusterings also hav e lo w cost. Em b edding via Color Coding. T o ov ercome the ab ov e issue (of imbalancedness) of clustering, w e introduce a color coding based embedding tec hnique. Given G and a target num b er of clusters k , we first uniformly and independently randomly color each vertex v in G , with a color in [ k ] (and let color of v b e denoted by c v ). Thus eac h edge (consisting of t wo v ertices), also gets (at most) t wo colors. Now, consider the embedding where eac h edge { u, v } ∈ E is mapp ed to the p oint (i.e., clien t), e u + e v + e c u + e c v ∈ R n + k . Consider the clustering of this client set to k parts in the follo wing w ay . Let S ⊂ [ n ] b e an y optimal v ertex co v er of G and S := S 1 ˙ ∪ · · · ˙ ∪ S k b e a partition of the vertices in the v ertex cov er based on the coloring. Then the alleged optimal clusters w ould b e giv en b y the edges cov ered by S 1 , . . . , S k , i.e., cluster i w ould be all edges (i.e., the corresponding p oin ts of these edges) in G cov ered by the v ertices in S i (break ties arbitrarily). The embedding of the colors of the edges, forces the optimal clusters to hav e a dominating color, and since the colors are uniformly spread, the obstacle mentioned in the ab ov e approach (without coloring), would yield high clustering cost and th us can now b e ruled out. 6 That said, there is more serious obstacle that is not addressed: when w e merge clusters in the completeness and soundness case, the graph top ology affects the k -means cost; we elaborate on this next. Need for V ertex Expansion. Consider tw o graphs G 1 and G 2 , b oth on n v ertices and are d - regular. Supp ose G 1 lo oks like a random d regular graph, and thus is a v ery go o d v ertex expander, and for ev ery subset S of vertices of size O ( n/d ) we hav e that the num ber of unique neighbors is ab out | S | · ( d − O (1)) [ V ad12 ]. On the other hand supp ose that G 2 is obtained by first taking n/ ( d − 3) disjoint copies of K d − 3 ,d − 3 and then adding a random 3-regular graph to connect these copies. W e encoun ter the following problem when w e merge clusters as describ ed ab o ve: G 1 b eha ves in the wa y we exp ect it to, whereas the G 2 has a very low cost even if it do esn’t hav e a go o d vertex co ver. This is b ecause if we tak e the d − 3 sized indep enden t set in a K d − 3 ,d − 3 then w e do get a v ery go od vertex co ver of K d − 3 ,d − 3 but in addition if all (or ev en a large fraction of ) the edges of K d − 3 ,d − 3 w ere in the same cluster then each edge has 2 d − 2 other edges that is adjacent to it in the cluster. On the other hand, if w e tak e the edges of an indep endent set in G 1 , then a typical edge has d − 1 other edges incident to it. This mak es the analysis of completeness and soundness imp ossible without knowing more ab out the graph topology . 2.2 New Hyp othesis: Exp onen tial Time for Expanders Hyp othesis Exp onen tial Time for Expanders Hyp othesis (XXH). T o remedy the situation we in tro- duce a w orking hypothesis that the gap v ertex co v er problem cannot b e solv ed in time 2 n 1 − o (1) on random graphs. F or the sak e of k eeping the pro ofs clean (to the exten t possible), we state the h yp othesis in Section 4 in terms of vertex expanders which makes it directly usable. Under this h yp othesis, we use the color co ding em b edding that was described earlier and with a lot of tec hnical effort are able to derive the conclusion given in Theorem 1.2 . Informally , XXH asserts that for some constants ζ , δ , suc h that ζ ≪ δ , no randomized algorithm running in 2 n 1 − o (1) time can take as input a d -regular vertex expander G := ([ n ] , E ) and distinguish b et ween the follo wing: Completeness: There exist n/ 2 vertices that cov ers at least (1 − ζ ) fraction of E . Soundness: Ev ery subset of V of size n/ 2 does not cov er δ fraction of the edges. Plausibilit y of XXH. A k ey observ ation connecting random graphs and expander graphs is that random d -regular graphs are known to b e v ery strong vertex expanders with constant probability (see Theorem 4.2 in [ V ad12 ]). Consequen tly , making progress on understanding the inappro xima- bilit y of the V ertex Cov er problem on vertex expanders is closely related to the hardness of the v ertex cov er problem on random graphs. XXH formalizes this hardness on v ertex expanders and, via our main reduction (Theorem 5.1 ), links it directly to the Euclidean k -means problem. This connection implies w e live in one of three p ossible w orlds. The first is where XXH is false and the Euclidean k -means admits a m uc h faster algorithm (suc h as 2 ( k/ε ) 0 . 99 · poly ( n ) time), and this would yield new algorithms for computing V ertex Cov er on random-lik e graphs. The second is where XXH is false but there are no new faster 7 algorithms for Euclidean k -means, and this w ould force a deeper, computational understanding of v ertex expanders themselv es. The third w orld is where XXH is true, an d th us the curren t Euclidean k -means algorithm is nearly optimal. The reader is directed to Section 4.2 for more discussion on these three p ossibilities. In vestigating the truth of XXH (the third world) leads to natural questions of indep endent in terest. F or instance, Raman ujan graphs are kno wn to b e near-perfect vertex expanders for small sets [ MM21 ], so we ask: do es the V ertex Co ver problem admit a PT AS when the input graph is a Raman ujan graph? Progress on this question w ould directly help us b etter understand XXH. As partial evidence for XXH, in Corollary 4.5 w e sho w that if w e forego the expansion prop ert y , then under the Unique Games Conjecture it is p ossible to sho w from [ AKS11 ] that no p olynomial- time algorithm can tak e as input a d -regular graph G := ([ n ] , E ) and distinguish b etw een the follo wing: Completeness: There exist n/ 2 vertices that cov er at least a (1 − ζ ) fraction of E . Soundness: Ev ery subset of V of size n/ 2 fails to co ver an Ω( √ ζ ) fraction of the edges. Our Message. W e wish to highlight t wo key asp ects of the XXH h yp othesis. First, it funda- men tally acts as a fine-grained complexity assumption regarding the hardness of the V ertex Co v er problem on random instances. Second, it establishes an imp ortan t connection: developing better appro ximation algorithms for Euclidean k -means w ould provide non-trivial approac hes to solving V ertex Co v er on expander graphs, a problem of significan t indep enden t in terest. In addition, ev en if w eaker versions of XXH are true (and prov ed in the future), this w ould imply weak er low er b ounds for (1 + ε )-appro ximating the Euclidean k -means problem (see Remark 5.3 for details). 2.3 Pro of Overview of Theorem 1.2 W e now give an ov erview of the completeness and soundness cases. Recall that w e are giv en a d -regular graph G = ([ n ] , E ), and we ha ve constructed a point e u + e v + e c u + e c v ∈ { 0 , 1 } n + k , for eac h edge ( u, v ) whose end points ha ve colors c u and c v . In the completeness case, we ha ve that there are n/ 2 vertices cov er 1 − ζ fraction of vertices, so our strategy to cluster is straightforw ard: first form n/ 2 clusters, each one corresp onding to a v ertex in the vertex co ver solution, and then identify eac h cluster with a color by lo oking at the color of the common v ertex in each cluster (whic h is a star graph). Then, eac h color class would b e a cluster, and w e hav e k clusters, and we can sho w that the cost is 3 | E | − (1 − 7 ζ ) k d . Ho wev er, our soundness analysis is highly non-trivial, and in v olves sev eral to ols and argumen ts of differen t fla vors. Supp ose w e ha ve a clustering of P whose k -means cost is ab out 3 | E | − (1 − δ 5 ) k d (where ζ ≤ δ 5 / 10), then we first connect the cost of cluster C i (for i ∈ [ k ]) with certain properties of the graph G i in the following w ay: Lemma 2.1 (Informal version of Lemma 7.1 ) . F or every cluster C i we have its c ost is e qual to: 3 | C i | − 1 + ( γ i − κ i )( | C i | − 1) − 1 | C i | · X v ∈ V i d 2 i,v , wher e d i,v is the de gr e e of v in the gr aph G i (induc e d by e dges in C i ), γ i b e the fr action of p airs of 8 e dges in G i that have no c olor in c ommon, and κ i b e the fr action of p airs of e dges that have two c olors in c ommon. Next, w e show that if γ i (fraction of pairs of edges with no color in common) and κ i (fraction of pairs of edges with tw o colors in common) are not to o large, then there is a dominant single color in the cluster C i . Lemma 2.2 (Informal version of Lemma 7.2 ) . If κ i and γ i ar e b ounde d by some smal l c onstants, then ther e is a lar ge fr action of e dges in G i that have the same c olor. Then, w e sho w that w e can id entify a large subcollection of clusters for whic h both κ i and γ i are small. In addition, we assume that the cost of the clustering C 1 , . . . , C k is at most 3 | E | − (1 − δ 5 ) dk . Then, for eac h cluster in this subcollection, we can relate a bound on the sum of the squared degrees of the vertices in the sub collection to the clustering cost app earing in Lemma 2.1 . Lemma 2.3 (Informal version of Lemma 7.3 ) . Ther e is some I ⊆ [ k ] such that for al l i ∈ I , it holds that κ i and γ i ar e smal l and P i ∈ I | E i | ≥ (1 − 38 δ 2 ) · | E | . In addition, we also have for al l i ∈ I : P v ∈ G i d 2 i,v P v ∈ G i d i,v ≥ (1 − δ 3 )2 d. Finally , we sho w that a cost of clustering 3 | E | − (1 − δ 5 ) dk implies that we can construct a set of vertices S ⊆ [ n ] of size slightly more than n/ 2 such that we co ver (1 − 10 δ 1 . 5 ) fraction of the edges, contradicting the soundness assumption of XXH for small enough δ . This step is quite in volv ed, and we skip pro viding more details about it here. 3 Preliminaries Notations W e consider the Euclidean space R d , with the ℓ 2 metric: dist( p, q ) := q P d i =1 ( p i − q i ) 2 . The k -means cost function of P using set of cen ters S is defined as cost( P , S ) := P p ∈ P dist( p, S ) 2 . Definition 3.1 (Contin uous k -means) . Given a set P ⊂ R d , the contin uous k -me ans pr oblem is the task of finding k c enters S in R d that minimize the c ost function cost( P , S ) . The contin uous k -means problem has a surprising (and folklore) formula that expresses the cost of a clustering only in terms of pairwise distances betw een input p oints: Lemma 3.2. [F olklor e] F or a given cluster C ⊂ R d , the optimal c enter is the aver age P p ∈ C p | C | , and the k -me ans c ost is P p 1 ,p 2 ∈ C ∥ p 1 − p 2 ∥ 2 2 | C | . Sometimes the points of P are called clients . A solution is any subset of R d with size k . W e sa y a solution is an α -approximation if its cost is at most α times the minimal cost. W e use the following tw o Chernoff-type bounds, where the latter is suited to partially dependent v ariables: Theorem 3.3 (Chernoff b ounds) . Supp ose X 1 , . . . , X n ar e i.i.d. r andom Bernoul li variables with p ar ameter p , and let µ = np b e the exp e cte d value of their sum. Then the fol lowing b ounds hold: 9 Upp er tail: F or any 0 < ε < 1 − p , Pr " n X i =1 X i ≥ ( p + ε ) n # ≤ exp  − nD ( p + ε ∥ p )  , wher e D ( q ∥ p ) = q ln q p + (1 − q ) ln 1 − q 1 − p . Lo w er tail: F or any 0 < λ < 1 , Pr " n X i =1 X i ≤ (1 − λ ) µ # ≤ exp  − λ 2 µ 2  . 4 Exp onen tial Time for Expanders Hyp othesis In this section, w e formally in troduce the Exp onen tial Time for Expanders Hypothesis whic h is then used in the next section to pro v e conditional low er bounds for the Euclidean k -means problems for small k . 4.1 New Hyp othesis: Exp onen tial Time for Expanders Hyp othesis W e first define the notion of vertex expansion relev ant to this submission. Definition 4.1 (Small Set V ertex Expanders) . Given c onstants α > 0 , a d -r e gular gr aph G = ( V , E ) on n vertic es, and an inte ger k := k ( n ) , we say that G is a ( k , α ) -smal l set vertex exp ander if for every subset S ⊆ V of size at most n/k we have that |{ u ∈ V : ∃{ u, v } ∈ E and v ∈ S }| ≥ (1 − α ) · d · | S | . No w, we can define our new h yp othesis. Hyp othesis 4.2 (Exp onential Time for Expanders Hyp othesis – XXH( δ, ζ , α )) . Given c onstants δ, ζ , α ∈ (0 , 1) , the XXH ( δ, ζ , α ) assumption states that the fol lowing holds for al l β > 0 : No r andomize d algorithm running in 2 n 1 − β time c an, given as input a d -r e gular (p olylog n, α ) -smal l set vertex exp ander G = ( V = [ n ] , E ) with d = (log n ) L (for some L > 1 ), distinguish b etwe en the fol lowing with pr ob ability 0.9: Completeness: Ther e exist n/ 2 vertic es that c over at le ast (1 − ζ ) fr action of E . Soundness: F or every S ⊆ V of size n/ 2 , ther e ar e at le ast δ · | E | many e dges which ar e not c over e d by any vertex in S . 4.2 Plausibilit y of XXH: Three Possible W orlds F or the sake of the discussion in this subsection, we refer to the ev ent “Clustering Barrier is Breac hed” to simply denote the existence of an algorithm muc h faster than 2 k/ε · p oly( n ) for 1 + ε appro ximating the Euclidean k -means problem. In Section 5 , w e pro ved that assuming XXH, the clustering barrier cannot b e breached. Therefore, w e are living in one of three p ossible worlds. The first world is where XXH is false and the clustering barrier is breac hed. The second w orld is w here 10 XXH is false, but the clustering barrier cannot b e breached. Finally , the third world is where XXH is true (and th us the clustering barrier cannot be breac hed from Theorem 5.1 ). The message we want to c onvey her e is that only one of the ab ove thr e e worlds is p ossible, and r e gar d less of which world is pr oven to b e the one we live in, it wil l she d new light on pr oblems of inter est to the algorithmic c ommunity. 4.2.1 W orld I: XXH is F alse and Clustering Barrier is Breac hed XXH ma y b e view ed as a fine-grained assumption for the v ertex co ver problem on random instances, whic h we elab orate on b elow. Th us, our reduction from XXH instances to Euclidean k -means instances, as giv en in Theorem 5.1 , can b e used to make oracle calls to the efficient algorithm for the Euclidean k -means problem (since the clustering barrier is breac hed in this w orld) to solv e XXH instances efficiently (i.e., in mildly sub-exp onential time). This would imply a separation in this w orld b etw een w orst-case and a verage-case instances of the gap vertex cov er problem (brushing aside many details to mak e a succinct claim). XXH as a Fine-Grained Assumption for Random Instances. Theorem 4.2 in [ V ad12 ] sho ws that for some large univ ersal constan t C , a random d -regular graph is a ( C, 2 d )-small set v ertex expander with a probabilit y of 0.5. Thus, making progress on the inappro ximabilit y of the v ertex co ver problem for v ertex expanders is morally similar to proving the hardness of approximation for v ertex co ver on random d -regular graphs. Or in this world, we w ould make algorithmic progress on understanding vertex co ver on random graphs through (hypothetical) clustering algorithms. 4.2.2 W orld I I: XXH is F alse but Clustering Barrier is not Breached In this world, XXH is false, p ossibly b ecause 1 the vertex cov er problem is computationally easy on small set vertex expanders. F or sp e ctr al expanders, it is sometimes possible to apply Hoffman’s ratio b ound [ Hae21 ] to obtain non-trivial sp eedups when the spectral gap is large. How ever, v ertex expanders, while in tuitiv ely similar to edge expanders (and sp ectral ones), are p o orly understo o d. In fact, explicit constructions with parameters close to those obtained for random graphs were dev elop ed only recently [ HLM + 25 ]. Thus, falsifying XXH motiv ates a b etter understanding of v ertex expanders from the computational viewp oint of optimization problems. Nev ertheless, Theorem 11 in [ A W23 ] sho ws that the exact v ertex cov er problem remains as hard on sp ectral expanders as it is on general graphs. 4.2.3 W orld I I I: XXH is T rue The first evidence that XXH might b e true is that the hard instances of vertex cov er constructed b y [ DS02 ] are essentially built from random lab el co ver instances whic h hav e strong expansion prop erties after PCP comp osition. Moreo v er, XXH only promises expansion for small sets (sub- p olynomial size sets), and th us it is unlikely that suc h a local structure can be algorithmically used. 1 W e are not addressing here the concern that XXH migh t be false in this w orld b ecause of the setting of the parameters. 11 Also, it can be formally argued that on small sets, Raman ujan graphs are near-p erfect vertex expanders [ MM21 ]. Therefore, as a wa y to prov e XXH, one can first ask whether the vertex co ver problem admits a PT AS on Ramanujan graphs. It is p ossible that the answer to this question is negativ e, although there are no techniques to handle such questions, and thus XXH op ens this new line of exploration. 4.3 Small Progress on XXH under Unique Games Conjecture In this subsection, w e sho w that without the vertex expansion prop erty , it is p ossible to obtain a w eak version of XXH under the unique games conjecture [ Kho02 ]. Let Φ denote the cumulativ e density function of the standard normal distribution and, for any ρ ∈ [ − 1 , 1] , µ ∈ [0 , 1], let Γ ρ ( µ ) denote Pr[ X ≤ Φ − 1 ( µ ) ∧ Y ≤ Φ − 1 ( µ )] where X, Y are normal random v ariables with means 0, v ariances 1 and co v ariance ρ . The main in termediate result of [ AKS11 ] is the following: Theorem 4.3 (Theorem 1 from [ AKS11 ]) . F or any q ∈ (0 , 1 / 2) and any ε > 0 , it is UG-har d to, given a r e gular gr aph G = ( V , E ) , distinguish b etwe en the fol lowing two c ases. • (Completeness) G c ontains an indep endent set of size at le ast q · | V | . • (Soundness) F or any subset T ⊆ V , the numb er of e dges with b oth endp oint in T is at le ast | E | ·  Γ − q / (1 − q ) ( µ ) − ε  wher e µ = | T | / | V | . This means that, in the completeness case, there is a v ertex co ver of size at most (1 − q ) · | V | . On the other hand, in the soundness case, if we consider an y subset S ⊆ V of size at most (1 − q ) · | V | , then the n umber of edges not cov ered is exactly the same as the n umber of edges with b oth endp oin ts in ( V \ S ), which is at least (Γ − q / (1 − q ) ( q ) − ε ) · | E | . W e will ev aluate this approximation factor for q = 1 2 − ζ . Claim 4.4. L et q := 1 2 − ζ wher e ζ > 0 . Then for smal l enough ζ , we have: Γ − q / (1 − q ) ( q ) ≥ √ ζ 3 . Assuming the abov e claim (whic h will be prov ed later), w e hav e the follo wing corollary from Theorem 4.3 by setting ε ≪ √ ζ : Corollary 4.5. Ther e exists δ, ζ > 0 with δ 2 > ζ / 33 such that it is UG-har d to, given a r e gular gr aph G = ( V , E ) , distinguish b etwe en the fol lowing two c ases. Completeness: Ther e exists | V | / 2 vertic es that c overs at le ast (1 − ζ ) fr action of E . Soundness: F or every S ⊆ V of size | V | / 2 , ther e ar e at le ast δ · | E | many e dges which ar e not c over e d by any vertex in S . Pr o of. Let G = ( V , E ) b e a d -regular hard instance giv en by Theorem 4.3 plugging in q := 1 2 − ζ 0 . Then, from Claim 4.4 , w e hav e that either G has a vertex cov er of size at most  1 2 + ζ 0  · | V | , or for ev ery subset S ⊆ V of size at most  1 2 + ζ 0  · | V | , the num b er of edges not co vered by S is at least ( √ ζ 0 3 − ε ) · | E | . Moreo ver, in the completeness case, b y an av eraging argument, there is a subset S ∗ ⊆ V of size | V | / 2 suc h that S ∗ co vers at least 1 / 2 1 / 2 + ζ 0 fraction of edges. Note that 1 / 2 1 / 2 + ζ 0 > 1 − 2 ζ 0 . On the other hand, in the soundness case, by setting ε = o ( √ ζ 0 ), we can conclude that for every subset S ⊆ V of size | V | / 2, the num ber of edges not co vered b y S is at least √ ζ 0 4 · | E | . Then, we define δ := √ ζ 0 4 and ζ := 2 ζ 0 to obtain the theorem statement. Note that δ 2 = ζ 32 > ζ 33 . 12 This corollary is weak er than the promise given in XXH in the follo wing w ays. First, and most imp ortan tly , the hard instances of Corollary 4.5 need not b e v ertex expanders. Second, the conditional lo wer b ound under the Unique Games Conjecture is only against polynomial-time algorithms, whereas XXH rules out sub-exponential time algorithms. Bet ween the tw o remarks made ab ov e, the first one is the ma jor obstacle, and this is mainly to do with our state of po or understanding of v ertex expanders. Once the to olkit develops on this topic, it is conceiv able that some additional progress can b e made on XXH. W e close this section, with the pro of of Claim 4.4 . Pr o of of Claim 4.4 . Let a := Φ − 1 (1 / 2 − ζ ). As ζ → 0, w e hav e the expansions: a ∼ − √ 2 π ζ and δ := 1 + ρ = 4 ζ 1 + 2 ζ ∼ 4 ζ . Since a < 0, we ha ve Γ − 1 ( q ) = Pr[ X ≤ a ∧ − X ≤ a ] = 0. W e compute Γ ρ ( q ) by in tegrating the biv ariate Gaussian densit y ϕ 2 ( a, a ; r ) from r = − 1 to ρ . Letting u = 1 + r , the integral is dominated b y the behavior near u = 0: Γ ρ ( q ) = Z ρ − 1 ϕ 2 ( a, a ; r ) , dr ∼ Z 4 ζ 0 1 2 π √ 2 u exp  − a 2 u  , du. Substituting v = a 2 /u and using the asymptotic R ∞ x v − 3 / 2 e − v dv ∼ 2 x − 1 / 2 for small x : Γ ρ ( q ) ∼ | a | 2 π √ 2 Z ∞ a 2 4 ζ v − 3 / 2 e − v , dv ∼ | a | 2 π √ 2 · 2  | a | √ 4 ζ  − 1 = √ 2 ζ π . Finally , since √ 2 π ≈ 0 . 45 > 1 3 , the claim holds. 5 Conditional Lo w er Bound for Euclidean k -means with F ew Clus- ters W e prov e in this section our main result, the hardness of appro ximating k -means when parameter- ized by k . The formal theorem is: Theorem 5.1 (Fine-Grained Hardness for Approximation of Euclidean k -means from XXH) . Sup- p ose XXH ( δ, ζ , α ) is true for some c onstants δ, ζ , α ∈ (0 , 1) satisfying δ ≤ 10 − 3 , ζ ≤ δ 5 / 10 , and α ≤ δ 10 . L et L > 1 b e the c onstant defining the de gr e e d = (log | V | ) L in the XXH hyp othesis. L et e k : N → N b e a non-de cr e asing function and e ε : N → (0 , 1) b e a non-incr e asing function. Define f ( n ) := e k ( n ) e ε ( n ) . Assume that for al l sufficiently lar ge n , f ( n ) satisfies the smo othness c ondition f ( n ) ≤ C · f ( n − 1) for some c onstant C ≥ 1 . F urthermor e, assume the fol lowing asymptotic limits hold: • e k (1) ≥ 1 e ε (1) ≥ 2 /δ 5 . • f ( n ) = ω (log n ) and f ( n ) = o  n log L n  . • e ε ( n ) = o  1 (log n ) ω (1)  . 13 • e k ( n ) · e ε ( n ) = ω  (log n ) 2 L  . Then the fol lowing holds for al l β > 0 : No r andomize d algorithm c an, given as input exactly n p oints in R poly ( n ) and an inte ger e k ( n ) , run in 2 f ( n ) 1 − β · p oly ( n ) time and output a (1 + e ε ( n )) - appr oximate estimate of the e k ( n ) -me ans c ost with pr ob ability at le ast 0 . 9 . In this section, we actually prov e the following theorem, which applies for e k and e ε b ounded in a sp ecific wa y and then sho w that it immediately implies Theorem 5.1 ab o ve. Theorem 5.2. L et δ , ζ , α ∈ (0 , 1) b e some c onstants satisfying δ ≤ 10 − 3 , ζ ≤ δ 5 / 10 , and α ≤ δ 10 . Ther e is a r andomize d algorithm running in line ar time in the input size, which takes as input an inte ger k (wher e k > p | V | · d , k = o ( | V | /d ω (1) ) ) and a d -r e gular (p olylog | V | , α ) -smal l set vertex exp ander G = ( V , E ) (wher e d = (log | V | ) L , for some L > 1 ), and outputs a p oint-set P ⊆ R O ( | V | ) of at most | E | p oints such that with pr ob ability at le ast 0 . 95 , the fol lowing holds: Completeness: If ther e ar e | V | / 2 vertic es c overing at le ast 1 − ζ fr action of E , then ther e is a clustering of P such that the k -me ans c ost is at most 3 | E | − (1 − 7 ζ ) kd . Soundness: If every | V | / 2 vertic es miss at le ast δ fr action of E , then every clustering of P has k -me ans c ost at le ast 3 | E | − (1 − δ 5 ) k d . No w the pro of of T heorem 5.1 follows as an immediate corollary b y a simple duplication tec hnique. Pr o of of The or em 5.1 . Assume, for the sake of con tradiction, that there exists a constant β ∈ (0 , 1) and a randomized algorithm A that, giv en n points and a target cluster coun t e k ( n ), runs in time 2 f ( n ) 1 − β · p oly ( n ) and outputs a (1 + e ε ( n ))-appro ximation to the e k ( n )-means cost with probabilit y at least 0 . 95. W e use A to refute the XXH hypothesis in sub-exponential time. Let G = ( V , E ) b e an instance of XXH on N = | V | v ertices with degree d = (log N ) L . Define C gap := 3 δ 5 − 7 ζ . The condition ζ ≤ δ 5 / 10 ensures δ 5 − 7 ζ ≥ 0 . 3 δ 5 > 0, so C gap is well- defined and strictly p ositive. Since f ( n ) = ω (log n ) div erges, we let n ∗ b e the minimal integer suc h that f ( n ∗ ) ≥ C gap N . By minimalit y , f ( n ∗ − 1) < C gap N . Applying the smo othness condition f ( n ) ≤ C · f ( n − 1), w e obtain C gap N ≤ f ( n ∗ ) < C · C gap N , establishing f ( n ∗ ) = Θ( N ). Let the target n umber of core clusters b e k core := e k ( n ∗ ) − 1. W e v erify that k core satisfies the preconditions of Theorem 5.2 for G . Since f ( n ∗ ) = Θ( N ) and f ( n ) = o ( n/ log L n ), we hav e log N = Θ(log f ( n ∗ )) = O (log n ∗ ). • Lo w er b ound k core > √ N d : W e ha ve k 2 core N = Θ  e k ( n ∗ ) 2 f ( n ∗ )  = Θ  e k ( n ∗ ) e ε ( n ∗ )  . By the theorem’s assumptions, this is ω  (log n ∗ ) 2 L  = ω  (log N ) 2 L  = ω ( d 2 ). F or sufficiently large N , this implies k core > √ N d . • Upp er b ound k core = o  N /d ω (1)  : W e hav e k core N ≤ e k ( n ∗ ) N = Θ  e ε ( n ∗ )  = o  (log n ∗ ) − ω (1)  = o  (log N ) − ω (1)  = o  d − ω (1)  . W e apply Theorem 5.2 to ( G, k core ), pro ducing a p oint set P core ⊂ R O ( N ) of size M ≤ | E | = N d/ 2. Since N = Θ( f ( n ∗ )) and log N = O (log n ∗ ), we hav e M = O ( f ( n ∗ )(log n ∗ ) L ) = o ( n ∗ ). Th us, for sufficien tly large N , M ≪ n ∗ . 14 Because A requires exactly n ∗ p oin ts, we construct a padded instance P new as follows. Let W := ⌊ ( n ∗ − 1) / M ⌋ . Since M = o ( n ∗ ), W ≥ 1. W e insert exactly W identical copies of every p oint in P core in to P new , yielding W M p oints. The remaining R := n ∗ − W M ≥ 1 p oints satisfy R ≤ M . W e in tro duce a new orthogonal basis v ector e new and add R iden tical dumm y p oin ts at co ordinate x ∗ := Λ e new . W e choose Λ > 0 to b e sufficien tly large (e.g., computable in p olynomial time as Λ 2 > n ∗ · W P x ∈ P core ∥ x ∥ 2 ). Because x ∗ lies on an orthogonal axis, this choice guaran tees that any clustering placing a dumm y p oint and a core p oin t in the same cluster incurs a v ariance p enalty strictly greater than the baseline cost of clustering all core p oints at the origin and placing a dedicated cen ter at x ∗ . Th us, any optimal clustering dedicates exactly 1 center to x ∗ (incurring zero cost for the iden tical dumm y p oints) and uses the remaining k core cen ters to optimally partition the W copies of P core . 2 The total n umber of p oints is W M + R = n ∗ . W e set K := e k ( n ∗ ) = k core + 1 and run A on ( P new , K ). Since exact p oin t duplication scales the k -means v ariance ob jectiv e uniformly , the optimal K -means cost is exactly W · cost k core ( P core ). By Theorem 5.2 , we ha ve: Completeness: cost( P new ) ≤ W · C comp ≤ W  3 | E | − (1 − 7 ζ ) k core d  . Soundness: cost( P new ) ≥ W · C sound ≥ W  3 | E | − (1 − δ 5 ) k core d  . T o successfully distinguish the tw o cas es, the relative error of A must not bridge the gap: (1 + e ε ( n ∗ )) W C comp < W C sound , which rearranges to: e ε ( n ∗ ) C comp < C sound − C comp = k core d ( δ 5 − 7 ζ ) . Using the worst-case b ound C comp ≤ 3 | E | = 1 . 5 N d , it suffices to guarantee 1 . 5 N e ε ( n ∗ ) < k core ( δ 5 − 7 ζ ), or equiv alen tly N < δ 5 − 7 ζ 1 . 5 k core e ε ( n ∗ ) . By our initial choice of n ∗ , we ha ve N ≤ δ 5 − 7 ζ 3 f ( n ∗ ) = δ 5 − 7 ζ 3 e k ( n ∗ ) e ε ( n ∗ ) . Since e k (1) ≥ 2 /δ 5 ≥ 2000 and e k is a non-decreasing function, k core = e k ( n ∗ ) − 1 ≥ 1 2 e k ( n ∗ ). This yields: N ≤ δ 5 − 7 ζ 3 k core + 1 e ε ( n ∗ ) < δ 5 − 7 ζ 1 . 5 k core e ε ( n ∗ ) Th us, A correctly distinguishes the instances. By the union b ound, the reduction mapping and A succeed simultaneously with probability at least 1 − 0 . 05 − 0 . 05 = 0 . 90. The total time to resolv e the XXH instance comprises the reduction o verhead and the execution of A . Because e ε ( n ) = o  1 / (log n ) ω (1)  , we ha ve f ( n ) = ω ((log n ) c ) for an y constan t c > 0. This implies log n ∗ = N o (1) , yielding n ∗ = 2 N o (1) . Thus, the reduction time to compute b ounds and construct P new is p oly ( N , n ∗ ) = 2 o ( N ) . Algorithm A runs in 2 f ( n ∗ ) 1 − β · p oly( n ∗ ) = 2 O ( N 1 − β ) · 2 N o (1) time. F or an y constant γ ∈ (0 , β ) and sufficien tly large N , the ov erall run time is b ounded b y 2 N 1 − γ . This decides the XXH hypothesis in strictly sub-exponential time, pro ducing a contradiction. Remark 5.3. It is worth noting that the r e duction in The or em 5.2 applies even for we aker versions of XXH. F or example, if one day in the futur e, XXH was pr ove d against algorithms running in time 2 √ n inste ad of the curr ently state d 2 n 1 − o (1) runtime algorithms, then we c an r e c over that ther e is 2 If A requires distinct p oin ts, an infinitesimally small perturbation av oids multisets without meaningfully affecting the con tinuous ob jective gap. 15 no r andomize d algorithm running in time much faster than 2 √ k/ε · p oly( n, d ) that c an (1 + ε ) - appr oximate the Euclide an k -me ans pr oblem. The rest of this section is dedicated to proving Theorem 5.2 . In Section 5.1 we provide the reduction from the instance of the vertex cov er problem given by XXH to the Euclidean k -means problems with the parameters given in Theorem 5.2 . Next in Sections 6 and 7 , we pro ve the completeness and soundness prop erties of the reduction, thus completing the pro of of Theorem 5.2 . Finally , in Section 8 , w e show ho w Theorem 5.1 implies Corollary 1.3 . 5.1 Construction of the instance Our starting p oin t is a d -regular (p olylog n, α )-small set v ertex expander G = ( V = [ n ] , E ) (where α ≤ δ 10 , k > √ n · d and d = (log n ) L , for some L > 1). Let ρ : [ n ] → [ k ] b e a uniform random function (also viewed as a coloring), where ρ ( j ) = i with probabilit y 1 /k indep enden tly for all j ∈ [ n ] and i ∈ [ k ]. Let [ n ] = U 1 ˙ ∪ · · · ˙ ∪ U k b e the partition induced by ρ , i.e., for all i ∈ [ k ], w e ha ve U i := { j ∈ [ n ] | ρ ( j ) = i } . Since ρ is a uniform random function, applying Chernoff bound, we ha ve: Prop osition 5.4. With pr ob ability at le ast 0.99, for al l i ∈ [ k ] , we have | U i | = n k ± 3 p n k · log k . Pr o of. Fix an index i ∈ [ k ]. F or each vertex j ∈ [ n ], let X j,i b e the indicator random v ariable for the even t that vertex j is assigned to set U i . By the definition of ρ , the v ariables { X j,i } n j =1 are indep enden t Bernoulli trials with parameter p = 1 /k . The expected size of the set U i is: E [ | U i | ] = n X j =1 E [ X j,i ] = n k . W e no w apply the Chernoff bound (Theorem 3.3 ) with the deviation term set to 3 p n k log k to obtain the following: Pr     | U i | − n k    ≥ 3 r n k log k  ≤ 2 exp  − 9 n k log k 3 n k  = 2 k − 3 ≤ 1 100 k , when k ≥ 15. By the Union Bound, the probabilit y that the b ound holds sim ultaneously for all i ∈ [ k ] is at least 1 − k 100 k = 0 . 99. W e next note that ρ induces a coloring on the vertices of G , simply by lab eling v ertices 1 to k . F or ev ery edge e ∈ E , let ρ ( e ) ⊂ [ k ] be the colors of the t wo endp oints of e , formalized as follows: if e connects u and v , ρ ( e ) := { ρ ( u ) , ρ ( v ) } . 5.1.1 Prop erties of the random coloring Call an edge e bad if | ρ ( e ) | = 1. A simple probabilistic analysis bounds the n umber of bad edges: Prop osition 5.5. With pr ob ability at le ast 0 . 99 , the fr action of b ad e dges is at most 100 /k . 16 Pr o of. Let Y b e the random v ariable coun ting the num b er of bad edges. W e write Y = P e ∈ E I e , where I e is the indicator that edge e = ( u, v ) is bad (i.e., ρ ( u ) = ρ ( v )). Since v ertex colors are chosen uniformly and indep endently , for an y edge e = ( u, v ), Pr( I e = 1) = P k c =1 Pr( ρ ( u ) = c ∧ ρ ( v ) = c ) = P k c =1 1 k 2 = 1 k . By linearit y of expectation, the exp ected fraction of bad edges is E [ Y / | E | ] = 1 | E | P e ∈ E E [ I e ] = 1 k . Applying Mark ov’s inequalit y to the non-negativ e random v ariable Y / | E | : Pr  Y | E | ≥ 100 k  ≤ E [ Y / | E | ] 100 /k = 0 . 01 . Next, we call a pair of edges non-representativ e if | ρ ( e ) ∪ ρ ( e ′ ) | < | e ∪ e ′ | . On the other hand, w e call a pair of edges representativ e if | ρ ( e ) ∪ ρ ( e ′ ) | = | e ∪ e ′ | . Similar to Prop osition 5.5 , w e can b ound the num b er of non-representativ e pairs: Prop osition 5.6. With pr ob ability 0 . 99 , the fr action of non-r epr esentative p airs is at most 600 /k . Pr o of. Let Z b e the random v ariable counting the num b er of non-representativ e pairs. F or any pair of edges { e, e ′ } , let S = e ∪ e ′ b e the set of v ertices inv olved. Since | S | ≤ 4, the probability of a collision in the coloring of S is bounded by  4 2  1 k = 6 k . By linearit y of exp ectation, the exp ected fraction of non-representativ e pairs is at most 6 /k . Applying Marko v’s inequalit y: Pr  fraction of non-representativ e pairs ≥ 600 k  ≤ 6 /k 600 /k = 0 . 01 . A particular subset of non-represen tative pairs is those that ha ve one v ertex in common – and hence hav e t wo colors in common. W e b ound more precisely those in the next lemma. Prop osition 5.7. With pr ob ability 0 . 99 , the numb er of e dges that have one vertex and two c olors in c ommon with another e dge is at most 100 nd 2 k . Pr o of. Let W b e the random v ariable counting the num b er of such edges. Such edges come in pairs sharing a v ertex. F or an y v ertex u , there are  d 2  pairs of incident edges. Let e = { u, v } and e ′ = { u, w } b e suc h a pair. They share tw o colors if ρ ( e ) = ρ ( e ′ ), which implies ρ ( v ) = ρ ( w ) (o ccurring with probability 1 /k ). Summing ov er all n vertices, the exp ected n umber of suc h pairs is n  d 2  1 k ≤ nd 2 2 k . Since each pair con tributes at most 2 edges to the count W , we ha ve E [ W ] ≤ 2 · nd 2 2 k = nd 2 k . Applying Marko v’s inequalit y: Pr  W ≥ 100 nd 2 k  ≤ E [ W ] 100 nd 2 /k ≤ nd 2 /k 100 nd 2 /k = 0 . 01 . Finally , we bound the num b er of non-represen tative pairs in an y set C . F or this, we use the Chernoff b ounds given in Theorem 3.3 . Lemma 5.8. With pr ob ability at le ast 0.99, it holds that for every set C of e dges, the fr action of p airs of e dges in C having two c olors in c ommon is at most 36 log k | C |− 1 . Pr o of. F or ev ery pair of distinct colors a, b ∈ [ k ] (with a  = b ), let E ab b e the set of edges in the graph with colors a and b . Note that mono chromatic edges (where a = b ) are considered “bad” 17 edges and are excluded from the p oint set P by definition in Section 5.1 . Th us E aa ∩ C = ∅ for any cluster C . The n umber of pairs of edges in C having t wo colors in common is therefore at most N ab := P a  = b  | E ab ∩ C | 2  . Note that the total n um b er of pairs of edges in C is  | C | 2  ≤ | C | 2 2 . The quantit y N ab is maximized when the mass is concentrated on as few sets E ab ∩ C as p ossible. Indeed, b y conv exit y , if x ≥ y , then for an y η > 0, x 2 + y 2 ≤ ( x + η ) 2 + ( y − η ) 2 . Hence, w e fo cus on a fixed pair of colors, and show that | E ab | ≤ 36 log k with high probabilit y . First, from Proposition 5.4 , the n um b er of v ertices with color a is at most 2 n/k with probabilit y 1 − exp( − n/k ). W e show an additional prop erty: any vertex has at most 4 neighbors with color a , with probabilit y 1 − 1 /k 2 . T o see this, consider the probabilit y that a v ertex has at least five neigh b ors with color a . This corresp onds to a Binomial distribution with parameters d and 1 /k ; hence the probabilit y is at most  d 5  (1 /k ) 5 ≤ d 5 k 5 . The exp ected num b er of vertices with at least five neigh b ors of color a is therefore at most nd 5 k 5 . Mark ov’s inequality ensures that, with probabilit y 1 − 1 /k 2 , this n umber is at most nd 5 k 3 . Since k ≥ √ nd , w e hav e k 3 ≥ n 1 . 5 d 3 > nd 5 (for sufficien tly large n ), implying this quantit y is strictly less than 1, so no such v ertex exists. Recall that U a is the set of vertices with color a . Since the graph is d -regular and | U a | ≤ 2 n/k , the set U a has at most 2 nd/k outgoing edges. F or a v ertex v ∈ U a , eac h neigh b or receiv es color b indep enden tly with probabilit y at most 1 / ( k − 1). Applying Chernoff b ounds (Theorem 3.3 ) with n umber of trials N = 2 nd/k and probability p = 1 /k , the num b er of neighbors of U a that hav e color b is b ounded. Let this count b e S . W e b ound Pr( S > N p + β N ). W e choose β = 4 k log k nd . Using the assumption k ≥ √ nd , we hav e k 2 ≥ nd 2 . Th us: β k = 4 k 2 log k nd ≥ 4 nd 2 log k nd = 4 d log k ≥ 8 , (since d ≥ 2 and k is large). This implies β ≥ 8 /k = 8 p , and consequently p + β ≤ 9 8 β . Using the b ound D ( p + β , p ) ≥ β 2 2( p + β ) , the exp onent is: β 2 N 2( p + β ) > β 2 N 2( 9 8 β ) = 4 9 β N = 4 9  4 k log k nd · 2 nd k  = 32 9 log k > 3 . 5 log k . Th us, Pr( S > N p + β N ) ≤ k − 3 . 5 . The deviation term is β N = 8 log k . The mean is N p = 2 nd k 2 < 1 (for large n ). Thus, with probabilit y at least 1 − k − 3 . 5 , there are at most 9 log k such vertices (rounding up conserv ativ ely), and th us S ≤ 9 log k . In addition, w e established that eac h v ertex has at most 4 neighbors in U a . Thus, each v ertex in U b adjacen t to U a con tributes at most 4 edges to E ab . Consequently , | E ab | ≤ 4 × 9 log k = 36 log k . A union b ound o v er all  k 2  < k 2 pairs a, b ensures this holds for all pairs sim ultaneously with probabilit y at least 1 − k 2 ( k − 3 . 5 ) = 1 − k − 1 . 5 ≥ 0 . 99. Finally , the sum P a  = b  | E ab ∩ C | 2  ≤ P | E ab ∩ C | 2 2 is maximized when | C | 36 log k differen t sets E ab ha ve the maximal size 36 log k . In that case: Num b er of pairs ≤ | C | 36 log k · (36 log k ) 2 2 = 18 | C | log k . 18 Dividing by the total n umber of pairs | C | ( | C |− 1) 2 , the fraction is at most 36 log k | C |− 1 . 5.1.2 Construction of the p oint set Let E ′ b e the set of all edges whic h are not bad. W e construct the p oint-set P ⊆ { 0 , 1 } n + k , where for ev ery e ∈ E ′ , w e ha ve a unique p oin t p e ∈ P . W e identify the first n coordinates with the set V (= [ n ]) and the last k co ordinates with the set [ k ]. Let e ∈ E ′ b e incident on the v ertices i and j in V . Then we can define the p oin t p e as follows: p e := e i + e j + e ρ ( i )+ n + e ρ ( j )+ n , for any j ∈ [ n + k ], e j denotes the standard basis v ector whic h is 1 on the j th co ordinate and zero ev erywhere else. Fix e ∈ E ′ . F rom the construction, we ha ve: ∀ e ∈ E ′ , ∥ p e ∥ 2 2 = 4 . (1) W e no w ha ve an upp er b ound on the diameter of the point-set we constructed. F or an y pair of p oints, we ha ve: ∀ e, e ′ ∈ E ′ , ∥ p e − p e ′ ∥ 2 2 ≤ ∥ p e ∥ 2 2 + ∥ p e ′ ∥ 2 2 ≤ 8 (2) F or a pair of edges that intersect, we ha ve that their corresp onding pair of points ha ve: 2 ≤ ∥ p e − p e ′ ∥ 2 2 ≤ 4 (3) Those at distance 2 m ust hav e t wo colors and one vertex in common: from Prop osition 5.7 , the n umber of such pairs is at most 100 nd 2 k , whic h is a tin y fraction of the total n umber of pairs. F or a pair of edges that don’t in tersect, their corresponding pair of p oints are at distance: 4 ≤ ∥ p e − p e ′ ∥ 2 2 ≤ 8 (4) In the follo wing sections, we abuse notation sometimes, and use E instead of E ′ , whenev er it is clear. In addition, since | E ′ | = (1 − o (1)) · | E | (as k = ω (1)), w e use | E | in measuring clustering cost instead of | E ′ | . F rom Prop ositions 5.4 , 5.5 , 5.6 , and 5.7 and Lemma 5.8 , with probability at least 0 . 95, the fraction of bad edges is at most 100 /k , the fraction of non-representativ e pairs is at most 600 /k , the num ber of edges that ha ve one v ertex and t w o colors in common with another edge is at most 100 nd 2 k , and it holds that for every set C of edges, the fraction of pairs of edges in C ha ving tw o colors in common is at most 36 log k | C | . Under the even t that the ab ov e happ ens, in Section 6 , w e sho w that supp ose there exists n/ 2 vertices in G that cov ers (1 − ζ ) · | E | edges the optimal k -means solution of P has cost at most 3 | E | − (1 − 7 ζ ) kd , in Section 7 , we show that supp ose ev ery n/ 2 v ertices in G co vers at most (1 − δ ) · | E | edges then optimal k -means solution of P has cost at least 3 | E | − (1 − δ 5 ) k d . 19 6 Completeness Analysis Supp ose there exists n/ 2 vertices in G that cov ers the subset of edges e E , where | e E | = (1 − ζ ) · | E | . W e show that, in that case, the cost of the optimal k -means solution in the instance constructed in Section 5.1 is cheap, namely: k -means( C 1 , . . . , C k ) ≤ 3 | E | − (1 − 7 ζ ) k d. 6.1 Construction of the Clustering Let S = { v 1 , . . . , v n/ 2 } b e the subset of vertices comprising the given cov er. This implies there exists a partition of e E into n/ 2 parts suc h that for all i ∈ [ n/ 2], all the edges in the i th part can b e co vered by v i . Let us capture this specific partitioning more formally . There exists a function π : e E → [ n/ 2] such that for all e ∈ e E we hav e v π ( e ) co vers e . Let e P b e the p oin ts (from P as constructed in Section 5.1.2 ) corresp onding to the edges e E . W e define a partition of e P in to k clusters e C 1 , . . . , e C k based on the colors of the co ver v ertices: the p oint p e b elongs to cluster e C i if and only if ρ ( v π ( e ) ) = i . W e extend this clustering to p oints in E \ e E : for each edge e ∈ E \ e E , arbitrarily designate one of its endpoints as u e , and assign p e to the cluster C ρ ( u e ) . W e finally define cluster C i := e C i ∪ C i . W e start by sho wing few prop erties ab out this partitioning. F or a v ertex v j from the vertex co ver, define d j suc h that d − d j is the num b er of edges with π ( e ) = j . W e hav e the following: F act 6.1. With the ab ove notation, P n/ 2 j =1 d j = ζ | E | . Pr o of. By definition of d j , v j co vers d − d j edges, therefore: (1 − ζ ) | E | = n/ 2 X j =1 ( d − d j ) = nd 2 − n/ 2 X j =1 d j = | E | − n/ 2 X j =1 d j . Th us, P d j = ζ | E | . W e can also sho w that the e C i are roughly balanced: F act 6.2. With pr ob ability 0 . 99 , e ach e C i has size at le ast nd 6 k . Pr o of. Let G = { v j ∈ S | d j ≤ d/ 2 } be the set of “go o d” co v er v ertices. Since eac h v ertex has degree d , v j co vers at most d edges, meaning d j ≥ 0. Applying Marko v’s inequalit y to the non-negative v alues d j , and by F act 6.1 , the n umber of vertices with d j > d/ 2 is strictly less than ζ nd/ 2 d/ 2 = ζ n . Consequen tly , the num b er of go o d v ertices is b ounded b elow b y | G | ≥ n/ 2 − ζ n = n (1 / 2 − ζ ). Since ρ assigns each vertex in V to a color in [ k ] indep endently and uniformly at random, for a fixed color i ∈ [ k ], the random v ariable S i := | G ∩ U i | follows a Binomial distribution Bin( | G | , 1 /k ). The exp ected v alue is µ := E [ S i ] = | G | k ≥ n (1 / 2 − ζ ) k . F rom the preconditions of Theorem 5.2 , w e kno w ζ ≤ δ 5 / 10 ≤ 10 − 15 / 10 < 1 / 100, whic h implies µ ≥ 49 n 100 k . W e wish to b ound the probabilit y that S i < n 3 k . W e apply the Chernoff bound for the lo wer tail (Theorem 3.3 ), Pr( S i ≤ (1 − λ ) µ ) ≤ exp  − λ 2 µ 2  , with the deviation parameter λ = 1 / 4: (1 − λ ) µ = 3 4 µ ≥ 3 4 · 49 n 100 k = 147 n 400 k . 20 Because 147 n 400 k = 0 . 3675 n k > 0 . 3333 n k ≈ n 3 k , any even t where S i < n 3 k implies that S i ≤  1 − 1 4  µ . Th us, we can safely upp er-b ound the probabilit y: Pr  S i < n 3 k  ≤ Pr  S i ≤  1 − 1 4  µ  ≤ exp  − (1 / 4) 2 µ 2  = exp  − µ 32  ≤ exp  − 49 n 3200 k  . Since k = o ( | V | / polylog ( | V | )), the ratio n/k grows strictly faster than an y polylogarithmic function, meaning exp( − 49 n/ 3200 k ) v anishes sup er-p olynomially . Thus, for sufficiently large n , we hav e exp  − 49 n 3200 k  ≤ 0 . 01 k . Applying a union b ound ov er all k clusters, we guarantee that S i ≥ n 3 k holds for all i ∈ [ k ] sim ultaneously with probabilit y at least 1 − k  0 . 01 k  = 0 . 99. Conditioned on this ev ent occurring, w e can low er b ound the size of eac h e C i . Recalling that | e C i | = P v j ∈ U i ∩ S ( d − d j ), we drop the non-negativ e contributions of v ertices outside G to obtain: | e C i | ≥ X v j ∈ G ∩ U i ( d − d j ) ≥ X v j ∈ G ∩ U i d 2 = d 2 S i ≥ d 2  n 3 k  = nd 6 k . 6.2 Cost of the Clustering The cost of cluster C i is 1 2 | C i | ·  P p e ,p e ′ ∈ C i ∥ p e − p e ′ ∥ 2 2  . Hence, we b ound the distance b etw een pairs of p oin ts in C i . First, as all edges hav e at least one color in common (color i ), they are at squared distance at most 6 in the em b edding. Some are at squared distance 4, when they share a v ertex: w e coun t more precisely their n um b er, in terms of d j : F act 6.3. In cluster C i , the numb er of or der e d p airs of e dges at squar e d distanc e at most 4 is at le ast | e C i |   d − 1 − 6 k n · X v j ∈ S : ρ ( v j )= i d j   . Pr o of. First, the num b er of ordered pairs of edges in C i that ha v e v j in common is ( d − d j )( d − d j − 1). Hence, the total n um b er of ordered pairs of edges in e C i with one vertex in common is : X v j ∈ S : ρ ( v j )= i ( d − d j )( d − d j − 1) = X v j ∈ S : ρ ( v j )= i ( d − d j )( d − 1) − X v j ∈ S : ρ ( v j )= i ( d − d j ) d j = | e C i | ( d − 1) − X v j ∈ S : ρ ( v j )= i ( d − d j ) d j ≥ | e C i | ( d − 1) − d X v j ∈ S : ρ ( v j )= i d j ≥ | e C i |   d − 1 − 6 k n · X v j ∈ S : ρ ( v j )= i d j   , where the last inequality holds b ecause | e C i | ≥ nd 6 k (F act 6.2 ), implying d | e C i | ≤ 6 k n . As all those edges are at squared distance at most 4, this concludes the claim. W e can now compute the k -means cost. 21 Prop osition 6.4. Supp ose S = { v 1 , . . . , v n/ 2 } c overs the subset of e dges e E , wher e | e E | = (1 − ζ ) · | E | . Then, k -me ans ( C 1 , . . . , C k ) ≤ 3 | E | − (1 − 7 ζ ) k d. Pr o of. Recall that the k -means cost is P k i =1 1 2 | C i | P e,e ′ ∈ C i ∥ p e − p e ′ ∥ 2 . The sum runs o ver all | C i | 2 ordered pairs. The trivial distance b ound is ∥ p e − p e ′ ∥ 2 ≤ 6 for all e  = e ′ . Sp ecifically , the cost con tribution of cluster C i is at most: 1 2 | C i | X e  = e ′ 6 = 1 2 | C i | 6 | C i | ( | C i | − 1) = 3( | C i | − 1) = 3 | C i | − 3 . Summing this base cost ov er all k clusters giv es P k i =1 (3 | C i | − 3) = 3 | E | − 3 k ≤ 3 | E | . Ho wev er, pairs sharing a v ertex ha ve squared distance at most 4. This is a reduction of at least 6 − 4 = 2 per ordered pair compared to the base b ound of 6. Let M i b e the n umber of unordered pairs in C i at distance 4. This corresp onds to 2 M i ordered pairs. The cost reduction is at least P k i =1 1 2 | C i | (2 · 2 M i ) = P k i =1 2 M i | C i | . Using F act 6.3 , let ∆ i = P v j ∈ S : ρ ( v j )= i d j . W e hav e M i ≥ 1 2 | e C i |  d − 1 − 6 k n ∆ i  . Since | C i | = | e C i | + | C i | , we can write the fraction | e C i | | C i | as 1 1+ | C i | / | e C i | . The total cost reduction is bounded below by: k X i =1 | e C i |  d − 1 − 6 k n ∆ i  | C i | = k X i =1 d − 1 − 6 k n ∆ i 1 + | C i | | e C i | . Using the algebraic iden tity B − C 1+ A ≥ B − C − AB (which is v alid for all A, B , C ≥ 0 since B − C 1+ A − ( B − C − AB ) = AC + A 2 B 1+ A ≥ 0), w e set A = | C i | | e C i | , B = d − 1, and C = 6 k n ∆ i . W e obtain: k X i =1 d − 1 − 6 k n ∆ i 1 + | C i | | e C i | ≥ k X i =1 " ( d − 1) − 6 k n ∆ i − ( d − 1) | C i | | e C i | # ≥ k X i =1 " ( d − 1) − 6 k n ∆ i − d | C i | | e C i | # . W e b ound each term in the summation. First w e hav e P k i =1 ( d − 1) = k ( d − 1) = k d − k . Second, we hav e P k i =1 6 k n ∆ i = 6 k n P i ∆ i = 6 k n ( ζ | E | ) = 6 k n ζ nd 2 = 3 ζ k d . (Using F act 6.1 ). Finally , we ha ve from F act 6.2 , | e C i | ≥ nd 6 k , so 1 | e C i | ≤ 6 k nd . Th us: k X i =1 d | C i | | e C i | ≤ d 6 k nd k X i =1 | C i | = 6 k n ( ζ | E | ) = 3 ζ k d. Com bining these, the total cost reduction is at least: ( k d − k ) − 3 ζ k d − 3 ζ kd = k d (1 − 6 ζ ) − k . The total cost is therefore at most: (3 | E | − 3 k ) − ( k d (1 − 6 ζ ) − k ) = 3 | E | − k d (1 − 6 ζ ) − 2 k ≤ 3 | E | − k d (1 − 7 ζ ) . 22 7 Soundness Analysis Let the soundness assumption b e that for every S ⊆ V of size n/ 2, we ha ve that there exists F ⊆ E suc h that | F | ≥ δ · | E | and for all e ∈ F w e ha ve e ∩ S = ∅ . Consider an arbitrary partition of P in to k clusters C 1 , . . . , C k . If this clustering has cost at most 3 | E | − (1 − δ 5 ) dk then w e will show that the soundness assumption is con tradicted 3 . F or every i ∈ [ k ], G i ( V i , E i ⊆ E ) is the subgraph defined as follo ws: e ∈ E i if and only if p e ∈ C i (and G i has no isolated vertices). In w ords, E i is the set of edges whose em b edded point is in cluster C i and thus | E i | = | C i | . W e start by remo ving all pairs of edges with one vertex and t wo colors in common. F rom Prop osition 5.7 , this is at most 100 nd 2 k = o ( k d ) many edges (as k > √ n and d = (log n ) L ). As this is a tin y fraction of the total n umber of edges, not co vering them do es not hurt the soundness analysis, and it only decreases the k -means cost. In addition, we assume that every cluster is of size at least ξ d for some ξ > 0. This is again b ecause the sum of p oints in all clusters of size less than ξ d is smaller than ξ dk ov erall, and th us ev en removing them do es not affect our analysis (assuming ξ is sufficien tly small, for example, ξ = 2 − 1 /δ ). Our pro of strategy can no w b e broken do wn to the following four steps. First, we connect the cost of cluster C i (for i ∈ [ k ]) with certain prop erties of the graph G i . F ormally , we sho w in Section 7.1 the follo wing: Lemma 7.1. Fix a cluster C i (for i ∈ [ k ] ). L et γ i b e the fr action of p airs of e dges e, e ′ ∈ E i that have no c olor in c ommon, and κ i b e the fr action of p airs of e dges that have two c olors in c ommon. Then, the c ost of the cluster is e qual to 3 | C i | − 1 + ( γ i − κ i )( | C i | − 1) − 1 | C i | · X v ∈ V i d 2 i,v , wher e d i,v is the de gr e e of v in the gr aph G i . Next, w e show that if γ i (fraction of pairs of edges with no color in common) and κ i (fraction of pairs of edges with tw o colors in common) are not to o large, then there is a dominant single color in the cluster C i . F ormally , w e sho w in Section 7.2 the follo wing: Lemma 7.2. L et 1 / 1000 > η > 0 b e some c onstant to b e sp e cifie d later. Fix a cluster C i (for some i ∈ [ k ] ). If κ i ≤ η 3 and γ i ≤ η/ 3 , then ther e is a set of (1 − η ) fr action of e dges in E i that have the same c olor c i . Then, we show that we can iden tify a large sub collection of clusters for whic h b oth κ i and γ i are small. Moreov er, for eac h cluster in this sub collection, we obtain a meaningful b ound on the sum of the squared degrees of the v ertices in the sub collection, and this b ound is tied to the clustering cost app earing in Lemma 7.1 . F ormally , w e show in Section 7.3 the following: Lemma 7.3. Ther e is some I ⊆ [ k ] such that the fol lowing holds: 3 Sometimes in the analysis, for the sake of ease of presentation, we bring back the bad edges remov ed from the construction of the point-set. This helps with using the regularlit y of the graph. How ev er, this do esn’t affect the conclusion, as k > √ nd implies | E bad | ≤ 50 nd k < 50 k d = o ( k ) ≪ τ kd . Hence their impact on the clustering cost is negligble. 23 1. for al l i ∈ I , κ i ≤ η 3 and γ i ≤ η/ 3 . 2. P i ∈ I | E i | ≥ (1 − 38 δ 2 ) · | E | . In addition, we also have for al l i ∈ I : 1 | E i | · X v ∈ G i d 2 i,v ≥ (1 − δ 3 ) d, (5) Finally , we show that a cost of clustering 3 | E | − (1 − δ 5 ) dk implies that we can construct a set of vertices S ⊆ [ n ] of size n/ 2 such that w e cov er most of the edges. Lemma 7.4. F or sufficiently lar ge n , if the clustering c ost is at most 3 | E | − (1 − δ 5 ) k d , then by setting η = δ 5 , ther e exists a subset of exactly n/ 2 vertic es that c overs at le ast 1 − 10 δ 1 . 5 fr action of E . F or small enough δ , this implies our constructed n/ 2-sized subset misses strictly fewer than δ fraction of the edges contradicting the soundness assumption. 7.1 Pro of of Lemma 7.1 : Connecting Cluster Cost to Prop erties of Graph Lemma 7.1. Fix a cluster C i (for i ∈ [ k ] ). L et γ i b e the fr action of p airs of e dges e, e ′ ∈ E i that have no c olor in c ommon, and κ i b e the fr action of p airs of e dges that have two c olors in c ommon. Then, the c ost of the cluster is e qual to 3 | C i | − 1 + ( γ i − κ i )( | C i | − 1) − 1 | C i | · X v ∈ V i d 2 i,v , wher e d i,v is the de gr e e of v in the gr aph G i . Pr o of. First note that from Lemma 3.2 : Cost( C i ) = 1 2 | C i | X e ∈ E i X e ′ ∈ E i ∥ p e − p e ′ ∥ 2 = X e ∈ E i ∥ p e ∥ 2 − 1 | C i |       X e ∈ E i p e       2 . (6) Since ∥ p e ∥ 2 = 4 for all e , the first term is 4 | C i | . No w consider the vector sum S = P e ∈ E i p e . Recall p e = e u + e v + e ρ ( u )+ n + e ρ ( v )+ n . W e can write S in terms of the basis vectors. The co ordinate e v app ears d i,v times (once for each incident edge in E i ). The co ordinate e c + n app ears κ i,c times, where κ i,c is the num ber of edges in E i that ha ve an endp oin t of color c . Thus: ∥ S ∥ 2 = X v ∈ V i d 2 i,v + X c ∈ [ k ] κ 2 i,c . The term P c κ 2 i,c coun ts the n umber of pairs ( e, e ′ ) that share a sp ecific color c , summed o ver all colors. X c κ 2 i,c = X c   X e ∈ E i 1 c ∈ ρ ( e )   2 = X e,e ′ ∈ E i X c 1 c ∈ ρ ( e ) ∩ ρ ( e ′ ) = X e,e ′ | ρ ( e ) ∩ ρ ( e ′ ) | . 24 W e split the sum into e = e ′ and e  = e ′ . F or e = e ′ , we hav e | ρ ( e ) ∩ ρ ( e ′ ) | = 2. There are | C i | suc h terms and th us the total sum contribution is 2 | C i | . F or e  = e ′ , let γ i b e the fraction with 0 colors common, and κ i b e the fraction with 2 colors common. The remaining fraction (1 − γ i − κ i ) hav e exactly 1 color common. The sum is: X e  = e ′ | ρ ( e ) ∩ ρ ( e ′ ) | = | C i | ( | C i | − 1) [0 · γ i + 1 · (1 − γ i − κ i ) + 2 · κ i ] = | C i | ( | C i | − 1)(1 − γ i + κ i ) . Th us, ∥ S ∥ 2 = P d 2 i,v + 2 | C i | + | C i | ( | C i | − 1)(1 − γ i + κ i ). Substituting bac k into ( 6 ), we hav e: Cost( C i ) = 4 | C i | − 1 | C i |   X v ∈ V i d 2 i,v + 2 | C i | + | C i | ( | C i | − 1)(1 − γ i + κ i )   = 4 | C i | − 1 | C i | X d 2 i,v − 2 − ( | C i | − 1)(1 − γ i + κ i ) = 4 | C i | − 2 − [( | C i | − 1) − ( | C i | − 1) γ i + ( | C i | − 1) κ i ] − 1 | C i | X d 2 i,v = 4 | C i | − 2 − | C i | + 1 + ( | C i | − 1)( γ i − κ i ) − 1 | C i | X d 2 i,v = 3 | C i | − 1 + ( | C i | − 1)( γ i − κ i ) − 1 | C i | X d 2 i,v . 7.2 Pro of of Lemma 7.2 : T ypical Clusters are Mono chromatic Lemma 7.2. L et 1 / 1000 > η > 0 b e some c onstant to b e sp e cifie d later. Fix a cluster C i (for some i ∈ [ k ] ). If κ i ≤ η 3 and γ i ≤ η/ 3 , then ther e is a set of (1 − η ) fr action of e dges in E i that have the same c olor c i . Pr o of. Assume to wards con tradiction that there is no color in tersecting more than a (1 − η ) fraction of the edges. Let f c b e the fraction of edges intersecting color c . W e first provide an upp er-b ound on the probabilit y that tw o distinct edges e, e ′ dra wn uniformly at random o verlap in at least one color. Let e f c = f c | C i |− 1 | C i |− 1 ≤ f c : for a pair of distinct edges e, e ′ , Pr[ c ∈ ρ ( e ′ ) | c ∈ ρ ( e )] = e f c ≤ f c . Therefore, the probability of o v erlap is at most: Pr e  = e ′ [ ∃ c ∈ ρ ( e ) ∩ ρ ( e ′ )] ≤ X c Pr e  = e ′ [ c ∈ ρ ( e ) and c ∈ ρ ( e ′ )] ≤ X c f 2 c . Since the probability of o verlap is 1 − γ i , we get 1 − η / 3 ≤ 1 − γ i ≤ X c f 2 c . (7) Since each edge gets exactly 2 colors, P f c = 2. Order the colors by decreasing frequency , f 1 ≥ f 2 ≥ . . . . Let f j,ℓ b e the fraction of edges with colors j and ℓ . Note that κ i is exactly the fraction of distinct pairs of edges sharing tw o colors. Let X j,ℓ = f j,ℓ | C i | b e the n umber of edges with colors j and ℓ . W e ha ve P j <ℓ X j,ℓ ( X j,ℓ − 1) | C i | ( | C i |− 1) = κ i . Since X ( X − 1) N ( N − 1) ≥ X 2 − X N 2 = f 2 − f N , w e obtain f 2 j,ℓ ≤ κ i + f j,ℓ | C i | ≤ κ i + 1 | C i | . Since w e pruned clusters smaller than ξ d , 25 and d = ω (1), the term 1 / | C i | v anishes for large n . Thus, we can define e κ i := κ i + 1 / | C i | ≤ 2 η 3 , yielding f j,ℓ ≤ √ e κ i . By the inclusion-exclusion principle, for any t colors, the indicator v ariables satisfy P j ∈ [ t ] f j − P j <ℓ ∈ [ t ] f j,ℓ ≤ 1, whic h implies: t X j =1 f j ≤ 1 +  t 2  p e κ i . (8) W e seek to maximize P f 2 c under these constrain ts. The sum is maximized when the mass is concen trated on as few colors as p ossible. Define L = { c ≥ 2 : f c = f 2 } . As long as f 1  = 1 − η and | L | ≤ 3, con tinuously transfer an arbitrarily small mass ε > 0 from each color in L to f 1 . Up date L b y adding colors whose frequency drops to equal the next largest frequency . This strictly increases P f 2 c (b y conv exit y), preserves the non-increasing order, and maintains the partial sum P 1+ | L | j =1 f j , satisfying Eq. ( 8 ) for t = 1 + | L | . Case 1: f 1 = 1 − η and | L | ≤ 3 . Applying Eq. ( 8 ) for t = 1+ | L | ensures f 1 + | L | f 2 ≤ 1+  | L | +1 2  √ e κ i . Substituting f 1 = 1 − η : f 2 ≤ η | L | + | L | + 1 2 p e κ i . Since 1 ≤ | L | ≤ 3, the maximum occurs at | L | = 1, giving f 2 ≤ η + 2 √ e κ i . The sum of squares is upp er b ounded by concen trating the remaining mass on f 2 : X f 2 i ≤ f 2 1 + f 2 (2 − f 1 ) ≤ (1 − η ) 2 + ( η + 2 p e κ i )(1 + η ) . Since e κ i ≤ 2 η 3 , we hav e √ e κ i ≤ 1 . 5 η 1 . 5 . The sum expands to: 1 − 2 η + η 2 + η + η 2 + 3 η 1 . 5 (1 + η ) ≤ 1 − η + 2 η 2 + 6 η 1 . 5 . F or η ≤ 1 / 1000, the quantit y 2 η 2 + 6 η 1 . 5 is less than 2 3 η . Thus, the sum is strictly less than 1 − η / 3, con tradicting Eq. ( 7 ). Case 2: | L | = 4 . If the pro cedure stops because | L | reac hes 4, then f 2 = f 3 = f 4 = f 5 . Ev aluating Eq. ( 8 ) for t = 5 implies f 1 + 4 f 2 ≤ 1 +  5 2  √ e κ i = 1 + 10 √ e κ i . Hence, f 2 ≤ 1+10 √ e κ i − f 1 4 . The sum of squares is b ounded by: X f 2 i ≤ f 2 1 + 2 f 2 ≤ f 2 1 + 1 − f 1 2 + 5 p e κ i . The quadratic function whic h maps x to x 2 − x/ 2 is maximized on [1 / 4 , 1 − η ] at the b oundary x = 1 − η . The v alue is b ounded by: (1 − η ) 2 + η 2 + 5(1 . 5 η 1 . 5 ) = 1 − 1 . 5 η + η 2 + 7 . 5 η 1 . 5 . W e require this to b e less than 1 − η / 3, which implies 1 . 16 η > η 2 + 7 . 5 η 1 . 5 . Dividing b y η , w e need 1 . 16 > η + 7 . 5 √ η . F or η ≤ 1 / 1000, √ η ≈ 0 . 0316, making the right side roughly 0 . 238 < 1 . 16. This pro vides the final con tradiction. 26 7.3 Pro of of Lemma 7.3 : Iden tifying large mono c hromatic clusters Lemma 7.3. Ther e is some I ⊆ [ k ] such that the fol lowing holds: 1. for al l i ∈ I , κ i ≤ η 3 and γ i ≤ η/ 3 . 2. P i ∈ I | E i | ≥ (1 − 38 δ 2 ) · | E | . In addition, we also have for al l i ∈ I : 1 | E i | · X v ∈ G i d 2 i,v ≥ (1 − δ 3 ) d, (5) F or the sake of presentation, we use τ := δ 5 and ω := δ 3 . Moreov er, recall that α = δ 10 , and w e set η = δ 5 . W e first apply a pre-pro cessing step to discard small clusters. W e discard all clusters satisfying | E i | < 10 τ 2 d . The total n umber of edges discarded is b ounded b y k · 10 τ 2 d = 10 τ 2 k d . Because k = o ( n ) and | E | = nd/ 2, this quantit y is o ( | E | ). Thus, removing them do es not significantly affect the total edge mass. F rom now on, w e assume | E i | ≥ 10 τ 2 d = 10 δ − 10 d for all remaining clusters. Because d = (log n ) L , for sufficiently large n , this size guarantees that 36 log k | E i |− 1 ≤ η 3 . Thus, from Lemma 5.8 , κ i ≤ η 3 is satisfied for all surviving clusters. F or readability , w e break the proof of Lemma 7.3 in the following subsections. 7.3.1 Large Clusters are Bad Lemma 7.5. L et C = { C 1 , . . . , C k } b e any clustering of the p oint set P . Supp ose ther e exists a cluster C ∈ C such that | C | ≥ 16 k d . Assuming n is sufficiently lar ge, the c ost of the clustering is gr e ater than 3 | E | − (1 − δ 5 ) k d . Pr o of. F or an y cluster A ∈ C , define its relativ e benefit as ∆( A ) = 3 | A | − Cost( A ). Using Lemma 7.1 : ∆( A ) = 1 − ( γ A − κ A )( | A | − 1) + 1 | A | X v ∈ V A d 2 A,v . Since the maximum degree in the induced subgraph is d , 1 | A | P d 2 A,v ≤ d · 1 | A | P d A,v = d (2 | A | ) | A | = 2 d . F rom Lemma 5.8 , κ A ( | A | − 1) ≤ 36 log k . Because γ A ≥ 0, we hav e − ( γ A − κ A )( | A | − 1) ≤ 36 log k . Th us, ∆( A ) ≤ 1 + 36 log k + 2 d . Summing ov er all k clusters, the maximum p ossible b enefit is B all ≤ k (2 d + 36 log k + 1) ≤ 3 k d . No w, consider the gian t cluster C with | C | ≥ 16 k d . Since k > √ n , w e ha v e k 2 > n = ⇒ k d > nd/k . Thus, | C | ≥ 16 nd k . Let f c b e the fraction of edges in C inciden t to color c . F rom Prop osition 5.4 , the absolute maxim um n umber of edges inciden t to color c is at most d | U c | ≤ 1 . 1 nd k . Th us, the maximum frequency is µ = max c f c ≤ 1 . 1 nd/k | C | ≤ 1 . 1 nd/k 16 nd/k = 1 . 1 16 < 0 . 07. The fraction of pairs sharing at least one color is b ounded b y 1 − γ C ≤ P f 2 c ≤ µ P f c = 2 µ ≤ 0 . 14. Thus, γ C ≥ 0 . 86. Similarly , κ C ≤ 1 − γ C ≤ 0 . 14. Therefore, the p enalt y co efficien t is firmly b ounded: γ C − κ C ≥ 0 . 86 − 0 . 14 = 0 . 72 ≥ 1 2 . 27 The b enefit of the gian t cluster is bounded by: ∆( C ) ≤ 1 − 1 2 ( | C | − 1) + 2 d ≤ 2 d + 1 . 5 − | C | 2 . W e refine our global b ound on the total benefit B b y using this penalty for C alongside the upper b ound for the remaining k − 1 clusters: B ≤ ∆( C ) + X A  = C ∆( A ) ≤ 2 d + 1 . 5 − | C | 2 + 3 k d. Since | C | ≥ 16 kd , w e ha v e − | C | 2 ≤ − 8 kd . Thus, B ≤ 3 k d + 2 d + 1 . 5 − 8 k d < − 4 k d < 0. Consequently , the total cost is at least 3 | E | − B > 3 | E | > 3 | E | − (1 − δ 5 ) k d . 7.3.2 Univ ersal Expansion Bound on Squared Degrees Before ev aluating the global cost, w e establish a upp er b ound on S i := 1 | E i | P v ∈ V i d 2 i,v for ev ery surviving cluster i . W e use the approach from [ CK19 ]. Consider the set of lo w-degree vertices W i = { v ∈ V i | d i,v ≤ √ αd } . The con tribution of these v ertices to the sum of squares is bounded b y their maxim um degree: P v ∈ W i d 2 i,v ≤ √ αd P v ∈ W i d i,v ≤ 2 √ αd | E i | . No w, consider the set W ′ i = V i \ W i of v ertices with d i,v > √ αd . By the contrapositive of Lemma 7.5 , our global cost assumption Φ( C ) ≤ 3 | E | − (1 − τ ) k d guaran tees that no cluster exceeds size 16 k d . Consequently , the n umber of high-degree v ertices in the induced subgraph is b ounded by | W ′ i | ≤ 2 | E i | √ αd ≤ 32 k √ α . Because k = o ( | V | /d ω (1) ), and d ω (1) gro ws strictly faster than any p olylogarithmic function, this guaran tees that | W ′ i | ≪ n/ p olylog n , whic h is small enough to apply the (p olylog n, α )-small set v ertex expansion prop erty of Hyp othesis 4.2 . By expansion, the num b er of in ternal edges ev aluated in the base graph G is b ounded b y 2 e G ( W ′ i ) ≤ αd | W ′ i | + | W ′ i | . The sum of degrees inside G i for these v ertices is exactly twice the in ternal edges plus the edges crossing to W i (whic h is bounded by | E i | ). Th us: X v ∈ W ′ i d i,v ≤ αd | W ′ i | + | W ′ i | + | E i | ≤ αd  2 | E i | √ αd  + 2 | E i | √ αd + | E i | ≤ (1 + 3 √ α ) | E i | , where the final inequalit y holds for large d . Their con tribution to the squares is P v ∈ W ′ i d 2 i,v ≤ d P v ∈ W ′ i d i,v ≤ d (1 + 3 √ α ) | E i | . Com bining b oth sets, w e conclude that for ev ery cluster: S i = 1 | E i | X v ∈ V i d 2 i,v ≤ 2 √ αd + d (1 + 3 √ α ) ≤ d (1 + 5 √ α ) . (9) 7.3.3 Restricting to clusters with small γ i Claim 7.6. L et B γ = { i ∈ [ k ] | γ i > η / 3 } b e the set of clusters with high c olor disagr e ement. The total numb er of e dges in B γ is b ounde d by P B γ | E i | ≤ 22 k d = o ( | E | ) . 28 Pr o of. Summing the cost form ula from Lemma 7.1 o v er all clusters giv es: 3 | E | − (1 − τ ) k d ≥ 3 | E | − k + k X i =1 γ i ( | E i | − 1) − k X i =1 κ i ( | E i | − 1) − k X i =1 S i . Rearranging to isolate the γ i terms yields: k X i =1 γ i ( | E i | − 1) ≤ τ k d − k d + k + k X i =1 S i + k X i =1 κ i ( | E i | − 1) . Substituting our b ound P S i ≤ kd (1 + 5 √ α ) and P κ i ( | E i | − 1) ≤ 36 k log k : k X i =1 γ i ( | E i | − 1) ≤ τ k d + 5 √ αk d + k + 36 k log k . By our c hoice of τ = δ 5 aligning with √ α = δ 5 , the righ t-hand side is b ounded b y τ kd + 5 τ k d + τ k d = 7 τ kd . Restricting the sum to B γ (where γ i > η/ 3) yields η 3 P i ∈ B γ ( | E i | − 1) ≤ 7 τ k d . Because w e defined η = τ , this simplifies to: X i ∈ B γ | E i | ≤ 21 k d + | B γ | ≤ 22 k d = o ( | E | ) . 7.3.4 Iden tifying clusters with optimal degree sums W e return to the clustering cost equation to b ound the clusters with po or degree concentration. Let I bad = { i ∈ [ k ] | S i < d (1 − ω ) } . Since Equation ( 9 ) establishes S i ≤ d (1 + 5 √ α ) for all k clusters, we cleanly partition the global sum into I bad and [ k ] \ I bad : k X i =1 S i = X I bad S i + X [ k ] \ I bad S i ≤ | I bad | d (1 − ω ) + ( k − | I bad | ) d (1 + 5 √ α ) . Isolating P S i from the initial cost equation ga ve P k i =1 S i ≥ k d (1 − 1 . 1 τ ). Substituting the low er b ound yields: k d (1 − 1 . 1 τ ) ≤ | I bad | d (1 − ω ) + ( k − | I bad | ) d (1 + 5 √ α ) . Dividing by k d and letting x = | I bad | /k b e the fraction of suc h clusters, we ha ve: 1 − 1 . 1 τ ≤ x (1 − ω ) + (1 − x )(1 + 5 √ α ) = 1 + 5 √ α − x ( ω + 5 √ α ) . Subtracting 1 and isolating the x terms to the left: x ( ω + 5 √ α ) ≤ 1 . 1 τ + 5 √ α. Since ω = δ 3 , τ = δ 5 , and √ α = δ 5 , we hav e: x ( δ 3 + 5 δ 5 ) ≤ 1 . 1 δ 5 + 5 δ 5 = 6 . 1 δ 5 . 29 Th us, x ≤ 6 . 1 δ 5 δ 3 = 6 . 1 δ 2 . This firmly b ounds the total num b er of sub-optimal clusters as | I bad | ≤ 6 . 1 δ 2 k . 7.3.5 Bounding the edges of the go o d clusters W e define our final goo d set of clusters as I = [ k ] \ ( P small ∪ B γ ∪ I bad ) (where P small is the set of pruned clusters). By definition, every i ∈ I inheren tly satisfies γ i ≤ η/ 3, κ i ≤ η 3 , and S i ≥ d (1 − ω ). W e sho w Item 2 , namely that most edges are safely retained in I . The omitted edges b elong to P small ∪ B γ ∪ I bad . W e established that P P small | E i | = o ( | E | ), and P B γ | E i | ≤ 22 k d = o ( | E | ). F or the clusters in I bad \ B γ , we leverage the fact that they satisfy γ i ≤ η/ 3. Fix any suc h cluster i . By Prop osition 5.4 with high probabilit y , no color is assigned to more than 1 . 1 n/k vertices. The maximum n umber of edges incident to any single color c is therefore at most 1 . 1 nd/k . Let e ∈ E i with colors ρ ( e ) = { c 1 , c 2 } . The n um b er of edges in E i sharing at least one color with e is at most 2 . 2 nd/k . Thus, e shares absolutely no colors with the remaining subset of edges, which has size at least | E i | − 2 . 2 nd k . Summing this disjointness o v er all e ∈ E i ev aluates eac h completely disjoint pair exactly t wice. The total n umber of disjoint pairs is therefore at least 1 2 | E i |  | E i | − 2 . 2 nd k  . Dividing b y the total n umber of v alid pairs  | E i | 2  ≤ 1 2 | E i | 2 , we get: γ i ≥ 1 2 | E i | ( | E i | − 2 . 2 nd/k ) 1 2 | E i | 2 = 1 − 2 . 2 nd k | E i | . If | E i | > 3 nd k , then γ i ≥ 1 − 2 . 2 3 ≥ 0 . 26. Ho w ever, we know γ i ≤ η / 3 < 0 . 01, leading to a con tradiction. Th us, ev ery cluster in I bad \ B γ m ust deterministically satisfy | E i | ≤ 3 nd k . Since there are at most | I bad | ≤ 6 . 1 δ 2 k suc h clusters, their absolute maxim um edge mass is b ounded by: 6 . 1 δ 2 k · 3 nd k = 18 . 3 δ 2 nd = 36 . 6 δ 2 | E | . Adding the limits together, the total n umber of edges omitted from I is b ounded by 36 . 6 δ 2 | E | + o ( | E | ) ≤ 38 δ 2 | E | for sufficiently large n . Th us, P i ∈ I | E i | ≥ (1 − 38 δ 2 ) | E | . 7.4 A cheap clustering admits a small v ertex co v er Lemma 7.4. F or sufficiently lar ge n , if the clustering c ost is at most 3 | E | − (1 − δ 5 ) k d , then by setting η = δ 5 , ther e exists a subset of exactly n/ 2 vertic es that c overs at le ast 1 − 10 δ 1 . 5 fr action of E . Pr o of. In an y cluster C i ∈ I , Lemma 7.2 guarantees the existence of a dominan t color c i shared by at least a (1 − η ) fraction of the edges in E i . Let e E i ⊆ E i b e this subset of edges. Let e V i ⊆ V i b e the set of endp oin ts of edges in e E i that hav e color c i . Since bad edges (mono c hromatic endpoints) w ere remo ved from P in Section 5.1 , no edge connects t wo v ertices of the same color. Th us, e V i is an indep endent set in G , and each edge in e E i has exactly one endp oint in e V i . Before pro ceeding, we apply the secondary pruning step: let I ′ ⊆ I b e the subset of go o d clusters satisfying | E i | ≥ τ nd k . Discarding the clusters with | E i | < τ nd k remo ves at most k · τ nd k = 30 τ nd = 2 τ | E | edges in total. Th us, the pruned set I ′ safely retains at least (1 − 38 δ 2 − 2 τ ) | E | edges. Because τ = δ 5 ≪ δ 2 , we cleanly bound this retained mass b y (1 − 39 δ 2 ) | E | . Fix i ∈ I ′ . Let S i b e the neighborho o d of e V i in G i . Let e S i ⊆ S i b e the set of unique neigh b ors of e V i in G i , meaning each v ertex in e S i has exactly one neigh bor in e V i within G i . Claim 7.7. | e S i | ≥ | e E i | − 2 αd | e V i | . Pr o of. Let R ⊆ V b e the set of v ertices in the graph G whic h ha v e more than one neigh b or in e V i . Let m R b e the num b er of edges betw een e V i and R in G . Since ev ery v ertex in R has at least tw o neigh b ors in e V i , m R ≥ 2 | R | . Because e V i ⊆ U c i , Prop osition 5.4 guaran tees | e V i | ≤ 1 . 1 n k with high probabilit y . Thus, b y the (p olylog n, α )-small set vertex expansion prop ert y (Definition 4.1 ), the neigh b orho o d size in G satisfies N G ( e V i ) ≥ d | e V i | (1 − α ). The n um b er of unique neigh b ors of e V i in G is therefore at least d | e V i | (1 − α ) − | R | . Since the total num ber of outgoing edges from e V i in G is exactly d | e V i | , we hav e: m R + ( d | e V i | (1 − α ) − | R | ) ≤ d | e V i | = ⇒ m R − | R | ≤ αd | e V i | . Using m R ≥ 2 | R | , this b ounds m R / 2 ≤ αd | e V i | , and consequen tly m R ≤ 2 αd | e V i | . In G i , the set of edges inciden t to e V i is precisely e E i . The num b er of those edges connecting to R is at most m R . Th us, the n um b er of unique neigh b ors in G i is | e S i | ≥ | e E i | − m R ≥ | e E i | − 2 αd | e V i | . Let e G i b e the bipartite subgraph of G i induced b y the edges b et ween e V i and e S i . Let e d i,v b e the degree of v ertex v in e G i . F or all v ∈ e G i , e d i,v ≤ d i,v ≤ d . Since every vertex in e S i has exactly one neigh b or in e V i , e G i is a collection of disjoin t stars centered at e V i . Thus, the total num b er of edges in e G i is exactly | e A i | = | e S i | . W e establish the follo wing relationship comparing the degree squares of G i and e G i : Claim 7.8. F or i ∈ I ′ , P v ∈ V i d 2 i,v | E i | ≤ P v ∈ e V i e d 2 i,v | e A i | + 1 + 4 d  η + 3 α τ  . Pr o of. F or any v ∈ V i , we hav e 0 ≤ e d i,v ≤ d i,v ≤ d . Expanding the square giv es: e d 2 i,v = ( d i,v − ( d i,v − e d i,v )) 2 ≥ d 2 i,v − 2 d i,v ( d i,v − e d i,v ) ≥ d 2 i,v − 2 d ( d i,v − e d i,v ) . Summing ov er all v ∈ V i , the sum of degrees in G i is 2 | E i | and in e G i is 2 | e A i | . Thus P ( d i,v − e d i,v ) = 2( | E i | − | e A i | ). Substituting this, we obtain P e d 2 i,v ≥ P d 2 i,v − 4 d ( | E i | − | e A i | ). Dividing b y | E i | yields: P d 2 i,v | E i | ≤ P e d 2 i,v | E i | + 4 d | E i | − | e A i | | E i | . Because | e A i | ≤ | E i | , replacing the denominator safely upper-b ounds the fraction, yielding: P d 2 i,v | E i | ≤ P e d 2 i,v | e A i | + 4 d | E i | − | e A i | | E i | . In the bipartite graph e G i , all edges are exclusiv ely b et ween e V i and e S i . Since every vertex in e S i has degree exactly 1, w e ha ve P v ∈ e S i e d 2 i,v = | e S i | = | e A i | . Th us, P e d 2 i,v | e A i | = P e V i e d 2 i,v + | e A i | | e A i | = P e V i e d 2 i,v | e A i | + 1. 31 F or the error term, Claim 7.7 implies | e A i | = | e S i | ≥ | e E i | − 2 αd | e V i | . Since | e E i | ≥ (1 − η ) | E i | , we ha ve | E i | − | e A i | ≤ η | E i | + 2 αd | e V i | . Because we explicitly restricted our ev aluation to clusters i ∈ I ′ whic h satisfy | E i | ≥ τ nd k , and we b ounded | e V i | ≤ 1 . 1 n k , we cleanly bound the fraction: | E i | − | e A i | | E i | ≤ η + 2 αd (1 . 1 n/k ) τ nd/k = η + 2 . 2 α τ ≤ η + 3 α τ . Substituting this completes the pro of. F rom Lemma 7.3 , since I ′ ⊆ I , for all i ∈ I ′ w e hav e P V i d 2 i,v | E i | ≥ d (1 − ω ). Substituting our reduction from Claim 7.8 , and noting that | e A i | = P v ∈ e V i e d i,v , we isolate the ratio: P e V i e d 2 i,v P e V i e d i,v ≥ d (1 − ω ) − 1 − 4 d  η + 3 α τ  ≥ d  1 − ω − 4 η − 12 α τ − 1 d  . Let θ := ω + 4 η + 12 α τ + 1 d . W e lo wer bound this b y d (1 − θ ). This directly implies: X v ∈ e V i e d i,v ( e d i,v − d (1 − θ )) ≥ 0 . (10) W e partition e V i := e V + i ˙ ∪ L i ˙ ∪ M i ˙ ∪ R i based on induced degree in e G i : • e V + i : e d v ≥ d (1 − θ ). • L i : d (1 − 2 √ θ ) ≤ e d v < d (1 − θ ). • M i : 2 √ θ d < e d v < d (1 − 2 √ θ ). • R i : e d v ≤ 2 √ θ d . Equation ( 10 ) requires the p ositive con tributions to out w eigh the negativ e ones: X e V + i e d v ( e d v − d (1 − θ )) ≥ X L i ∪ M i ∪ R i e d v ( d (1 − θ ) − e d v ) . (11) Since e d v ≤ d , the maximum positive con tribution p er v ertex in e V + i is d ( d − d (1 − θ )) = θd 2 . Th us, the LHS is b ounded by | e V + i | θ d 2 . F or v ∈ M i , the quadratic e d v ( d (1 − θ ) − e d v ) is a down w ard-facing parabola minimized at the endp oin ts of the interv al [2 √ θ d, d (1 − 2 √ θ )]. At either endp oin t, the v alue is at least 1 2 √ θ d 2 . Thus, the p enalty from M i is at least | M i | 1 2 √ θ d 2 . Comparing this to the LHS bound yields | M i | 1 2 √ θ d 2 ≤ | e V + i | θ d 2 = ⇒ | M i | ≤ 2 √ θ | e V + i | . W e define | e A ( S ) | = P v ∈ S e d v as the n um b er of edges in e G i inciden t to a subset S . T o b ound | e A ( M i ) | , w e choose S i ⊆ e V + i to b e the | M i | v ertices in e V + i with the smal lest degrees. Because S i ⊆ e V + i , ev ery u ∈ S i satisfies e d u ≥ d (1 − θ ) > e d v for all v ∈ M i . Bijecting M i to S i , and noting that the a verage degree of the smallest elements is strictly less than the o verall av erage, we obtain: | e A ( M i ) | = X v ∈ M i e d v < X u ∈ S i e d u ≤ | S i | P e V + i e d v | e V + i | = | M i | | e V + i | | e A ( e V + i ) | ≤ 2 √ θ | e A ( e V + i ) | . 32 F or R i , the p enalt y is e d v ( d (1 − θ ) − e d v ) ≥ e d v d (1 − θ − 2 √ θ ) ≥ d (1 − 3 √ θ ) e d v . Summing o ver R i giv es d (1 − 3 √ θ ) | e A ( R i ) | . The LHS of ( 11 ) is upper bounded by dθ P e V + i e d v = θ d | e A ( e V + i ) | . Th us: d (1 − 3 √ θ ) | e A ( R i ) | ≤ θ d | e A ( e V + i ) | = ⇒ | e A ( R i ) | ≤ θ 1 − 3 √ θ | e A ( e V + i ) | ≤ 2 θ | e A ( e V + i ) | . Summing the comp onents, the total edges in e G i is b ounded by: | e A i | = | e A ( e V + i ) | + | e A ( L i ) | + | e A ( M i ) | + | e A ( R i ) | ≤ | e A ( e V + i ) | + | e A ( L i ) | + 2 √ θ | e A ( e V + i ) | + 2 θ | e A ( e V + i ) | ≤ (1 + 4 √ θ )  | e A ( e V + i ) | + | e A ( L i ) |  . Define U i = e V + i ∪ L i , and let e U = S i ∈ I ′ U i . The num ber of edges in e G i inciden t to U i is exactly | e A ( U i ) | = | e A ( e V + i ) | + | e A ( L i ) | . Th us, | e A i | ≤ (1 + 4 √ θ ) | e A ( U i ) | . W e relate the edge mass of e U directly to the global graph. Because e G i is a disjoin t star forest cen tered at e V i , and the edges e E i represen t disjoint subsets of the global edges E (due to the distinct colors), there is absolutely no double-coun ting of edges across the sets U i . Thus e U p erfectly co vers at least P i ∈ I ′ | e A ( U i ) | distinct edges in G . Summing | e A i | o ver I ′ and applying the b ound established in Claim 7.8 (where | E i |−| e A i | | E i | ≤ η + 3 α τ = ⇒ | e A i | ≥ | E i |  1 − η − 3 α τ  ), we obtain: (1 + 4 √ θ ) | e A ( e U ) | ≥ X i ∈ I ′ | e A i | ≥ X i ∈ I ′ | E i |  1 − η − 3 α τ  . Using our safely retained edge mass bound P i ∈ I ′ | E i | ≥ (1 − 39 δ 2 ) | E | established at the b eginning of this pro of, we seamlessly b ound the sum: X i ∈ I ′ | e A i | ≥  1 − η − 3 α τ  (1 − 39 δ 2 ) | E | ≥  1 − η − 3 α τ − 39 δ 2  | E | . Dividing b y (1 + 4 √ θ ) and utilizing the algebraic iden tit y 1 − X 1+ Y ≥ 1 − X − Y (v alid for X, Y ≥ 0), w e cleanly lo w er b ound the co v ered edges without inflating the error terms: | e A ( e U ) | ≥ 1 − η − 3 α /τ − 39 δ 2 1 + 4 √ θ | E | ≥  1 − η − 3 α τ − 39 δ 2 − 4 √ θ  | E | . Finally , w e upp er b ound the size of e U . F or ev ery i ∈ I ′ and v ∈ U i , its induced degree in e G i satisfies e d i,v ≥ d (1 − 2 √ θ ). Since θ ≈ δ 3 ≪ 1, w e safely hav e 1 − 2 √ θ > 1 / 2. Because the total degree of an y vertex in the base graph G is exactly d , it is imp ossible for a vertex to hav e strictly more than d/ 2 incident edges assigned to tw o different clusters. Thus, the sets U i are explicitly pairwise disjoint. This guarantees that | e U | = P i ∈ I ′ | U i | and that the total num b er of edges cov ered b y e U across all e G i ev aluates exactly to P v ∈ e U e d i,v = | e A ( e U ) | . Since the maximum num b er of edges 33 in G is | E | = nd/ 2, w e ha v e: d (1 − 2 √ θ ) | e U | ≤ X v ∈ e U e d i,v = | e A ( e U ) | ≤ | E | = nd 2 . Rearranging this yields: | e U | ≤ n 2(1 − 2 √ θ ) ≤ n 2 (1 + 4 √ θ ) = n 2 + 2 √ θ n. Th us, w e ha ve found a subset e U of size at most n 2 + 2 √ θ n that co vers at least (1 − η − 3 α/τ − 39 δ 2 − 4 √ θ ) | E | edges. T o extract an exactly n/ 2-sized subset, we arbitrarily discard at most 2 √ θ n v ertices from e U . Since the maximum degree in the base graph G is d , discarding them sacrifices at most 2 √ θ nd = 4 √ θ | E | edges. If | e U | < n/ 2, we pad it arbitrarily without losing edges. The final remaining exactly n/ 2-sized subset cov ers at least (1 − η − 3 α/τ − 39 δ 2 − 8 √ θ ) | E | . As established earlier, τ = δ 5 , η = δ 5 , α = δ 10 , and ω = δ 3 . Th us, α/τ = δ 5 , and θ = δ 3 + 4 δ 5 + 12 δ 5 + 1 /d . Thus, θ ≤ δ 3 + 17 δ 5 . W e b ound the square ro ot b y √ θ ≤ δ 1 . 5 √ 1 + 17 δ 2 ≤ δ 1 . 5 (1 + 8 . 5 δ 2 ) = δ 1 . 5 + 8 . 5 δ 3 . 5 . Then 8 √ θ ≤ 8 δ 1 . 5 + 68 δ 3 . 5 . The total missed fraction is bounded b y: Missed ≤ δ 5 + 3 δ 5 + 39 δ 2 + 8 δ 1 . 5 + 68 δ 3 . 5 ≤ 10 δ 1 . 5 , where the final inequalit y holds b ecause Theorem 5.2 enforces δ ≤ 10 − 3 . 8 Lo w er Bounds for Exact Algorithms: Pro of of Corollary 1.3 In this section, w e prov e Corollary 1.3 . Supp ose that there is an algorithm solving exact k -means in R d in times T exact ( n, k , d ). The reduction proceeds as follo ws. Given a point set P , w e compute a α - coreset Ω consisting of poly( k /α ) p oin ts. The definition of a α -coreset ensures that, for an y p ossible solution S , cost( P , S ) = (1 ± α )cost(Ω , S ). In particular, an optimal clustering on Ω is a (1 + α )- appro ximate clustering for P . The running time for that procedure is T core ( n, d, k, α − 1 ) = O ( nk d ) [ CSS21 ]. Next, w e apply a Johnson-Lindenstrauss transformation on Ω. Specifically , we use Theorem 1.3 from [ MMR19 ], which states that an y set of n p oints may b e em b edded into O ( α − 2 log k /α ) dimensions suc h that for an y partition in to k clusters, the k -means cost of the clustering is preserv ed up to a (1 ± α ) factor. The running time for that dimension-reduction is T J L ( n, d, k, α − 1 ) = O ( nd log( k /α )). Th us, by applying a coreset algorithm in time T core ( n, d, k, α − 1 ), then a Johnson-Lindenstrauss transform in time T J L ( | Ω | , d, k, α − 1 ), and solving the resulting k -means instance exactly in time T exact ( | Ω | , k , α − 2 log( k /α )), the ov erall running time for the algorithm is T exact ( | Ω | , k , α − 2 log( k /α ))+ T J L ( | Ω | , d, k ) + T core ( n, d, k ). W e ha ve T core ( n, d, k, α − 1 ) + T J L ( | Ω | , d, k, α − 1 ) = O ( nk d ). Th us, from Theorem 5.1 , it m ust hold T exact ( | Ω | , k , α − 2 log k /α ) = 2 ( k/α ) 1 − o (1) . Assume there exists an algorithm running in time n ( k √ d ) 1 − β . Running this algorithm on Ω 34 w ould result in a running time of | Ω | ( k √ d ) 1 − β = 2 ( k √ d ) 1 − β · log | Ω | = 2 ( k/α ) 1 − β · polylog ( k/α ) ≤ 2 ( k/α ) 1 − β / 2 . This con tradicts Theorem 5.1 , and therefore the running time of the exact algorithm must be n ( k √ d ) 1 − o (1) . This pro ves the first part of Corollary 1.3 . T o prov e the second part, w e use a different dimension-reduction: in time p oly( n, d ), one can reduce the dimension to O ( k /α ) [ CEM + 15 ] while preserving the cost of any clustering. Therefore, the same argument as ab o ve enforces that T exact ( | Ω | , k , k/α ) ≥ 2 ( k α ) 1 − o (1) . Supp ose that T exact ( n, k , d ) = n d 1 − β . Then, T exact ( | Ω | , k , k/α ) = | Ω | ( k/α ) 1 − β = 2 ( k/α ) 1 − β · log | Ω | = 2 ( k/α ) 1 − β · polylog ( k/α )) ≤ 2 ( k/α ) 1 − β / 2 , whic h contradicts again Theorem 5.1 . This concludes the proof of Corollary 1.3 . Ac kno wledgements W e thank Dor Minzer, Euiwoong Lee, and Pasin Manurangsi for several discussions that help ed conceptualize the pro of approach in this pap er. References [ABB + 23] F ateme Abbasi, Sandip Banerjee, Jaroslaw Byrk a, Parin y a Chalermso ok, Ameet Gadek ar, Kamy ar Kho damoradi, D´ aniel Marx, Ro ohani Sharma, and Joachim Sp o- erhase. Parameterized appro ximation schemes for clustering with general norm ob- jectiv es. In 64th IEEE Annual Symp osium on F oundations of Computer Scienc e, F OCS 2023, Santa Cruz, CA, USA, Novemb er 6-9, 2023 , pages 1377–1399. IEEE, 2023. [ABM + 19] Marek Adamczyk, Jaroslaw Byrk a, Jan Marcink owski, Syed Mohammad Meesum, and Mic hal Wlo darczyk. Constant-factor FPT approximation for capacitated k- median. In Mic hael A. Bender, Ola Sv ensson, and Grzegorz Herman, editors, 27th A nnual Eur op e an Symp osium on Algorithms, ESA 2019, Septemb er 9-11, 2019, Mu- nich/Gar ching, Germany , v olume 144 of LIPIcs , pages 1:1–1:14. Schloss Dagstuhl - Leibniz-Zen trum f ¨ ur Informatik, 2019. [A CKS15] Pranjal Aw asthi, Moses Charik ar, Ra vishank ar Krishnaswam y , and Ali Kemal Sinop. The hardness of approximation of euclidean k-means. In Lars Arge and J´ anos Pac h, editors, 31st International Symp osium on Computational Ge ometry, SoCG 2015, June 22-25, 2015, Eindhoven, The Netherlands , volume 34 of LIPIcs , pages 754– 767. Schloss Dagstuhl - Leibniz-Zen trum f ¨ ur Informatik, 2015. 35 [AKP24] En ver Aman, Karthik C. S., and Sharath Punna. On connections betw een k-coloring and euclidean k-means. In Timoth y M. Chan, Johannes Fischer, John Iacono, and Grzegorz Herman, editors, 32nd Annual Eur op e an Symp osium on A lgorithms, ESA 2024, Septemb er 2-4, 2024, R oyal Hol loway, L ondon, Unite d Kingdom , volume 308 of LIPIcs , pages 9:1–9:18. Schloss Dagstuhl - Leibniz-Zentrum f ¨ ur Informatik, 2024. [AKS11] P er Austrin, Subhash Khot, and Muli Safra. Inappro ximability of v ertex cov er and indep enden t set in b ounded degree graphs. The ory of Computing , 7(1):27–43, 2011. [AP02] P ank a j K Agarw al and Cecilia Magdalena Pro copiuc. Exact and appro ximation algorithms for clustering. Algorithmic a , 33:201–226, 2002. [A W23] Amir Abb oud and Nathan W allheimer. W orst-case to expander-case reductions. In Y ael T auman Kalai, editor, 14th Innovations in The or etic al Computer Scienc e Confer enc e, ITCS 2023, January 10-13, 2023, MIT, Cambridge, Massachusetts, USA , volume 251 of LIPIcs , pages 1:1–1:23. Schloss Dagstuhl - Leibniz-Zen trum f ¨ ur Informatik, 2023. [BGI25] Sujo y Bhore, Ameet Gadek ar, and T anmay Inamdar. Coreset strik es bac k: Impro ved parameterized approximation schemes for (constrained) k-median/means, 2025. [BHI02] Mihai Badoiu, Sariel Har-P eled, and Piotr Indyk. Approximate clustering via core- sets. In John H. Reif, editor, Pr o c e e dings on 34th A nnual ACM Symp osium on The ory of Computing, May 19-21, 2002, Montr´ eal, Qu´ eb e c, Canada , pages 250– 257. ACM, 2002. [BJK18] An up Bhattac harya, Ragesh Jaisw al, and Amit Kumar. F aster algorithms for the constrained k-means problem. The ory Comput. Syst. , 62(1):93–115, 2018. [BJKW21] Vladimir Brav erman, Shaofeng H.-C. Jiang, Rob ert Krauthgamer, and Xuan W u. Coresets for clustering in excluded-minor graphs and b ey ond. In D´ aniel Marx, editor, Pr o c e e dings of the 2021 ACM-SIAM Symp osium on Discr ete Algorithms, SOD A 2021, Virtual Confer enc e, January 10 - 13, 2021 , pages 2679–2696. SIAM, 2021. Consulted on arxiv on May 2022. [CdMRR18] Vincen t Cohen-Addad, Arnaud de Mesmay , Ev a Roten b erg, and Alan Roytman. The bane of lo w-dimensionality clustering. In Pr o c e e dings of the 2018 Annual A CM- SIAM Symp osium on Discr ete A lgorithms (SOD A) , 2018. [CEM + 15] Mic hael B Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina P ersu. Dimensionality reduction for k-means clustering and lo w rank appro ximation. In Pr o c e e dings of the forty-seventh annual ACM symp osium on The ory of c omputing , pages 163–172, 2015. [CEMN22] Vincen t Cohen-Addad, Hossein Esfandiari, V ahab S. Mirrokni, and Sh yam Nara yanan. Impro ved approximations for euclidean k-means and k-median, via nested quasi-indep endent sets. In STOC ’22 . ACM, 2022. [CFS21] Vincen t Cohen-Addad, Andreas Emil F eldmann, and David Saulpic. Near-linear time appro ximation sc hemes for clustering in doubling metrics. J. ACM , 68(6):44:1– 44:34, 2021. 36 [CGK + 19] Vincen t Cohen-Addad, An upam Gupta, Amit Kumar, Euiw o ong Lee, and Jason Li. Tigh t FPT appro ximations for k-median and k-means. In Christel Baier, Ioannis Chatzigiannakis, P aola Flo cc hini, and Stefano Leonardi, editors, 46th International Col lo quium on Automata, L anguages, and Pr o gr amming, ICALP 2019, July 9-12, 2019, Patr as, Gr e e c e , volume 132 of LIPIcs , pages 42:1–42:14. Schloss Dagstuhl - Leibniz-Zen trum f ¨ ur Informatik, 2019. [CHX + 23] Xianrun Chen, Lu Han, Dach uan Xu, Yicheng Xu, and Y ong Zhang. k- median/means with outliers revisited: A simple fpt approximation. In W eili W u and Guangmo T ong, editors, Computing and Combinatorics - 29th International Confer enc e, COCOON 2023, Hawaii, HI, USA, De c emb er 15-17, 2023, Pr o c e e d- ings, Part II , v olume 14423 of L e ctur e Notes in Computer Scienc e , pages 295–302. Springer, 2023. [CK19] Vincen t Cohen-Addad and Karthik C. S. Inappro ximability of clustering in lp met- rics. In David Zuck erman, editor, 60th IEEE A nnual Symp osium on F oundations of Computer Scienc e, FOCS 2019, Baltimor e, Maryland, USA, Novemb er 9-12, 2019 , pages 519–539. IEEE Computer So ciety , 2019. [CKL21] Vincen t Cohen-Addad, Karthik C. S., and Euiwoong Lee. On approximabilit y of clustering problems without candidate cen ters. In D´ aniel Marx, editor, Pr o c e e dings of the 2021 ACM-SIAM Symp osium on Discr ete Algorithms, SOD A 2021, Virtual Confer enc e, January 10 - 13, 2021 , pages 2635–2648. SIAM, 2021. [CKL22] Vincen t Cohen-Addad, Karthik C. S., and Euiw o ong Lee. Johnson co v erage h yp oth- esis: Inappro ximability of k-means and k-median in ℓ p -metrics. In Joseph (Seffi) Naor and Niv Buch binder, editors, Pr o c e e dings of the 2022 ACM-SIAM Symp osium on Discr ete Algorithms, SOD A 2022, Virtual Confer enc e / Alexandria, V A, USA, January 9 - 12, 2022 , pages 1493–1530. SIAM, 2022. [CL19] Vincen t Cohen-Addad and Jason Li. On the fixed-parameter tractability of capac- itated clustering. In Christel Baier, Ioannis Chatzigiannakis, Paola Flo cc hini, and Stefano Leonardi, editors, 46th International Col lo quium on A utomata, L anguages, and Pr o gr amming, ICALP 2019, July 9-12, 2019, Patr as, Gr e e c e , volume 132 of LIPIcs , pages 41:1–41:14. Schloss Dagstuhl - Leibniz-Zentrum f ¨ ur Informatik, 2019. [CLSS22] Vincen t Cohen-Addad, Kasper Green Larsen, Da vid Saulpic, and Chris Sc hwiegelshohn. T o wards optimal low er bounds for k-median and k-means core- sets. In Stefano Leonardi and An upam Gupta, editors, STOC ’22: 54th Annual A CM SIGACT Symp osium on The ory of Computing, R ome, Italy, June 20 - 24, 2022 , pages 1038–1051. A CM, 2022. [CS22] Ra jesh Chitnis and Nitin Saurabh. Tigh t lo wer b ounds for approximate & exact k-cen ter in ∖ d . In Xavier Goao c and Michael Kerb er, editors, 38th International Symp osium on Computational Ge ometry, SoCG 2022, June 7-10, 2022, Berlin, Ger- many , v olume 224 of LIPIcs , pages 28:1–28:15. Schloss Dagstuhl - Leibniz-Zen trum f ¨ ur Informatik, 2022. 37 [CSS21] Vincent Cohen-Addad, David Saulpic, and Chris Sch wiegelshohn. A new coreset framew ork for clustering. In Samir Khuller and Virginia V assilevsk a Williams, edi- tors, STOC ’21: 53r d A nnual A CM SIGACT Symp osium on The ory of Computing, Virtual Event, Italy, June 21-25, 2021 , pages 169–182. ACM, 2021. [dBBK + 20] Mark de Berg, Hans L. Bodlaender, S´ andor Kisfaludi-Bak, D´ aniel Marx, and T om C. v an der Zanden. A framework for exp onential-time-h yp othesis-tigh t algorithms and lo wer b ounds in geometric intersection graphs. SIAM J. Comput. , 49(6):1291–1331, 2020. [DF09] Sanjoy Dasgupta and Y oav F reund. Random pro jection trees for vector quantization. IEEE T r ansactions on Information The ory , 55(7):3229–3242, 2009. [Din16] Irit Dinur. Mildly exp onential reduction from gap 3sat to polynomial-gap lab el- co ver. Ele ctr on. Col lo quium Comput. Complex. , TR16-128, 2016. [DS02] Irit Din ur and Shm uel Safra. The imp ortance of b eing biased. In John H. Reif, editor, Pr o c e e dings on 34th Annual ACM Symp osium on The ory of Computing, May 19-21, 2002, Montr ´ eal, Qu´ eb e c, Canada , pages 33–42. ACM, 2002. [FdlVKKR03] W enceslas F ernandez de la V ega, Marek Karpinski, Claire Keny on, and Y uv al Ra- bani. Appro ximation schemes for clustering problems. In La wrence L. Larmore and Mic hel X. Go emans, e ditors, Pr o c e e dings of the 35th Annual ACM Symp osium on The ory of Computing, June 9-11, 2003, San Die go, CA, USA , pages 50–58. ACM, 2003. [F GS21] F edor V. F omin, P etr A. Golo v ac h, and Kirill Simonov. P arameterized k -clustering: T ractabilit y island. J. Comput. Syst. Sci. , 117:50–74, 2021. [FL11] Dan F eldman and Mic hael Langb erg. A unified framework for appro ximating and clustering data. In Pr o c e e dings of the 43r d ACM Symp osium on The ory of Comput- ing, STOC 2011, San Jose, CA, USA, 6-8 June 2011 , pages 569–578, 2011. [FMS07] Dan F eldman, Morteza Monemizadeh, and Christian Sohler. A PT AS for k-means clustering based on w eak coresets. In Jeff Erickson, editor, Pr o c e e dings of the 23r d A CM Symp osium on Computational Ge ometry, Gye ongju, South Kor e a, June 6-8, 2007 , pages 11–18. A CM, 2007. [GI03] V enk atesan Gurusw ami and Piotr Indyk. Em b eddings and non-approximabilit y of geometric problems. In Pr o c e e dings of the F ourte enth Annual ACM-SIAM Symp o- sium on Discr ete Algorithms, January 12-14, 2003, Baltimor e, Maryland, USA. , pages 537–538, 2003. [GK99] Sudipto Guha and Samir Khuller. Greedy strikes bac k: Improv ed facilit y lo cation algorithms. J. Algorithms , 31(1):228–248, 1999. [Hae21] Willem H Haemers. Hoffman’s ratio b ound. Line ar Algebr a and its Applic ations , 617:215–219, 2021. 38 [HLM + 25] Jun-Ting Hsieh, Alexander Lub otzky , Sidhan th Mohan ty , Assaf Reiner, and Rac hel Y un Zhang. Explicit lossless vertex expanders. In 66th IEEE A nnual Symp o- sium on F oundations of Computer Scienc e, FOCS 2025, Sydney, Austr alia . IEEE, 2025. [HV20] Lingxiao Huang and Nisheeth K. Vishnoi. Coresets for clustering in euclidean spaces: imp ortance sampling is nearly optimal. In Konstan tin Mak arychev, Y ury Mak aryc hev, Madh ur T ulsiani, Gautam Kamath, and Julia Chuzho y , editors, Pr o c- c e dings of the 52nd A nnual ACM SIGACT Symp osium on The ory of Computing, STOC 2020, Chic ago, IL, USA, June 22-26, 2020 , pages 1416–1429. A CM, 2020. [HXDZ22] Lu Han, Dach uan Xu, Donglei Du, and Dongmei Zhang. An appro ximation algo- rithm for the uniform capacitated k-means problem. J. Comb. Optim. , 44(3):1812– 1823, 2022. [IKI94] Mary Inaba, Naoki Katoh, and Hiroshi Imai. Applications of w eighted voronoi diagrams and randomization to v ariance-based k -clustering (extended abstract). In Pr o c e e dings of the T enth Annual Symp osium on Computational Ge ometry, Stony Br o ok, New Y ork, USA, June 6-8, 1994 , pages 332–339, 1994. [IP01] Russell Impagliazzo and Ramamohan Paturi. On the complexit y of k-SA T. J. Comput. Syst. Sci. , 62(2):367–375, 2001. [IPZ01] Russell Impagliazzo, Ramamohan Paturi, and F rancis Zane. Which problems ha ve strongly exp onential complexity? J. Comput. Syst. Sci. , 63(4):512–530, 2001. [JKS14] Ragesh Jaisw al, Amit Kumar, and Sandeep Sen. A simple D 2-sampling based PT AS for k-means and other clustering problems. Algorithmic a , 70(1):22–46, 2014. [JKY15] Ragesh Jaisw al, Meh ul Kumar, and Pulkit Y adav. Improv ed analysis of d2-sampling based ptas for k-means and other clustering problems. Information Pr o c essing L et- ters , 115(2):100–103, 2015. [JMS02] Kamal Jain, Mohammad Mahdian, and Amin Sab eri. A new greedy approach for facilit y lo cation problems. In Pr o c e e dings on 34th A nnual ACM Symp osium on The ory of Computing, May 19-21, 2002, Montr´ eal, Qu´ eb e c, Canada , pages 731– 740, 2002. [Kho02] Subhash Khot. On the p o wer of unique 2-prov er 1-round games. In Pr o c e e dings of the thiry-fourth annual A CM symp osium on The ory of c omputing , pages 767–775, 2002. [KNW21] S´ andor Kisfaludi-Bak, Jesp er Nederlof, and Karol W egrzycki. A gap-eth-tight ap- pro ximation scheme for euclidean TSP. In 62nd IEEE A nnual Symp osium on F oun- dations of Computer Scienc e, FOCS 2021, Denver, CO, USA, F ebruary 7-10, 2022 , pages 351–362. IEEE, 2021. [KSS10] Amit Kumar, Y ogish Sabharwal, and Sandeep Sen. Linear-time appro ximation sc hemes for clustering problems in any dimensions. J. A CM , 57(2), 2010. 39 [Llo82] Stuart Llo yd. Least squares quan tization in pcm. IEEE tr ansactions on information the ory , 28(2):129–137, 1982. [LMS11] Daniel Loksh tano v, D´ aniel Marx, and Saket Saurabh. Low er b ounds based on the exp onen tial time h yp othesis. Bul l. EA TCS , 105:41–72, 2011. [LN17] Kasp er Green Larsen and Jelani Nelson. Optimalit y of the johnson-lindenstrauss lemma. In Chris Umans, editor, 58th IEEE Annual Symp osium on F oundations of Computer Scienc e, F OCS 2017, Berkeley, CA, USA, Octob er 15-17, 2017 , pages 633–638. IEEE Computer Society , 2017. [LSW17] Euiw o ong Lee, Melanie Sc hmidt, and John W right. Improv ed and simplified inap- pro ximability for k-means. Inf. Pr o c ess. L ett. , 120:40–43, 2017. [Man19] P asin Man urangsi. A note on max k-v ertex co v er: F aster fpt-as, smaller appro ximate k ernel and improv ed appro ximation. In Jeremy T. Fineman and Mic hael Mitzen- mac her, editors, 2nd Symp osium on Simplicity in Algorithms, SOSA 2019, January 8-9, 2019, San Die go, CA, USA , volume 69 of O ASIcs , pages 15:1–15:21. Sc hloss Dagstuhl - Leibniz-Zentrum f ¨ ur Informatik, 2019. [Man20] P asin Manurangsi. Tigh t running time low er b ounds for strong inapproximabilit y of maxim um k-co v erage, unique set co v er and related problems (via t-wise agreement testing theorem). In Pr o c e e dings of the F ourte enth A nnual A CM-SIAM Symp osium on Discr ete Algorithms , pages 62–81. SIAM, 2020. [Mar08] D´ aniel Marx. P arameterized complexity and approximation algorithms. Comput. J. , 51(1):60–78, 2008. [MM21] Theo McKenzie and Sidhanth Mohan t y . High-Girth Near-Raman ujan Graphs with Lossy V ertex Expansion. In Nikhil Bansal, Eman uela Merelli, and James W or- rell, editors, 48th International Col lo quium on A utomata, L anguages, and Pr o gr am- ming (ICALP 2021) , v olume 198 of L eibniz International Pr o c e e dings in Informatics (LIPIcs) , pages 96:1–96:15, Dagstuhl, German y , 2021. Sc hloss Dagstuhl – Leibniz- Zen trum f ¨ ur Informatik. [MMR19] Konstan tin Mak aryc hev, Y ury Mak arychev, and Ily a P . Razensh teyn. Performance of johnson-lindenstrauss transform for k -means and k -medians clustering. In Pr o- c e e dings of the 51st Annual ACM SIGACT Symp osium on The ory of Computing, STOC 2019, Pho enix, AZ, USA, June 23-26, 2019 , pages 1027–1038, 2019. [MNV12] Meena Maha jan, Pra jakta Nimbhork ar, and Kasturi R. V aradara jan. The planar k-means problem is np-hard. The or. Comput. Sci. , 442:13–21, 2012. [MR17] Pasin Manurangsi and Prasad Raghav endra. A birthday rep etition theorem and complexit y of appro ximating dense csps. In Ioannis Chatzigiannakis, Piotr Indyk, F abian Kuhn, and Anca Muscholl, editors, 44th International Col lo quium on Au- tomata, L anguages, and Pr o gr amming, ICALP 2017, July 10-14, 2017, Warsaw, Poland , volume 80 of LIPIcs , pages 78:1–78:15. Schloss Dagstuhl - Leibniz-Zen trum f ¨ ur Informatik, 2017. 40 [MS84] Nimro d Megiddo and Kenneth J Sup owit. On the complexity of some common geometric lo cation problems. SIAM journal on c omputing , 13(1):182–196, 1984. [V ad12] Salil P . V adhan. Pseudorandomness. F ound. T r ends The or. Comput. Sci. , 7(1-3):1– 336, 2012. 41

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment