Learning Functions of Halfspaces

Learning Functions of Halfspaces
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We give an algorithm that learns arbitrary Boolean functions of $k$ arbitrary halfspaces over $\mathbb{R}^n$, in the challenging distribution-free Probably Approximately Correct (PAC) learning model, running in time $2^{\sqrt{n} \cdot (\log n)^{O(k)}}$. This is the first algorithm that can PAC learn even intersections of two halfspaces in time $2^{o(n)}.$


💡 Research Summary

The paper tackles the long‑standing open problem of learning arbitrary Boolean functions of a constant number k of halfspaces in the distribution‑free PAC model. While a single halfspace can be learned in polynomial time, even the intersection of two halfspaces was known to require polynomial‑threshold functions of degree Ω(n), precluding sub‑exponential algorithms via the classic polynomial method. The authors break this barrier by abandoning polynomial representations and instead exploiting geometric margin properties of the unknown halfspaces.

The core technical tool is an algorithmic “Förster transform” (from recent work of Diakonikolas et al.) which, given a halfspace, produces a sample containing an Ω(1/√n)‑margin point in polynomial time. Using this, the algorithm repeatedly draws random Gaussian vectors g ∼ N(0, I_n) and looks for one that has a large inner product with the unknown normal vector w of a target halfspace (w·g ≥ α, where α≈√log n·n^{1/4}). With probability 2^{‑Θ(√n)} such a g exists, and after O(2^{Θ(√n)}) trials one is found with high probability. The halfspace of interest is then “fixed” on the region R₊(g) = { x : g·x ≥ α/10 } (or R₋(g) for the negative side). Within this region, the original halfspace’s label is almost constant; the opposite label occurs with probability only 2^{‑Θ(√n)}.

For learning the intersection of two halfspaces, the algorithm first fixes one halfspace using the above procedure, then repeats the process on the remaining points to fix the second halfspace. To avoid the “credit assignment” problem (uncertainty about which halfspace caused a negative example), the algorithm strengthens the margin requirement in the first step to α·log n, which yields a region where the first halfspace is fixed with error n^{‑log n}. This stronger guarantee forces the points that remain ambiguous to have a large margin with respect to the second halfspace, allowing the second fixing step to succeed with even higher confidence. The resulting region contains a 2^{‑Θ(√n)} fraction of the sample and is labeled correctly on a ½ + γ fraction of examples, where γ = 2^{‑Θ(√n)}.

The method generalizes to any constant k. The algorithm iteratively fixes each halfspace, each time using a Gaussian vector with inner product at least α·log n with the current unknown normal. After k iterations, the intersection of the k regions yields a set R where all k halfspaces are simultaneously fixed (each taking a predetermined sign s_i). Consequently the target Boolean function f(x)=g(h₁(x),…,h_k(x)) is almost constant on R, taking the value g(s₁,…,s_k) on the overwhelming majority of points in R. A hypothesis that outputs this constant on R and the majority label elsewhere achieves accuracy ½ + γ with γ = 2^{‑Θ(√n·log^{O(k)} n)}.

A standard boosting reduction turns this weak learner into a full PAC learner. The total running time is
 poly(2^{√n·(log n)^{O(k)}}, 1/ε, log 1/δ),
which is sub‑exponential in n for any constant k. Notably, this is the first algorithm that learns even the intersection of two halfspaces in time 2^{o(n)} under arbitrary distributions, let alone arbitrary Boolean combinations of k halfspaces.

The paper’s significance lies in bypassing the polynomial‑threshold‑function lower bounds that have dominated prior work, introducing a novel combination of margin‑based region construction and the Förster transform. The techniques open a new avenue for tackling other depth‑2 neural‑network models, multi‑index functions, and related high‑dimensional Boolean learning problems where previous approaches were stymied by exponential degree requirements.


Comments & Academic Discussion

Loading comments...

Leave a Comment