Covering Points by Disjoint Boxes with Outliers

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

For a set of n points in the plane, we consider the axis–aligned (p,k)-Box Covering problem: Find p axis-aligned, pairwise-disjoint boxes that together contain n-k points. In this paper, we consider the boxes to be either squares or rectangles, and we want to minimize the area of the largest box. For general p we show that the problem is NP-hard for both squares and rectangles. For a small, fixed number p, we give algorithms that find the solution in the following running times: For squares we have O(n+k log k) time for p=1, and O(n log n+k^p log^p k time for p = 2,3. For rectangles we get O(n + k^3) for p = 1 and O(n log n+k^{2+p} log^{p-1} k) time for p = 2,3. In all cases, our algorithms use O(n) space.

💡 Research Summary

The paper studies a geometric covering problem that blends clustering with outlier removal. Given a set P of n points in the plane, two integers p > 0 and k ≥ 0, the goal is to place p axis‑aligned squares (or rectangles) that are pairwise interior‑disjoint (their boundaries may touch) such that at least n − k points of P are covered. The objective is to minimize the area of the largest box among the p boxes. This is called the (p, k)‑Box Covering problem.

Hardness.
The authors first prove that when p is part of the input (i.e., not a fixed constant), the decision version of the problem is NP‑hard even for a fixed k. The reduction is from planar 3‑SAT. They construct variable gadgets consisting of 4N grid points that can be covered by exactly 2N unit squares in two distinct ways, encoding a true/false assignment. Clause gadgets consist of 4M + 1 points arranged linearly with three “link points” (switches). Connecting each clause gadget to its three literals are odd‑length chains of points ending in a switch. A switch is covered by a variable gadget only if the corresponding literal is set to true. Lemma 2 shows that covering a clause gadget (except one link point) requires exactly 2M unit squares; Lemma 3 shows that covering the clause together with its connections needs 2M + ⌈e_i/2⌉ squares per connection, and this is possible iff at least one switch is on. Consequently, a planar 3‑SAT instance is satisfiable iff the constructed point set can be covered with a specific number of unit squares. This establishes NP‑hardness for both the square and rectangle versions. Moreover, they derive a hardness‑of‑approximation result: if the formula is unsatisfiable, any covering using the same number of squares must contain a square of area at least 9/4, i.e., side length ≥ 3/2.

Lower bound.
In the algebraic decision‑tree model they prove an Ω(n log n) lower bound even for fixed p, by reducing the 1‑dimensional duplicate‑detection problem to a covering instance where a zero‑area square exists iff a duplicate is present.

Algorithms for constant p.
The main technical contribution is a suite of exact algorithms when p is a small constant (1, 2, 3). All algorithms run in linear space.

Single box (p = 1).
For squares the problem is equivalent to the rectilinear 1‑center (minimum‑enclosing square) with outliers, an LP‑type problem of dimension 3. Using known LP‑type techniques they achieve O(n + k log k) time, improving over the previous O(n log n) bound. For rectangles the best known algorithm runs in O(n + k³) time (citing Atanasov et al.).

Two and three boxes (p = 2, 3) – squares.
The algorithm proceeds by binary searching on the candidate maximum side length λ. For a fixed λ, the decision problem reduces to checking whether the points can be covered by p disjoint λ‑by‑λ squares after discarding at most k points. They sort points by x‑coordinate, enumerate O(n) possible left/right boundaries for each square, and use a dynamic‑programming style sweep to allocate squares while counting uncovered points. The decision test runs in O(n log n) time. The outer binary search contributes a factor of O(log k). The combinatorial explosion due to choosing which points become outliers is bounded by O(k^p), leading to total times O(n log n + k² log² k) for p = 2 and O(n log n + k³ log³ k) for p = 3.

Rectangles (p = 2, 3).
Rectangles have two independent side lengths, so the decision test must consider both width and height. The authors adapt the square algorithm by enumerating candidate widths and heights separately, which multiplies the combinatorial factor by an extra k for each additional degree of freedom. Consequently the running times become O(n log n + k⁴ log k) for p = 2 and O(n log n + k⁵ log k) for p = 3.

All algorithms use only O(n) additional memory (arrays for sorted points, a few counters, and recursion stacks).

Implications and future work.
The results close a gap in the literature: prior work handled only p = 1 with outliers, while the classic p‑center problem (allowing overlapping squares) is known to be NP‑hard for p ≥ 5. By proving NP‑hardness for the disjoint version with outliers and providing near‑optimal exact algorithms for constant p, the paper offers both theoretical insight and practical tools for applications such as facility location, image segmentation, and GIS where non‑overlapping clusters and robustness to noise are required. Future directions include approximation algorithms for larger p, extensions to higher dimensions, and empirical evaluation on real‑world datasets.

Covering Points by Disjoint Boxes with Outliers

💡 Research Summary

Comments & Academic Discussion

Leave a Comment