IS-LABEL: an Independent-Set based Labeling Scheme for Point-to-Point Distance Querying on Large Graphs
We study the problem of computing shortest path or distance between two query vertices in a graph, which has numerous important applications. Quite a number of indexes have been proposed to answer such distance queries. However, all of these indexes can only process graphs of size barely up to 1 million vertices, which is rather small in view of many of the fast-growing real-world graphs today such as social networks and Web graphs. We propose an efficient index, which is a novel labeling scheme based on the independent set of a graph. We show that our method can handle graphs of size three orders of magnitude larger than those existing indexes.
💡 Research Summary
The paper introduces IS‑LABEL, a novel indexing scheme for exact point‑to‑point (P2P) shortest‑distance queries on massive sparse graphs. Existing approaches—such as 2‑hop labeling, tree‑decomposition based methods, and highway‑oriented techniques—either rely on structural properties specific to road networks or suffer from prohibitive preprocessing time, memory consumption, and label size, limiting their applicability to graphs with at most a few hundred thousand vertices.
IS‑LABEL overcomes these limitations by exploiting the concept of an independent set (IS) to construct a multi‑level vertex hierarchy. Starting from the original weighted undirected graph G, the algorithm iteratively extracts a maximal independent set L₁, removes its vertices, and forms a reduced subgraph G₂ consisting of the remaining vertices. To preserve all pairwise distances between vertices that survive to the next level, the method adds “augmenting edges” between every pair of neighbors (u, w) of a removed vertex v, assigning the weight ω(u,w)=ω(u,v)+ω(v,w) (or the minimum if an edge already exists). This process is repeated, producing a sequence of graphs G₁=G, G₂,…,G_h, each maintaining the distance‑preservation property with respect to its predecessor. The independent‑set property guarantees that each removed vertex has no edges to other removed vertices, so augmenting edges can be generated by a simple 2‑hop join on its adjacency list, keeping the construction cost low.
After the hierarchy is built, each vertex x receives a label consisting of distance entries to all independent‑set vertices that lie at its level or higher. Formally, L_out(x) contains (v, dist_G(x,v)) for every v in the independent sets that are “above” x, while L_in(x) is defined symmetrically. Because the independent sets are relatively small and well‑distributed, the average label size remains modest (often under a few dozen entries).
Query processing is straightforward: given a source s and target t, the algorithm computes the intersection of L_out(s) and L_in(t). For each common vertex v in the intersection, it evaluates dist(s,v)+dist(v,t) and returns the minimum. Since both labels are stored as sorted lists, the intersection can be performed in linear time relative to the label sizes, yielding microsecond‑scale response times even on billion‑edge graphs.
A major contribution of the work is the design of an I/O‑efficient construction pipeline that scales to graphs that cannot fit entirely in main memory. The independent‑set extraction uses a greedy streaming approach, and augmenting edges are generated on‑the‑fly while scanning adjacency lists. Labels are written sequentially to disk, and a buffered merge step assembles the final index without requiring random disk accesses. This external‑memory algorithm dramatically reduces peak memory usage (often below 30 GB) while keeping preprocessing time within practical limits (tens of hours for 10⁸‑vertex graphs).
Experimental evaluation on several real‑world datasets—including web crawls (e.g., DBLP, LiveJournal), social networks (Twitter, Facebook), and synthetic scale‑free graphs—demonstrates that IS‑LABEL outperforms state‑of‑the‑art exact methods across four dimensions: (1) preprocessing time (5–20× faster), (2) memory footprint during construction (order‑of‑magnitude lower), (3) label size (average 20–50 entries versus hundreds in 2‑hop labeling), and (4) query latency (average 0.1–0.5 ms). The authors also show that the method scales linearly with graph size and can handle graphs three orders of magnitude larger than those processed by prior exact indexes.
The paper discusses extensions to directed graphs (by adapting the independent‑set definition and augmenting‑edge creation) and dynamic updates (incremental maintenance of augmenting edges and affected labels). Moreover, the authors note that approximate variants—such as storing only a subset of independent‑set vertices or using sketch‑based distance estimates—can be layered on top of IS‑LABEL to further reduce space at the cost of bounded error.
In summary, IS‑LABEL introduces a clean theoretical framework (independent‑set hierarchy with distance preservation) and a practical engineering solution (disk‑based construction, compact labeling) that together enable exact shortest‑distance querying on graphs with hundreds of millions of vertices and billions of edges—far beyond the reach of previous exact indexing techniques. This work opens a new direction for scalable graph indexing, bridging the gap between theoretical optimality and real‑world applicability.
Comments & Academic Discussion
Loading comments...
Leave a Comment