Sketching, Moment Estimation, and the Lévy-Khintchine Representation Theorem
In the $d$-dimensional turnstile streaming model, a frequency vector $\mathbf{x}=(\mathbf{x}(1),\ldots,\mathbf{x}(n))\in (\mathbb{R}^d)^n$ is updated entry-wisely over a stream. We consider the problem of $f$-moment estimation for which one wants to estimate $$f(\mathbf{x})=\sum_{v\in[n]}f(\mathbf{x}(v))$$ with a small-space sketch. In this work we present a simple and generic scheme to construct sketches with the novel idea of hashing indices to Lévy processes, from which one can estimate the $f$-moment $f(\mathbf{x})$ where $f$ is the characteristic exponent of the Lévy process. The fundamental Lévy-Khintchine representation theorem completely characterizes the space of all possible characteristic exponents, which in turn characterizes the set of $f$-moments that can be estimated by this generic scheme. The new scheme has strong explanatory power. It unifies the construction of many existing sketches and it implies the tractability of many nearly periodic functions that were previously unclassified. Furthermore, the scheme can be conveniently generalized to multidimensional cases ($d\geq 2$) by considering multidimensional Lévy processes and can be further generalized to estimate heterogeneous moments by projecting different indices with different Lévy processes. We conjecture that the set of tractable functions can be characterized using the Lévy-Khintchine representation theorem via what we called the Fourier-Hahn-Lévy method.
💡 Research Summary
The paper establishes a unified framework for moment estimation and weighted sampling in streaming algorithms by exploiting the deep connection between Lévy processes and data sketches. It begins by introducing the M‑turnstile model, a highly general streaming setting where updates occur over any commutative monoid (M,+). This abstraction simultaneously captures the classic turnstile model (integers or reals), multidimensional vectors, and purely incremental (non‑negative) streams.
Three canonical problems are defined: (1) f‑moment estimation in an ℝ^d‑turnstile, where the goal is to approximate ∑v f(x(v)) for a function f; (2) G‑moment estimation in an ℝ+‑turnstile, a special case where f is non‑negative; and (3) G‑sampling, i.e., returning an index v with probability proportional to G(x(v)). The authors survey prior work, noting that many existing sketches (AMS for F₂, Indyk’s stable sketches for F_p, PCSA/HyperLogLog for distinct counting, etc.) handle only specific families of functions, often with ad‑hoc analyses.
The central technical insight is that any Lévy process X_t on ℝ^d possesses a characteristic exponent f_X(z) defined by E
Comments & Academic Discussion
Loading comments...
Leave a Comment