FPGA Implementation of Sketched LiDAR for a 192 x 128 SPAD Image Sensor

FPGA Implementation of Sketched LiDAR for a 192 x 128 SPAD Image Sensor
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This study presents an efficient field-programmable gate array (FPGA) implementation of a polynomial spline function-based statistical compression algorithm designed to address the critical challenge of massive data transfer bandwidth in emerging high-spatial-resolution single-photon avalanche diode (SPAD) arrays, where data rates can reach tens of gigabytes per second. In our experiments, the proposed hardware implementation achieves a compression ratio of 512x compared with conventional histogram-based outputs, with the potential for further improvement. The algorithm is first optimized in software using fixed-point (FXP) arithmetic and look-up tables (LUTs) to eliminate explicit additions, multiplications, and non-linear operations. This enables a careful balance between accuracy and hardware resource utilization. Guided by this trade-off analysis, online sketch processing elements (SPEs) are implemented on an FPGA to directly process time-stamp streams from the SPAD sensor. The implementation is validated using a customized LiDAR setup with a 192 x 128-pixel SPAD array. This work demonstrates histogram-free online depth reconstruction with high fidelity, effectively alleviating the time-stamp transfer bottleneck of SPAD arrays and offering scalability as pixel counts continue to increase for future SPADs.


💡 Research Summary

This paper addresses the critical data‑transfer bottleneck inherent in emerging high‑resolution single‑photon avalanche diode (SPAD) image sensors, which can generate tens of gigabytes per second of timestamp streams. The authors present a field‑programmable gate array (FPGA) implementation of a polynomial‑spline‑based statistical compression algorithm—referred to as a “sketch”—that processes timestamps directly on‑chip without constructing per‑pixel histograms.

The core algorithm computes, for each pixel, M sketch values (\hat{z}_i) from the n detected photon timestamps (X_k) using a spline function (\phi_p(\cdot)) (Equation 1). In hardware, the authors fix (M=4) to balance accuracy and resource usage, and they adopt a linear spline (p = 1) because non‑linear splines would require multiplications and additional clock cycles. To eliminate all multiplications and non‑linear operations, the spline values are pre‑computed and stored in a read‑only memory (ROM) lookup table (LUT). The timestamp (X) (12‑bit, range 0–4095) is first reduced modulo the histogram bin count (4096) and then scaled to an address space of (2^N) entries; the authors choose (N=8) (256‑entry LUT), which can be accessed with a single clock‑cycle bit‑shift operation.

The FPGA design, written in parameterized Verilog, targets an Opal‑Kelly XEM7310 board (Cyclone‑V). Timestamps are serialized from the 192 × 128 SPAD array via a parallel‑in‑serial‑out (PISO) interface, requiring 192 × 128 clock cycles per frame. Four parallel sketch processing elements (SPEs) receive the stream; each SPE contains its own LUT for (\phi_p) and computes four B values per cycle. The LUT outputs are accumulated in 16‑bit block RAM (BRAM) entries, while photon counts (n) are accumulated in a separate BRAM. After a configurable number of frames (512 in the experiments) the 64‑bit accumulated sketch values are read out through two 32‑bit FIFOs over USB 3.0, achieving a 512‑fold reduction in data volume. The division by (n) required for final normalization is deferred to offline software, avoiding costly on‑chip division.

Resource utilization shows that LUTs and registers consume roughly 10 % of the device, whereas BRAM accounts for 61 % of the available memory, reflecting the design’s heavy reliance on on‑chip storage for sketch accumulation. The authors note that external SRAM could be leveraged in future work to increase the sketch dimension (M) and improve reconstruction fidelity.

Simulation results explore the impact of LUT depth and fixed‑point (FXP) format on reconstruction error. With a LUT depth of 256 and FXP format ⟨16, 7⟩, the average log‑error is 0.47 bins, i.e., sub‑bin accuracy. Experimental validation uses a 192 × 128 Quanticam SPAD sensor with 40 ps temporal resolution, a 2 ns pulsed laser, and a 5 ms exposure. The authors acquire 512 frames, compress them on‑chip, and compare the resulting sketches with those derived offline from full histograms. Depth maps reconstructed from the online sketches exhibit root‑mean‑square errors (RMSE) and regression slopes nearly identical to the offline baseline, confirming that the histogram‑free approach does not sacrifice depth accuracy. Additional tests with varying STOP‑signal delays (simulating different distances) and with a 3‑D‑printed object further demonstrate comparable performance between online and offline processing.

In summary, the paper delivers a practical, low‑latency FPGA solution for real‑time 3D imaging with large SPAD arrays. By exploiting fixed‑point arithmetic, LUT‑based spline evaluation, and on‑chip accumulation, the authors achieve a 512× compression ratio while preserving depth reconstruction quality. The work opens pathways for scaling to larger sensor formats, integrating spatial regularization, and moving toward video‑rate (≥30 fps) depth reconstruction through parallel hardware pipelines and external memory extensions.


Comments & Academic Discussion

Loading comments...

Leave a Comment