Event Driven Clustering Algorithm
This paper introduces a novel asynchronous, event-driven algorithm for real-time detection of small event clusters in event camera data. Like other hierarchical agglomerative clustering algorithms, the algorithm detects the event clusters based on their tempo-spatial distance. However, the algorithm leverages the special asynchronous data structure of event camera, and by a sophisticated, efficient and simple decision-making, enjoys a linear complexity of $O(n)$ where $n$ is the events amount. In addition, the run-time of the algorithm is independent with the dimensions of the pixels array.
💡 Research Summary
The paper presents a novel asynchronous, event‑driven clustering algorithm designed specifically for the data stream produced by neuromorphic (event) cameras. Unlike conventional frame‑based or batch clustering methods, which typically require multiple passes over the data and exhibit at best O(n log n) or even O(n²) computational complexity, the proposed method processes each incoming event exactly once, achieving linear time complexity O(n) where n is the total number of events.
Problem Context
Event cameras output a continuous, time‑ordered list of asynchronous events vᵢ = (tᵢ, xᵢ, yᵢ, pᵢ), where each event records the timestamp, pixel coordinates, and polarity (sign of brightness change). Because events are sparse and temporally precise (microsecond resolution), they are well suited for high‑speed vision tasks, but traditional computer‑vision pipelines that rely on fixed‑rate frames cannot exploit this asynchronous nature efficiently. The authors argue that a clustering algorithm that works directly on the event stream, without binning or buffering, is essential for real‑time small‑object detection, motion segmentation, and related applications.
Algorithmic Core
The method builds an implicit directed graph G = (V, E) where each event is a vertex. Edges are added according to two temporal‑spatial criteria:
- Exact‑pixel temporal proximity – If a new event occurs at the same pixel as an existing cluster and its timestamp differs by no more than a predefined temporal window δ, it is attached to that cluster.
- Neighbourhood temporal proximity – If there exists any pixel within a spatial radius d whose most recent event (the “time surface”) is within δ of the new event’s timestamp, the new event is attached to the cluster rooted at that neighbour.
If neither condition holds, a new cluster is started, and the current event becomes the root of that cluster. The graph formed in this way is a poly‑forest: each connected component is a rooted tree, and the root is simply the earliest event in that component.
Data Structures
To achieve O(1) per‑event processing, the algorithm maintains several r × s (sensor resolution) arrays:
- TimeSurface – timestamp of the latest event at each pixel.
- PointerX / PointerY – coordinates of the root of the cluster currently associated with the pixel (‑1 if none).
- grade – number of events already assigned to the cluster whose root is stored in the pixel.
- pixels – number of distinct pixels contributing to that cluster.
- ClusterBegin / ClusterEnd – timestamps of the first and most recent events of the cluster.
- ClusterID – index of the cluster in the output list (‑1 if not yet output).
- Compatibility – timestamp of the last cluster that the pixel was compatible with (used to detect when a pixel’s previous cluster has been superseded).
When a new event arrives, the algorithm follows a decision‑tree (illustrated in the paper’s block diagram):
- Pointer initialization if the pixel has never been seen.
- Check for “stale” root – whether the pixel still points to a cluster that has already been replaced by a newer one (using Compatibility).
- Temporal‑spatial check on the current root – if the time difference to the root’s last event ≤ δ, attach the event to that cluster and update grade and ClusterEnd.
- Neighbourhood search – scan the d‑radius neighbourhood for the pixel with the smallest (tᵢ − TimeSurface) value; if such a pixel exists, attach the event to its cluster (updating the new pixel’s pointers).
- Create a new cluster if none of the above conditions hold.
Output Policy
Two static thresholds are defined: n (minimum number of events) and m (minimum number of distinct pixels). A cluster is emitted to the final “Clusters” list only when both thresholds are satisfied. The list stores, for each emitted cluster, the root timestamp, root coordinates, last‑event timestamp, total event count, and pixel count. The algorithm can optionally emit clusters incrementally as soon as they cross the thresholds, providing immediate feedback for downstream tasks.
Complexity and Independence from Resolution
Because each event triggers a constant‑time series of array look‑ups and updates, the overall runtime scales linearly with the number of events, regardless of the sensor’s spatial resolution. Memory consumption is O(r·s) due to the per‑pixel arrays; this is acceptable for typical event cameras (e.g., 640 × 480) but may become a consideration for ultra‑high‑resolution sensors.
Demonstration
The authors illustrate the algorithm using a dataset where a 100 Hz sinusoidal power line drives an incandescent lamp, generating a periodic burst of events. A 3‑D plot visualizes how clusters emerge over time, showing that the root of each cluster is reported as soon as the event count exceeds the predefined threshold. No quantitative benchmarks (e.g., processing latency, CPU/GPU usage, comparison with DBSCAN‑style event clustering) are provided.
Critical Assessment
Strengths
- True event‑driven processing: By never buffering or binning events, the method fully exploits the microsecond temporal resolution of event cameras.
- Linear time complexity: The O(n) claim is well‑justified by the constant‑time per‑event operations.
- Immediate root reporting: Useful for real‑time applications where early detection of a small moving object is critical (e.g., fast‑moving drones, industrial inspection).
- Simple parameter set: Only four static parameters (δ, d, n, m) are required, making the algorithm easy to configure.
Weaknesses
- Lack of empirical evaluation: No runtime measurements, memory profiling, or accuracy analysis are presented, making it difficult to assess practical performance.
- Static thresholds: δ and d are fixed a priori; in dynamic lighting or motion conditions, adaptive thresholds would likely improve robustness.
- Polarity ignored: The algorithm discards the polarity field, missing an opportunity to differentiate between positive and negative brightness changes, which can be valuable for edge‑oriented tasks.
- Memory scaling: While runtime is resolution‑independent, memory scales with r·s; for 1280 × 720 sensors the per‑pixel arrays could become a bottleneck on embedded platforms.
- No comparison to state‑of‑the‑art: The paper does not benchmark against recent event‑based clustering methods (e.g., asynchronous DBSCAN, mean‑shift on time surfaces, graph‑cut approaches), leaving the relative merit unclear.
- Single‑threaded assumption: The algorithm is described as a sequential loop; modern neuromorphic processing pipelines often exploit parallelism (GPU, FPGA, or multi‑core CPUs). Extending the method to parallel execution is non‑trivial because of the shared per‑pixel structures.
Potential Extensions
- Adaptive δ and d: Implement a feedback loop that adjusts temporal and spatial windows based on observed event rates or scene dynamics.
- Polarity‑aware clustering: Incorporate pᵢ into the distance metric, allowing separate clusters for positive and negative changes or weighting them differently.
- GPU/FPGA acceleration: Map the per‑pixel arrays to shared memory on a GPU and process batches of events in parallel, while preserving the “no look‑back” property via atomic updates.
- Dynamic memory management: Use sparse hash tables for the per‑pixel structures to reduce memory for high‑resolution, low‑activity scenes.
- Benchmark suite: Evaluate on standard event‑camera datasets (e.g., N‑MNIST, DVS128 Gesture) and compare detection latency, false‑positive/negative rates, and resource usage against asynchronous DBSCAN and mean‑shift baselines.
Conclusion
The paper introduces a conceptually elegant, linear‑time clustering algorithm that directly leverages the asynchronous nature of event cameras. Its primary contribution lies in demonstrating that small‑scale clusters can be detected in a single pass without any temporal buffering, and that the root of each cluster can be reported immediately after a simple count threshold is crossed. However, the lack of quantitative experiments, adaptive mechanisms, and exploitation of the full event payload (polarity) limits the immediate applicability of the method. Future work that addresses these gaps and provides rigorous benchmarks will be essential to establish the algorithm as a practical tool for real‑time neuromorphic vision systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment