Helios 2.0: A Robust, Ultra-Low Power Gesture Recognition System Optimised for Event-Sensor based Wearables
We present an advance in wearable technology: a mobile-optimized, real-time, ultra-low-power event camera system that enables natural hand gesture control for smart glasses, dramatically improving user experience. While hand gesture recognition in computer vision has advanced significantly, critical challenges remain in creating systems that are intuitive, adaptable across diverse users and environments, and energy-efficient enough for practical wearable applications. Our approach tackles these challenges through carefully selected microgestures: lateral thumb swipes across the index finger (in both directions) and a double pinch between thumb and index fingertips. These human-centered interactions leverage natural hand movements, ensuring intuitive usability without requiring users to learn complex command sequences. To overcome variability in users and environments, we developed a novel simulation methodology that enables comprehensive domain sampling without extensive real-world data collection. Our power-optimised architecture maintains exceptional performance, achieving F1 scores above 80% on benchmark datasets featuring diverse users and environments. The resulting models operate at just 6-8 mW when exploiting the Qualcomm Snapdragon Hexagon DSP, with our 2-channel implementation exceeding 70% F1 accuracy and our 6-channel model surpassing 80% F1 accuracy across all gesture classes in user studies. These results were achieved using only synthetic training data. This improves on the state-of-the-art for F1 accuracy by 20% with a power reduction 25x when using DSP. This advancement brings deploying ultra-low-power vision systems in wearable devices closer and opens new possibilities for seamless human-computer interaction.
💡 Research Summary
Helios 2.0 presents a complete end‑to‑end solution for ultra‑low‑power, real‑time hand‑gesture recognition on smart‑glass wearables, leveraging event‑camera technology and a DSP‑optimized neural network. The authors first define a set of four micro‑gestures—left and right thumb swipes across the index finger and a double pinch between thumb and index fingertips—that are natural, require minimal hand movement, and generate strong contrast events suitable for asynchronous sensing. To avoid costly real‑world data collection, they build a custom simulation pipeline that combines the ESIM event‑generation framework with a Unity‑based rendering engine. The simulator creates long, multi‑gesture sequences (≈2 s) using a Markov transition model, class‑balanced sampling, and kinematic blending, thereby producing diverse synthetic datasets that faithfully mimic variations in hand pose, speed, lighting, and background.
The core of Helios 2.0 is a five‑stage neural architecture designed for quantization‑aware training (QAT) and for execution almost entirely on the Qualcomm Snapdragon Hexagon DSP. Over 99.8 % of the compute is quantized to 8‑bit and mapped to DSP kernels, while only a tiny fraction runs on the main CPU. The network processes multi‑channel event representations: a 2‑channel version consumes time‑surface maps and achieves >70 % F1 across all gesture classes; a 6‑channel version adds event‑volume features and reaches >80 % F1. Training proceeds on the large synthetic corpus, followed by fine‑tuning with rotational augmentation to improve robustness to hand‑pose variations.
Power consumption is measured on a Snapdragon XR2‑Gen2 development board using a high‑resolution ADC to capture current and voltage across a shunt resistor. By subtracting the baseline (no‑model) power from the total and dividing by the number of inference iterations, the authors isolate the model’s energy draw, which is consistently 6–8 mW during inference. This represents a 25× reduction compared to prior event‑camera systems (≈150 mW) and a 20 % boost in accuracy over the earlier Helios 1.0. Latency is reduced from 60 ms to 2.34 ms, enabling truly responsive interaction.
Extensive real‑world evaluation is performed on three benchmark datasets covering user variability (multiple users, fixed environment), environmental variability (single expert across diverse lighting and background conditions), and outdoor scenarios. In all cases, the models trained solely on synthetic data generalize well, confirming the effectiveness of the simulation methodology. The paper also discusses related work in frame‑based vision, sEMG, ultrasound, and other event‑camera approaches, highlighting how Helios 2.0 uniquely combines micro‑gesture design, synthetic data generation, and DSP‑centric quantized inference.
Future directions include expanding the gesture vocabulary, incorporating user‑specific adaptation, fusing additional modalities such as inertial measurement units or voice commands, and refining the noise model in the simulator for even higher fidelity. By demonstrating that high‑accuracy, low‑latency gesture recognition can be achieved at sub‑10 mW power budgets, Helios 2.0 pushes event‑based vision toward practical deployment in everyday wearable devices, opening new avenues for seamless human‑computer interaction.
Comments & Academic Discussion
Loading comments...
Leave a Comment