TsetlinWiSARD: On-Chip Training of Weightless Neural Networks using Tsetlin Automata on FPGAs

TsetlinWiSARD: On-Chip Training of Weightless Neural Networks using Tsetlin Automata on FPGAs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Increasing demands for adaptability, privacy, and security at the edge have persistently pushed the frontiers for a new generation of machine learning (ML) algorithms with training and inference capabilities on-chip. Weightless Neural Network (WNN) is such an algorithm that is principled on lookup table based simple neuron structures. As a result, it offers architectural benefits, such as low-latency, low-complexity inference, compared to deep neural networks that depend heavily on multiply-accumulate operations. However, traditional WNNs rely on memorization-based one-shot training, which either leads to overfitting and reduced accuracy or requires tedious post-training adjustments, limiting their effectiveness for efficient on chip training. In this work, we propose TsetlinWiSARD, a training approach for WNNs that leverages Tsetlin Automata (TAs) to enable probabilistic, feedback-driven learning. It overcomes the overfitting of WiSARD’s one-shot training with iterative optimization, while maintaining simple, continuous binary feedback for efficient on-chip training. Central to our approach is a field programmable gate array (FPGA)-based training architecture that delivers state-of-the-art accuracy while significantly improving hardware efficiency. Our approach provides over 1000x faster training when compared with the traditional WiSARD implementation of WNNs. Further, we demonstrate 22% reduced resource usage, 93.3% lower latency, and 64.2% lower power consumption compared to FPGA-based training accelerators implementing other ML algorithms.


💡 Research Summary

The paper introduces TsetlinWiSARD, a novel on‑chip training framework that combines the classic WiSARD weightless neural network (WNN) with Tsetlin Automata (TA). Traditional WiSARD relies on a one‑shot memorization scheme: each LUT (lookup table) entry is set to 1 when a training pattern activates it, and remains unchanged thereafter. While this yields ultra‑low‑latency inference, it suffers from severe over‑fitting, saturation of LUT entries on large datasets, and an inability to adapt after deployment. Existing remedies such as B‑bleaching still require post‑training thresholding and are unsuitable for real‑time edge learning.

Algorithmic contribution
TsetlinWiSARD replaces every LUT entry with a dedicated TA. Each TA has 2N internal states, split into two zones that correspond to binary outputs 0 and 1. At initialization all TAs are placed at the midpoint (states N or N+1), providing maximal uncertainty. Training proceeds epoch‑wise over the dataset. For each sample, the model predicts a label ˆy; if the prediction matches the true label y, no feedback is issued. When a misclassification occurs, two probabilistic feedback actions are triggered with a preset probability P:

  1. Increment the TAs belonging to the correct discriminator y at the addresses indexed by the sample. This pushes those entries toward state > N, i.e., output 1, strengthening the correct class’s vote.
  2. Decrement the TAs belonging to the incorrectly predicted discriminator ˆy at the same addresses, moving them toward output 0 and weakening the wrong class’s vote. Other discriminators remain untouched. Each TA makes an independent random decision based on a lightweight pseudo‑random generator, ensuring stochastic exploration and avoiding local minima.

The authors systematically study four key hyper‑parameters:

  • Number of LUTs (hence number of TAs) – accuracy improves with more LUTs but exhibits diminishing returns.
  • Inputs per LUT – increasing the LUT fan‑in (e.g., from 3‑input to 9‑input) dramatically expands the address space, yielding higher accuracy, though FPGA hardware often fixes this fan‑in.
  • Number of TA states – performance stabilizes once the state count exceeds ~128, indicating sufficient resolution for noise robustness.
  • Feedback probability P – lower P (e.g., 0.1) yields smoother convergence; higher P accelerates learning but can cause larger fluctuations. All values eventually converge given enough epochs.

Hardware architecture
Implemented on a Xilinx XC7Z020, the design maps logical LUTs directly onto the FPGA’s configurable RAM (LUTRAM). A group of 2ⁿ TAs is compacted into a single n‑input physical LUT, drastically reducing resource consumption. Training operations consist only of simple logical comparisons, increment/decrement counters, and a lightweight linear‑feedback shift register (LFSR) for random number generation. No processor or external memory is required; the entire learning loop runs in pure FPGA fabric, enabling truly “processor‑free” on‑chip training.

Experimental results
Six benchmark datasets (including MNIST and several UCI classification tasks) were used to compare TsetlinWiSARD against:

  • Standard WiSARD (one‑shot)
  • B‑bleaching (post‑training thresholding)
  • Tsetlin Machine (TA‑based conjunctive clause learning)
  • CNN‑based FPGA accelerators

When matched for model size (identical number of LUTs), TsetlinWiSARD consistently achieved 4–8 % higher classification accuracy. Training speed was over 1,000× faster than the only existing WiSARD accelerator, thanks to the fully parallel, binary‑feedback learning pipeline. Resource utilization dropped by 22 %, inference latency reduced by 93.3 %, and power consumption fell by 64.2 % relative to other FPGA training accelerators (TM and CNN). The architecture also retains compatibility with existing WiSARD inference optimizations (e.g., hash‑based LUT compression), because the inference path remains unchanged.

Significance and future work
TsetlinWiSARD demonstrates that weightless neural networks can be equipped with an iterative, stochastic learning rule that preserves their hardware friendliness while overcoming the classic over‑fitting limitation. The TA‑driven feedback is inherently binary, making it ideal for FPGA LUT‑based implementations and for ultra‑low‑power edge devices that must adapt locally without cloud connectivity. The paper suggests extensions to multi‑layer WiSARD structures, exploration of ASIC implementations, and integration with other reconfigurable platforms (e.g., RISC‑V SoCs) to broaden applicability. Overall, the work provides a compelling blueprint for on‑chip, real‑time learning in resource‑constrained environments.


Comments & Academic Discussion

Loading comments...

Leave a Comment