Learning Compact Boolean Networks
Floating-point neural networks dominate modern machine learning but incur substantial inference cost, motivating interest in Boolean networks for resource-constrained settings. However, learning compact and accurate Boolean networks is challenging due to their combinatorial nature. In this work, we address this challenge from three different angles: learned connections, compact convolutions and adaptive discretization. First, we propose a novel strategy to learn efficient connections with no additional parameters and negligible computational overhead. Second, we introduce a novel convolutional Boolean architecture that exploits the locality with reduced number of Boolean operations than existing methods. Third, we propose an adaptive discretization strategy to reduce the accuracy drop when converting a continuous-valued network into a Boolean one. Extensive results on standard vision benchmarks demonstrate that the Pareto front of accuracy vs. computation of our method significantly outperforms prior state-of-the-art, achieving better accuracy with up to 37x fewer Boolean operations.
💡 Research Summary
The paper tackles the problem of learning compact yet accurate Boolean networks, which are attractive for resource‑constrained inference because Boolean operations are far cheaper than floating‑point arithmetic. Existing Boolean‑network methods suffer from three major drawbacks: (1) connections between layers are either fixed random samples or learned via large auxiliary weight matrices, leading to poor utilization of the ultra‑sparse two‑input neuron structure or prohibitive memory overhead; (2) convolutional Boolean layers are built on binary tree kernels, requiring an exponential number of Boolean operations with depth and introducing sequential dependencies that limit parallelism; (3) after training with a differentiable relaxation, the final discretization step typically incurs a noticeable accuracy drop because the continuous and discrete models diverge.
To overcome these issues, the authors propose three complementary techniques.
Efficient connection learning – Each neuron is parameterized as a weighted mixture of 16 possible binary Boolean functions, but unlike prior work the two inputs to each Boolean function are allowed to be different. A triple (k, p, q) denotes the Boolean operation index k and the indices p and q of the two source neurons. During training the weight vector w determines a soft selection over the 16 triples. The authors monitor the entropy of the normalized weights (h(w)) with an exponential moving average (EMA). When the EMA stabilizes, three cases are distinguished: (i) a dominant triple (weight ≥ 0.95) – the neuron is essentially decided; (ii) dispersed (all weights ≤ 0.25) – the neuron has failed to find a useful pattern; (iii) intermediate – still learning. In case (i) the non‑dominant triples are resampled; in case (ii) all triples are resampled; case (iii) leaves the neuron untouched. This adaptive resampling lets the network explore new connections without storing any extra parameters; the only overhead is a scalar EMA per neuron and occasional resampling, which is negligible compared to overall training cost. Empirically, more than 90 % of neurons converge to a single dominant triple, confirming that the method discovers compact, meaningful connections.
Compact convolutional architecture – Prior Boolean convolutions use a binary tree of depth d, consuming 2^d − 1 Boolean operations per output pixel and creating a deep sequential dependency chain. The new design replaces the tree with a single Boolean operation kernel. Thanks to the flexible connection learning, a kernel can aggregate information from up to 32 distinct inputs within its receptive field during training, effectively capturing spatial patterns while keeping the inference cost at a single Boolean gate per output. This eliminates the exponential blow‑up and enables full parallel execution of convolutional layers.
Adaptive discretization – The authors observe that convolutional layers converge much faster than fully‑connected layers. They measure the proportion of neurons whose dominant (k, p, q) triple matches the final discretized network. When a layer reaches a predefined convergence threshold (e.g., 80 % for the first conv layer), that layer is discretized and its parameters are frozen, while deeper layers continue to train in the continuous domain. This progressive “freeze‑as‑you‑go” strategy reduces the mismatch between the relaxed and Boolean models, mitigating the typical post‑discretization accuracy loss.
Experimental evaluation – The method is evaluated on CIFAR‑10, MNIST and other standard vision benchmarks. Compared with the previous state‑of‑the‑art Boolean network (Petersen et al., 2024), the proposed approach achieves higher accuracy (≈0.5–1 % improvement on CIFAR‑10) while using dramatically fewer Boolean operations—up to 37× fewer on MNIST. Training overhead remains modest, as the extra bookkeeping (EMA, resampling) is lightweight. Ablation studies confirm that each component (connection learning, tree‑free convolution, adaptive discretization) contributes to the overall gains.
Limitations and future work – The current work focuses on image classification; extending the approach to sequence models, graph neural networks, or larger‑scale vision tasks remains open. The resampling hyper‑parameters (stability threshold, patience) may need dataset‑specific tuning, and the method’s robustness to highly imbalanced data has not been explored.
In summary, the paper delivers a coherent framework that jointly optimizes network topology, convolutional design, and the training‑to‑inference transition for Boolean networks, pushing them closer to practical deployment on edge hardware where inference efficiency is paramount.
Comments & Academic Discussion
Loading comments...
Leave a Comment