Enhancing Optimization Robustness in 1-bit Neural Networks through Stochastic Sign Descent

NianHui Guo*, Hong Guo, Christoph Meinel, Haojin Yang ;

Abstract


"Binary Neural Networks (BNNs) offer a promising avenue toward achieving efficient deep-learning models but are hindered by the inherent challenge of aligning noisy floating-point gradients with binary parameters. To address this, we introduce Diode, a groundbreaking optimizer designed explicitly for BNNs that bridges this gap by utilizing the gradient’s sign information in a unique, latent-weight-free approach. By focusing on the gradient sign’s lower-order moment estimate for parameter updates, Diode uniformly fine-tunes binary parameters, significantly enhancing model convergence without the dependency on 32-bit latent weights or embedding buffers. This paper showcases Diode’s superior performance through comprehensive evaluations on a variety of vision and Natural Language Processing (NLP) tasks. Remarkably, Diode advances the state-of-the-art by increasing BNext-18 Top-1 accuracy on ImageNet ILSVRC2012 by 0.96% with eightfold fewer training iterations. In the case of ReActNet, Diode not only matches but slightly exceeds previous benchmarks without resorting to complex multi-stage optimization strategies, effectively halving the training duration. Additionally, Diode proves its robust generalization capability on the binary BERT architecture within the GLUE benchmark, outperforming the existing BiT design by 3.3% without data augmentation and establishing a new SOTA accuracy of 78.8% with augmentation. The implementation of Diode is available at: https://github.com/GreenBitAI/bitorch-engine."

Related Material


[pdf] [DOI]