Quantization-Friendly Winograd Transformations for Convolutional Neural Networks

Vladimir Protsenko*, Vladimir Kryzhanovskiy, Alexander Filippov ;

Abstract


"Efficient deployment of modern deep convolutional neural networks on resource-constrained devices suffers from demanding computational requirements of convolution operations. Quantization and use of Winograd convolutions operating on sufficiently large-tile inputs are two powerful strategies to speed up convolution operations. However, their combination results in numerical instability, which manifests itself in a strong quality performance degradation. We present an efficient learning scenario that either completely overcomes or strongly reduces the accuracy degradation of full 8-bit quantized F (4, 3) and F (6, 3) Winograd convolutions. Within the global particle swarm optimization (PSO), we derived a set of quantization-friendly Winograd transformations. Following the state-of-the-art (SOTA) training pipeline [J. Fernandez-Marques , Proc. Mach. Learn. Syst. 2, 14-29 (2020)], we treat Winograd transformations as learnable parameters during network training. Evolving transformations starting from our PSO-derived ones rather than the standard Winograd transformations results in significant numerical error reduction and accuracy improvement. As a consequence, our approach significantly outperforms SOTA methods on various tasks. Efficient deployment of modern deep convolutional neural networks on resource-constrained devices suffers from demanding computational requirements of convolution operations. Quantization and use of Winograd convolutions operating on sufficiently large-tile inputs are two powerful strategies to speedup convolution operations. However, their combination results in numerical instability, which manifests itself in a strong quality performance degradation. To solve this issue, we derived a set of quantization-friendly Winograd transformations (matrices A, B and G). Following SOTA training pipeline, we treat Winograd transformations as learnable parameters during network training. Initialization from our transformation matrices instead of the standard ones for quantization-aware training results in a significant numerical error reduction, a training stabilizing and accuracy improvement. As a consequence, our approach significantly outperforms SOTA methods on various tasks. Especially, we made the quantized F(4, 3) Winograd algorith ready for industrial use (?). The code will be publicly available at https://github.com/add. Efficient deployment of modern deep convolutional neural networks (CNNs) on resource-constrained devices (e.g., mobile devices) suffers from the demanding computational requirements of convolution operations. Quantization and use of Winograd convolutions are two powerful strategies for accelerating convolution operations. While quantization reduces computational intensity by mapping network parameters into their low-precision representations, Winograd convolutions achieve speedup by using a computationally efficient algorithm to perform convolution operations. Theoretically, quantization and Winograd convolutions are independent paths to optimize CNNs and their advantages can be joined. However, quantization of the Winograd convolution results in their numerical instability, which manifests itself in a strong quality performance degradation. Especially this challenge is severe for the most promising Winograd algorithms F (4, 3) and F (3, 6) allowing to operate on sufficiently large tile inputs, which significantly reduces the computational complexity of convolution operations. In this paper, we present an efficient learning scenario that either completely overcomes or strongly reduces the accuracy degradation of full 8-bit quantized F (4, 3) and F (6, 3) Winograd convolutions. To this end, prior to network training within the global particle swarm optimization (PSO), we derived a set of model- and data-free quantization-friendly Winograd transformation matrices. We demonstrate that Winograd convolutions build on our PSO-derived transformation matrices significantly benefit in terms of both numerical error reduction and accuracy compared to using the standard Winograd matrices. We then integrate our transformations into the state-of-the-art (SOTA) training pipeline of Ref. [?], which treats Winograd transformations as learnable parameters during network training. We show that allowing to evolve of Winograd transformation starting from our PSO-derived matrices, rather than from standard set of Winograd matrices, as in the original pipeline, results in further substantial improvements of performance. As a consequence, our approach significantly outperforms the SOTA methods in accuracy on various tasks, including classification, super-resolution, and semantic segmentation, while retaining the same inference speedup. 2"

Related Material


[pdf] [supplementary material] [DOI]