6693. K-Quant: a non uniform post-training quantization algorithm
Authors (first author is the speaker)
|Department of Information Engineering, Università degli Studi di Firenze|
|Dipartimento di Ingegneria dell'Informazione, Università degli Studi di Firenze|
Quantization is a simple yet effective way to deploy deep neural networks on resource-limited hardware. Post-training quantization algorithms are particularly interesting because they do not require the full dataset to run. In this work we explore a way to perform non uniform post-training quantization using an optimization algorithm to minimize the output differences between each compressed layer and the original one. The proposed method significantly reduces the memory required by the neural network without affecting the performance in terms of accuracy.
- Artificial intelligence based optimization methods and appli
- Linear and nonlinear optimization
Back to the list of papers