arXiv:2603.19559v1 Announce Type: cross
Abstract: We study entrywise scalar quantization of two matrices prior to multiplication. Given $Ain R^mtimes k$ and $Bin R^ktimes n$, we quantize entries of $A$ and $B$ independently using scalar quantizers with $K_X$ and $K_Y$ levels per entry, and form $widehat C=widehat A,widehat B$. The objective is to minimize the matrix multiplication mean-squared error (MSE) $E[|AB-widehat Awidehat B|_F^2]$ under a pair-i.i.d. inner-product model. In the high-resolution regime $K_X,K_Ytoinfty$, we derive a sharp $K^-2$ asymptotic expansion for $mathcalE$, identify the exact optimal leading constants, and characterize asymptotically optimal quantization center densities in terms of conditional second moments. We then specialize to correlated Gaussian multiplicative pairs, obtaining a closed-form optimal point density [ lambda^star(u) propto exp!left(-fracu^26right)bigl((1-rho^2)+rho^2u^2bigr)^1/3, qquad u=fracxsigma_X, ] with the same form for $y/sigma_Y$, and prove a correlation-driven phase transition: the density is unimodal at the origin for $|rho|leq 1/sqrt3$ and becomes bimodal for $|rho|>1/sqrt3$ with peaks at $u_mathrmpeak=pmsqrt3-1/rho^2$. We show our method’s applicability in synthetic experiments such as matrix multiplication quantization and least squares optimization, as well as quantization of large language model key and query activations.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844