arXiv:2603.19559v1 Announce Type: cross
Abstract: We study entrywise scalar quantization of two matrices prior to multiplication. Given $Ain R^mtimes k$ and $Bin R^ktimes n$, we quantize entries of $A$ and $B$ independently using scalar quantizers with $K_X$ and $K_Y$ levels per entry, and form $widehat C=widehat A,widehat B$. The objective is to minimize the matrix multiplication mean-squared error (MSE) $E[|AB-widehat Awidehat B|_F^2]$ under a pair-i.i.d. inner-product model. In the high-resolution regime $K_X,K_Ytoinfty$, we derive a sharp $K^-2$ asymptotic expansion for $mathcalE$, identify the exact optimal leading constants, and characterize asymptotically optimal quantization center densities in terms of conditional second moments. We then specialize to correlated Gaussian multiplicative pairs, obtaining a closed-form optimal point density [ lambda^star(u) propto exp!left(-fracu^26right)bigl((1-rho^2)+rho^2u^2bigr)^1/3, qquad u=fracxsigma_X, ] with the same form for $y/sigma_Y$, and prove a correlation-driven phase transition: the density is unimodal at the origin for $|rho|leq 1/sqrt3$ and becomes bimodal for $|rho|>1/sqrt3$ with peaks at $u_mathrmpeak=pmsqrt3-1/rho^2$. We show our method’s applicability in synthetic experiments such as matrix multiplication quantization and least squares optimization, as well as quantization of large language model key and query activations.
A woman’s uterus has been kept alive outside the body for the first time
“Think of this as a human body,” says Javier González. In front of me is essentially a metal box on wheels. Standing at around a


