arXiv:2512.19367v2 Announce Type: replace-cross
Abstract: We introduce Sprecher Networks (SNs), a family of trainable architectures derived from David Sprecher’s 1965 constructive form of the Kolmogorov-Arnold representation. Each SN block implements a “sum of shifted univariate functions” using only two shared learnable splines per block, a monotone inner spline $phi$ and a general outer spline $Phi$, together with a learnable shift parameter $eta$ and a mixing vector $lambda$ shared across all output dimensions. Stacking these blocks yields deep, compositional models; for vector-valued outputs we append an additional non-summed output block.
We also propose an optional lateral mixing operator enabling intra-block communication between output channels with only $O(d_mathrmout)$ additional parameters. Owing to the vector (not matrix) mixing weights and spline sharing, SNs scale linearly in width, approximately $O(sum_ell(d_ell-1+d_ell+G))$ parameters for $G$ spline knots, versus $O(sum_ell d_ell-1d_ell)$ for dense MLPs and $O(Gsum_ell d_ell-1d_ell)$ for edge-spline KANs. This linear width-scaling is particularly attractive for extremely wide, shallow models, where low depth can translate into low inference latency. Finally, we describe a sequential forward implementation that avoids materializing the $d_mathrmintimes d_mathrmout$ shifted-input tensor, reducing peak forward-intermediate memory from quadratic to linear in layer width, relevant for memory-constrained settings such as on-device/edge inference; we demonstrate deployability via fixed-point real-time digit classification on resource-constrained embedded device with only 4 MB RAM. We provide empirical demonstrations on supervised regression, Fashion-MNIST classification (including stable training at 25 hidden layers with residual connections and normalization), and a Poisson PINN, with controlled comparisons to MLP and KAN baselines.



