arXiv:2511.11041v2 Announce Type: replace-cross
Abstract: We find that current sentence-embedding models produce outputs with a consistent bias: every embedding $e$ decomposes as $tilde e + mu$, where the mean $mu$ is near-identical across all sentences. We study two training-free corrections — subtracting $mu$ directly (R1), or projecting each embedding off the mean direction (R2) — and show, via a first-order error-propagation argument, that R2 cancels the parallel component of mean-estimation error that R1 retains. Across 38 models on the Massive Multilingual Text Embedding Benchmark (MMTEB)~citepMMTEB, R2 yields consistent classification gains (paired $bar t = 3.31$, 29 of 38 models with $t>2$, zero losses), and the per-model mean norm $VertmuVert$ correlates with which models benefit most. A nine-method dose-response ablation on five models further reveals that mild single-direction removal helps, but full principal component analysis (PCA) whitening hurts every model we test, and that R2 and All-but-the-Top with depth one agree within $0.18$ pp downstream despite weak geometric alignment between $hatmu$ and the centered top principal component.
Crisis support teams’ technological openness and learning attitudes toward the AI based virtual patient system crisis support VR
BackgroundAgainst the backdrop of escalating global humanitarian crises, innovative didactic simulations are becoming increasingly important. A promising alternative to traditional classroom-based didactics for learning psychological