arXiv:2601.09566v4 Announce Type: replace-cross
Abstract: In this work, we study whether rendering Chinese characters as visual glyph images, rather than discrete token IDs as mainstream LLMs do, providing an inductive bias for character-level language modeling. Our central finding gives a double-edged insight: visual inputs produce a pronounced hot-start effect, more than doubling early-stage accuracy within the first epoch (at 0.4% of total training steps) (12.3% visual inputs vs. 5.8% index-based baseline), yet both approaches converge to essentially identical final accuracy (39%). This pattern holds across resolutions as low as 8×8 pixels, partial cropping up to 50%, and model scales from 110M to 1.78B parameters. The mechanism we identify is that glyph rendering pre-encodes radical-based structure into embedding space before any training (cosine similarity 0.27 vs. 0.002 for random embeddings), enabling faster alignment but not higher final capacity. Our results clarify both the promise and fundamental limitation of visual representations as inductive biases for Chinese language modeling.
Crisis support teams’ technological openness and learning attitudes toward the AI based virtual patient system crisis support VR
BackgroundAgainst the backdrop of escalating global humanitarian crises, innovative didactic simulations are becoming increasingly important. A promising alternative to traditional classroom-based didactics for learning psychological