arXiv:2603.06570v1 Announce Type: cross Abstract: Surgeons don’t just see — they interpret. When an expert observes a surgical scene, they understand not only what instrument is being used, but why it was chosen, what risk it poses, and what comes next. Current surgical AI cannot answer such questions, largely because training data that explicitly encodes […]
Make VLM Recognize Visual Hallucination on Cartoon Character Image with Pose Information
arXiv:2403.15048v4 Announce Type: replace-cross Abstract: Leveraging large-scale Text-to-Image (TTI) models have become a common technique for generating exemplar or training dataset in the fields of image synthesis, video editing, 3D reconstruction. However, semantic structural visual hallucinations involving perceptually severe defects remain a concern, especially in the domain of non-photorealistic rendering (NPR) such as cartoons and […]
COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
arXiv:2603.06495v1 Announce Type: cross Abstract: Activation steering methods enable inference-time control of large language model (LLM) behavior without retraining, but current approaches face a fundamental trade-off: sample-efficient methods suboptimally capture steering signals from labeled examples, while methods that better extract these signals require hundreds to thousands of examples. We introduce COLD-Steer, a training-free framework that […]
Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics
arXiv:2603.02252v2 Announce Type: replace-cross Abstract: We introduce Whisper-RIR-Mega, a benchmark dataset of paired clean and reverberant speech for evaluating automatic speech recognition (ASR) robustness to room acoustics. Each sample pairs a clean LibriSpeech utterance with the same utterance convolved with a real room impulse response from the RIR-Mega corpus, with stratified splits by reverberation time […]
FourierSpecNet: Neural Collision Operator Approximation Inspired by the Fourier Spectral Method for Solving the Boltzmann Equation
arXiv:2504.20408v2 Announce Type: replace-cross Abstract: The Boltzmann equation, a fundamental model in kinetic theory, describes the evolution of particle distribution functions through a nonlinear, high-dimensional collision operator. However, its numerical solution remains computationally demanding, particularly for inelastic collisions and high-dimensional velocity domains. In this work, we propose the Fourier Neural Spectral Network (FourierSpecNet), a hybrid […]
What Topological and Geometric Structure Do Biological Foundation Models Learn? Evidence from 141 Hypotheses
arXiv:2602.22289v2 Announce Type: replace Abstract: When biological foundation models such as scGPT and Geneformer process single-cell gene expression, what geometric and topological structure forms in their internal representations? Is that structure biologically meaningful or a training artifact, and how confident should we be in such claims? We address these questions through autonomous large-scale hypothesis screening: […]
FragFM: Hierarchical Framework for Efficient Molecule Generation via Fragment-Level Discrete Flow Matching
arXiv:2502.15805v4 Announce Type: replace-cross Abstract: We introduce FragFM, a novel hierarchical framework via fragment-level discrete flow matching for efficient molecular graph generation. FragFM generates molecules at the fragment level, leveraging a coarse-to-fine autoencoder to reconstruct details at the atom level. Together with a stochastic fragment bag strategy to effectively handle a large fragment space, our […]
Do Foundation Models Know Geometry? Probing Frozen Features for Continuous Physical Measurement
arXiv:2603.06459v1 Announce Type: cross Abstract: Vision-language models encode continuous geometry that their text pathway fails to express: a 6,000-parameter linear probe extracts hand joint angles at 6.1 degrees MAE from frozen features, while the best text output achieves only 20.0 degrees — a 3.3x bottleneck. LoRA fine-tuning (r=16, 2,000 images) narrows this gap to 6.5 […]
Immersive competence as a source of bias in virtual reality clinical assessment
npj Digital Medicine, Published online: 09 March 2026; doi:10.1038/s41746-026-02482-z Immersive competence as a source of bias in virtual reality clinical assessment
Personalised health plan development using agentic AI in Singapore’s national preventive care programme: a pilot study
npj Digital Medicine, Published online: 09 March 2026; doi:10.1038/s41746-026-02514-8 Personalised health plan development using agentic AI in Singapore’s national preventive care programme: a pilot study