arXiv:2605.10181v2 Announce Type: replace-cross Abstract: Out-of-distribution (OOD) detection is essential for building reliable AI systems, as models that produce outputs for invalid inputs cannot be trusted. Although deep learning (DL) is often assumed to outperform traditional machine learning (ML), medical imaging data are typically acquired under standardized protocols, leading to relatively constrained image variability in […]
Under Pressure: Emotional Framing Induces Measurable Behavioral Shifts and Structured Internal Geometry in Small Language Models
arXiv:2605.20202v1 Announce Type: cross Abstract: I study whether emotionally framed evaluation follow-ups change both the behavior and the calm-relative internal representations of small, locally deployed language models. Our main benchmark uses Qwen 3.5 0.8B on four impossible-constraint coding tasks and eight follow-up framings: calm, pressure, urgency, approval, shame, curiosity, encouragement, and threat. In the 0.8B […]
From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach
arXiv:2605.21303v1 Announce Type: cross Abstract: Mechanistic interpretability produces circuit-level causal analyses of neural network behaviour, but discovered circuits often remain isolated experimental artefacts: there is no shared formal representation for what circuits compute, how they relate, or when two findings provide evidence for the same mechanism. This work provides a formal infrastructure for cumulative mechanistic […]
RealUserSim: Bridging the Reality Gap in Agent Benchmarking via Grounded User Simulation
arXiv:2605.20204v1 Announce Type: cross Abstract: LLM-based user simulation is the primary mechanism for end-to-end agent evaluation, yet simulated users are poor proxies for real humans: unconstrained LLM defaults produce a Formalism Ceiling (style match rates of 6-8% against real users), while hand-crafted behavioral directives trigger Directive Amplification, where models hyper-interpret instructions into unnatural behavioral extremes […]
HiRes: Inspectable Precedent Memory for Reaction Condition Recommendation
arXiv:2605.21420v1 Announce Type: cross Abstract: Reaction condition recommendation sits immediately after retrosynthetic disconnection selection, and in practice, chemists require both accurate predictions and the precedents that justify them. We present HiRes (Hierarchical Reaction Representations), a retrieval-augmented condition recommendation system whose learned reaction space serves as both a classifier feature and an inspectable precedent memory. The […]
Artificial Pancreas Implantables — How Healthcare Professionals May Deal With DIY Bio Cases
arXiv:2605.20208v1 Announce Type: cross Abstract: Automated insulin delivery (AID) and artificial pancreas systems increasingly serve as safety-critical cyber-physical technologies in clinical care, integrating sensors, algorithms, software, and insulin-delivery hardware to automate a life-sustaining therapy. While regulated commercial systems are supported by formal approval pathways, manufacturer governance, and post-market surveillance, clinicians are also encountering patients who […]
Task-Agnostic Noisy Label Detection via Standardized Loss Aggregation
arXiv:2605.10165v2 Announce Type: replace-cross Abstract: Noisy labels are common in large-scale medical imaging datasets due to inter-observer variability and ambiguous cases. We propose a statistically grounded and task-agnostic framework, Standardized Loss Aggregation (SLA), for detecting noisy labels at the sample level. SLA quantifies label reliability by aggregating standardized fold-level validation losses across repeated cross-validation runs. […]
Leveraging Vision-Language Models to Detect Attention in Educational Videos
arXiv:2605.20211v1 Announce Type: cross Abstract: Educational videos are a cornerstone of remote and blended learning. However, learners’ fluctuating attention remains a significant barrier to effective information retention. Prior research has attempted to mitigate this by detecting and reacting to attention loss at runtime using eye tracking. Such detection has been based so far on classical […]
SAM-Sode: Towards Faithful Explanations for Tiny Bacteria Detection
arXiv:2605.21186v1 Announce Type: cross Abstract: Interpretability in object detection provides crucial confidence support for clinical auxiliary diagnosis. However, in tiny bacteria detection, traditional explanation methods often suffer from blurred foreground boundaries and diffuse feature attribution due to the extreme sparsity of target morphological features and severe interference from complex backgrounds. Such limitations hinder the provision […]
AI-Assisted Competency Assessment from Egocentric Video in Simulation-Based Nursing Education
arXiv:2605.20233v1 Announce Type: cross Abstract: Assessing learner competency in clinical simulation requires expert observation that is time-intensive, difficult to scale, and subject to inter-rater variability. Vision-language models have emerged as a promising tool for understanding complex visual behavior. In this work, we investigate whether visual observations can provide educationally meaningful signals for competency assessment through […]
STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction
arXiv:2508.12247v2 Announce Type: replace-cross Abstract: Recently, spatio-temporal time-series prediction has developed rapidly, yet existing deep learning methods struggle with learning complex long-term spatio-temporal dependencies efficiently. The long-term spatio-temporal dependency learning brings two new challenges: 1) The long-term temporal sequence naturally includes multiscale information, which is hard to extract efficiently; 2) The multiscale temporal information from […]
Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine
arXiv:2605.20235v1 Announce Type: cross Abstract: Diffusion models generate high-dimensional data with remarkable quality, yet how their training efficiently learns the score function, bypassing the curse of dimensionality when data is supported on low-dimensional manifolds, remains theoretically unexplained. We identify a collapse-and-refine mechanism driven by the geometry of the score function itself: at small noise scales, […]