arXiv:2512.18792v1 Announce Type: new Abstract: In a striking neuroscience study, the authors placed a dead salmon in an MRI scanner and showed it images of humans in social situations. Astonishingly, standard analyses of the time reported brain regions predictive of social emotions. The explanation, of course, was not supernatural cognition but a cautionary tale about […]
CrashChat: A Multimodal Large Language Model for Multitask Traffic Crash Video Analysis
arXiv:2512.18878v1 Announce Type: cross Abstract: Automating crash video analysis is essential to leverage the growing availability of driving video data for traffic safety research and accountability attribution in autonomous driving. Crash video analysis is a challenging multitask problem due to the complex spatiotemporal dynamics of crash events in video data and the diverse analytical requirements […]
Multimodal Bayesian Network for Robust Assessment of Casualties in Autonomous Triage
arXiv:2512.18908v1 Announce Type: new Abstract: Mass Casualty Incidents can overwhelm emergency medical systems and resulting delays or errors in the assessment of casualties can lead to preventable deaths. We present a decision support framework that fuses outputs from multiple computer vision models, estimating signs of severe hemorrhage, respiratory distress, physical alertness, or visible trauma, into […]
MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems
arXiv:2511.20663v4 Announce Type: replace-cross Abstract: Reliability in multi-agent systems (MAS) built on large language models is increasingly limited by cognitive failures rather than infrastructure faults. Existing observability tools describe failures but do not quantify how quickly distributed reasoning recovers once coherence is lost. We introduce MTTR-A (Mean Time-to-Recovery for Agentic Systems), a runtime reliability metric […]
Recontextualization Mitigates Specification Gaming without Modifying the Specification
arXiv:2512.19027v1 Announce Type: new Abstract: Developers often struggle to specify correct training labels and rewards. Perhaps they don’t need to. We propose recontextualization, which reduces how often language models “game” training signals, performing misbehaviors those signals mistakenly reinforce. We show recontextualization prevents models from learning to 1) prioritize evaluation metrics over chat response quality; 2) […]
Tool-Augmented Hybrid Ensemble Reasoning with Distillation for Bilingual Mathematical Problem Solving
arXiv:2512.19093v1 Announce Type: new Abstract: Bilingual mathematical problem solving needs a clear link between language reasoning and symbolic calculation. Large language models often handle language well but are weak in accurate computation. This paper presents HERALD (Hybrid Ensemble Reasoning with Adaptive Learning and Distillation), a framework that joins reasoning and calculation using NuminaMath-7B-TIR, GPT-4o, and […]
GateRA: Token-Aware Modulation for Parameter-Efficient Fine-Tuning
arXiv:2511.17582v3 Announce Type: replace-cross Abstract: Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, DoRA, and HiRA, enable lightweight adaptation of large pre-trained models via low-rank updates. However, existing PEFT approaches apply static, input-agnostic updates to all tokens, disregarding the varying importance and difficulty of different inputs. This uniform treatment can lead to overfitting on trivial content […]
Can We Test Consciousness Theories on AI? Ablations, Markers, and Robustness
arXiv:2512.19155v1 Announce Type: new Abstract: The search for reliable indicators of consciousness has fragmented into competing theoretical camps (Global Workspace Theory (GWT), Integrated Information Theory (IIT), and Higher-Order Theories (HOT)), each proposing distinct neural signatures. We adopt a synthetic neuro-phenomenology approach: constructing artificial agents that embody these mechanisms to test their functional consequences through precise […]
DSTED: Decoupling Temporal Stabilization and Discriminative Enhancement for Surgical Workflow Recognition
arXiv:2512.19387v1 Announce Type: cross Abstract: Purpose: Surgical workflow recognition enables context-aware assistance and skill assessment in computer-assisted interventions. Despite recent advances, current methods suffer from two critical challenges: prediction jitter across consecutive frames and poor discrimination of ambiguous phases. This paper aims to develop a stable framework by selectively propagating reliable historical information and explicitly […]
Vibe Reasoning: Eliciting Frontier AI Mathematical Capabilities — A Case Study on IMO 2025 Problem 6
arXiv:2512.19287v1 Announce Type: new Abstract: We introduce Vibe Reasoning, a human-AI collaborative paradigm for solving complex mathematical problems. Our key insight is that frontier AI models already possess the knowledge required to solve challenging problems — they simply do not know how, what, or when to apply it. Vibe Reasoning transforms AI’s latent potential into […]