arXiv:2603.11369v1 Announce Type: cross Abstract: Antimicrobial resistance (AMR) poses a global health threat, reducing the effectiveness of antibiotics and complicating clinical decision-making. To address this challenge, we introduce abx_amr_simulator, a Python-based simulation package designed to model antibiotic prescribing and AMR dynamics within a controlled, reinforcement learning (RL)-compatible environment. The simulator allows users to specify patient […]
Realizing Common Random Numbers: Event-Keyed Hashing for Causally Valid Stochastic Models
arXiv:2603.11084v1 Announce Type: cross Abstract: Agent-based models (ABMs) are widely used to estimate causal treatment effects via paired counterfactual simulation. A standard variance reduction technique is common random numbers (CRNs), which couples replicates across intervention scenarios by sharing the same random inputs. In practice, CRNs are implemented by reusing the same base seed, but this […]
Attention Gathers, MLPs Compose: A Causal Analysis of an Action-Outcome Circuit in VideoViT
arXiv:2603.11142v1 Announce Type: cross Abstract: The paper explores how video models trained for classification tasks represent nuanced, hidden semantic information that may not affect the final outcome, a key challenge for Trustworthy AI models. Through Explainable and Interpretable AI methods, specifically mechanistic interpretability techniques, the internal circuit responsible for representing the action’s outcome is reverse-engineered […]
Locating Demographic Bias at the Attention-Head Level in CLIP’s Vision Encoder
arXiv:2603.11793v1 Announce Type: cross Abstract: Standard fairness audits of foundation models quantify that a model is biased, but not where inside the network the bias resides. We propose a mechanistic fairness audit that combines projected residual-stream decomposition, zero-shot Concept Activation Vectors, and bias-augmented TextSpan analysis to locate demographic bias at the level of individual attention […]
Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling
arXiv:2603.11971v1 Announce Type: cross Abstract: Emotion recognition in in-the-wild video data remains a challenging problem due to large variations in facial appearance, head pose, illumination, background noise, and the inherently dynamic nature of human affect. Relying on a single modality, such as facial expressions or speech, is often insufficient to capture these complex emotional cues. […]
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
arXiv:2603.12228v1 Announce Type: cross Abstract: Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in small models such expert solutions occupy […]
Human-Centred LLM Privacy Audits: Findings and Frictions
arXiv:2603.12094v1 Announce Type: cross Abstract: Large language models (LLMs) learn statistical associations from massive training corpora and user interactions, and deployed systems can surface or infer information about individuals. Yet people lack practical ways to inspect what a model associates with their name. We report interim findings from an ongoing study and introduce LMP2, a […]
INFACT: A Diagnostic Benchmark for Induced Faithfulness and Factuality Hallucinations in Video-LLMs
arXiv:2603.11481v1 Announce Type: cross Abstract: Despite rapid progress, Video Large Language Models (Video-LLMs) remain unreliable due to hallucinations, which are outputs that contradict either video evidence (faithfulness) or verifiable world knowledge (factuality). Existing benchmarks provide limited coverage of factuality hallucinations and predominantly evaluate models only in clean settings. We introduce textscINFACT, a diagnostic benchmark comprising […]
SemBench: A Universal Semantic Framework for LLM Evaluation
arXiv:2603.11687v1 Announce Type: cross Abstract: Recent progress in Natural Language Processing (NLP) has been driven by the emergence of Large Language Models (LLMs), which exhibit remarkable generative and reasoning capabilities. However, despite their success, evaluating the true semantic understanding of these models remains a persistent challenge. Traditional benchmarks such as Word-in-Context (WiC) effectively probe this […]
OA-NBV: Occlusion-Aware Next-Best-View Planning for Human-Centered Active Perception on Mobile Robots
arXiv:2603.11072v1 Announce Type: cross Abstract: We naturally step sideways or lean to see around the obstacle when our view is blocked, and recover a more informative observation. Enabling robots to make the same kind of viewpoint choice is critical for human-centered operations, including search, triage, and disaster response, where cluttered environments and partial visibility frequently […]
ResWM: Residual-Action World Model for Visual RL
arXiv:2603.11110v1 Announce Type: cross Abstract: Learning predictive world models from raw visual observations is a central challenge in reinforcement learning (RL), especially for robotics and continuous control. Conventional model-based RL frameworks directly condition future predictions on absolute actions, which makes optimization unstable: the optimal action distributions are task-dependent, unknown a priori, and often lead to […]
Markovian Generation Chains in Large Language Models
arXiv:2603.11228v1 Announce Type: cross Abstract: The widespread use of large language models (LLMs) raises an important question: how do texts evolve when they are repeatedly processed by LLMs? In this paper, we define this iterative inference process as Markovian generation chains, where each step takes a specific prompt template and the previous output as input, […]