arXiv:2406.08334v2 Announce Type: replace-cross Abstract: Memory pressure has emerged as a dominant constraint in scaling the training of large language models (LLMs), particularly in resource-constrained environments. While modern frameworks incorporate various memory-saving techniques, they often expose low-level configuration knobs that require manual tuning and specialized system expertise. This not only adds engineering overhead but also […]
In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks
arXiv:2603.15250v2 Announce Type: replace-cross Abstract: Symbolic regression aims to replace black-box predictors with concise analytical expressions that can be inspected and validated in scientific machine learning. Kolmogorov-Arnold Networks (KANs) are well suited to this goal because each connection between adjacent units (an “edge”) is parametrised by a learnable univariate function that can, in principle, be […]
MMErroR: A Benchmark for Erroneous Reasoning in Vision-Language Models
arXiv:2601.03331v2 Announce Type: replace-cross Abstract: Recent advances in Vision-Language Models (VLMs) have improved performance in multi-modal learning, raising the question of whether these models truly understand the content they process. Crucially, can VLMs detect when a reasoning process is wrong and identify its error type? To answer this, we present MMErroR, a multi-modal benchmark of […]
SpecPylot: Python Specification Generation using Large Language Models
arXiv:2604.16560v1 Announce Type: cross Abstract: Automatically generating formal specifications could reduce the effort needed to improve program correctness, but in practice, this is still challenging. Many developers avoid writing contracts by hand, which limits the use of automated verification tools. Recent large language models (LLMs) can generate specifications from code, but these specifications often fail […]
Healthcare AI for Automation or Allocation? A Transaction Cost Economics Framework
arXiv:2604.16465v1 Announce Type: new Abstract: Healthcare productivity is shaped not only by clinical complexity but by the costs of coordinating work under uncertainty. Transaction-cost economics offers a theory of these coordination frictions, yet has rarely been operationalised at task level across health occupations. Using task statements and frequency weights from the O*NET occupational database, we […]
Real-Time Visual Attribution Streaming in Thinking Model
arXiv:2604.16587v1 Announce Type: cross Abstract: We present an amortized framework for real-time visual attribution streaming in multimodal thinking models. When these models generate code from a screenshot or solve math problems from images, their long reasoning traces should be grounded in visual evidence. However, verifying this reliance is challenging: faithful causal methods require costly repeated […]
Support Sufficiency as Consequence-Sensitive Compression in Belief Arbitration
arXiv:2604.16434v1 Announce Type: new Abstract: When a system commits to a hypothesis, much of the evidential structure behind that commitment is lost to compression. Standard accounts assume that selected content and scalar confidence suffice for downstream control. This paper argues that they do not, and that determining what must survive compression is itself a consequence-sensitive […]
Representation Before Training: A Fixed-Budget Benchmark for Generative Medical Event Models
arXiv:2604.16775v1 Announce Type: cross Abstract: Every prediction from a generative medical event model is bounded by how clinical events are tokenized, yet input representation is rarely isolated from other system and architectural choices. We evaluate how representation decisions affect downstream prediction after a shared one-epoch pretraining budget. We train 28 matched transformers on MIMIC-IV and […]
Heterogeneous Self-Play for Realistic Highway Traffic Simulation
arXiv:2604.16406v1 Announce Type: new Abstract: Realistic highway simulation is critical for scalable safety evaluation of autonomous vehicles, particularly for interactions that are too rare to study from logged data alone. Yet highway traffic generation remains challenging because it requires broad coverage across speeds and maneuvers, controllable generation of rare safety-critical scenarios, and behavioral credibility in […]
NaviFormer: A Deep Reinforcement Learning Transformer-like Model to Holistically Solve the Navigation Problem
arXiv:2604.16967v1 Announce Type: cross Abstract: Path planning is usually solved by addressing either the (high-level) route planning problem (waypoint sequencing to achieve the final goal) or the (low-level) path planning problem (trajectory prediction between two waypoints avoiding collisions). However, real-world problems usually require simultaneous solutions to the route and path planning subproblems with a holistic […]
Computational Hermeneutics: Evaluating generative AI as a cultural technology
arXiv:2604.16403v1 Announce Type: new Abstract: Generative AI systems are increasingly recognized as cultural technologies, yet current evaluation frameworks often treat culture as a variable to be measured rather than fundamental to the system’s operation. Drawing on hermeneutic theory from the humanities, we argue that GenAI systems function as “context machines” that must inherently address three […]
Persona-Based Requirements Engineering for Explainable Multi-Agent Educational Systems: A Scenario Simulator for Clinical Reasoning Training
arXiv:2604.17186v1 Announce Type: cross Abstract: As Artificial Intelligence (AI) and Agentic AI become increasingly integrated across sectors such as education and healthcare, it is critical to ensure that Multi-Agent Education System (MAES) is explainable from the early stages of requirements engineering (RE) within the AI software development lifecycle. Explainability is essential to build trust, promote […]