arXiv:2605.27190v1 Announce Type: cross Abstract: Recent advances in Large Audio-Language Models (LALMs) have made real-time, streaming spoken interaction increasingly practical. In this setting, reasoning quality and responsiveness are tightly coupled: delaying reasoning until the speech endpoint can improve answer quality but moves deliberation into user-visible response delay, while answering too early risks committing before decisive […]
Lessons from Penetration Tests on Large-Scale Agent Systems
arXiv:2605.27042v1 Announce Type: cross Abstract: As AI systems gain increasing autonomy and execution capability, the number of discovered security vulnerabilities continues to rise. However, many of these vulnerabilities are not fundamentally novel, but instead reflect recurring classes of weaknesses long observed in prior computing systems. Execution-capable AI agents are effectively unbounded, self-modifying programs that interact […]
From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator
arXiv:2605.26403v1 Announce Type: new Abstract: A long-standing goal of the research community is to develop highly interactive LLM-based dialogue agents. Recent research focuses on optimizing policies based on fixed offline logs (Static Context RL) or using a prompt-based simulator (Interactive RL). In this work, we theoretically show that both paradigms are fundamentally limited by context […]
High-Quality Synthetic Financial Time-Series using a GAN-Diffusion Framework
arXiv:2605.27113v1 Announce Type: cross Abstract: In recent years, financial institutions and firms have increasingly adopted synthetic data to address data scarcity and to generate counterfactual market scenarios. However, reproducing all the statistical properties of financial time series, commonly known as stylized facts, remains an open challenge for many existing general-purpose architectures. In this paper, we […]
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders
arXiv:2605.27354v1 Announce Type: cross Abstract: Model internals encode rich information about how a large language model (LLM) processes its training data; however, post-training data engineering largely relies on external signals and ignores rich intrinsic signals lying in model internals. We propose SAERL, a data engineering framework for LLM reinforcement learning (RL). It models three intrinsic […]
BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma
arXiv:2605.26376v1 Announce Type: cross Abstract: Hepatocellular carcinoma (HCC) is biologically heterogeneous, shaped by the interplay between hepatic functional reserve and tumor-related oncologic factors; thus, similar survival outcomes may reflect fundamentally different underlying biological processes. Prognostic modeling in HCC is informed by rich multimodal information from multiparametric MRI and radiology reports from routine clinical practice. Existing […]
Fixation location in structured populations
arXiv:2605.26411v1 Announce Type: new Abstract: In stochastic evolutionary dynamics, the replacement of an existing genotype or cultural trait by a newly introduced mutant is typically characterized by the quantities of fixation probability and fixation time. But in a structured population, the disappearance of a lineage occurs at a specific place. For evolutionary dynamics on graphs, […]
Plans for Evaluating Structured Generative Search Summaries
arXiv:2605.26400v1 Announce Type: cross Abstract: We propose a framework for evaluating structured generative search summaries that are placed atop organic web search results. A structured summary, generated by a large language model, typically consists of an overview, several sections with section titles, and a list of source documents that are cited within the summary. We […]
Joint economic and epidemiological modelling of alternative pandemic response strategies
arXiv:2512.08355v2 Announce Type: replace Abstract: In an emerging pandemic, policymakers need to make important decisions with limited information, for example choosing between a mitigation, suppression or elimination strategy. These strategies may require trade-offs to be made between the health impact of the pandemic and the economic costs of the interventions introduced in response. Mathematical models […]
When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control
arXiv:2605.26418v1 Announce Type: cross Abstract: A properly calibrated rule-based autoscaler can beat every one of six mainstream deep reinforcement learning (DRL) algorithms on cost across every workload we test – so when, if ever, does DRL actually help? We study this in RLScale-Bench, a reproducible benchmark and evaluation protocol for DRL on adaptive resource control, […]
Reasoning, Code, or Both? How Large Language Models Handle Variations in Math Questions
arXiv:2605.26414v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve impressive accuracy on mathematical reasoning benchmarks, yet their performance drops when problems are modified with simple changes like different names or numbers. Code execution methods, which let models generate and run Python code instead of reasoning in natural language, have been proposed as a solution, […]
Targeted Remasking: Replacing Token Editing with Token-to-Mask Refinement in Discrete Diffusion Language Models
arXiv:2605.26436v1 Announce Type: cross Abstract: Discrete masked diffusion language models such as LLaDA generate text through iterative denoising, where mask tokens are progressively replaced with predicted tokens. LLaDA2.1 introduced a Token-to-Token (T2T) editing mechanism that accelerates generation by directly replacing committed tokens suspected of being incorrect. However, we identify fundamental limitations of T2T editing: it […]