arXiv:2605.11538v1 Announce Type: cross Abstract: Group Relative Policy Optimization (GRPO) has emerged as a promising approach for improving the reasoning capabilities of large language models. However, it struggles to effectively balance the tradeoff between exploration and exploitation during training, often resulting in suboptimal performance. Motivated by the theoretical insight that changes in entropy are governed […]
Few-Shot Truly Benign DPO Attack for Jailbreaking LLMs
arXiv:2605.10998v1 Announce Type: cross Abstract: Fine-tuning APIs make frontier LLMs easy to customize, but they can also weaken safety alignment during fine-tuning. While prior work shows that benign supervised fine-tuning (SFT) can reduce refusal behavior, deployed fine-tuning pipelines increasingly support preference-based objectives, whose safety risks remain less understood. We show that Direct Preference Optimization (DPO) […]
An Executable Benchmarking Suite for Tool-Using Agents
arXiv:2605.11030v1 Announce Type: cross Abstract: Closed-loop tool-using agents are increasingly evaluated in executable web, code, and micro-task environments, but benchmark reports often conflate workloads, action-generating drivers, and the evidence admitted for systems-facing claims. We present an executable benchmarking suite that makes these objects explicit under a shared evidence-admission contract. The suite connects WebArena Verified, a […]
Deploying Self-Supervised Learning for Real Seismic Data Denoising
arXiv:2605.11109v1 Announce Type: cross Abstract: Self-supervised learning (SSL) has emerged as a promising approach to seismic data denoising as it does not require clean reference data. In this work, the deployment of the Noisy-as-Clean (NaC) method was evaluated for real seismic data denoising under controlled conditions. Two independent seismic acquisitions, each comprising noisy and filtered […]
Exploring Token-Space Manipulation in Latent Audio Tokenizers
arXiv:2605.11192v1 Announce Type: cross Abstract: Neural audio codecs provide compact discrete representations for speech generation and manipulation. However, most codecs organize tokens as frame-level sequences, making it difficult to study or intervene on global factors of variation. In this work, we propose the Latent Audio Tokenizer for Token-space Editing (LATTE) that appends a fixed set […]
OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents
arXiv:2605.11169v1 Announce Type: new Abstract: Large language model agents interleave reasoning, action selection, and observation to solve sequential decision-making tasks. In deployed settings where agents repeatedly handle related multi-step tasks, small action-selection errors can accumulate into wasted tool calls, latency, and reduced reliability. Despite this need for deployment-time improvement, existing inference-time adaptation methods for LLM […]
Rethinking external validation for the target population: Capturing patient-level similarity with a generative model
arXiv:2605.11284v1 Announce Type: cross Abstract: Background: External validation is essential for assessing the transportability of predictive models. However, its interpretation is often confounded by differences between external and development populations. This study introduces a framework to distinguish model deficiencies from case-mix effects. Method: We propose a framework that quantifies each external patient’s similarity to the […]
Architecture Determines Observability of Transformers
arXiv:2604.24801v3 Announce Type: replace-cross Abstract: Autoregressive transformers make confident errors that output-confidence monitoring cannot catch. Activation monitors catch them only when training leaves a decision-quality signal beyond what the output already exposes. This signal is an architectural property of the trained model, fixed upstream of any monitor. Controlling for output confidence removes 60.3% of the […]
LPDP: Inference-Time Reward Control for Variable-Length DNA Generation with Edit Flows
arXiv:2605.11368v1 Announce Type: cross Abstract: We study the application of recent Edit Flows for inference-time reward control for DNA sequence generation. Unlike most reward-guided DNA generation frameworks, which operate on fixed-length sequence spaces, Edit Flows have a potential to generate variable-length DNA through biologically plausible insertion, deletion, and substitution operations. In particular, we propose Local […]
The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes
arXiv:2605.11182v1 Announce Type: new Abstract: On-policy distillation (OPD) and on-policy self-distillation (OPSD) have emerged as promising post-training methods for large language models, offering dense token-level supervision on trajectories sampled from the model’s own policy. However, existing results on their effectiveness remain mixed: while OP(S)D has shown promise in system prompt and knowledge internalization, recent studies […]
Drop the Act: Probe-Filtered RL for Faithful Chain-of-Thought Reasoning
arXiv:2605.11467v1 Announce Type: cross Abstract: Reasoning models post-hoc rationalize answers they have already committed to internally, producing chains of *reasoning theater*: deliberative-looking steps that contribute nothing to correctness. This wastes inference tokens, pollutes interpretability, and obscures what the model actually computed. We introduce **ProFIL** (**Pro**be-**Fil**tered Reinforcement Learning) to *reduce theater, increase chain-of-thought faithfulness, and shrink […]
Modulation Consistency-based Contrastive Learning for Self-Supervised Automatic Modulation Classification
arXiv:2605.11875v1 Announce Type: cross Abstract: Deep learning-based AMC methods have achieved remarkable performance, but their practical deployment remains constrained by the high cost of labeled data. Although self-supervised learning (SSL) reduces the reliance on labels, existing SSL-based AMC methods often rely on task-agnostic pretext objectives misaligned with modulation classification, leading to representations entangled with nuisance […]