arXiv:2605.16356v1 Announce Type: new Abstract: Exhaled breath condensate (EBC) contains volatile metabolites and is promising for non-invasive disease diagnosis, but after decades of research spanning over 100 biomarkers and 10 diseases, no EBC-based test has reached clinical use. The measurement variability that can span orders of magnitude, far exceeding the clinically required 10%, has long […]
ReTAMamba: Reliability-Aware Temporal Aggregation with Mamba for Irregular Clinical Time Series Prediction
arXiv:2605.16380v1 Announce Type: cross Abstract: Clinical time-series data are difficult to model with methods designed for regular sequences because they exhibit irregular sampling, frequent missing values, and heterogeneous observation patterns across variables. Existing approaches commonly use observation masks and time-gap information, but they do not continuously capture the decaying reliability of past observations or consistently […]
When Does Non-Uniform Replay Matter in Reinforcement Learning?
arXiv:2605.10236v3 Announce Type: replace-cross Abstract: Modern off-policy reinforcement learning algorithms often rely on simple uniform replay sampling and it remains unclear when and why non-uniform replay improves over this strong baseline. Across diverse RL settings, we show that the effectiveness of non-uniform replay is governed by three factors: replay volume, the number of replayed transitions […]
ClawGym: A Scalable Framework for Building Effective Claw Agents
arXiv:2604.26904v3 Announce Type: replace-cross Abstract: Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these environments remains constrained by the absence of a systematic framework, especially one for synthesizing verifiable training data and integrating it with agent training and diagnostic evaluation. To address this challenge, we present […]
A More Word-like Image Tokenization for MLLMs
arXiv:2605.17954v1 Announce Type: cross Abstract: Modern multimodal large language models (MLLMs) typically keep the language model fixed and train a visual projector that maps the pixels into a sequence of tokens in its embedding space, so that images can be presented in essentially the same form as text. However, the language model has been optimized […]
Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training
arXiv:2604.18966v2 Announce Type: replace-cross Abstract: Tabular language models can generate synthetic tables by modeling rows as token sequences, but they are typically trained once with supervised fine-tuning and then used as static synthesizers. This is limiting because next-token likelihood does not directly optimize the distributional, utility, and indistinguishability properties used to evaluate synthetic data. We […]
PAREDA: A Multi-Accent Speech Dataset of Natural Language Processing Research Discussions
arXiv:2605.17860v1 Announce Type: cross Abstract: While modern Automatic Speech Recognition (ASR) systems achieve high accuracy on benchmark corpora, their performance often degrades when there is real-world variability. This work focuses on variability arising due to accented, spontaneous, and domain-specific speech. In particular, we introduce PAper REading DAtaset (PAREDA), a first-of-its-kind multi-accent speech dataset consisting of […]
StreamPro: From Reactive Perception to Proactive Decision-Making in Streaming Video
arXiv:2605.16381v1 Announce Type: cross Abstract: Proactive streaming video understanding requires models to continuously process video streams and decide when to respond, rather than merely what to respond. This naturally introduces a decision-making problem under partial observations, where models must balance early prediction against sufficient evidence. However, existing benchmarks largely follow a “see-then-answer” paradigm, where responses […]
Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs
arXiv:2605.16366v1 Announce Type: cross Abstract: Video MLLMs face a persistent tension between spatial fidelity and temporal coverage: preserving fine-grained visual details requires many spatial tokens, while capturing short-lived events requires dense temporal sampling. We propose textbfFre-Res, a budget-adaptive dual-track video-token compression framework that separates these two forms of evidence. Fre-Res preserves sparse high-fidelity spatial anchors […]
LAST-RAG: Literature-Anchored Stochastic Trajectory Retrieval-Augmented Generation for Knowledge-Conditioned Degradation Model Selection
arXiv:2605.17902v1 Announce Type: new Abstract: Stochastic-process-based degradation modeling is a core approach for estimating the distribution of remaining useful life (RUL); however, the selection of an appropriate stochastic process has not been sufficiently addressed. Existing model selection methods mainly rely on the statistical fit of the observed health indicator (HI) trajectory, but this approach may […]
ANVIL: Analogies and Videos for Lecturers
arXiv:2605.16295v1 Announce Type: cross Abstract: We present ANVIL, a multimodal generative system that automates the production of analogy-based instructional animations for computer science topics. Given a concept definition, ANVIL generates a textual analogy, compiles it into a structured visual screenplay, and produces executable manim code to render an animation, with an automated repair mechanism to […]
LoopQ: Quantization for Recursive Transformers
arXiv:2605.16343v1 Announce Type: cross Abstract: Looped language models (LoopLMs) improve parameter efficiency by recursively reusing Transformer blocks, enabling deeper computation under a fixed model size. However, this reuse makes LoopLMs more fragile under post-training quantization (PTQ). We present the first systematic study of quantization in LoopLMs and identify three challenges: distribution shift across roles, state […]