arXiv:2605.17879v1 Announce Type: cross Abstract: Training frontier-scale foundation models involves coordinating tens of thousands of GPUs over multi-month runs, where even minor performance degradations can accumulate into substantial efficiency losses. Existing health-check mechanisms, such as NCCL tests or GPU burn-in, primarily focus on functional correctness and often fail to detect fail-slow behaviors that silently degrade […]
Modality vs. Morphology: A Framework for Time Series Classification for Biological Signals
arXiv:2605.18483v1 Announce Type: cross Abstract: Time series classification (TSC) of biological signals has progressed from handcrafted, modality-specific approaches to deep architectures capable of representing the diverse waveform structures of underlying physiological processes (i.e., morphology). This review introduces a unified morphology–modality framework that connects waveform structure to a methodological design, revealing how spikes, bursts, oscillations, slow […]
WriteSAE: Sparse Autoencoders for Recurrent State
arXiv:2605.12770v3 Announce Type: replace-cross Abstract: We introduce WriteSAE, the first sparse autoencoder that decomposes and edits the matrix cache write of state-space and hybrid recurrent language models, where residual SAEs cannot reach. Existing SAEs read residual streams, but Gated DeltaNet, Mamba-2, and RWKV-7 write to a $d_k times d_v$ cache through rank-1 updates $k_t v_t^top$ […]
HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents
arXiv:2605.17873v1 Announce Type: cross Abstract: Training long-horizon LLM agents with reinforcement learning is challenging because sparse outcome rewards reveal whether a task succeeds, but not which intermediate actions caused the outcome or how they should be corrected. Recent methods alleviate this issue by generating rewards or textual hints from turn-level action-output signals, or by using […]
An AI system to help scientists write expert-level empirical software
arXiv:2509.06503v2 Announce Type: replace Abstract: The cycle of scientific discovery is frequently bottlenecked by the slow, manual creation of software to support computational experimentscitehannay2009how. To address this, we present Empirical Research Assistance (ERA), an AI system that creates expert-level scientific software whose goal is to maximize a quality metric. The system uses a Large Language […]
Stable Audio 3
arXiv:2605.17991v1 Announce Type: cross Abstract: Stable Audio 3 is a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. Since our models can generate several minutes of audio, variable-length generations are key to avoid the cost of producing full-length generations for short sounds. We also support inpainting, enabling targeted […]
Does Your Reasoning Model Implicitly Know When to Stop Thinking?
arXiv:2602.08354v5 Announce Type: replace Abstract: Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. Recent studies show that longer reasoning chains are frequently uncorrelated […]
SkillJect: Effectively Automating Skill-Based Prompt Injection for Skill-Enabled Agents
arXiv:2602.14211v2 Announce Type: replace-cross Abstract: Agent skills are increasingly used to extend LLM agents with task-specific instructions, executable scripts, and auxiliary resources. While improving reusability, this modular design also introduces a new supply-chain attack surface: a malicious or compromised skill may be repeatedly loaded as trusted guidance and steer an agent’s tool use during downstream […]
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
arXiv:2605.06638v3 Announce Type: replace Abstract: Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. Observed LLM shortcomings in long-horizon reasoning have raised the prospect that they are fundamental to the […]
Investigation into In-Context Learning Capabilities of Transformers
arXiv:2604.25858v2 Announce Type: replace-cross Abstract: Transformers have demonstrated a strong ability for in-context learning (ICL), enabling models to solve previously unseen tasks using only example input output pairs provided at inference time. While prior theoretical work has established conditions under which transformers can perform linear classification in-context, the empirical scaling behavior governing when this mechanism […]
Experimentally validated quantum-secure federated learning over a multi-user quantum network
arXiv:2501.12709v2 Announce Type: replace-cross Abstract: Federated learning enables decentralized, privacy-preserving training but remains vulnerable to privacy leakage in the quantum era. Quantum federated learning (QFL) offers a promising path towards enhanced security and efficiency. However, a practical and experimentally validated QFL protocol utilizing near-term quantum techniques to address data privacy has been lacking. Here we […]
Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement
arXiv:2409.02428v4 Announce Type: replace-cross Abstract: Achieving the effective design and improvement of reward functions in reinforcement learning (RL) tasks with complex custom environments and multiple requirements presents considerable challenges. In this paper, we propose ERFSL, an efficient reward function searcher using LLMs, which enables LLMs to be effective white-box searchers and highlights their advanced semantic […]