arXiv:2604.02598v2 Announce Type: replace-cross Abstract: LLM-generated explanations can make technical content more accessible, but there is a ceiling on what they can support interactively. Because LLM outputs are static text, they cannot be executed or stepped through. We argue that grounding explanations in a formalized representation enables interactive affordances beyond what static text supports. We […]
Structuring versus Problematizing: How LLM-based Agents Scaffold Learning in Diagnostic Reasoning
arXiv:2604.09158v1 Announce Type: cross Abstract: Supporting students in developing diagnostic reasoning is a key challenge across educational domains. Novices often face cognitive biases such as premature closure and over-reliance on heuristics, and they struggle to transfer diagnostic strategies to new cases. Scenario-based learning (SBL) enhanced by Learning Analytics (LA) and large language models (LLM) offers […]
Towards Context-Aware Image Anonymization with Multi-Agent Reasoning
arXiv:2603.27817v3 Announce Type: replace-cross Abstract: Street-level imagery contains personally identifiable information (PII), some of which is context-dependent. Existing anonymization methods either over-process images or miss subtle identifiers, while API-based solutions compromise data sovereignty. We present an agentic framework CAIAMAR (underlineContext-underlineAware underlineImage underlineAnonymization with underlineMulti-underlineAgent underlineReasoning) for context-aware PII segmentation with diffusion-based anonymization, combining pre-defined processing […]
CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation
arXiv:2604.09155v1 Announce Type: cross Abstract: Graphical user interface (GUI) agents powered by vision language models (VLMs) are rapidly moving from passive assistance to autonomous operation. However, this unrestricted action space exposes users to severe and irreversible financial, privacy or social harm. Existing safeguards rely on prompt engineering, brittle heuristics and VLM-as-critic lack formal verification and […]
RAM: Recover Any 3D Human Motion in-the-Wild
arXiv:2603.19929v2 Announce Type: replace-cross Abstract: RAM incorporates a motion-aware semantic tracker with adaptive Kalman filtering to achieve robust identity association under severe occlusions and dynamic interactions. A memory-augmented Temporal HMR module further enhances human motion reconstruction by injecting spatio-temporal priors for consistent and smooth motion estimation. Moreover, a lightweight Predictor module forecasts future poses to […]
EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant Graph Attention Transformers
arXiv:2604.09130v1 Announce Type: cross Abstract: As $SE(3)$-equivariant graph neural networks mature as a core tool for 3D atomistic modeling, improving their efficiency, expressivity, and physical consistency has become a central challenge for large-scale applications. In this work, we introduce EquiformerV3, the third generation of the $SE(3)$-equivariant graph attention Transformer, designed to advance all three dimensions: […]
You’ve Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector
arXiv:2603.15757v2 Announce Type: replace-cross Abstract: What happens when a pretrained generative robot policy is provided a constant initial noise as input, rather than repeatedly sampling it from a Gaussian? We demonstrate that the performance of a pretrained, frozen diffusion or flow matching policy can be improved with respect to a downstream reward by swapping the […]
Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition
arXiv:2604.09121v1 Announce Type: cross Abstract: Recent years have witnessed remarkable progress in automatic speech recognition (ASR), driven by advances in model architectures and large-scale training data. However, two important aspects remain underexplored. First, Word Error Rate (WER), the dominant evaluation metric for decades, treats all words equally and often fails to reflect the semantic correctness […]
Memory-efficient Continual Learning with Prototypical Exemplar Condensation
arXiv:2603.13804v2 Announce Type: replace-cross Abstract: Rehearsal-based continual learning (CL) mitigates catastrophic forgetting by maintaining a subset of samples from previous tasks for replay. Existing studies primarily focus on optimizing memory storage through coreset selection strategies. While these methods are effective, they typically require storing a substantial number of samples per class (SPC), often exceeding 20, […]
PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing
arXiv:2604.09111v1 Announce Type: cross Abstract: Recently, artificial intelligence-based dubbing technology has advanced, enabling automated dubbing (AD) to convert the source speech of a video into target speech in different languages. However, natural AD still faces synchronization challenges such as duration and lip-synchronization (lip-sync), which are crucial for preserving the viewer experience. Therefore, this paper proposes […]
Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails
arXiv:2603.03099v5 Announce Type: replace-cross Abstract: Despite Adam demonstrating faster empirical convergence than SGD in many applications, much of the existing theory yields guarantees essentially comparable to those of SGD, leaving the empirical performance gap insufficiently explained. In this paper, we uncover a key second-moment normalization in Adam and develop a stopping-time/martingale analysis that provably distinguishes […]
TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training
arXiv:2604.09107v1 Announce Type: cross Abstract: Modern LLM reinforcement learning (RL) workloads require a highly efficient weight transfer system to scale training across heterogeneous computational resources. However, existing weight transfer approaches either fail to provide flexibility for dynamically scaling clusters or incur fundamental data movement overhead, resulting in poor performance. We introduce Reference-Oriented Storage (ROS), a […]