arXiv:2604.11557v1 Announce Type: new Abstract: Tool-use capability is a fundamental component of LLM agents, enabling them to interact with external systems through structured function calls. However, existing research exhibits inconsistent interaction representations, largely overlooks the structural distribution of tool-use trajectories, and relies on incompatible evaluation benchmarks. We present UniToolCall, a unified framework for tool learning […]
SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding
arXiv:2604.09557v1 Announce Type: cross Abstract: Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and representative workloads are essential for accurately measuring its effectiveness. Existing benchmarks suffer from limited task diversity, inadequate support for throughput-oriented evaluation, […]
Tuning Qwen2.5-VL to Improve Its Web Interaction Skills
arXiv:2604.09571v1 Announce Type: cross Abstract: Recent advances in vision-language models (VLMs) have sparked growing interest in using them to automate web tasks, yet their feasibility as independent agents that reason and act purely from visual input remains underexplored. We investigate this setting using Qwen2.5-VL-32B, one of the strongest open-source VLMs available, and focus on improving […]
Generative UI: LLMs are Effective UI Generators
arXiv:2604.09577v1 Announce Type: cross Abstract: AI models excel at creating content, but typically render it with static, predefined interfaces. Specifically, the output of LLMs is often a markdown “wall of text”. Generative UI is a long standing promise, where the model generates not just the content, but the interface itself. Until now, Generative UI was […]
MeloTune: On-Device Arousal Learning and Peer-to-Peer Mood Coupling for Proactive Music Curation
arXiv:2604.10815v2 Announce Type: cross Abstract: MeloTune is an iPhone-deployed music agent that instantiates the Mesh Memory Protocol (MMP) and Symbolic-Vector Attention Fusion (SVAF) as a production system for affect-aware music curation with peer-to-peer mood coupling. Each device runs two closed-form continuous-time (CfC) networks: a private listener-level CfC that predicts a short-horizon affective trajectory on Russell’s […]
Beyond Offline A/B Testing: Context-Aware Agent Simulation for Recommender System Evaluation
arXiv:2604.09549v1 Announce Type: cross Abstract: Recommender systems are central to online services, enabling users to navigate through massive amounts of content across various domains. However, their evaluation remains challenging due to the disconnect between offline metrics and online performance. The emergence of Large Language Model-powered agents offers a promising solution, yet existing studies model users […]
Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition
arXiv:2604.09121v3 Announce Type: replace-cross Abstract: Recent years have witnessed remarkable progress in automatic speech recognition (ASR), driven by advances in model architectures and large-scale training data. However, two important aspects remain underexplored. First, Word Error Rate (WER), the dominant evaluation metric for decades, treats all words equally and often fails to reflect the semantic correctness […]
WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent Benchmark
arXiv:2604.10988v1 Announce Type: new Abstract: Existing browser agent benchmarks face a fundamental trilemma: real-website benchmarks lack reproducibility due to content drift, controlled environments sacrifice realism by omitting real-web noise, and both require costly manual curation that limits scalability. We present WebForge, the first fully automated framework that resolves this trilemma through a four-agent pipeline — […]
EvoNash-MARL: A Closed-Loop Multi-Agent Reinforcement Learning Framework for Medium-Horizon Equity Allocation
arXiv:2604.10911v2 Announce Type: new Abstract: Medium- to long-horizon equity allocation is challenging due to weak predictive structure, non-stationary market regimes, and the degradation of signals under realistic trading constraints. Conventional approaches often rely on single predictors or loosely coupled pipelines, which limit robustness under distributional shift. This paper proposes EvoNash-MARL, a closed-loop framework that integrates […]
The origin of the genetic code is encrypted in the structure of present-day transfer RNAs
arXiv:2604.11696v1 Announce Type: new Abstract: Background/ Objectives: Resolving the origin of the genetic code is fundamental to understanding how life began its journey out of the chemical world. Since its deciphering some 60 years ago, there is still no general theory of the emergence of the genetic code. My objectives are to bring some unique […]
Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization
arXiv:2604.11259v1 Announce Type: new Abstract: Mobile GUI agents powered by Multimodal Large Language Models (MLLMs) can execute complex tasks on mobile devices. Despite this progress, most existing systems still optimize task success or efficiency, neglecting users’ privacy personalization. In this paper, we study the often-overlooked problem of agent personalization. We observe that personalization can induce […]
OOM-RL: Out-of-Money Reinforcement Learning Market-Driven Alignment for LLM-Based Multi-Agent Systems
arXiv:2604.11477v1 Announce Type: new Abstract: The alignment of Multi-Agent Systems (MAS) for autonomous software engineering is constrained by evaluator epistemic uncertainty. Current paradigms, such as Reinforcement Learning from Human Feedback (RLHF) and AI Feedback (RLAIF), frequently induce model sycophancy, while execution-based environments suffer from adversarial “Test Evasion” by unconstrained agents. In this paper, we introduce […]