arXiv:2604.15271v2 Announce Type: replace-cross Abstract: Reliable uncertainty estimation is critical for medical image segmentation, where automated contours feed downstream quantification and clinical decision support. Many strong uncertainty methods require repeated inference, while efficient single-forward-pass alternatives often provide weaker failure ranking or rely on restrictive feature-space assumptions. We present $textbfSegWithU$, a post-hoc framework that augments a […]
GeGS-PCR: Effective and Robust 3D Point Cloud Registration with Two-Stage Color-Enhanced Geometric-3DGS Fusion
arXiv:2604.17721v1 Announce Type: cross Abstract: We address the challenge of point cloud registration using color information, where traditional methods relying solely on geometric features often struggle in low-overlap and incomplete scenarios. To overcome these limitations, we propose GeGS-PCR, a novel two-stage method that combines geometric, color, and Gaussian information for robust registration. Our approach incorporates […]
Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval
arXiv:2604.17906v1 Announce Type: cross Abstract: While Large Language Models (LLMs) exhibit exceptional zero-shot relevance modeling, their high computational cost necessitates framing passage retrieval as a budget-constrained global optimization problem. Existing approaches passively rely on first-stage dense retrievers, which leads to two limitations: (1) failing to retrieve relevant passages in semantically distinct clusters, and (2) failing […]
Before You Interpret the Profile: Validity Scaling for LLM Metacognitive Self-Report
arXiv:2604.17707v1 Announce Type: cross Abstract: Clinical personality assessment screens response validity before interpreting substantive scales. LLM evaluation does not. We apply the validity scaling framework from the PAI and MMPI-3 to metacognitive probe data from 20 frontier models across 524 items. Six validity indices are operationalised: L (maintaining confidence on errors), K (betting on errors), […]
Scalable Neighborhood-Based Multi-Agent Actor-Critic
arXiv:2604.18190v1 Announce Type: cross Abstract: We propose MADDPG-K, a scalable extension to Multi-Agent Deep Deterministic Policy Gradient (MADDPG) that addresses the computational limitations of centralized critic approaches. Centralized critics, which condition on the observations and actions of all agents, have demonstrated significant performance gains in cooperative and competitive multi-agent settings. However, their critic networks grow […]
VeriGraphi: A Multi-Agent Framework of Hierarchical RTL Generation for Large Hardware Designs
arXiv:2604.14550v2 Announce Type: replace-cross Abstract: Generating synthesizable Verilog for large, hierarchical hardware designs remains a significant challenge for large language models (LLMs), which struggle to replicate the structured reasoning that human experts employ when translating complex specifications into RTL. When tasked with producing hierarchical Verilog, LLMs frequently lose context across modules, hallucinate interfaces, fabricate inter-module […]
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
arXiv:2509.02547v5 Announce Type: replace Abstract: The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov […]
WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference
arXiv:2604.17701v1 Announce Type: cross Abstract: While distributed device-edge speculative decoding enhances resource utilization across heterogeneous nodes, its performance is often bottlenecked by conventional token-level verification strategies. Such rigid alignment leads to excessive rejections, significantly diminishing the accepted sequence length and increasing interaction rounds under fluctuating wireless conditions. In this paper, we propose WISV (Wireless-Informed Semantic […]
The World Leaks the Future: Harness Evolution for Future Prediction Agents
arXiv:2604.15719v2 Announce Type: replace Abstract: Many consequential decisions must be made before the relevant outcome is known. Such problems are commonly framed as future prediction, where an LLM agent must form a prediction for an unresolved question using only the public information available at the prediction time. The setting is difficult because public evidence evolves […]
Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization
arXiv:2604.14267v2 Announce Type: replace-cross Abstract: Search agents extend Large Language Models (LLMs) beyond static parametric knowledge by enabling access to up-to-date and long-tail information unavailable during pretraining. While reinforcement learning has been widely adopted for training such agents, existing approaches face key limitations: process supervision often suffers from unstable value estimation, whereas outcome supervision struggles […]
EyeMulator: Improving Code Language Models by Mimicking Human Visual Attention
arXiv:2508.16771v2 Announce Type: replace-cross Abstract: Code Language Models (CodeLLMs) traditionally learn attention based solely on statistical input-output token correlations (“machine attention”). In contrast, human developers rely on intuition, selectively fixating on semantically salient tokens during program comprehension. We present EyeMulator, a model-agnostic technique to align CodeLLM attention with human visual attention without architectural changes. By […]
CAPO: Counterfactual Credit Assignment in Sequential Cooperative Teams
arXiv:2604.17693v1 Announce Type: cross Abstract: In cooperative teams where agents act in a fixed order and share a single team reward, it is hard to know how much each agent contributed, and harder still when agents are updated one at a time because data collected earlier no longer reflects the new policies. We introduce the […]