arXiv:2603.27346v2 Announce Type: replace-cross Abstract: Robotic manipulation remains challenging for reinforcement learning due to contact-rich dynamics, long horizons, and training instability. Although off-policy actor-critic algorithms such as SAC and TD3 perform well in simulation, they often suffer from policy oscillations and performance collapse in realistic settings, partly due to experience replay strategies that ignore the […]
ATBench: A Diverse and Realistic Trajectory Benchmark for Long-Horizon Agent Safety
arXiv:2604.02022v1 Announce Type: new Abstract: Evaluating the safety of LLM-based agents is increasingly important because risks in realistic deployments often emerge over multi-step interactions rather than isolated prompts or final responses. Existing trajectory-level benchmarks remain limited by insufficient interaction diversity, coarse observability of safety failures, and weak long-horizon realism. We introduce ATBench, a trajectory-level benchmark […]
A deep learning pipeline for PAM50 subtype classification using histopathology images and multi-objective patch selection
arXiv:2604.01798v1 Announce Type: cross Abstract: Breast cancer is a highly heterogeneous disease with diverse molecular profiles. The PAM50 gene signature is widely recognized as a standard for classifying breast cancer into intrinsic subtypes, enabling more personalized treatment strategies. In this study, we introduce a novel optimization-driven deep learning framework that aims to reduce reliance on […]
Do Large Language Models Mentalize When They Teach?
arXiv:2604.01594v1 Announce Type: new Abstract: How do LLMs decide what to teach next: by reasoning about a learner’s knowledge, or by using simpler rules of thumb? We test this in a controlled task previously used to study human teaching strategies. On each trial, a teacher LLM sees a hypothetical learner’s trajectory through a reward-annotated directed […]
Generation Is Compression: Zero-Shot Video Coding via Stochastic Rectified Flow
arXiv:2603.26571v2 Announce Type: replace-cross Abstract: Recent advances in generative modeling have enabled perceptual video compression at ultra-low bitrates, yet existing methods predominantly treat the generative model as a refinement or reconstruction module attached to a separately designed codec backbone. We propose emphGenerative Video Codebook Codec (GVCC), a zero-shot framework that turns a pretrained video generative […]
MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction
arXiv:2604.01600v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have recently demonstrated promising capabilities in multimodal coding tasks such as chart-to-code generation. However, existing methods primarily rely on supervised fine-tuning (SFT), which requires the model to learn code patterns through chart-code pairs but does not expose the model to a code execution environment. Moreover, […]
FSKD: Monocular Forest Structure Inference via LiDAR-to-RGBI Knowledge Distillation
arXiv:2604.01766v1 Announce Type: cross Abstract: Very High Resolution (VHR) forest structure data at individual-tree scale is essential for carbon, biodiversity, and ecosystem monitoring. Still, airborne LiDAR remains costly and infrequent despite being the reference for forest structure metrics like Canopy Height Model (CHM), Plant Area Index (PAI), and Foliage Height Diversity (FHD). We propose FSKD: […]
From Multi-Agent to Single-Agent: When Is Skill Distillation Beneficial?
arXiv:2604.01608v1 Announce Type: new Abstract: Multi-agent systems (MAS) tackle complex tasks by distributing expertise, though this often comes at the cost of heavy coordination overhead, context fragmentation, and brittle phase ordering. Distilling a MAS into a single-agent skill can bypass these costs, but this conversion lacks a principled answer for when and what to distill. […]
Beyond Via: Analysis and Estimation of the Impact of Large Language Models in Academic Papers
arXiv:2603.25638v2 Announce Type: replace-cross Abstract: Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of “beyond” and “via” in titles and the decreased frequency of “the” and “of” in abstracts. […]
Analysis of LLM Performance on AWS Bedrock: Receipt-item Categorisation Case Study
arXiv:2604.01615v1 Announce Type: new Abstract: This paper presents a systematic, cost-aware evaluation of large language models (LLMs) for receipt-item categorisation within a production-oriented classification framework. We compare four instruction-tuned models available through AWS Bedrock: Claude 3.7 Sonnet, Claude 4 Sonnet, Mixtral 8x7B Instruct, and Mistral 7B Instruct. The aim of the study was (1) to […]
DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning
arXiv:2604.01765v1 Announce Type: cross Abstract: Recently, world-action models (WAM) have emerged to bridge vision-language-action (VLA) models and world models, unifying their reasoning and instruction-following capabilities and spatio-temporal world modeling. However, existing WAM approaches often focus on modeling 2D appearance or latent representations, with limited geometric grounding-an essential element for embodied systems operating in the physical […]
Exploring Robust Multi-Agent Workflows for Environmental Data Management
arXiv:2604.01647v1 Announce Type: new Abstract: Embedding LLM-driven agents into environmental FAIR data management is compelling – they can externalize operational knowledge and scale curation across heterogeneous data and evolving conventions. However, replacing deterministic components with probabilistic workflows changes the failure mode: LLM pipelines may generate plausible but incorrect outputs that pass superficial checks and propagate […]