MICA: Multi-granularity Intertemporal Credit Assignment for Long-Horizon Emotional Support Dialogue

arXiv:2603.06194v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) for large language models (LLMs) has shown strong performance in single-turn tasks, but extending it to multi-turn interaction remains challenging due to sparse rewards and poor per-turn credit assignment. In emotional support dialogues, responses shape future user states, so matched-state step-wise comparison is unavailable, while trajectory-level supervision […]

Enhancing Agent Safety Judgment: Controlled Benchmark Rewriting and Analogical Reasoning for Deceptive Out-of-Distribution Scenarios

arXiv:2605.03242v1 Announce Type: new Abstract: Tool-using agent systems powered by large language models (LLMs) are increasingly deployed across web, app, operating-system, and transactional environments. Yet existing safety benchmarks still emphasize explicit risks, potentially overstating a model’s ability to judge deceptive or ambiguous trajectories. To address this gap, we introduce ROME (Red-team Orchestrated Multi-agent Evolution), a […]

Deepfake Audio Detection Using Self-supervised Fusion Representations

arXiv:2605.03420v1 Announce Type: cross Abstract: This paper describes a submission to the Environment-Aware Speech and Sound Deepfake Detection Challenge (ESDD2) 2026, which addresses component-level deepfake detection using the CompSpoofV2 dataset, where speech and environmental sounds may be independently manipulated. To address this challenge, a dual-branch deepfake detection framework is proposed to jointly model speech and […]

FINER-SQL: Boosting Small Language Models for Text-to-SQL

arXiv:2605.03465v1 Announce Type: cross Abstract: Large language models have driven major advances in Text-to-SQL generation. However, they suffer from high computational cost, long latency, and data privacy concerns, which make them impractical for many real-world applications. A natural alternative is to use small language models (SLMs), which enable efficient and private on-premise deployment. Yet, SLMs […]

Donor-Aware scRNA-seq Benchmarks for IBD Classification

arXiv:2605.03281v1 Announce Type: new Abstract: Donor-level disease classification from single-cell RNA sequencing (scRNA-seq) requires strict donor-aware cross-validation: naive pipelines that split cells randomly conflate training and test donors, inflating reported performance through pseudoreplication. We present a donor-aware benchmark evaluating three feature representations across two independent IBD cohorts: centered log-ratio (CLR) transformed cell-type composition, GatedStructuralCFN dependency […]

MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models

arXiv:2605.03485v1 Announce Type: cross Abstract: Multidimensional human understanding is essential for real-world applications such as film analysis and virtual digital humans, yet current LVLM benchmarks largely focus on single-task settings and lack fine-grained, human-centric evaluation. In this work, we introduce MHPR, a comprehensive benchmark for joint perception-reasoning over human-centric scenes spanning individual, multi-person, and human-object […]

Parametrizing Convex Sets Using Sublinear Neural Networks

arXiv:2605.03520v1 Announce Type: cross Abstract: We propose a neural parameterization of convex sets by learning sublinear (positively homogeneous and convex) functions. Our networks implicitly represent both the support and gauge functions of a convex body. We prove a universal approximation theorem for convex sets under this parametrization. Empirically, we demonstrate the method on shape optimization […]

Revisiting the Travel Planning Capabilities of Large Language Models

arXiv:2605.03308v1 Announce Type: new Abstract: Travel planning serves as a critical task for long-horizon reasoning, exposing significant deficits in LLMs. However, existing benchmarks and evaluations primarily assess final plans in an end-to-end manner, which lacks interpretability and makes it difficult to analyze the root causes of failures. To bridge this gap, we decompose travel planning […]

Healthcare AI GYM for Medical Agents

arXiv:2605.02943v1 Announce Type: cross Abstract: Clinical reasoning demands multi-step interactions — gathering patient history, ordering tests, interpreting results, and making safe treatment decisions — yet a unified training environment provides the breadth of clinical domains and specialized tools to train generalizable medical AI agents through reinforcement learning remains elusive. We present a comprehensive empirical study […]

PatRe: A Full-Stage Office Action and Rebuttal Generation Benchmark for Patent Examination

arXiv:2605.03571v1 Announce Type: cross Abstract: Patent examination is a complex, multi-stage process requiring both technical expertise and legal reasoning, increasingly challenged by rising application volumes. Prior benchmarks predominantly view patent examination as discriminative classification or static extraction, failing to capture its inherently interactive and iterative nature, similar to the peer review and rebuttal process in […]

RouteHijack: Routing-Aware Attack on Mixture-of-Experts LLMs

arXiv:2605.02946v1 Announce Type: cross Abstract: Safety alignment is critical for the responsible deployment of large language models (LLMs). As Mixture-of-Experts (MoE) architectures are increasingly adopted to scale model capacity, understanding their safety robustness becomes essential. Existing adversarial attacks, however, have notable limitations. Prompt-based jailbreaks rely on heuristic search and transfer poorly, model intervention methods require […]

Automated Large-scale CVRP Solver Design via LLM-assisted Flexible MCTS

arXiv:2605.03339v1 Announce Type: new Abstract: Solving large-scale CVRP (LSCVRP) with hundreds to thousands of nodes remains difficult for even state-of-the-art solvers. Divide-and-conquer can scale by decomposing the instance into size-reduced subproblems, but designing decomposition logic and configuring sub-solvers is highly expertise- and labor-intensive. Large Language Models (LLMs) have emerged as promising tools for automated algorithm […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844