HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation

arXiv:2603.23871v1 Announce Type: cross Abstract: Large language models trained with reinforcement learning (RL) for mathematical reasoning face a fundamental challenge: on problems the model cannot solve at all – “cliff” prompts – the RL gradient vanishes entirely, preventing any learning signal from reaching these failure modes. We introduce Hybrid Distillation Policy Optimization (HDPO), which augments […]

Revealing Multi-View Hallucination in Large Vision-Language Models

arXiv:2603.23934v1 Announce Type: cross Abstract: Large vision-language models (LVLMs) are increasingly being applied to multi-view image inputs captured from diverse viewpoints. However, despite this growing use, current LVLMs often confuse or mismatch visual information originating from different instances or viewpoints, a phenomenon we term multi-view hallucination. To systematically analyze this problem, we construct MVH-Bench, a […]

Understanding the Challenges in Iterative Generative Optimization with LLMs

arXiv:2603.23994v1 Announce Type: cross Abstract: Generative optimization uses large language models (LLMs) to iteratively improve artifacts (such as code, workflows or prompts) using execution feedback. It is a promising approach to building self-improving agents, yet in practice remains brittle: despite active research, only 9% of surveyed agents used any automated optimization. We argue that this […]

Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization

arXiv:2603.24093v1 Announce Type: cross Abstract: Recently, reinforcement learning~(RL) has become an important approach for improving the capabilities of large language models~(LLMs). In particular, reinforcement learning from verifiable rewards~(RLVR) has emerged as a promising paradigm for reasoning tasks. However, existing RL-based training still remains only a rough approximation to human learning. Human learners leverage both external […]

StateLinFormer: Stateful Training Enhancing Long-term Memory in Navigation

arXiv:2603.23571v1 Announce Type: cross Abstract: Effective navigation intelligence relies on long-term memory to support both immediate generalization and sustained adaptation. However, existing approaches face a dilemma: modular systems rely on explicit mapping but lack flexibility, while Transformer-based end-to-end models are constrained by fixed context windows, limiting persistent memory across extended interactions. We introduce StateLinFormer, a […]

Wafer-Level Etch Spatial Profiling for Process Monitoring from Time-Series with Time-LLM

arXiv:2603.23576v1 Announce Type: cross Abstract: Understanding wafer-level spatial variations from in-situ process signals is essential for advanced plasma etching process monitoring. While most data-driven approaches focus on scalar indicators such as average etch rate, actual process quality is determined by complex two-dimensional spatial distributions across the wafer. This paper presents a spatial regression model that […]

LLMLOOP: Improving LLM-Generated Code and Tests through Automated Iterative Feedback Loops

arXiv:2603.23613v1 Announce Type: cross Abstract: Large Language Models (LLMs) are showing remarkable performance in generating source code, yet the generated code often has issues like compilation errors or incorrect code. Researchers and developers often face wasted effort in implementing checks and refining LLM-generated code, frequently duplicating their efforts. This paper presents LLMLOOP, a framework that […]

lambdaSplit: Self-Supervised Content-Aware Spectral Unmixing for Fluorescence Microscopy

arXiv:2603.23647v1 Announce Type: cross Abstract: In fluorescence microscopy, spectral unmixing aims to recover individual fluorophore concentrations from spectral images that capture mixed fluorophore emissions. Since classical methods operate pixel-wise and rely on least-squares fitting, their performance degrades with increasingly overlapping emission spectra and higher levels of noise, suggesting that a data-driven approach that can learn […]

Prototype Fusion: A Training-Free Multi-Layer Approach to OOD Detection

arXiv:2603.23677v1 Announce Type: cross Abstract: Deep learning models are increasingly deployed in safety-critical applications, where reliable out-of-distribution (OOD) detection is essential to ensure robustness. Existing methods predominantly rely on the penultimate-layer activations of neural networks, assuming they encapsulate the most informative in-distribution (ID) representations. In this work, we revisit this assumption to show that intermediate […]

The Diminishing Returns of Early-Exit Decoding in Modern LLMs

arXiv:2603.23701v1 Announce Type: cross Abstract: In Large Language Model (LLM) inference, early-exit refers to stopping computation at an intermediate layer once the prediction is sufficiently confident, thereby reducing latency and cost. However, recent LLMs adopt improved pretraining recipes and architectures that reduce layer redundancy, potentially limiting early-exit opportunities. We re-evaluate layer-wise early-exit in modern LLMs […]

AI-driven Intent-Based Networking Approach for Self-configuration of Next Generation Networks

arXiv:2603.23772v1 Announce Type: cross Abstract: Intent-Based Networking (IBN) aims to simplify operating heterogeneous infrastructures by translating high-level intents into enforceable policies and assuring compliance. However, dependable automation remains difficult because (i) realizing intents from ambiguous natural language into controller-ready policies is brittle and prone to conflicts and unintended side effects, and (ii) assurance is often […]

Object Search in Partially-Known Environments via LLM-informed Model-based Planning and Prompt Selection

arXiv:2603.23800v1 Announce Type: cross Abstract: We present a novel LLM-informed model-based planning framework, and a novel prompt selection method, for object search in partially-known environments. Our approach uses an LLM to estimate statistics about the likelihood of finding the target object when searching various locations throughout the scene that, combined with travel costs extracted from […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844