Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits

arXiv:2605.14553v1 Announce Type: cross Abstract: Prompt engineering has become central to eliciting the capabilities of large language models (LLMs). At its core lies prompt selection — efficiently identifying the most effective prompts. However, most prior investigations overlook a key challenge: the inherently multi-faceted nature of prompt performance, which cannot be captured by a single metric. […]

OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling

arXiv:2601.19924v2 Announce Type: replace-cross Abstract: We investigate the capabilities and scalability of Large Language Models (LLMs) in optimization modeling, a domain requiring structured reasoning and precise formulation. To this end, we introduce OPT-ENGINE, an extensible benchmark framework with quantifiable and controllable complexity. OPT-ENGINE spans ten canonical Operations Research problems, systematically scaling from Linear Programming to […]

MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models

arXiv:2602.23798v2 Announce Type: replace-cross Abstract: Machine unlearning for large language models often faces a privacy dilemma in which strict constraints prohibit sharing either the server’s parameters or the client’s forget set. To address this dual non-disclosure constraint, we propose MPU, an algorithm-agnostic privacy-preserving Multiple Perturbed Copies Unlearning framework that primarily introduces two server-side modules: Pre-Process […]

PinpointQA: A Dataset and Benchmark for Small Object-Centric Spatial Understanding in Indoor Videos

arXiv:2604.08991v2 Announce Type: replace-cross Abstract: Small object-centric spatial understanding in indoor videos remains a significant challenge for multimodal large language models (MLLMs), despite its practical value for object search and assistive applications. Although existing benchmarks have advanced video spatial intelligence, embodied reasoning, and diagnostic perception, no existing benchmark directly evaluates whether a model can localize […]

PREPING: Building Agent Memory without Tasks

arXiv:2605.13880v1 Announce Type: new Abstract: Agent memory is typically constructed either offline from curated demonstrations or online from post-deployment interactions. However, regardless of how it is built, an agent faces a cold-start gap when first introduced to a new environment without any task-specific experience available. In this paper, we study pre-task memory construction: whether an […]

Quantifying Cyber-Vulnerability in Power Electronics Systems via an Impedance-Based Attack Reachable Domain

arXiv:2605.14502v1 Announce Type: cross Abstract: Power electronics systems are increasingly exposed to cyber threats due to their integration with digital controllers and communication networks. However, an attacker-oriented metric is still lacking to quantify the extent to which a node can be pushed toward instability within a privilege-constrained action space. This letter proposes an impedance-based Attack […]

Action-Inspired Generative Models

arXiv:2605.14631v1 Announce Type: cross Abstract: We introduce Action-Inspired Generative Models (AGMs), a dual-network generative framework motivated by the observation that existing bridge-matching methods assign uniform regression weight to every stochastic transition in the transport landscape, regardless of whether a given bridge sample lies along a structurally coherent trajectory or a degenerate one. We address this […]

Efficient Generative Retrieval for E-commerce Search with Semantic Cluster IDs and Expert-Guided RL

arXiv:2605.14434v1 Announce Type: cross Abstract: Generative retrieval offers a promising alternative by unifying the fragmented multi-stage retrieval process into a single end-to-end model. However, its practical adoption in industrial e-commerce search remains challenging, given the massive and dynamic product catalogs, strict latency requirements, and the need to align retrieval with downstream ranking goals. In this […]

Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy

arXiv:2605.14558v1 Announce Type: cross Abstract: Agentic reinforcement learning trains large language models using multi-turn trajectories that interleave long reasoning traces with short environment-facing actions. Common policy-gradient methods, such as PPO and GRPO, treat each token in a trajectory equally, leading to uniform credit assignment. In this paper, we critically demonstrate that such uniform credit assignment […]

GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration

arXiv:2605.13848v1 Announce Type: new Abstract: Agentic LLM frameworks that rely on prompted orchestration, where the model itself determines workflow transitions, often suffer from hallucinated routing, infinite loops, and non-reproducible execution. We introduce GraphBit, an engine-orchestrated framework that defines workflows explicitly and deterministically as a directed acyclic graph (DAG). Unlike prompted orchestration, agents in GraphBit operate […]

Head Forcing: Long Autoregressive Video Generation via Head Heterogeneity

arXiv:2605.14487v1 Announce Type: cross Abstract: Autoregressive video diffusion models support real-time synthesis but suffer from error accumulation and context loss over long horizons. We discover that attention heads in AR video diffusion transformers serve functionally distinct roles as local heads for detail refinement, anchor heads for structural stabilization, and memory heads for long-range context aggregation, […]

PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media

arXiv:2605.14534v1 Announce Type: cross Abstract: Evaluating object removal in images and videos remains challenging because the task is inherently one-to-many, yet existing metrics frequently disagree with human perception. Full-reference metrics reward copy-paste behaviors over genuine erasure; no-reference metrics suffer from systematic biases such as favoring blurry results; and global temporal metrics are insensitive to localized […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844