When Do We Need LLMs? A Diagnostic for Language-Driven Bandits

arXiv:2604.05859v1 Announce Type: new Abstract: We study Contextual Multi-Armed Bandits (CMABs) for non-episodic sequential decision making problems where the context includes both textual and numerical information (e.g., recommendation systems, dynamic portfolio adjustments, offer selection; all frequent problems in finance). While Large Language Models (LLMs) are increasingly applied to these settings, utilizing LLMs for reasoning at […]

How LLMs Follow Instructions: Skillful Coordination, Not a Universal Mechanism

arXiv:2604.06015v1 Announce Type: new Abstract: Instruction tuning is commonly assumed to endow language models with a domain-general ability to follow instructions, yet the underlying mechanism remains poorly understood. Does instruction-following rely on a universal mechanism or compositional skill deployment? We investigate this through diagnostic probing across nine diverse tasks in three instruction-tuned models. Our analysis […]

CRFT: Consistent-Recurrent Feature Flow Transformer for Cross-Modal Image Registration

arXiv:2604.05689v1 Announce Type: cross Abstract: We present Consistent-Recurrent Feature Flow Transformer (CRFT), a unified coarse-to-fine framework based on feature flow learning for robust cross-modal image registration. CRFT learns a modality-independent feature flow representation within a transformer-based architecture that jointly performs feature alignment and flow estimation. The coarse stage establishes global correspondences through multi-scale feature correlation, […]

Artificial Intelligence and the Structure of Mathematics

arXiv:2604.06107v1 Announce Type: new Abstract: Recent progress in artificial intelligence (AI) is unlocking transformative capabilities for mathematics. There is great hope that AI will help solve major open problems and autonomously discover new mathematical concepts. In this essay, we further consider how AI may open a grand perspective on mathematics by forging a new route, […]

Generative AI for Video Trailer Synthesis: From Extractive Heuristics to Autoregressive Creativity

arXiv:2604.04953v1 Announce Type: cross Abstract: The domain of automatic video trailer generation is currently undergoing a profound paradigm shift, transitioning from heuristic-based extraction methods to deep generative synthesis. While early methodologies relied heavily on low-level feature engineering, visual saliency, and rule-based heuristics to select representative shots, recent advancements in Large Language Models (LLMs), Multimodal Large […]

Flowr — Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains

arXiv:2604.05987v1 Announce Type: new Abstract: Retail supply chain operations in supermarket chains involve continuous, high-volume manual workflows spanning demand forecasting, procurement, supplier coordination, and inventory replenishment, processes that are repetitive, decision-intensive, and difficult to scale without significant human effort. Despite growing investment in data analytics, the decision-making and coordination layers of these workflows remain predominantly […]

Contextuality as an External Bookkeeping Cost under Fixed Shared-State Semantics

arXiv:2601.20167v2 Announce Type: cross Abstract: Contextuality is a central feature distinguishing quantum from classical probability theories, but its operational meaning is often stated only qualitatively. In this Letter, we study a simple information-theoretic question: how much additional contextual information must a classical simulation introduce when it tries to keep a shared internal description fixed across […]

Automatic Image-Level Morphological Trait Annotation for Organismal Images

arXiv:2604.01619v2 Announce Type: replace-cross Abstract: Morphological traits are physical characteristics of biological organisms that provide vital clues on how organisms interact with their environment. Yet extracting these traits remains a slow, expert-driven process, limiting their use in large-scale ecological studies. A major bottleneck is the absence of high-quality datasets linking biological images to trait-level annotations. […]

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

arXiv:2604.06132v1 Announce Type: new Abstract: Large language models are increasingly deployed as autonomous agents executing multi-step workflows in real-world software environments. However, existing agent benchmarks suffer from three critical limitations: (1) trajectory-opaque grading that checks only final outputs, (2) underspecified safety and robustness evaluation, and (3) narrow modality coverage and interaction paradigms. We introduce Claw-Eval, […]

Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

arXiv:2604.04936v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems critically depend on effective document chunking strategies to balance retrieval quality, latency, and operational cost. Traditional chunking approaches, such as fixed-size, rule-based, or fully agentic chunking, often suffer from high token consumption, redundant text generation, limited scalability, and poor debuggability, especially for large-scale web content ingestion. […]

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

arXiv:2604.05172v2 Announce Type: new Abstract: Large language model (LLM) agents are increasingly deployed to automate productivity tasks (e.g., email, scheduling, document management), but evaluating them on live services is risky due to potentially irreversible changes. Existing benchmarks rely on simplified environments and fail to capture realistic, stateful, multi-service workflows. We introduce ClawsBench, a benchmark for […]

ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference

arXiv:2508.16703v4 Announce Type: replace-cross Abstract: On-device running Large Language Models (LLMs) is nowadays a critical enabler towards preserving user privacy. We observe that the attention operator falls back from the special-purpose NPU to the general-purpose CPU/GPU because of quantization sensitivity in state-of-the-art frameworks. This fallback results in a degraded user experience and increased complexity in […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844