Three Regimes of Context-Parametric Conflict: A Predictive Framework and Empirical Validation

arXiv:2605.11574v1 Announce Type: cross Abstract: The literature on how large language models handle conflict between their training knowledge and a contradicting document presents a persistent empirical contradiction: some studies find models stubbornly retain their trained answers, ignoring provided documents nearly half the time, while others find models readily defer to the document, following context approximately […]

Template-as-Ontology: Configurable Synthetic Data Infrastructure for Cross-Domain Manufacturing AI Validation

arXiv:2605.11259v1 Announce Type: new Abstract: LLarge language model (LLM)-based AI agents deployed in manufacturing environments require populated, schema-correct data for validation, yet production MES data is proprietary, privacy-encumbered, and vendor-specific. This paper introduces the Template-as-Ontology principle: a single Python configuration module (700-770 lines, 45 validated exports) serves simultaneously as the specification for a time-stepped manufacturing […]

Trade-offs in Decentralized Agentic AI Discovery Across the Compute Continuum

arXiv:2605.11839v1 Announce Type: cross Abstract: Agentic systems deployed across the compute continuum need discovery mechanisms that remain effective across cloud, edge, and intermittently connected domains. In some emerging agentic architectures, decentralized discovery is already an active design direction, placing DHT-based lookup on the path toward agent directories. This paper studies the trade-offs among major structured-overlay […]

LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer?

arXiv:2605.11301v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have heterogeneous strengths across OCR, chart understanding, spatial reasoning, visual question answering, cost, and latency. Effective MLLM routing therefore requires more than estimating query difficulty: a router must match the multimodal requirements of the current image-question input with the capabilities of each candidate model. We […]

The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures

arXiv:2605.11999v1 Announce Type: cross Abstract: Power capping is the standard GPU energy lever in LLM serving, and it appears to work: throughput drops, power readings fall, and energy budgets are met. We show the appearance is illusory for the phase that dominates production serving: autoregressive decode. Across four attention paradigms — GQA, MLA, Gated DeltaNet, […]

AESOP: Adversarial Execution-path Selection to Overload Deep Learning Pipelines

arXiv:2605.10987v1 Announce Type: cross Abstract: Modern machine learning deployments increasingly compose specialized models into dynamic inference pipelines, where upstream components produce intermediate predictions that determine the workload and inputs of downstream components. The cost of processing an input is therefore not determined by any single model, but by two coupled factors: the per-inference cost of […]

Constraint-Data-Value-Maximization: Utilizing Data Attribution for Effective Data Pruning in Low-Data Environments

arXiv:2605.11312v1 Announce Type: new Abstract: Attributing model behavior to training data is an evolving research field. A common benchmark is data removal, which involves eliminating data instances with either low or high values, then assessing a model’s performance trained on the modified dataset. Many existing studies leverage Shapley-based data values for this task. In this […]

Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration

arXiv:2605.12084v1 Announce Type: cross Abstract: Designing learnable information-theoretic objectives for robot exploration remains challenging. Such objectives aim to guide exploration toward data that reduces uncertainty in model parameters, yet it is often unclear what information the collected data can actually reveal. Although reinforcement learning (RL) can optimize a given objective, constructing objectives that reflect parametric […]

MT-JailBench: A Modular Benchmark for Understanding Multi-Turn Jailbreak Attacks

arXiv:2605.11002v1 Announce Type: cross Abstract: Multi-turn jailbreaks exploit the ability of large language models to accumulate and act on conversational context. Instead of stating a harmful request directly, an attacker can gradually steer the conversation toward an unsafe answer. Recent methods demonstrate this risk, but they are usually evaluated as black-box pipelines with different budgets, […]

Rethinking Evaluation for LLM Hallucination Detection: A Desiderata, A New RAG-based Benchmark, New Insights

arXiv:2605.11330v1 Announce Type: new Abstract: Hallucination, broadly referring to unfaithful, fabricated, or inconsistent content generated by LLMs, has wide-ranging implications. Therefore, a large body of effort has been devoted to detecting LLM hallucinations, as well as designing benchmark datasets for evaluating these detectors. In this work, we first establish a desiderata of properties for hallucination […]

RT-Transformer: The Transformer Block as a Spherical State Estimator

arXiv:2605.11007v1 Announce Type: cross Abstract: We show that the core components of the Transformer block — attention, residual connections, and normalization — arise naturally from a single geometric estimation problem. Modeling the latent state as a direction on the hypersphere, with noise defined in the tangent plane at the current estimate, yields a precision-weighted directional […]

Harness Engineering as Categorical Architecture

arXiv:2605.12239v1 Announce Type: cross Abstract: The agent harness, the system layer comprising prompts, tools, memory, and orchestration logic that surrounds the model, has emerged as the central engineering abstraction for LLMbased agents. Yet harness design remains ad hoc, with no formal theory governing composition, preservation of properties under compilation, or systematic comparison across frameworks. We […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844