Safety, Security, and Cognitive Risks in World Models

arXiv:2604.01346v2 Announce Type: replace-cross Abstract: World models – learned internal simulators of environment dynamics – are rapidly becoming foundational to autonomous decision-making in robotics, autonomous vehicles, and agentic AI. By predicting future states in compressed latent spaces, they enable sample-efficient planning and long-horizon imagination without direct environment interaction. Yet this predictive power introduces a distinctive […]

An Innovative Next Activity Prediction Using Process Entropy and Dynamic Attribute-Wise-Transformer in Predictive Business Process Monitoring

arXiv:2502.10573v2 Announce Type: replace-cross Abstract: Next activity prediction in predictive business process monitoring is crucial for operational efficiency and informed decision-making. While machine learning and Artificial Intelligence have achieved promising results, challenges remain in balancing interpretability and accuracy, particularly due to the complexity and evolving nature of event logs. This paper presents two contributions: (i) […]

On the Role of Fault Localization Context for LLM-Based Program Repair

arXiv:2604.05481v1 Announce Type: cross Abstract: Fault Localization (FL) is a key component of Large Language Model (LLM)-based Automated Program Repair (APR), yet its impact remains underexplored. In particular, it is unclear how much localization is needed, whether additional context beyond the predicted buggy location is beneficial, and how such context should be retrieved. We conduct […]

Hypothesis-Driven Feature Manifold Analysis in LLMs via Supervised Multi-Dimensional Scaling

arXiv:2510.01025v2 Announce Type: replace Abstract: The linear representation hypothesis states that language models (LMs) encode concepts as directions in their latent space, forming organized, multidimensional manifolds. Prior work has largely focused on identifying specific geometries for individual features, limiting its ability to generalize. We introduce Supervised Multi-Dimensional Scaling (SMDS), a model-agnostic method for evaluating and […]

UnWeaving the knots of GraphRAG — turns out VectorRAG is almost enough

arXiv:2603.29875v2 Announce Type: replace-cross Abstract: One of the key problems in Retrieval-augmented generation (RAG) systems is that chunk-based retrieval pipelines represent the source chunks as atomic objects, mixing the information contained within such a chunk into a single vector. These vector representations are then fundamentally treated as isolated, independent and self-sufficient, with no attempt to […]

Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights

arXiv:2502.19463v2 Announce Type: replace-cross Abstract: Hedging and non-affirmation are behaviors exhibited by large language models (LLMs) that limit the clear endorsement of specific statements. While these behaviors are desirable in subjective contexts, they are undesirable in the context of human rights – which apply unambiguously to all groups. We present a systematic framework to measure […]

LLM Evaluation as Tensor Completion: Low Rank Structure and Semiparametric Efficiency

arXiv:2604.05460v1 Announce Type: cross Abstract: Large language model (LLM) evaluation platforms increasingly rely on pairwise human judgments. These data are noisy, sparse, and non-uniform, yet leaderboards are reported with limited uncertainty quantification. We study this as semiparametric inference for a low-rank latent score tensor observed through pairwise comparisons under Bradley-Terry-Luce-type models. This places LLM evaluation […]

PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation

arXiv:2512.23994v2 Announce Type: replace-cross Abstract: Text-to-audio-video (T2AV) generation is central to applications such as filmmaking and world modeling. However, current models often fail to produce physically plausible sounds. Previous benchmarks primarily focus on audio-video temporal synchronization, while largely overlooking explicit evaluation of audio-physics grounding, thereby limiting the study of physically plausible audio-visual generation. To address […]

DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale

arXiv:2604.00813v2 Announce Type: replace-cross Abstract: End-to-end autonomous driving has evolved from the conventional paradigm based on sparse perception into vision-language-action (VLA) models, which focus on learning language descriptions as an auxiliary task to facilitate planning. In this paper, we propose an alternative Vision-Geometry-Action (VGA) paradigm that advocates dense 3D geometry as the critical cue for […]

MA-IDS: Multi-Agent RAG Framework for IoT Network Intrusion Detection with an Experience Library

arXiv:2604.05458v1 Announce Type: cross Abstract: Network Intrusion Detection Systems (NIDS) face important limitations. Signature-based methods are effective for known attack patterns, but they struggle to detect zero-day attacks and often miss modified variants of previously known attacks, while many machine learning approaches offer limited interpretability. These challenges become even more severe in IoT environments because […]

LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals

arXiv:2604.05655v1 Announce Type: cross Abstract: This work characterizes large language models’ chain-of-thought generation as a structured trajectory through representation space. We show that mathematical reasoning traverses functionally ordered, step-specific subspaces that become increasingly separable with layer depth. This structure already exists in base models, while reasoning training primarily accelerates convergence toward termination-related subspaces rather than […]

Code Review Agent Benchmark

arXiv:2603.23448v3 Announce Type: replace-cross Abstract: Software engineering agents have shown significant promise in writing code. As AI agents permeate code writing, and generate huge volumes of code automatically — the matter of code quality comes front and centre. As the automatically generated code gets integrated into huge code-bases — the issue of code review and […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844