Fine-Grained Graph Generation through Latent Mixture Scheduling

arXiv:2605.02780v1 Announce Type: new Abstract: Structure aware graph generation aims to generate graphs that satisfy given topological properties. It has applications in domains such as drug discovery, social network modeling, and knowledge graph construction. Unlike existing methods that only provide coarse control over graph properties, we introduce a novel conditional variational autoencoder for fine-grained structural […]

Intervention Complexity as a Canonical Reward and a Measure of Intelligence

arXiv:2605.02175v1 Announce Type: new Abstract: The Legg–Hutter universal intelligence measure provides a rigorous scalar assessment of general intelligence as expected reward across all computable environments, weighted by simplicity. However, the measure presupposes an externally specified reward function, raising the question of whether the reward primitive is inherently arbitrary or whether a canonical choice exists. We […]

PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments

arXiv:2605.02240v1 Announce Type: new Abstract: We introduce PhysicianBench, a benchmark for evaluating LLM agents on physician tasks grounded in real clinical setting within electronic health record (EHR) environments. Existing medical agent benchmarks primarily focus on static knowledge recall, single-step atomic actions, or action intent without verifiable execution against the environment. As a result, they fail […]

Can Causal Discovery Algorithms Help in Generating Legal Arguments?

arXiv:2605.02318v1 Announce Type: new Abstract: In 2011, Judea Pearl received the Turing Award, considered the Nobel Prize in Computing, for fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning. It includes pioneering the development of causal discovery algorithms. These computer algorithms can analyze large multivariate datasets and automatically […]

Measuring AI Reasoning: A Guide for Researchers

arXiv:2605.02442v1 Announce Type: new Abstract: In this paper, we offer a guide for researchers on evaluating reasoning in language models, building the case that reasoning should be assessed through evidence of adaptive, multi-step search rather than final-answer accuracy alone. Under an evaluation-oriented definition, reasoning requires selecting intermediate steps and halting according to input-dependent conditions, which […]

Double Rectified Linear Unit-based Modular Semantics for Quantitative Bipolar Argumentation Framework

arXiv:2605.02551v1 Announce Type: new Abstract: Quantitative Bipolar Argumentation Frameworks (QBAFs) provide an alternative approach to computing argument acceptability in Bipolar Argumentation Frameworks (BAFs). Each argument is assigned an initial strength, which is then updated to a final strength by considering the influence of both its attackers and supporters. Over the years, several semantics have been […]

AcademiClaw: When Students Set Challenges for AI Agents

arXiv:2605.02661v1 Announce Type: new Abstract: Benchmarks within the OpenClaw ecosystem have thus far evaluated exclusively assistant-level tasks, leaving the academic-level capabilities of OpenClaw largely unexamined. We introduce AcademiClaw, a bilingual benchmark of 80 complex, long-horizon tasks sourced directly from university students’ real academic workflows — homework, research projects, competitions, and personal projects — that they […]

First-Order Efficiency for Probabilistic Value Estimation via A Statistical Viewpoint

arXiv:2605.02827v1 Announce Type: new Abstract: Probabilistic values, including Shapley values and semivalues, provide a model-agnostic framework to attribute the behavior of a black-box model to data points or features, with a wide range of applications including explainable artificial intelligence and data valuation. However, their exact computation requires utility evaluations over exponentially many coalitions, making Monte […]

Generative-AI and the transformation of workforce. A job postings-driven analysis

arXiv:2605.00843v1 Announce Type: cross Abstract: This paper investigates how generative-artificial intelligence AI is reshaping job requirements, skill compositions and sectoral dynamics across global labor markets. It examines the evolving frequency and framing of AI-related competencies in job postings, exploring whether generative-AI functions primarily as an augmentative or substitutive force in the workplace. A large-scale, multi-source […]

NORA: A Harness-Engineered Autonomous Research Agent for End-to-End Spatial Data Science

arXiv:2605.02092v1 Announce Type: new Abstract: The automation of scientific research workflows has emerged as a transformative frontier in artificial intelligence, yet existing autonomous research agents remain largely domain-agnostic, lacking the specialized reasoning, method selection, and data acquisition capabilities required for rigorous spatial data science. This paper introduces NORA (Night Owl Research Agent), a harness-engineered, multi-agent […]

Planner Matters! An Efficient and Unbalanced Multi-agent Collaboration Framework for Long-horizon Planning

arXiv:2605.02168v1 Announce Type: new Abstract: Language model (LM)-based agents have demonstrated promising capabilities in automating complex tasks from natural language instructions, yet they continue to struggle with long-horizon planning and reasoning. To address this, we propose an enhanced multi-agent framework that decomposes automation into three roles: a planner for high-level decision-making, an actor for task […]

MEMAUDIT: An Exact Package-Oracle Evaluation Protocol for Budgeted Long-Term LLM Memory Writing

arXiv:2605.02199v1 Announce Type: new Abstract: Long-term LLM agents must compress streams of past interactions into persistent memory before future queries are known. Existing evaluations usually measure final question-answering accuracy, which entangles memory writing with retrieval, prompting, and reader reasoning. We introduce MEMAUDIT, an exact packageoracle evaluation protocol for budgeted long-term memory writing. A MEMAUDIT package […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844