RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

arXiv:2604.09860v3 Announce Type: replace-cross Abstract: The pursuit of general-purpose robotics has yielded impressive foundation models, yet simulation-based benchmarking remains a bottleneck due to rapid performance saturation and a lack of true generalization testing. Existing benchmarks often exhibit significant domain overlap between training and evaluation, trivializing success rates and obscuring insights into robustness. We introduce RoboLab, […]

GraphFlow: An Architecture for Formally Verifiable Visual Workflows Enabling Reliable Agentic AI Automation

arXiv:2605.14968v1 Announce Type: new Abstract: GraphFlow is a visual workflow system designed to improve the reliability of agentic AI automation in multi-step, mission-critical processes. In these workflows, small errors compound rapidly: under an idealized model of independent steps, a ten-step process with 90% per-step reliability completes successfully only 35% of the time. Existing workflow platforms […]

Phylogenetic Tree Inference with Tropical Axial Attention

arXiv:2605.13894v1 Announce Type: new Abstract: In this work, we introduce a Tropical Axial Attention neural reasoning architecture that replaces vanilla softmax dot-product attention with max-plus operators, inducing a piecewise-linear structure aligned with dynamic programming formulations. From multi-species sequence alignments, our model learns all possible pairwise distances and is trained using a combination of $ell_1$ and […]

Explainable Detection of Depression Status Shifts from User Digital Traces

arXiv:2605.14995v1 Announce Type: new Abstract: Every day, users generate digital traces (e.g., social media posts, chats, and online interactions) that are inherently timestamped and may reflect aspects of their mental state. These traces can be organized into temporal trajectories that capture how a user’s mental health signals evolve, including phases of improvement, deterioration, or stability. […]

SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring

arXiv:2604.25855v2 Announce Type: replace-cross Abstract: Multimodal large language models (MLLMs) achieve ever-stronger performance on visual-language tasks. Even as traditional visual question answering (VQA) benchmarks approach saturation, reliable deployment requires satisfying low error tolerances in real-world, out-of-distribution (OOD) scenarios. Precisely, selective prediction aims to improve coverage, i.e. the share of inputs the system answers, while adhering […]

Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

arXiv:2605.14984v1 Announce Type: cross Abstract: Generating a street-level 3D scene from a single satellite image is a crucial yet challenging task. Current methods present a stark trade-off: geometry-colorization models achieve high geometric fidelity but are typically building-focused and lack semantic diversity. In contrast, proxy-based models use feed-forward image-to-3D frameworks to generate holistic scenes by jointly […]

Attention-Based Multimodal Survival Prediction with Cross-Modal Bilinear Fusion

arXiv:2605.13897v1 Announce Type: new Abstract: We propose a novel multimodal deep learning framework for patient-level survival prediction, which integrates whole-slide histology features, RNA-seq expression profiles, and clinical variables. Our architecture combines an ABMIL module~citeilse2018attention for slide-level representation with feedforward encoders for RNA and clinical data. These embeddings are then integrated through low-rank bilinear cross-modal fusion~citeliu2018efficient […]

FutureSim: Replaying World Events to Evaluate Adaptive Agents

arXiv:2605.15188v1 Announce Type: cross Abstract: AI agents are being increasingly deployed in dynamic, open-ended environments that require adapting to new information as it arrives. To efficiently measure this capability for realistic use-cases, we propose building grounded simulations that replay real-world events in the order they occurred. We build FutureSim, where agents forecast world events beyond […]

Frequency-Space Mechanics: A Sequence and Coordinate-Free Representation for Protein Function Prediction

arXiv:2605.13899v1 Announce Type: new Abstract: Protein function prediction is dominated by representations grounded in sequence and static structure, neither of which captures the collective vibrational dynamics through which proteins act. Here we introduce frequency-space mechanics, a representational framework in which a protein is encoded as a mechanical harmonics graph (MHG): nodes are vibrational modes derived […]

Learning Developmental Scaffoldings to Guide Self-Organisation

arXiv:2605.14998v1 Announce Type: new Abstract: From subcellular structures to entire organisms, many natural systems generate complex organisation through self-organisation: local interactions that collectively give rise to global structure without any blueprint of the outcome. Yet a significant portion of the information driving such processes is not produced by self-organisation itself, instead, it is often offloaded […]

Small, Private Language Models as Teammates for Educational Assessment Design

arXiv:2605.15015v1 Announce Type: new Abstract: Generative AI increasingly supports educational design tasks, e.g., through Large Language Models (LLMs), demonstrating the capability to design assessment questions that are aligned with pedagogical frameworks (e.g., Bloom’s taxonomy). However, they often rely on subjective or limited evaluation methods; focus primarily on proprietary models; or rarely systematically examine generation, evaluation, […]

Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

arXiv:2605.03596v4 Announce Type: replace Abstract: Workspace learning requires AI agents to identify, reason over, exploit, and update explicit and implicit dependencies among heterogeneous files in a worker’s workspace, enabling them to complete both routine and advanced tasks effectively. Despite its importance, existing relevant benchmarks largely evaluate agents on pre-specified or synthesized files with limited real-world […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844