ATLAS: An Annotation Tool for Long-horizon Robotic Action Segmentation

arXiv:2604.26637v1 Announce Type: cross Abstract: Annotating long-horizon robotic demonstrations with precise temporal action boundaries is crucial for training and evaluating action segmentation and manipulation policy learning methods. Existing annotation tools, however, are often limited: they are designed primarily for vision-only data, do not natively support synchronized visualization of robot-specific time-series signals (e.g., gripper state or […]

Momentum-Conserving Graph Neural Networks for Deformable Objects

arXiv:2604.26097v1 Announce Type: cross Abstract: Graph neural networks (GNNs) have emerged as a versatile and efficient option for modeling the dynamic behavior of deformable materials. While GNNs generalize readily to arbitrary shapes, mesh topologies, and material parameters, existing architectures struggle to correctly predict the temporal evolution of key physical quantities such as linear and angular […]

reward-lens: A Mechanistic Interpretability Library for Reward Models

arXiv:2604.26130v1 Announce Type: cross Abstract: Every RLHF-trained language model is shaped by a reward model, yet the mechanistic interpretability toolkit — logit lens, direct logit attribution, activation patching, sparse autoencoders — was built for generative LLMs whose primitives all project onto a vocabulary unembedding. Reward models replace that with a scalar regression head, breaking each […]

Ceci n’est pas une explication: Evaluating Explanation Failures as Explainability Pitfalls in Language Learning Systems

arXiv:2604.26145v1 Announce Type: cross Abstract: AI-powered language learning tools increasingly provide instant, personalised feedback to millions of learners worldwide. However, this feedback can fail in ways that are difficult for learners–and even teachers–to detect, potentially reinforcing misconceptions and eroding learning outcomes over extended use. We present a portion of L2-Bench, a benchmark for evaluating AI […]

Structural Generalization on SLOG without Hand-Written Rules

arXiv:2604.26157v1 Announce Type: cross Abstract: Structural generalization in semantic parsing requires systems to apply learned compositional rules to novel structural combinations. Existing approaches either rely on hand-written algebraic rules (AM-Parser) or fail to generalize structurally (Transformer-based models). We present an alternative requiring no hand-written compositional rules, based on a neural cellular automaton (NCA) with a […]

Co-Learning Port-Hamiltonian Systems and Optimal Energy-Shaping Control

arXiv:2604.26172v1 Announce Type: cross Abstract: We develop a physics-informed learning framework for energy-shaping control of port-Hamiltonian (pH) systems from trajectory data. The proposed approach co-learns a pH system model and an optimal energy-balancing passivity-based controller (EB-PBC) through alternating optimization with policy-aware data collection. At each iteration, the system model is refined using trajectory data collected […]

Evaluating Strategic Reasoning in Forecasting Agents

arXiv:2604.26106v1 Announce Type: new Abstract: Forecasting benchmarks produce accuracy leaderboards but little insight into why some forecasters are more accurate than others. We introduce Bench to the Future 2 (BTF-2), 1,417 pastcasting questions with a frozen 15M-document research corpus in which agents reproducibly research and forecast offline, producing full reasoning traces. BTF-2 detects accuracy differences […]

Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

arXiv:2601.15808v2 Announce Type: replace Abstract: Recent advances in Deep Research Agents (DRAs) are transforming automated knowledge discovery and problem-solving. While the majority of existing efforts focus on enhancing policy capabilities via post-training, we propose an alternative paradigm: self-evolving the agent’s ability by iteratively verifying the policy model’s outputs, guided by meticulously crafted rubrics. This approach […]

Text Style Transfer with Machine Translation for Graphic Designs

arXiv:2604.26361v1 Announce Type: cross Abstract: Globalization of graphic designs such as those used in marketing materials and magazines is increasingly important for communication to broad audiences. To accomplish this, the textual content in the graphic designs needs to be accurately translated and have the text styling preserved in order to fit visually into the design. […]

Evergreen: Efficient Claim Verification for Semantic Aggregates

arXiv:2604.26180v1 Announce Type: cross Abstract: With recent semantic query processing engines, semantic aggregation has become a primitive operator, enabling the reduction of a relation into a natural language aggregate using an LLM. However, the resulting semantic aggregate may contain claims that are not grounded in the underlying relation. Verifying such claims is challenging: they often […]

Hierarchical Multi-Persona Induction from User Behavioral Logs: Learning Evidence-Grounded and Truthful Personas

arXiv:2604.26120v1 Announce Type: new Abstract: Behavioral logs provide rich signals for user modeling, but are noisy and interleaved across diverse intents. Recent work uses LLMs to generate interpretable natural-language personas from user logs, yet evaluation often emphasizes downstream utility, providing limited assurance of persona quality itself. We propose a hierarchical framework that aggregates user actions […]

Option-Order Randomisation Reveals a Distributional Position Attractor in Prompted Sandbagging

arXiv:2604.26206v1 Announce Type: cross Abstract: A predecessor pilot (Cacioli, 2026) found that Llama-3-8B implements prompted sandbagging as positional collapse rather than answer avoidance. However, fixed option ordering in MMLU-Pro left open whether this reflected a model-level position-dominant policy or dataset-level distractor structure. This pre-registered follow-up (3 models, 2,000 MMLU-Pro items, 4 conditions, 24,000 primary trials) […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844