Open-o3-Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

arXiv:2510.20579v2 Announce Type: replace-cross Abstract: Most video reasoning models only generate textual reasoning traces without indicating when and where key evidence appears. Recent models such as OpenAI-o3 have sparked wide interest in evidence-centered reasoning for images, yet extending this ability to videos is more challenging due to the need for joint temporal tracking and spatial […]

SCISSR: Scribble-Conditioned Interactive Surgical Segmentation and Refinement

arXiv:2603.18544v1 Announce Type: cross Abstract: Accurate segmentation of tissues and instruments in surgical scenes is annotation-intensive due to irregular shapes, thin structures, specularities, and frequent occlusions. While SAM models support point, box, and mask prompts, points are often too sparse and boxes too coarse to localize such challenging targets. We present SCISSR, a scribble-promptable framework […]

Krause Synchronization Transformers

arXiv:2602.11534v2 Announce Type: replace-cross Abstract: Self-attention in Transformers relies on globally normalized softmax weights, causing all tokens to compete for influence at every layer. When composed across depth, this interaction pattern induces strong synchronization dynamics that favor convergence toward a dominant mode, a behavior associated with representation collapse and attention sink phenomena. We introduce Krause […]

Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds

arXiv:2603.18532v1 Announce Type: cross Abstract: The strong performance of large vision-language models (VLMs) trained with reinforcement learning (RL) has motivated similar approaches for fine-tuning vision-language-action (VLA) models in robotics. Many recent works fine-tune VLAs directly in the real world to avoid addressing the sim-to-real gap. While real-world RL circumvents sim-to-real issues, it inherently limits the […]

Memento-Skills: Let Agents Design Agents

arXiv:2603.18743v1 Announce Type: new Abstract: We introduce emphMemento-Skills, a generalist, continually-learnable LLM agent system that functions as an emphagent-designing agent: it autonomously constructs, adapts, and improves task-specific agents through experience. The system is built on a memory-based reinforcement learning framework with emphstateful prompts, where reusable skills (stored as structured markdown files) serve as persistent, evolving […]

D-Mem: A Dual-Process Memory System for LLM Agents

arXiv:2603.18631v1 Announce Type: new Abstract: Driven by the development of persistent, self-adapting autonomous agents, equipping these systems with high-fidelity memory access for long-horizon reasoning has emerged as a critical requirement. However, prevalent retrieval-based memory frameworks often follow an incremental processing paradigm that continuously extracts and updates conversational memories into vector databases, relying on semantic retrieval […]

MANAR: Memory-augmented Attention with Navigational Abstract Conceptual Representation

arXiv:2603.18676v1 Announce Type: new Abstract: MANAR (Memory-augmented Attention with Navigational Abstract Conceptual Representation), contextualization layer generalizes standard multi-head attention (MHA) by instantiating the principles of Global Workspace Theory (GWT). While MHA enables unconstrained all-to-all communication, it lacks the functional bottleneck and global integration mechanisms hypothesized in cognitive models of consciousness. MANAR addresses this by implementing […]

Interplay: Training Independent Simulators for Reference-Free Conversational Recommendation

arXiv:2603.18573v1 Announce Type: new Abstract: Training conversational recommender systems (CRS) requires extensive dialogue data, which is challenging to collect at scale. To address this, researchers have used simulated user-recommender conversations. Traditional simulation approaches often utilize a single large language model (LLM) that generates entire conversations with prior knowledge of the target items, leading to scripted […]

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

arXiv:2603.18886v1 Announce Type: new Abstract: The ability to precisely derive mathematical objects is a core requirement for downstream STEM applications, including mathematics, physics, and chemistry, where reasoning must culminate in formally structured expressions. Yet, current LM evaluations of mathematical and scientific reasoning rely heavily on simplified answer formats such as numerical values or multiple choice […]

Evaluating Game Difficulty in Tetris Block Puzzle

arXiv:2603.18994v1 Announce Type: new Abstract: Tetris Block Puzzle is a single player stochastic puzzle in which a player places blocks on an 8 x 8 grid to complete lines; its popular variants have amassed tens of millions of downloads. Despite this reach, there is little principled assessment of which rule sets are more difficult. Inspired […]

DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models

arXiv:2603.18048v1 Announce Type: new Abstract: Recent Audio Multimodal Large Language Models (Audio MLLMs) demonstrate impressive performance on speech benchmarks, yet it remains unclear whether these models genuinely process acoustic signals or rely on text-based semantic inference. To systematically study this question, we introduce DEAF (Diagnostic Evaluation of Acoustic Faithfulness), a benchmark of over 2,700 conflict […]

Interplay between evolutionary and epidemic time scales challenges the outcome of control policies

arXiv:2603.18801v1 Announce Type: new Abstract: The SIR model is the cornerstone model for mathematical epidemiology, explaining key epidemic features such as the second-order transition between disease-free and epidemic states, the initial exponential growth of outbreaks or the short-term benefits of control measures. Nonetheless, the classical SIR model assumes that pathogen traits remain fixed, thus neglecting […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844