WIND: Weather Inverse Diffusion for Zero-Shot Atmospheric Modeling

arXiv:2602.03924v2 Announce Type: replace-cross Abstract: Deep learning has revolutionized weather forecasting, but many challenges remain, including climate modeling. Moreover, the current landscape remains fragmented: highly specialized models are typically trained individually for distinct tasks. To unify this landscape, we introduce WIND, a single pre-trained foundation model capable of replacing specialized baselines across a vast array […]

ContextFlow: Hierarchical Task-State Alignment for Long-Horizon Embodied Agents

arXiv:2605.19314v1 Announce Type: cross Abstract: Long-horizon embodied agents increasingly delegate navigation, search, approach, and manipulation to specialist executors. As these executors become stronger, the main bottleneck shifts from local skill execution to maintaining a coherent task frontier across planning, monitoring, memory, and execution. We study task-state misalignment, a task-level consistency failure in which the planner’s […]

SpecX: A Large-Scale Benchmark for Multi-Modal Spectroscopy and Cross-Paradigm Evaluation

arXiv:2605.18791v1 Announce Type: cross Abstract: Existing spectral benchmarks are limited in scale, modality alignment, and evaluation scope, and typically focus on either specialized models or multimodal language models (MLLMs). We introduce SpecX, a large-scale benchmark for multi-modal spectroscopy with cross-paradigm evaluation. SpecX contains 1.7M molecules with diverse spectral modalities, including NMR (1H, 13C, HSQC), IR, […]

VCR: Learning Valid Contextual Representation for Incomplete Wearable Signals

arXiv:2605.18837v1 Announce Type: cross Abstract: Wearable devices enable continuous health monitoring from multimodal signals, but real-world deployment is hindered by limited labeled data and pervasive sensor incompleteness. While large-scale self-supervised pretraining reduces label dependence, most existing methods assume full modality availability. Current approaches for handling modality missingness often reconstruct entire absent signals, which can encourage […]

DualView: Adaptive Local-Global Fusion for Multi-Hop Document Reranking

arXiv:2605.18767v1 Announce Type: cross Abstract: Multi-hop question answering requires aggregating information from multiple documents, a critical capability for knowledge-intensive applications. A fundamental challenge lies in efficiently identifying the minimal relevant document set from retrieved candidates while maintaining high recall. We present an efficient dual-view cascaded reranking framework for multi-hop document reranking. Operating as a lightweight […]

A Reproducibility Analysis of PO4ISR: Diagnosing and Mitigating Semantic Drift in LLM-Based Session Recommendation

arXiv:2605.18780v1 Announce Type: cross Abstract: Reasoning-based Large Language Models (LLMs) like PO4ISR have set new benchmarks in session-based recommendation. However, the reproducibility of their reasoning capabilities across diverse semantic domains remains unexplored. In this work, we conduct a rigorous reproducibility study of PO4ISR to assess its generalization limits. Our analysis reveals a critical failure mode: […]

Face morphometric profiles of groups as early markers for certain diseases?

arXiv:2605.20103v1 Announce Type: new Abstract: Background: Face morphometry has been shown to work as a diagnosis tool in a set of syndromes. Face similarities are usually indications of more complete genetic similarities. Purpose: To show preliminary results on the face morphometry profile of the Cuban population and to argue that it could be used to […]

OmniGUI: Benchmarking GUI Agents in Omni-Modal Smartphone Environments

arXiv:2605.18758v1 Announce Type: cross Abstract: Current benchmarks for graphical user interface (GUI) agents predominantly rely on static screenshots. However, real-world smartphone interaction routinely requires agents to process transient audio cues and temporal video dynamics that are tightly coupled with the moment of action. To bridge this gap, we introduce OmniGUI, the first step-level benchmark designed […]

From SGD to Muon: Adaptive Optimization via Schatten-p Norms

arXiv:2605.19781v1 Announce Type: new Abstract: Modern optimizers, like Muon, impose matrix-wise geometry constraints on their updates. These matrix-wise constraints can be unified under Linear Minimization Oracle (LMO) theory. However, all current methods impose fixed LMO geometries for the update rules, chosen by-design or empirically, which are not necessarily optimal according to the problem’s geometry. We […]

Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains

arXiv:2605.19940v1 Announce Type: new Abstract: Foundation models are increasingly deployed in socially sensitive domains such as education, mental health, and caregiving, where failures are often cumulative and context-dependent. Existing guardrail approaches — ranging from training-time alignment to prompting, decoding constraints, and post-hoc moderation — primarily provide empirical risk reduction rather than enforceable behavioral guarantees, and […]

What’s Holding Back Latent Visual Reasoning?

arXiv:2605.18445v2 Announce Type: replace-cross Abstract: Humans can approach complex visual problems by mentally simulating intermediate visual steps, rather than reasoning through language alone. Inspired by this, several works on Vision-Language Models have recently explored chain-of-thought reasoning with continuous latent tokens as intermediate visual imagination steps. In this work, we investigate how recent models leverage such […]

EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design

arXiv:2605.19743v1 Announce Type: new Abstract: Large Language Model (LLM) agents are increasingly applied to engineering design tasks, yet existing evaluation frameworks do not adequately address multi-agent systems that combine simulation, retrieval, and manufacturing preparation. We introduce a benchmark suite with three evaluation dimensions: (1) a workflow benchmark with seven prompt styles targeting distinct cognitive demands-including […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844