Land cover and flood type govern the detection limits of satellite-based flood mapping across diverse global flood events

arXiv:2606.07780v1 Announce Type: new Abstract: Floods are among the most destructive natural hazards, and their increasing frequency under climate change makes satellite-based inundation mapping essential for disaster response. Geospatial foundation models pretrained on satellite archives offer geographic transferability, but their operational reliability across diverse, unseen events remains uncharacterized. Here we deploy Prithvi-EO-2.0 across 19 out-of-distribution […]

SLMJury: Can Small Language Models Judge as Well as Large Ones?

arXiv:2606.07810v1 Announce Type: cross Abstract: Large language models (LLMs) are widely used as judges for evaluating model outputs, but their high cost, latency, and opacity limit scalability. We introduce SLMJury, a framework for evaluating small language models (SLMs) as judges across two paradigms: closed-ended binary correctness and open-ended quality scoring. We benchmark 16 SLM judges […]

MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models

arXiv:2606.04627v2 Announce Type: replace Abstract: Mobile agents are increasingly expected to operate everyday applications from screenshots and language goals, where reliable control requires reasoning over screen affordances, multi-step navigation, and future state changes. However, many agents externalize this computation as long textual chains of thought, which slows interaction, increases supervision cost, and complicates deployment. We […]

Decoupling Semantics and Logic: A Training-Free Coarse-to-Fine Pipeline for Video Retrieval-Augmented Generation

arXiv:2606.07924v1 Announce Type: cross Abstract: This paper presents our system description for the 2nd Workshop on Multimodal Augmented Generation via MultimodAl Retrieval (MAGMaR). Addressing the critical challenges of cross-lingual long-video comprehension, strict persona adherence, and zero-hallucination temporal grounding, we propose a fully training-free, two-stage cascaded Video RAG pipeline. Our architecture strategically decouples semantic retrieval from […]

Reconstructing and forecasting disease trajectories of patients with Alzheimer’s disease using routine data in resource-constrained settings

arXiv:2606.07798v1 Announce Type: new Abstract: Alzheimer’s disease is a progressive neurodegenerative disorder, and its progression varies substantially across patients. Existing work aims to forecast patients’ future cognitive state, with minimal focus on reconstructing the state from past visits. Furthermore, in current research, quantifying predictive uncertainty remains underexplored and relies on costly modalities such as MRI, […]

Semantic Quorum Assurance: Collective Certification for Non-Deterministic AI Infrastructure

arXiv:2606.08021v1 Announce Type: cross Abstract: As large language model (LLM) agents are integrated into autonomous cloud operations, distributed systems face a semantic reliability problem: proposer agents can generate production mutations, such as modifying IAM policies, opening firewall security groups, or executing data exports, that are syntactically valid and statically authorized but operationally unsafe. Classical distributed […]

Supracompetitive Pricing Under AI Monoculture

arXiv:2601.01279v3 Announce Type: replace-cross Abstract: When competing sellers delegate pricing to a shared AI model, such as a large language model, correlated recommendations combined with performance-driven updates aggregating seller feedback raise a key question: can standard AI deployment practices inadvertently produce supracompetitive pricing? We develop a stylized duopoly model in which two sellers receive pricing […]

Human-Centered Benchmarking of Driver Monitoring Models

arXiv:2606.08123v1 Announce Type: cross Abstract: Vision-based driver monitoring systems are increasingly deployed in safety-critical intelligent transportation settings, yet they are almost always compared on classification accuracy alone. This paper argues that accuracy is insufficient to characterize a model’s fitness for real-world deployment, and proposes the Human-Centered Benchmarking Framework (HCBF), which evaluates models across four dimensions: […]

Improving Multimodal Reasoning via Worst Dimension Optimization

arXiv:2606.07801v1 Announce Type: new Abstract: Multimodal reasoning requires a path that retains integrity over a wide range of constraints, from visual grounding to logic consistency. However, the current Process Reward Models focus on heuristically defined rewards that equally weigh these factors, which may lead to the concealment of individual dimension failures by the dominating factors, […]

Contemporary AI lacks the imagination to diverge or negate in science

arXiv:2606.08251v1 Announce Type: cross Abstract: Bold projections that artificial intelligence will accelerate scientific discovery have raced ahead of evidence from working scientists, and the field still lacks large-scale, scientist-in-the-loop tests of these claims. Here we mount the largest such evaluation to date and map what AI cannot yet do for science. We invited authors of […]

Watts-per-Intelligence Part II: Algorithmic Catalysis

arXiv:2604.20897v2 Announce Type: replace-cross Abstract: We develop a thermodynamic theory of algorithmic catalysis within the watts per intelligence framework, identifying reusable computational structures that reduce irreversible operations for a task class while satisfying bounded restoration and structural selectivity constraints. We prove that any class specific speed-up is upper-bounded by the algorithmic mutual information between the […]

Impacts of Histories and Models on LLM Grading: A Study in Advanced Software Engineering Courses

arXiv:2606.08400v1 Announce Type: cross Abstract: Graduate-level research reading report assessment creates a substantial labor burden for educators. While large language models (LLMs) hold great potential for automating academic grading, their reliability for this specialized task remains understudied, particularly regarding grading consistency, the lack of which represents a primary obstacle to educational fairness. This paper proposes […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844