arXiv:2603.25025v1 Announce Type: new Abstract: Autoregressive neural PDE simulators predict the evolution of physical fields one step at a time from a finite history, but low-cost context-window selection for such simulators remains an unformalized problem. Existing approaches to context-window selection in time-series forecasting include exhaustive validation, direct low-cost search, and system-theoretic memory estimation, but they […]
WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing
arXiv:2603.25226v1 Announce Type: cross Abstract: The emergence of Large Language Models (LLMs) has catalyzed a paradigm shift in programming, giving rise to “vibe coding”, where users can build complete projects and even control computers using natural language instructions. This paradigm has driven automated webpage development, but it introduces a new requirement about how to automatically […]
Sparse Visual Thought Circuits in Vision-Language Models
arXiv:2603.25075v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) improve interpretability in multimodal models, but it remains unclear whether SAE features form modular, composable units for reasoning-an assumption underlying many intervention-based steering methods. We test this modularity hypothesis and find it often fails: intervening on a task-selective feature set can modestly improve reasoning accuracy, while intervening […]
Gastric-X: A Multimodal Multi-Phase Benchmark Dataset for Advancing Vision-Language Models in Gastric Cancer Analysis
arXiv:2603.19516v2 Announce Type: replace-cross Abstract: Recent vision-language models (VLMs) have shown strong generalization and multimodal reasoning abilities in natural domains. However, their application to medical diagnosis remains limited by the lack of comprehensive and structured datasets that capture real clinical workflows. To advance the development of VLMs for clinical applications, particularly in gastric cancer, we […]
UniAI-GraphRAG: Synergizing Ontology-Guided Extraction, Multi-Dimensional Clustering, and Dual-Channel Fusion for Robust Multi-Hop Reasoning
arXiv:2603.25152v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) systems face significant challenges in complex reasoning, multi-hop queries, and domain-specific QA. While existing GraphRAG frameworks have made progress in structural knowledge organization, they still have limitations in cross-industry adaptability, community report integrity, and retrieval performance. This paper proposes UniAI-GraphRAG, an enhanced framework built upon open-source GraphRAG. […]
A Wireless World Model for AI-Native 6G Networks
arXiv:2603.25216v1 Announce Type: cross Abstract: Integrating AI into the physical layer is a cornerstone of 6G networks. However, current data-driven approaches struggle to generalize across dynamic environments because they lack an intrinsic understanding of electromagnetic wave propagation. We introduce the Wireless World Model (WWM), a multi-modal foundation framework predicting the spatiotemporal evolution of wireless channels […]
The Self-Replication Phase Diagram: Mapping Where Life Becomes Possible in Cellular Automata Rule Space
arXiv:2603.25239v1 Announce Type: new Abstract: What substrate features allow life? We exhaustively classify all 262,144 outer-totalistic binary cellular automata rules with Moore neighbourhood for self-replication and produce phase diagrams in the $(lambda, F)$ plane, where $lambda$ is Langton’s rule density and $F$ is a background-stability parameter. Of these rules, 20,152 (7.69%) support pattern proliferation, concentrated […]
When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making
arXiv:2603.16673v2 Announce Type: replace-cross Abstract: Embodied robotic systems increasingly rely on large language model (LLM)-based agents to support high-level reasoning, planning, and decision-making during interactions with the environment. However, invoking LLM reasoning introduces substantial computational latency and resource overhead, which can interrupt action execution and reduce system reliability. Excessive reasoning may delay actions, while insufficient […]
A Gait Foundation Model Predicts Multi-System Health Phenotypes from 3D Skeletal Motion
arXiv:2603.25283v1 Announce Type: new Abstract: Gait is increasingly recognized as a vital sign, yet current approaches treat it as a symptom of specific pathologies rather than a systemic biomarker. We developed a gait foundation model for 3D skeletal motion from 3,414 deeply phenotyped adults, recorded via a depth camera during five motor tasks. Learned embeddings […]
Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction
arXiv:2603.25209v1 Announce Type: cross Abstract: Generating long videos using pre-trained video diffusion models, which are typically trained on short clips, presents a significant challenge. Directly applying these models for long-video inference often leads to a notable degradation in visual quality. This paper identifies that this issue primarily stems from two out-of-distribution (O.O.D) problems: frame-level relative […]
Macroscopic Characteristics of Mixed Traffic Flow with Deep Reinforcement Learning Based Automated and Human-Driven Vehicles
arXiv:2603.25328v1 Announce Type: new Abstract: Automated Vehicle (AV) control in mixed traffic, where AVs coexist with human-driven vehicles, poses significant challenges in balancing safety, efficiency, comfort, fuel efficiency, and compliance with traffic rules while capturing heterogeneous driver behavior. Traditional car-following models, such as the Intelligent Driver Model (IDM), often struggle to generalize across diverse traffic […]
Seeking Physics in Diffusion Noise
arXiv:2603.14294v2 Announce Type: replace-cross Abstract: Do video diffusion models encode signals predictive of physical plausibility? We probe intermediate denoising representations of a pretrained Diffusion Transformer (DiT) and find that physically plausible and implausible videos are partially separable in mid-layer feature space across noise levels. This separability cannot be fully attributed to visual quality or generator […]