arXiv:2511.03724v3 Announce Type: replace Abstract: AI researchers have long focused on poker-like games as a testbed for environments characterized by multi-player dynamics, imperfect information, and reasoning under uncertainty. While recent breakthroughs have matched elite human play at no-limit Texas hold’em, the multi-player dynamics are subdued: most hands converge quickly with only two players engaged through […]
Beyond Structure: Revolutionising Materials Discovery via AI-Driven Synthesis Protocol-Property Relationships
arXiv:2605.00313v1 Announce Type: cross Abstract: The current structure-centric paradigm in artificial intelligence (AI)-driven materials discovery, despite delivering thousands of candidate structures, is stalling at a critical barrier: the synthesizability gap. We argue that closing this gap demands a pivot to a synthesis-first paradigm in which executable synthesis protocols, not just atomic configurations, are treated as […]
Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling
arXiv:2605.00412v1 Announce Type: new Abstract: World models have recently re-emerged as a central paradigm for embodied intelligence, robotics, autonomous driving, and model-based reinforcement learning. However, current world model research is often dominated by three partially separated routes: 2D video-generative models that emphasize visual future synthesis, 3D scene-centric models that emphasize spatial reconstruction, and JEPA-like latent […]
Unbox Responsible GeoAI: Navigating Climate Extreme and Disaster Mapping
arXiv:2605.00315v1 Announce Type: cross Abstract: As climate extreme and disaster events become more frequent and intense, Geospatial Artificial Intelligence (GeoAI) has emerged as a transformative approach for large-scale disaster mapping and risk reduction. However, the purely mechanical, performance-driven deployment of GeoAI models can result in amplifying inherent spatial inequalities, preventing effective emergency decision-making, and producing […]
Budget-Aware Routing for Long Clinical Text
arXiv:2605.00336v1 Announce Type: cross Abstract: A key challenge for large language models is token cost per query and overall deployment cost. Clinical inputs are long, heterogeneous, and often redundant, while downstream tasks are short and high stakes. We study budgeted context selection, where a subset of document units is chosen under a strict token budget […]
Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning
arXiv:2605.00347v1 Announce Type: cross Abstract: Given the rapidly growing capabilities of vision-language models (VLMs), extending them to interactive decision-making tasks such as video games has emerged as a promising frontier. However, existing approaches either rely on large-scale supervised fine-tuning (SFT) on human trajectories or apply reinforcement learning (RL) only in relatively short-horizon settings (typically around […]
Pedagogical Promise and Peril of AI: A Text Mining Analysis of ChatGPT Research Discussions in Programming Education
arXiv:2605.00361v1 Announce Type: cross Abstract: GenAI systems such as ChatGPT are increasingly discussed in programming education, but the ways in which the research literature conceptualizes and frames their role remain unclear. This chapter applies text mining to publications indexed in a leading academic database to map scholarly discourse on ChatGPT in programming education. Term frequency […]
GaMMA: Towards Joint Global-Temporal Music Understanding in Large Multimodal Models
arXiv:2605.00371v1 Announce Type: cross Abstract: In this paper, we propose GaMMA, a state-of-the-art (SoTA) large multimodal model (LMM) designed to achieve comprehensive musical content understanding. GaMMA inherits the streamlined encoder-decoder design of LLaVA, enabling effective cross-modal learning between music and language. By incorporating audio encoders in a mixture-of-experts manner, GaMMA effectively unifies both time-series and […]
SIMON: Saliency-aware Integrative Multi-view Object-centric Neural Decoding
arXiv:2605.00401v1 Announce Type: cross Abstract: Recent EEG-to-image retrieval methods leverage pretrained vision encoders and foveation-inspired priors, but typically assume a fixed, center-focused view. This center bias conflicts with content-driven human attention, creating a geometric-semantic dissociation between visual features and EEG responses. We propose SIMON, a saliency-aware multi-view framework for zero-shot EEG-to-image retrieval. SIMON combines foreground […]
Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines
arXiv:2605.00410v1 Announce Type: cross Abstract: A multi-agent pipeline with N agents typically issues N LLM calls per run. Merging agents into fewer calls (compound execution) promises token savings, but naively merged calls silently degrade quality through tool loss and prompt compression. We present Agent Capsules, an adaptive execution runtime that treats multi-agent pipeline execution as […]
RadLite: Multi-Task LoRA Fine-Tuning of Small Language Models for CPU-Deployable Radiology AI
arXiv:2605.00421v1 Announce Type: cross Abstract: Large language models (LLMs) show promise in radiology but their deployment is limited by computational requirements that preclude use in resource-constrained clinical environments. We investigate whether small language models (SLMs) of 3-4 billion parameters can achieve strong multi-task radiology performance through LoRA fine-tuning, enabling deployment on consumer-grade CPUs. We train […]
Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes
arXiv:2605.00424v1 Announce Type: cross Abstract: Agent skills — structured packages of instructions, scripts, and references that augment a large language model (LLM) without modifying the model itself — have moved from convenience to first-class deployment artifact. The runtime that loads them inherits the same problem package managers and operating systems have always faced: a piece […]