arXiv:2605.21902v1 Announce Type: new Abstract: Growing attention to intelligent agents has put a spotlight on one of their central capabilities: planning. Early attempts to leverage large language models (LLMs) for planning relied on single-shot plan generation, followed by hybrid approaches that coupled LLMs with limited external search. These methods, unsound and incomplete by their very […]
ACE: Self-Evolving LLM Coding Framework via Adversarial Unit Test Generation and Preference Optimization
arXiv:2605.16299v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) excel at code generation but remain heavily reliant on large-scale annotated solutions and verification-based supervision, which constrains scalability and hinders sustained self-improvement. Recent solver–verifier frameworks exploit program execution as an automatic supervision signal, but their effectiveness degrades as solvers become moderately strong: verifier-generated tests increasingly confirm […]
A Characterization of Level-k Realizability for Clustering Systems
arXiv:2605.21945v1 Announce Type: new Abstract: We give a Hasse-diagram characterization of when a clustering system $mathcal C$ on a finite taxa set $X$ is the hardwired clustering system $C_N$ of a rooted level-$k$ network. For each non-trivial block $B$ of $H=mathcal H[mathcal C]$, we define a parameter $mu(B)$ using minimum families of clusters that generate […]
Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions
arXiv:2605.22321v1 Announce Type: cross Abstract: As autonomous agents (e.g., OpenClaw) increasingly operate with deep system-level privileges to execute complex tasks, they introduce severe, unmitigated security risks. Current vulnerability analyses overwhelmingly focus on single-turn, stateless behaviors, overlooking the expanded attack surface inherent in stateful, multi-turn interactions and dynamic tool invocations. In this paper, we propose a […]
Canonical Functionalism: Defining Functional Structure without Observer-Relative Semantic Maps
arXiv:2605.21506v1 Announce Type: new Abstract: Computational functionalism about consciousness is often criticized for relying on observer-relative interpretations of physical systems. This paper proposes a mathematical refinement of functionalism that avoids this problem. The central idea is that consciousness-relevant functional organization should be identified not with arbitrary input-output mappings, semantic labels, or externally imposed computational descriptions, […]
Bernini: Latent Semantic Planning for Video Diffusion
arXiv:2605.22344v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) and diffusion models have each reached remarkable maturity: MLLMs excel at reasoning over heterogeneous multimodal inputs with strong semantic grounding, while diffusion models synthesize images and videos with photorealistic fidelity. We argue that these two families can be unified through a simple division of labor: […]
AI-Enabled Serious Games: Integrating Intelligence and Adaptivity in Training Systems
arXiv:2605.21962v1 Announce Type: new Abstract: Serious games are widely used for learning and training across domains such as healthcare, defense, and education. Persistent challenges remain, however, including static scenario design, authoring bottlenecks, limited learner modeling, and difficulty implementing meaningful real-time instructional adaptation. Recent advances in artificial intelligence (AI) introduce novel capabilities such as dynamic scenario […]
Making the Discrete Continuous: Synthetic RAW Augmentations for Fine-Grained Evaluation of Person Detection Performance in Low Light
arXiv:2605.22455v1 Announce Type: cross Abstract: Real-world deployment of AI vision models is both fueled and limited by the data available for training and testing. Real datasets are sparse and uneven: long-tailed or unbalanced distributions hinder generalization, and the low number of samples in low density regions makes it hard to run evaluations. Synthetic data can […]
EmoTrack: Robust Depression Tracking from Counseling Transcripts across Session Regimes
arXiv:2605.22286v1 Announce Type: cross Abstract: Text-based counseling is an important interface for AI mental-health support, where transcripts may be used to monitor depression severity and flag sessions requiring timely human review. However, robust PHQ-8 prediction across session regimes remains challenging: fine-tuning-based methods can exploit richer supervision but may generalize poorly under data scarcity, while prompt-based […]
HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools
arXiv:2605.22733v1 Announce Type: new Abstract: Every Python function deployed as an LLM tool must today exist in two forms: an HTTP endpoint for human-facing clients and CI pipelines, and an MCP tool registration for agent runtimes such as Claude and Cursor. These representations share business logic yet diverge in all the surrounding machinery (routing, validation, […]
On the Wasserstein Gradient Flow Interpretation of Drifting Models
arXiv:2605.05118v2 Announce Type: replace-cross Abstract: Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of steepest descent for a functional in the space of probability measures, equipped with the […]
MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering
arXiv:2605.22269v1 Announce Type: cross Abstract: Long streaming video QA remains challenging due to growing visual tokens and limited reasoning length of large language models (LLMs). KV-caching stores the Key-Value (KV) of the historical tokens via LLM prefill and enables more efficient streaming QA. However, existing methods cache every one or two frames, causing redundant memory […]