arXiv:2605.18094v1 Announce Type: new Abstract: We study the Compositional Geometry Routing Problem (CGRP), a unified superclass of traditional routing problems that covers point-only, line-only, area-only, and arbitrary hybrid task geometries, providing a broad abstraction for real-world routing scenarios. Beyond standard point-based routing, CGRP with non-point tasks can be inherently asymmetric, tightly coupled travel routes with […]
TRACE: Trajectory Correction from Cross-layer Evidence for Hallucination Reduction
arXiv:2605.18163v1 Announce Type: new Abstract: Hallucination correction is not a one-direction problem. We show that intermediate layers are neither uniformly more truthful than final layers nor uniformly less trustworthy. Yet hallucination reduction is usually instantiated through one fixed intervention form: contrast one layer against another, steer along a truthfulness direction, or defer to external evidence. […]
QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi
arXiv:2605.18380v1 Announce Type: new Abstract: We introduce an extensive qualitative spatial and temporal reasoning (QSTR) benchmark for evaluating large language models (LLMs). We pose questions concerning compositional reasoning (using composition tables, CT), converse relations, and conceptual neighbourhoods (CN) for QSTR calculi, Point Algebra (PA), Allen’s Interval Algebra, Interval and Duration (INDU), Region Connection Calculus (RCC-5, […]
When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State
arXiv:2605.18580v1 Announce Type: new Abstract: Outcome-only evaluation can certify economically unsafe agents: a policy can hit a business KPI while violating deployable behavioral discipline. In hotel pricing with hidden competitor state, a learner can achieve plausible revenue per available room while failing to preserve the rate discipline of a rule-based revenue-management competitor. We introduce discipline […]
Efficient Lookahead Encoding and Abstracted Width for Learning General Policies in Classical Planning
arXiv:2605.18674v1 Announce Type: new Abstract: Generalized planning aims to learn policies that generalize across collections of instances within a classical planning domain. Recent Graph Neural Network (GNN) approaches have learned nearly perfect policies for several domains. This work improves on the recently published idea of Iterated Width (IW) policies. Therein, the policy broadens its successor […]
Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra
arXiv:2605.16259v1 Announce Type: cross Abstract: While real-time image generation using diffusion models has advanced rapidly on NVIDIA GPUs, systematic optimization research on non-CUDA platforms such as Apple Silicon remains extremely limited. In this study, we conducted comprehensive optimization experiments across 10 phases targeting the Apple M3 Ultra (60-core GPU, 512 GB unified memory) with the […]
Generative AI and Two-Tiered Online Mental Health Communities
arXiv:2605.16279v1 Announce Type: cross Abstract: Online mental health communities (OMHCs) are tiered platforms that connect patients with licensed counselors through public Q&A forums and paid private consultations. Their two-tier structure creates a strategic dilemma for genAI integration. Conversational agents can provide scalable and timely responses to a broader set of patients, alleviating persistent supply shortages, […]
AI of the People, by the People, for the People: A Social Choice Approach to Collective Control of Artificial Intelligence
arXiv:2605.16291v1 Announce Type: cross Abstract: With the growing adoption of AI systems, reasoning about how society can exert control over AI becomes an increasingly urgent problem. Existing work on democratic control largely focuses on macro-level governance. In contrast, we propose a new approach grounded in social choice theory, which we term collective control of artificial […]
Consent Chain Degradation in Embodied Multi-Agent Systems: Bridging the Gap Between AI Agent Governance and Robot Ethics
arXiv:2605.16300v1 Announce Type: cross Abstract: Robotic systems are moving from isolated platforms to interconnected multi-agent ecosystems that operate in human environments. This shift raises a governance problem that existing frameworks do not address: how does consent propagate, degrade, and break down across chains of delegation between embodied autonomous agents? The AI ethics community has begun […]
Phase Transitions in Driven Informational Systems: A Two-Field Perspective on Learning Theory and Non-Equilibrium Chemistry
arXiv:2605.16325v1 Announce Type: cross Abstract: Phase-transition phenomena in deep learning (grokking, emergent capabilities, and ontological reorganization under context shift) have been studied through several lenses, including representational compression, singular learning theory, and information-theoretic progress measures. Independently, non-equilibrium statistical physics has identified phase transitions in driven chemical reaction networks underlying prebiotic selection, with empirical signatures that […]
Federated Nested Learning: Collaborative Training of Self-Referential Memories for Test-Time Adaptation
arXiv:2605.16350v1 Announce Type: cross Abstract: We rethink Federated Learning (FL) from a nested learning perspective, framing the core challenge as how to collaboratively learn optimization rules, not just static models, to tackle Non-IID client data. To address this, we propose Federated Nested Learning (FedNL), a novel framework that reformulates FL as a three-level nested optimization […]
ProxyKV: Cross-Model Proxy Pruning for Efficient Long-Context LLM Inference
arXiv:2605.16360v1 Announce Type: cross Abstract: Efficient long-context inference in Large Language Models (LLMs) is severely constrained by the Key-Value (KV) cache memory wall, yet existing pruning methods force a choice between low-latency heuristics that sacrifice precision and high-precision reconstruction methods that incur prohibitive prefilling overhead. To bridge this scoring-cost–accuracy gap, we propose ProxyKV, a cross-model […]