arXiv:2604.21155v1 Announce Type: new Abstract: Intrinsic motivations are receiving increasing attention, i.e. behavioral incentives that are not engineered, but emerge from the interaction of an agent with its surroundings. In this work we study the emergence of behaviors driven by one such incentive, empowerment, specifically in the context of more than one agent. We formulate […]
Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms
arXiv:2604.21131v1 Announce Type: cross Abstract: AI-agent guardrails are memoryless: each message is judged in isolation, so an adversary who spreads a single attack across dozens of sessions slips past every session-bound detector because only the aggregate carries the payload. We make three contributions to cross-session threat detection. (1) Dataset. CSTM-Bench is 26 executable attack taxonomies […]
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
arXiv:2509.24239v4 Announce Type: replace-cross Abstract: Recent large language models (LLMs) have shown strong reasoning capabilities. However, a critical question remains: do these models possess genuine strategic reasoning, or do they primarily excel at pattern recognition? To address this, we present ChessArena, a chess-based testbed for evaluating LLMs. Chess demands strategic reasoning, precise rule adherence, and […]
Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs. Explicit User Profiles
arXiv:2604.21152v1 Announce Type: cross Abstract: As state-of-the-art Large Language Models (LLMs) have become ubiquitous, ensuring equitable performance across diverse demographics is critical. However, it remains unclear whether these disparities arise from the explicitly stated identity itself or from the way identity is signaled. In real-world interactions, users’ identity is often conveyed implicitly through a complex […]
Trust but Verify: Introducing DAVinCI — A Framework for Dual Attribution and Verification in Claim Inference for Language Models
arXiv:2604.21193v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable fluency and versatility across a wide range of NLP tasks, yet they remain prone to factual inaccuracies and hallucinations. This limitation poses significant risks in high-stakes domains such as healthcare, law, and scientific communication, where trust and verifiability are paramount. In this paper, […]
Doubly Saturated Ramsey Graphs: A Case Study in Computer-Assisted Mathematical Discovery
arXiv:2604.21187v1 Announce Type: cross Abstract: Ramsey-good graphs are graphs that contain neither a clique of size $s$ nor an independent set of size $t$. We study doubly saturated Ramsey-good graphs, defined as Ramsey-good graphs in which the addition or removal of any edge necessarily creates an $s$-clique or a $t$-independent set. We present a method […]
AI for software engineering: from probable to provable
arXiv:2511.23159v2 Announce Type: replace-cross Abstract: Vibe coding, the much-touted use of AI techniques for programming, faces two overwhelming obstacles: the difficulty of specifying goals (“prompt engineering” is a form of requirements engineering, one of the toughest disciplines of software engineering); and the hallucination phenomenon. Programs are only useful if they are correct or very close […]
Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics
arXiv:2604.21216v1 Announce Type: cross Abstract: The First Fundamental Theorem of Welfare Economics assumes that welfare-bearing agents are autonomous and implicitly relies on a binary distinction between autonomy and instrumentality. Welfare subjects are those who have autonomy and therefore the capacity to choose and enter into utility comparisons, while everything else does not. In post-AGI economies […]
Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management
arXiv:2604.21209v1 Announce Type: new Abstract: Online reviews have played a pivotal role in consumers’ decision-making processes. Existing research has highlighted the significant impact of managerial review responses on customer relationship management and firm performance. However, a large portion of online reviews remains unaddressed due to the considerable human labor required to respond to the rapid […]
CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors
arXiv:2604.21241v1 Announce Type: cross Abstract: Vision–Language–Action (VLA) models often use intermediate representations to connect multimodal inputs with continuous control, yet spatial guidance is often injected implicitly through latent features. We propose $CorridorVLA$, which predicts sparse spatial anchors as incremental physical changes (e.g., $Delta$-positions) and uses them to impose an explicit tolerance region in the training […]
An Overlay Multicast Routing Method Based on Network Situational Awareness and Hierarchical Multi-Agent Reinforcement Learning
arXiv:2602.13211v2 Announce Type: replace-cross Abstract: Compared with IP multicast, Overlay Multicast (OM) offers better compatibility and flexible deployment in heterogeneous, cross-domain networks. However, traditional OM struggles to adapt to dynamic traffic due to unawareness of physical resource states, and existing reinforcement learning methods fail to decouple OM’s tightly coupled multi-objective nature, leading to high complexity, […]
Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding
arXiv:2604.21268v1 Announce Type: cross Abstract: Graphical User Interface (GUI) grounding requires mapping natural language instructions to precise pixel coordinates. However, due to visually homogeneous elements and dense layouts, models typically grasp semantic intent yet struggle with achieving precise localization. While scaling sampling attempts (Pass@k) reveals potential gains, static self-consistency strategies derived from geometric clustering often […]