arXiv:2603.28959v1 Announce Type: cross Abstract: The exploration-exploitation trade-off is central to sequential decision-making and black-box optimization, yet how Large Language Models (LLMs) reason about and manage this trade-off remains poorly understood. Unlike Bayesian Optimization, where exploration and exploitation are explicitly encoded through acquisition functions, LLM-based optimization relies on implicit, prompt-based reasoning over historical evaluations, making […]
Bethe Ansatz with a Large Language Model
arXiv:2603.29932v1 Announce Type: cross Abstract: We explore the capability of a Large Language Model (LLM) to perform specific computations in mathematical physics: the task is to compute the coordinate Bethe Ansatz solution of selected integrable spin chain models. We select three integrable Hamiltonians for which the solutions were unpublished; two of the Hamiltonians are actually […]
AI-Generated Prior Authorization Letters: Strong Clinical Content, Weak Administrative Scaffolding
arXiv:2603.29366v1 Announce Type: new Abstract: Prior authorization remains one of the most burdensome administrative processes in U.S. healthcare, consuming billions of dollars and thousands of physician hours each year. While large language models have shown promise across clinical text tasks, their ability to produce submission-ready prior authorization letters has received only limited attention, with existing […]
Design Principles for the Construction of a Benchmark Evaluating Security Operation Capabilities of Multi-agent AI Systems
arXiv:2603.28998v1 Announce Type: cross Abstract: As Large Language Models (LLMs) and multi-agent AI systems are demonstrating increasing potential in cybersecurity operations, organizations, policymakers, model providers, and researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such AI systems to achieve more autonomous SOCs (security operation centers) and reduce manual effort. […]
Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks
arXiv:2603.30016v1 Announce Type: cross Abstract: AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We articulate three positions: (1) dynamic replanning and […]
Which Similarity-Sensitive Entropy (Sentropy)?
arXiv:2511.03849v4 Announce Type: replace-cross Abstract: Shannon entropy is not the only entropy that is relevant to machine-learning datasets, nor possibly even the most important one. Traditional entropies such as Shannon entropy capture information represented by elements’ frequencies but not the richer information encoded by their similarities and differences. Capturing the latter requires similarity-sensitive entropy (“sentropy”). […]
Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead
arXiv:2603.10062v2 Announce Type: replace-cross Abstract: As LLM agents evolve into collaborative multi-agent systems, their memory requirements grow rapidly in complexity. This position paper frames multi-agent memory as a computer architecture problem. We distinguish shared and distributed memory paradigms, propose a three-layer memory hierarchy (I/O, cache, and memory), and identify two critical protocol gaps: cache sharing […]
ContractSkill: Repairable Contract-Based Skills for Multimodal Web Agents
arXiv:2603.20340v2 Announce Type: replace-cross Abstract: Self-generated skills for web agents are often unstable and can even hurt performance relative to direct acting. We argue that the key bottleneck is not only skill generation quality, but the fact that web skills remain implicit and therefore cannot be checked or locally repaired. To address this, we present […]
KEditVis: A Visual Analytics System for Knowledge Editing of Large Language Models
arXiv:2603.29689v1 Announce Type: cross Abstract: Large Language Models (LLMs) demonstrate exceptional capabilities in factual question answering, yet they sometimes provide incorrect responses. To address this issue, knowledge editing techniques have emerged as effective methods for correcting factual information in LLMs. However, typical knowledge editing workflows struggle with identifying the optimal set of model layers for […]
TTA-DAME: Test-Time Adaptation with Domain Augmentation and Model Ensemble for Dynamic Driving Conditions
arXiv:2508.12690v2 Announce Type: replace-cross Abstract: Test-time Adaptation (TTA) poses a challenge, requiring models to dynamically adapt and perform optimally on shifting target domains. This task is particularly emphasized in real-world driving scenes, where weather domain shifts occur frequently. To address such dynamic changes, our proposed method, TTA-DAME, leverages source domain data augmentation into target domains. […]
Evaluation of Generative Models for Emotional 3D Animation Generation in VR
arXiv:2512.16081v2 Announce Type: replace-cross Abstract: Social interactions incorporate nonverbal signals to convey emotions alongside speech, including facial expressions and body gestures. Generative models have demonstrated promising results in creating full-body nonverbal animations synchronized with speech; however, evaluations using statistical metrics in 2D settings fail to fully capture user-perceived emotions, limiting our understanding of model effectiveness. […]
AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models
arXiv:2506.09082v4 Announce Type: replace-cross Abstract: The rise of vision foundation models (VFMs) calls for systematic evaluation. A common approach pairs VFMs with large language models (LLMs) as general-purpose heads, followed by evaluation on broad Visual Question Answering (VQA) benchmarks. However, this protocol has two key blind spots: (i) the instruction tuning data may not align […]