arXiv:2603.13189v1 Announce Type: cross Abstract: Large Language Models (LLMs) can generate persuasive influence strategies that shift cooperative behavior in multi-agent populations, but a critical question remains: does the resulting cooperation reflect genuine prosocial alignment, or does it mask erosion of agent autonomy, epistemic integrity, and distributional fairness? We introduce Constitutional Multi-Agent Governance (CMAG), a two-stage […]
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
arXiv:2509.26495v3 Announce Type: replace Abstract: Large Language Model (LLM) safety is one of the most pressing challenges for enabling wide-scale deployment. While most studies and global discussions focus on generic harms, such as models assisting users in harming themselves or others, enterprises face a more fundamental concern: whether LLM-based agents are safe for their intended […]
XSkill: Continual Learning from Experience and Skills in Multimodal Agents
arXiv:2603.12056v2 Announce Type: replace Abstract: Multimodal agents can now tackle complex reasoning tasks with diverse tools, yet they still suffer from inefficient tool use and inflexible orchestration in open-ended settings. A central challenge is enabling such agents to continually improve without parameter updates by learning from past trajectories. We identify two complementary forms of reusable […]
Towards AI Search Paradigm
arXiv:2506.17188v2 Announce Type: replace-cross Abstract: In this paper, we introduce the AI Search Paradigm, a comprehensive blueprint for next-generation search systems capable of emulating human information processing and decision-making. The paradigm employs a modular architecture of four LLM-powered agents (Master, Planner, Executor and Writer) that dynamically adapt to the full spectrum of information needs, from […]
NeuCo-Bench: A Novel Benchmark Framework for Neural Embeddings in Earth Observation
arXiv:2510.17914v2 Announce Type: replace-cross Abstract: We introduce NeuCo-Bench, a novel benchmark framework for evaluating (lossy) neural compression and representation learning in the context of Earth Observation (EO). Our approach builds on fixed-size embeddings that act as compact, task-agnostic representations applicable to a broad range of downstream tasks. NeuCo-Bench comprises three components: (i) an evaluation pipeline […]
DeCode: Decoupling Content and Delivery for Medical QA
arXiv:2601.02123v3 Announce Type: replace-cross Abstract: Large language models (LLMs) exhibit strong medical knowledge and can generate factually accurate responses. However, existing models often fail to account for individual patient contexts, producing answers that are clinically correct yet poorly aligned with patients’ needs. In this work, we introduce DeCode (Decoupling Content and Delivery), a training-free, model-agnostic […]
Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models
arXiv:2603.05773v2 Announce Type: replace-cross Abstract: Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection automatically triggers refusal. However, the persistence of jailbreak attacks suggests a fundamental mechanistic decoupling. We propose the textbfunderlineDisentangled textbfunderlineSafety textbfunderlineHypothesis textbf(DSH), positing that safety computation operates on two distinct subspaces: a textitRecognition Axis ($mathbfv_H$, “Knowing”) and an textitExecution […]
Team RAS in 10th ABAW Competition: Multimodal Valence and Arousal Estimation Approach
arXiv:2603.13056v1 Announce Type: cross Abstract: Continuous emotion recognition in terms of valence and arousal under in-the-wild (ITW) conditions remains a challenging problem due to large variations in appearance, head pose, illumination, occlusions, and subject-specific patterns of affective expression. We present a multimodal method for valence-arousal estimation ITW. Our method combines three complementary modalities: face, behavior, […]
ESG-Bench: Benchmarking Long-Context ESG Reports for Hallucination Mitigation
arXiv:2603.13154v1 Announce Type: cross Abstract: As corporate responsibility increasingly incorporates environmental, social, and governance (ESG) criteria, ESG reporting is becoming a legal requirement in many regions and a key channel for documenting sustainability practices and assessing firms’ long-term and ethical performance. However, the length and complexity of ESG disclosures make them difficult to interpret and […]
Active Causal Structure Learning with Latent Variables: Towards Learning to Detour in Autonomous Robots
arXiv:2410.20894v2 Announce Type: replace Abstract: Artificial General Intelligence (AGI) Agents and Robots must be able to cope with everchanging environments and tasks. They must be able to actively construct new internal causal models of their interactions with the environment when new structural changes take place in the environment. Thus, we claim that active causal structure […]
CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks
arXiv:2508.11360v2 Announce Type: replace Abstract: As autonomous agents become adept at understanding and interacting with graphical user interface (GUI) environments, a new era of automated task execution is emerging. Recent studies have demonstrated that Reinforcement Learning (RL) can effectively enhance agents’ performance in dynamic interactive GUI environments. However, these methods face two key limitations: (1) […]
Do LLMs Share Human-Like Biases? Causal Reasoning Under Prior Knowledge, Irrelevant Context, and Varying Compute Budgets
arXiv:2602.02983v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used in domains where causal reasoning matters, yet it remains unclear whether their judgments reflect normative causal computation, human-like shortcuts, or brittle pattern matching. We benchmark 20+ LLMs against a matched human baseline on 11 causal judgment tasks formalized by a collider structure ($C_1 […]