arXiv:2508.13998v2 Announce Type: replace-cross Abstract: Generalization in embodied AI is hindered by the “seeing-to-doing gap,” which stems from data scarcity and embodiment heterogeneity. To address this, we pioneer “pointing” as a unified, embodiment-agnostic intermediate representation, defining four core embodied pointing abilities that bridge high-level vision-language comprehension with low-level action primitives. We introduce Embodied-R1, a 3B […]
SkillX: Automatically Constructing Skill Knowledge Bases for Agents
arXiv:2604.04804v1 Announce Type: cross Abstract: Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient: agents learn in isolation, repeatedly rediscover similar behaviors from limited experience, resulting in redundant exploration and poor generalization. To address this problem, we propose SkillX, a fully automated framework for constructing […]
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
arXiv:2510.23883v3 Announce Type: replace Abstract: Agentic AI systems powered by large language models (LLMs) and endowed with planning, tool use, memory, and autonomy, are emerging as powerful, flexible platforms for automation. Their ability to autonomously execute tasks across web, software, and physical environments creates new and amplified security risks, distinct from both traditional AI safety […]
A Persistent Homology Design Space for 3D Point Cloud Deep Learning
arXiv:2604.04299v1 Announce Type: cross Abstract: Persistent Homology (PH) offers stable, multi-scale descriptors of intrinsic shape structure by capturing connected components, loops, and voids that persist across scales, providing invariants that complement purely geometric representations of 3D data. Yet, despite strong theoretical guarantees and increasing empirical adoption, its integration into deep learning for point clouds remains […]
StableTTA: Training-Free Test-Time Adaptation that Improves Model Accuracy on ImageNet1K to 96%
arXiv:2604.04552v1 Announce Type: cross Abstract: Ensemble methods are widely used to improve predictive performance, but their effectiveness often comes at the cost of increased memory usage and computational complexity. In this paper, we identify a conflict in aggregation strategies that negatively impacts prediction stability. We propose StableTTA, a training-free method to improve aggregation stability and […]
Commercial Persuasion in AI-Mediated Conversations
arXiv:2604.04263v1 Announce Type: cross Abstract: As Large Language Models (LLMs) become a primary interface between users and the web, companies face growing economic incentives to embed commercial influence into AI-mediated conversations. We present two preregistered experiments (N = 2,012) in which participants selected a book to receive from a large eBook catalog using either a […]
Teaching Machine Learning Fundamentals with LEGO Robotics
arXiv:2601.19376v2 Announce Type: replace-cross Abstract: This paper presents the web-based platform Machine Learning with Bricks and an accompanying two-day course designed to teach machine learning concepts to students aged 12 to 17 through programming-free robotics activities. Machine Learning with Bricks is an open source platform and combines interactive visualizations with LEGO robotics to teach three […]
IC3-Evolve: Proof-/Witness-Gated Offline LLM-Driven Heuristic Evolution for IC3 Hardware Model Checking
arXiv:2604.03232v1 Announce Type: new Abstract: IC3, also known as property-directed reachability (PDR), is a commonly-used algorithm for hardware safety model checking. It checks if a state transition system complies with a given safety property. IC3 either returns UNSAFE (indicating property violation) with a counterexample trace, or SAFE with a checkable inductive invariant as the proof […]
Beyond Message Passing: A Semantic View of Agent Communication Protocols
arXiv:2604.02369v2 Announce Type: replace-cross Abstract: Agent communication protocols are becoming critical infrastructure for large language model (LLM) systems that must use tools, coordinate with other agents, and operate across heterogeneous environments. This work presents a human-inspired perspective on this emerging landscape by organizing agent communication into three layers: communication, syntactic, and semantic. Under this framework, […]
Good Rankings, Wrong Probabilities: A Calibration Audit of Multimodal Cancer Survival Models
arXiv:2604.04239v1 Announce Type: cross Abstract: Multimodal deep learning models that fuse whole-slide histopathology images with genomic data have achieved strong discriminative performance for cancer survival prediction, as measured by the concordance index. Yet whether the survival probabilities derived from these models – either directly from native outputs or via standard post-hoc reconstruction – are calibrated […]
Task-Centric Personalized Federated Fine-Tuning of Language Models
arXiv:2604.00050v2 Announce Type: replace-cross Abstract: Federated Learning (FL) has emerged as a promising technique for training language models on distributed and private datasets of diverse tasks. However, aggregating models trained on heterogeneous tasks often degrades the overall performance of individual clients. To address this issue, Personalized FL (pFL) aims to create models tailored for each […]
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios
arXiv:2512.18470v5 Announce Type: replace-cross Abstract: Existing benchmarks for AI coding agents focus on isolated, single-issue tasks such as fixing a bug or adding a small feature. However, real-world software engineering is a long-horizon endeavor: developers interpret high-level requirements, coordinate changes across many files, and evolve codebases over multiple iterations while preserving functionality. We introduce SWE-EVO, […]