arXiv:2604.02399v1 Announce Type: cross Abstract: Safe Rust guarantees memory safety through strict compile-time constraints: ownership can be transferred, borrowing can temporarily guarantee either shared read-only or exclusive write access, and ownership and borrowing are scoped by lifetime. Automatically synthesizing correct and safe Rust code is challenging, as the generated code must not only satisfy ownership, […]
When Openclaw Agents Learn from Each Other: Insights from Emergent AI Agent Communities for Human-AI Partnership in Education
arXiv:2603.16663v3 Announce Type: replace-cross Abstract: The AIED community envisions AI evolving “from tools to teammates,” yet our understanding of AI teammates remains limited to dyadic human-AI interactions. We offer a different vantage point: a rapidly growing ecosystem of AI agent platforms where over 167,000 agents participate, interact as peers, and develop learning behaviors without researcher […]
Assessing High-Risk AI Systems under the EU AI Act: From Legal Requirements to Technical Verification
arXiv:2512.13907v3 Announce Type: replace-cross Abstract: The implementation of the AI Act requires practical mechanisms to verify compliance with legal obligations, yet concrete and operational mappings from high-level requirements to verifiable assessment activities remain limited, contributing to uneven readiness across Member States. This paper presents a structured mapping that translates high-level AI Act requirements into concrete, […]
AutiHero: Engaging Parents in Creating Personalized, Multi-path Social Narratives for Autistic Children
arXiv:2509.17608v3 Announce Type: replace-cross Abstract: Social narratives help autistic children understand and navigate social situations through stories. To ensure effective practice, however, they often require significant time and effort from parents in customizing the narrative materials and delivering repeated instructions on them. We present AutiHero, a generative AI (GenAI)-based social narrative system, which supports parents […]
Efficient Causal Graph Discovery Using Large Language Models
arXiv:2402.01207v5 Announce Type: replace-cross Abstract: We propose a novel framework that leverages LLMs for full causal graph discovery. While previous LLM-based methods have used a pairwise query approach, this requires a quadratic number of queries which quickly becomes impractical for larger causal graphs. In contrast, the proposed framework uses a breadth-first search (BFS) approach which […]
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents
arXiv:2511.02734v2 Announce Type: replace Abstract: Current evaluations of Large Language Model (LLM) agents primarily emphasize task completion, often overlooking resource efficiency and adaptability. This neglects a crucial capability: agents’ ability to devise and adjust cost-optimal plans in response to changing environments. To bridge this gap, we introduce CostBench, a scalable, cost-centric benchmark designed to evaluate […]
A Systematic Security Evaluation of OpenClaw and Its Variants
arXiv:2604.03131v1 Announce Type: cross Abstract: Tool-augmented AI agents substantially extend the practical capabilities of large language models, but they also introduce security risks that cannot be identified through model-only evaluation. In this paper, we present a systematic security assessment of six representative OpenClaw-series agent frameworks, namely OpenClaw, AutoClaw, QClaw, KimiClaw, MaxClaw, and ArkClaw, under multiple […]
Comparing the Impact of Pedagogy-Informed Custom and General-Purpose GAI Chatbots on Students’ Science Problem-Solving Processes and Performance Using Heterogeneous Interaction Network Analysis
arXiv:2604.03022v1 Announce Type: cross Abstract: Problem solving plays an essential role in science education, and generative AI (GAI) chatbots have emerged as a promising tool for supporting students’ science problem solving. However, general-purpose chatbots (e.g., ChatGPT), which often provide direct, ready-made answers, may lead to students’ cognitive offloading. Prior research has rarely focused on custom […]
ProdCodeBench: A Production-Derived Benchmark for Evaluating AI Coding Agents
arXiv:2604.01527v2 Announce Type: replace-cross Abstract: Benchmarks that reflect production workloads are better for evaluating AI coding agents in industrial settings, yet existing benchmarks differ from real usage in programming language distribution, prompt style and codebase structure. This paper presents a methodology for curating production-derived benchmarks, illustrated through ProdCodeBench, a benchmark sourced from real developer-agent sessions. […]
How Annotation Trains Annotators: Competence Development in Social Influence Recognition
arXiv:2604.02951v1 Announce Type: cross Abstract: Human data annotation, especially when involving experts, is often treated as an objective reference. However, many annotation tasks are inherently subjective, and annotators’ judgments may evolve over time. This study investigates changes in the quality of annotators’ work from a competence perspective during a process of social influence recognition. The […]
Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study
arXiv:2604.03070v1 Announce Type: cross Abstract: Third-party skills extend LLM agents with powerful capabilities but often handle sensitive credentials in privileged environments, making leakage risks poorly understood. We present the first large-scale empirical study of this problem, analyzing 17,022 skills (sampled from 170,226 on SkillsMP) using static analysis, sandbox testing, and manual inspection. We identify 520 […]
Reliability Gated Multi-Teacher Distillation for Low Resource Abstractive Summarization
arXiv:2604.03192v1 Announce Type: cross Abstract: We study multiteacher knowledge distillation for low resource abstractive summarization from a reliability aware perspective. We introduce EWAD (Entropy Weighted Agreement Aware Distillation), a token level mechanism that routes supervision between teacher distillation and gold supervision based on inter teacher agreement, and CPDP (Capacity Proportional Divergence Preservation), a geometric constraint […]