arXiv:2604.06448v1 Announce Type: cross Abstract: Prime Video regularly conducts load tests to simulate the viewer traffic spikes seen during live events such as Thursday Night Football as well as video-on-demand (VOD) events such as Rings of Power. While these stress tests validate system capacity, they can sometimes miss service behaviors unique to real event traffic. […]
A Study of LLMs’ Preferences for Libraries and Programming Languages
arXiv:2503.17181v3 Announce Type: replace-cross Abstract: Despite the rapid progress of large language models (LLMs) in code generation, existing evaluations focus on functional correctness or syntactic validity, overlooking how LLMs make critical design choices such as which library or programming language to use. To fill this gap, we perform the first empirical study of LLMs’ preferences […]
Inference-Time Code Selection via Symbolic Equivalence Partitioning
arXiv:2604.06485v1 Announce Type: cross Abstract: “Best-of-N” selection is a popular inference-time scaling method for code generation using Large Language Models (LLMs). However, to reliably identify correct solutions, existing methods often depend on expensive or stochastic external verifiers. In this paper, we propose Symbolic Equivalence Partitioning, a selection framework that uses symbolic execution to group candidate […]
Reasoning Fails Where Step Flow Breaks
arXiv:2604.06695v1 Announce Type: new Abstract: Large reasoning models (LRMs) that generate long chains of thought now perform well on multi-step math, science, and coding tasks. However, their behavior is still unstable and hard to interpret, and existing analysis tools struggle with such long, structured reasoning traces. We introduce Step-Saliency, which pools attention–gradient scores into step-to-step […]
Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees
arXiv:2604.06515v1 Announce Type: cross Abstract: Sparse Mixture-of-Experts (MoE) allows scaling of language and vision models efficiently by activating only a small subset of experts per input. While this reduces computation, the large number of parameters still incurs substantial memory overhead during inference. Post-training quantization has been explored to address this issue. Because uniform quantization suffers […]
Leveraging Wireless Sensor Networks for Real-Time Monitoring and Control of Industrial Environments
arXiv:2510.13820v3 Announce Type: replace-cross Abstract: This research proposes an extensive technique for monitoring and controlling the industrial parameters using Internet of Things (IoT) technology based on wireless communication. We proposed a system based on NRF transceivers to establish a strong Wireless Sensor Network (WSN), enabling transfer of real-time data from multiple sensors to a central […]
SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills
arXiv:2604.06550v1 Announce Type: cross Abstract: OpenClaw’s ClawHub marketplace hosts over 13,000 community-contributed agent skills, and between 13% and 26% of them contain security vulnerabilities according to recent audits. Regex scanners miss obfuscated payloads; formal static analyzers cannot read the natural language instructions in SKILL.md files where prompt injection and social engineering attacks hide. Neither approach […]
AgentGate: A Lightweight Structured Routing Engine for the Internet of Agents
arXiv:2604.06696v1 Announce Type: new Abstract: The rapid development of AI agent systems is leading to an emerging Internet of Agents, where specialized agents operate across local devices, edge nodes, private services, and cloud platforms. Although recent efforts have improved agent naming, discovery, and interaction, efficient request dispatch remains an open systems problem under latency, privacy, […]
Scientific Knowledge-driven Decoding Constraints Improving the Reliability of LLMs
arXiv:2604.06603v1 Announce Type: cross Abstract: Large language models (LLMs) have shown strong knowledge reserves and task-solving capabilities, but still face the challenge of severe hallucination, hindering their practical application. Though scientific theories and rules can efficiently direct the behaviors of human manipulators, LLMs still do not utilize these highly-condensed knowledge sufficiently through training or prompting. […]
Adaptive Replay Buffer for Offline-to-Online Reinforcement Learning
arXiv:2512.10510v2 Announce Type: replace-cross Abstract: Offline-to-Online Reinforcement Learning (O2O RL) faces a critical dilemma in balancing the use of a fixed offline dataset with newly collected online experiences. Standard methods, often relying on a fixed data-mixing ratio, struggle to manage the trade-off between early learning stability and asymptotic performance. To overcome this, we introduce the […]
Logical Robots: Declarative Multi-Agent Programming in Logica
arXiv:2604.06629v1 Announce Type: cross Abstract: We present Logical Robots, an interactive multi-agent simulation platform where autonomous robot behavior is specified declaratively in the logic programming language Logica. Robot behavior is defined by logical predicates that map observations from simulated radar arrays and shared memory to desired motor outputs. This approach allows low-level reactive control and […]
ATANT: An Evaluation Framework for AI Continuity
arXiv:2604.06710v1 Announce Type: new Abstract: We present ATANT (Automated Test for Acceptance of Narrative Truth), an open evaluation framework for measuring continuity in AI systems: the ability to persist, update, disambiguate, and reconstruct meaningful context across time. While the AI industry has produced memory components (RAG pipelines, vector databases, long context windows, profile layers), no […]