arXiv:2604.14556v1 Announce Type: cross Abstract: Video object insertion is a critical task for dynamically inserting new objects into existing environments. Previous video generation methods focus primarily on synthesizing entire scenes while struggling with ensuring consistent object appearance, spatial alignment, and temporal coherence when inserting objects into existing videos. In this paper, we propose a novel […]
Don’t Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG
arXiv:2604.14572v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) grounds LLM responses in external evidence but treats the model as a passive consumer of search results: it never sees how the corpus is organized or what it has not yet retrieved, limiting its ability to backtrack or combine scattered evidence. We present Corpus2Skill, which distills a […]
From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs
arXiv:2604.14137v2 Announce Type: replace-cross Abstract: Evaluating LLMs is challenging, as benchmark scores often fail to capture models’ real-world usefulness. Instead, users often rely on “vibe-testing”: informal experience-based evaluation, such as comparing models on coding tasks related to their own workflow. While prevalent, vibe-testing is often too ad hoc and unstructured to analyze or reproduce at […]
NuHF Claw: A Risk Constrained Cognitive Agent Framework for Human Centered Procedure Support in Digital Nuclear Control Rooms
arXiv:2604.14160v1 Announce Type: new Abstract: The rapid digitization of nuclear power plant main control rooms has fundamentally reshaped operator interaction patterns, introducing complex soft-control behaviors and elevated cognitive risks that are not adequately addressed by existing human reliability analysis approaches. Although recent advances in large language models and autonomous agents offer new opportunities for intelligent […]
Robust Evaluation of Neural Encoding Models via ground-truth approximation
arXiv:2604.14694v1 Announce Type: new Abstract: Encoding models enable measurement of how our brains represent sensory inputs using electro-and magneto-encephalography (MEEG). Evaluating how closely encoding models reflect the underlying brain functions is a crucial premise for model interpretation and hypothesis testing. However, the ground-truth neural activity is unknown, preventing model evaluation with respect to the target […]
VeriGraphi: A Multi-Agent Framework of Hierarchical RTL Generation for Large Hardware Designs
arXiv:2604.14550v1 Announce Type: cross Abstract: Generating synthesizable Verilog for large, hierarchical hardware designs remains a significant challenge for large language models (LLMs), which struggle to replicate the structured reasoning that human experts employ when translating complex specifications into RTL. When tasked with producing hierarchical Verilog, LLMs frequently lose context across modules, hallucinate interfaces, fabricate inter-module […]
Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents
arXiv:2604.14717v1 Announce Type: new Abstract: Persistent language-model agents increasingly combine tool use, tiered memory, reflective prompting, and runtime adaptation. In such systems, behavior is shaped not only by current prompts but by mutable internal conditions that influence future action. This paper introduces layered mutability, a framework for reasoning about that process across five layers: pretraining, […]
A Comparative Study of CNN Optimization Methods for Edge AI: Exploring the Role of Early Exits
arXiv:2604.14789v1 Announce Type: new Abstract: Deploying deep neural networks on edge devices requires balancing accuracy, latency, and resource constraints under realistic execution conditions. To fit models within these constraints, two broad strategies have emerged: static compression techniques such as pruning and quantization, which permanently reduce model size, and dynamic approaches such as early-exit mechanisms, which […]
CSRA: Controlled Spectral Residual Augmentation for Robust Sepsis Prediction
arXiv:2604.14532v1 Announce Type: cross Abstract: Accurate prediction of future risk and disease progression in sepsis is clinically important for early warning and timely intervention in intensive care. However, short-window sepsis prediction remains challenging, because shorter observation windows provide limited historical evidence, whereas longer prediction horizons reduce the number of patient trajectories with valid future supervision. […]
The Missing Knowledge Layer in AI: A Framework for Stable Human-AI Reasoning
arXiv:2604.14881v1 Announce Type: new Abstract: Large language models are increasingly integrated into decision-making in areas such as healthcare, law, finance, engineering, and government. Yet they share a critical limitation: they produce fluent outputs even when their internal reasoning has drifted. A confident answer can conceal uncertainty, speculation, or inconsistency, and small changes in phrasing can […]
Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models
arXiv:2604.10681v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs), despite their impressive capabilities across domains, have been shown to be vulnerable to backdoor attacks. Prior backdoor strategies predominantly operate at the token level, where an injected trigger causes the model to generate a specific target word, choice, or class (depending on the task). Recent advances, […]
Discovering Novel LLM Experts via Task-Capability Coevolution
arXiv:2604.14969v1 Announce Type: new Abstract: Frontier model developers aim to train models continually to possess emergent, diverse capabilities. To extend capabilities, the current pre-training and post-training paradigm requires manually starting training runs with static datasets or reward functions every time. Addressing this limitation, our work pursues the insight that open-endedness (via the coevolution of models […]