arXiv:2603.14833v1 Announce Type: cross Abstract: Multi-stream transformer architectures have recently been proposed as a promising direction for managing representation collapse and the vanishing gradient problem for residual connections, yet their internal mechanisms remain unexplored. In particular, the recently introduced Manifold-Constrained Hyper-Connections (mHC) architecture posits multiple residual streams with constrained interaction, but lacks in-depth mechanistic analysis. […]
A Loss Landscape Visualization Framework for Interpreting Reinforcement Learning: An ADHDP Case Study
arXiv:2603.14600v1 Announce Type: cross Abstract: Reinforcement learning algorithms have been widely used in dynamic and control systems. However, interpreting their internal learning behavior remains a challenge. In the authors’ previous work, a critic match loss landscape visualization method was proposed to study critic training. This study extends that method into a framework which provides a […]
DOVA: Deliberation-First Multi-Agent Orchestration for Autonomous Research Automation
arXiv:2603.13327v1 Announce Type: new Abstract: Large language model (LLM) agents have demonstrated remarkable capabilities in tool use, reasoning, and code generation, yet single-agent systems exhibit fundamental limitations when confronted with complex research tasks demanding multi-source synthesis, adversarial verification, and personalized delivery. We present DOVA (Deep Orchestrated Versatile Agent), a multi-agent platform introducing three key innovations: […]
A mechanical bifurcation constrains the evolution of cell sheet folding in the family Volvocaceae
arXiv:2603.15171v1 Announce Type: cross Abstract: The processes of morphogenesis that give rise to the shapes of organs and organisms during development are often driven by mechanical instabilities. Can such mechanical bifurcations also drive or constrain the evolution of these processes in the first place? We discover an instance of these constraints in the green algae […]
VisualLeakBench: Auditing the Fragility of Large Vision-Language Models against PII Leakage and Social Engineering
arXiv:2603.13385v1 Announce Type: cross Abstract: As Large Vision-Language Models (LVLMs) are increasingly deployed in agent-integrated workflows and other deployment-relevant settings, their robustness against semantic visual attacks remains under-evaluated — alignment is typically tested on explicit harmful content rather than privacy-critical multimodal scenarios. We introduce VisualLeakBench, an evaluation suite to audit LVLMs against OCR Injection and […]
Why Grokking Takes So Long: A First-Principles Theory of Representational Phase Transitions
arXiv:2603.13331v1 Announce Type: new Abstract: Grokking is the sudden generalization that appears long after a model has perfectly memorized its training data. Although this phenomenon has been widely observed, there is still no quantitative theory explaining the length of the delay between memorization and generalization. Prior work has noted that weight decay plays an important […]
GPrune-LLM: Generalization-Aware Structured Pruning for Large Language Models
arXiv:2603.13418v1 Announce Type: cross Abstract: Structured pruning is widely used to compress large language models (LLMs), yet its effectiveness depends heavily on neuron importance estimation. Most existing methods estimate neuron importance from activation statistics on a single calibration dataset, which introduces calibration bias and degrades downstream cross-task generalization. We observe that neurons exhibit heterogeneous distribution […]
CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents
arXiv:2603.15421v1 Announce Type: cross Abstract: Large language model agents heavily rely on external memory to support knowledge reuse and complex reasoning tasks. Yet most memory systems store experiences in a single global retrieval pool which can gradually dilute or corrupt stored knowledge. This problem is especially pronounced for small language models (SLMs), which are highly […]
CHIMERA-Bench: A Benchmark Dataset for Epitope-Specific Antibody Design
arXiv:2603.13431v1 Announce Type: cross Abstract: Computational antibody design has seen rapid methodological progress, with dozens of deep generative methods proposed in the past three years, yet the field lacks a standardized benchmark for fair comparison and model development. These methods are evaluated on different SAbDab snapshots, non-overlapping test sets, and incompatible metrics, and the literature […]
DyACE: Dynamic Algorithm Co-evolution for Online Automated Heuristic Design with Large Language Model
arXiv:2603.13344v1 Announce Type: new Abstract: The prevailing paradigm in Automated Heuristic Design (AHD) typically relies on the assumption that a single, fixed algorithm can effectively navigate the shifting dynamics of a combinatorial search. This static approach often proves inadequate for Perturbative Heuristics, where the optimal algorithm for escaping local optima depends heavily on the specific […]
MGMAR: Metal-Guided Metal Artifact Reduction for X-ray Computed Tomography
arXiv:2603.13447v1 Announce Type: cross Abstract: An X-ray computed tomography (CT), metal artifact reduction (MAR) remains a major challenge because metallic implants violate standard CT forward-model assumptions, producing severe streaking and shadowing artifacts that degrade diagnostic quality. We propose MGMAR, a metal-guided MAR method that explicitly leverages metal-related information throughout the reconstruction pipeline. MGMAR first generates […]
Benchmarking LLM-based agents for single-cell omics analysis
arXiv:2508.13201v3 Announce Type: replace Abstract: Background: The surge in single-cell omics data exposes limitations in traditional, manually defined analysis workflows. AI agents offer a paradigm shift, enabling adaptive planning, executable code generation, traceable decisions, and real-time knowledge fusion. However, the lack of a comprehensive benchmark critically hinders progress. Results: We introduce a novel benchmarking evaluation […]