arXiv:2603.18373v1 Announce Type: cross Abstract: When VLMs answer correctly, do they genuinely rely on visual information or exploit language shortcuts? We introduce the Tri-Layer Diagnostic Framework, which disentangles hallucination sources via three metrics: Latent Anomaly Detection (perceptual awareness), Visual Necessity Score (visual dependency, measured via KL divergence), and Competition Score (conflict between visual grounding and […]
Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation
arXiv:2603.18652v1 Announce Type: cross Abstract: Reliably extracting tables from PDFs is essential for large-scale scientific data mining and knowledge base construction, yet existing evaluation approaches rely on rule-based metrics that fail to capture semantic equivalence of table content. We present a benchmarking framework based on synthetically generated PDFs with precise LaTeX ground truth, using tables […]
LICA: Layered Image Composition Annotations for Graphic Design Research
arXiv:2603.16098v2 Announce Type: replace-cross Abstract: We introduce LICA (Layered Image Composition Annotations), a large scale dataset of 1,550,244 multi-layer graphic design compositions designed to advance structured understanding and generation of graphic layouts. In addition to rendered PNG images, LICA represents each design as a hierarchical composition of typed components including text, image, vector, and group […]
Multiscale Switch for Semi-Supervised and Contrastive Learning in Medical Ultrasound Image Segmentation
arXiv:2603.18655v1 Announce Type: cross Abstract: Medical ultrasound image segmentation faces significant challenges due to limited labeled data and characteristic imaging artifacts including speckle noise and low-contrast boundaries. While semi-supervised learning (SSL) approaches have emerged to address data scarcity, existing methods suffer from suboptimal unlabeled data utilization and lack robust feature representation mechanisms. In this paper, […]
Embodied Foundation Models at the Edge: A Survey of Deployment Constraints and Mitigation Strategies
arXiv:2603.16952v2 Announce Type: replace-cross Abstract: Deploying foundation models in embodied edge systems is fundamentally a systems problem, not just a problem of model compression. Real-time control must operate within strict size, weight, and power constraints, where memory traffic, compute latency, timing variability, and safety margins interact directly. The Deployment Gauntlet organizes these constraints into eight […]
SCALE:Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction
arXiv:2603.17380v2 Announce Type: replace-cross Abstract: Virtual cell models aim to enable in silico experimentation by predicting how cells respond to genetic, chemical, or cytokine perturbations from single-cell measurements. In practice, however, large-scale perturbation prediction remains constrained by three coupled bottlenecks: inefficient training and inference pipelines, unstable modeling in high-dimensional sparse expression space, and evaluation protocols […]
Cognitive Amplification vs Cognitive Delegation in Human-AI Systems: A Metric Framework
arXiv:2603.18677v1 Announce Type: cross Abstract: Artificial intelligence is increasingly embedded in human decision-making, where it can either enhance human reasoning or induce excessive cognitive dependence. This paper introduces a conceptual and mathematical framework for distinguishing cognitive amplification, in which AI improves hybrid human-AI performance while preserving human expertise, from cognitive delegation, in which reasoning is […]
TDAD: Test-Driven Agentic Development – Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis
arXiv:2603.17973v2 Announce Type: replace-cross Abstract: AI coding agents can resolve real-world software issues, yet they frequently introduce regressions — breaking tests that previously passed. Current benchmarks focus almost exclusively on resolution rate, leaving regression behavior under-studied. This paper presents TDAD (Test-Driven Agentic Development), an open-source tool that performs pre-change impact analysis for AI coding agents. […]
MOSS-TTS Technical Report
arXiv:2603.18090v1 Announce Type: cross Abstract: This technical report presents MOSS-TTS, a speech generation foundation model built on a scalable recipe: discrete audio tokens, autoregressive modeling, and large-scale pretraining. Built on MOSS-Audio-Tokenizer, a causal Transformer tokenizer that compresses 24 kHz audio to 12.5 fps with variable-bitrate RVQ and unified semantic-acoustic representations, we release two complementary generators: […]
Beyond TVLA: Anderson-Darling Leakage Assessment for Neural Network Side-Channel Leakage Detection
arXiv:2603.18647v1 Announce Type: cross Abstract: Test Vector Leakage Assessment (TVLA) based on Welch’s $t$-test has become a standard tool for detecting side-channel leakage. However, its mean-based nature can limit sensitivity when leakage manifests primarily through higher-order distributional differences. As our experiments show, this property becomes especially crucial when it comes to evaluating neural network implementations. […]
Intellectual Stewardship: Re-adapting Human Minds for Creative Knowledge Work in the Age of AI
arXiv:2603.18117v1 Announce Type: cross Abstract: Background: Amid the opportunities and risks introduced by generative AI, learning research needs to envision how human minds and responsibilities should re-adapt as AI continues to augment or automate various tasks. Approach: Drawing on theories of learning, intelligence, and knowledge creation, this conceptual paper proposes intellectual stewardship as a human-centered, […]
To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation
arXiv:2603.15159v3 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have shown strong potential for code generation, yet they remain limited in private-library-oriented code generation, where the goal is to generate code using APIs from private libraries. Existing approaches mainly rely on retrieving private-library API documentation and injecting relevant knowledge into the context at inference time. […]