arXiv:2603.12606v1 Announce Type: cross Abstract: Current vision-language detection and grounding models predominantly focus on prompts with positive semantics and often struggle to accurately interpret and ground complex expressions containing negative semantics. A key reason for this limitation is the lack of high-quality training data that explicitly captures discriminative negative samples and negation-aware language descriptions. To […]
Are General-Purpose Vision Models All We Need for 2D Medical Image Segmentation? A Cross-Dataset Empirical Study
arXiv:2603.13044v1 Announce Type: cross Abstract: Medical image segmentation (MIS) is a fundamental component of computer-assisted diagnosis and clinical decision support systems. Over the past decade, numerous architectures specifically tailored to medical imaging have emerged to address domain-specific challenges such as low contrast, small anatomical structures, and limited annotated data. In parallel, rapid progress in computer […]
Proof-Carrying Materials: Falsifiable Safety Certificates for Machine-Learned Interatomic Potentials
arXiv:2603.12183v2 Announce Type: replace-cross Abstract: Machine-learned interatomic potentials (MLIPs) are deployed for high-throughput materials screening without formal reliability guarantees. We show that a single MLIP used as a stability filter misses 93% of density functional theory (DFT)-stable materials (recall 0.07) on a 25,000-material benchmark. Proof-Carrying Materials (PCM) closes this gap through three stages: adversarial falsification […]
Beyond Convolution: A Taxonomy of Structured Operators for Learning-Based Image Processing
arXiv:2603.12067v2 Announce Type: replace-cross Abstract: The convolution operator is the fundamental building block of modern convolutional neural networks (CNNs), owing to its simplicity, translational equivariance, and efficient implementation. However, its structure as a fixed, linear, locally-averaging operator limits its ability to capture structured signal properties such as low-rank decompositions, adaptive basis representations, and non-uniform spatial […]
Predictive Analytics for Foot Ulcers Using Time-Series Temperature and Pressure Data
arXiv:2603.12278v1 Announce Type: new Abstract: Diabetic foot ulcers (DFUs) are a severe complication of diabetes, often resulting in significant morbidity. This paper presents a predictive analytics framework utilizing time-series data captured by wearable foot sensors — specifically NTC thin-film thermocouples for temperature measurement and FlexiForce pressure sensors for plantar load monitoring. Data was collected from […]
Shattering the Shortcut: A Topology-Regularized Benchmark for Multi-hop Medical Reasoning in LLMs
arXiv:2603.12458v1 Announce Type: cross Abstract: While Large Language Models (LLMs) achieve expert-level performance on standard medical benchmarks through single-hop factual recall, they severely struggle with the complex, multi-hop diagnostic reasoning required in real-world clinical settings. A primary obstacle is “shortcut learning”, where models exploit highly connected, generic hub nodes (e.g., “inflammation”) in knowledge graphs to […]
SvfEye: A Semantic-Visual Fusion Framework with Multi-Scale Visual Context for Multimodal Reasoning
arXiv:2603.00171v2 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs) often struggle to accurately perceive fine-grained visual details, especially when targets are tiny or visually subtle. This challenge can be addressed through semantic-visual information fusion, which integrates global image context with fine-grained local evidence for multi-scale visual understanding. Recently, a paradigm termed “Thinking with Images” […]
Fair Lung Disease Diagnosis from Chest CT via Gender-Adversarial Attention Multiple Instance Learning
arXiv:2603.12988v1 Announce Type: cross Abstract: We present a fairness-aware framework for multi-class lung disease diagnosis from chest CT volumes, developed for the Fair Disease Diagnosis Challenge at the PHAROS-AIF-MIH Workshop (CVPR 2026). The challenge requires classifying CT scans into four categories — Healthy, COVID-19, Adenocarcinoma, and Squamous Cell Carcinoma — with performance measured as the […]
Depth Charge: Jailbreak Large Language Models from Deep Safety Attention Heads
arXiv:2603.05772v2 Announce Type: replace-cross Abstract: Currently, open-sourced large language models (OSLLMs) have demonstrated remarkable generative performance. However, as their structure and weights are made public, they are exposed to jailbreak attacks even after alignment. Existing attacks operate primarily at shallow levels, such as the prompt or embedding level, and often fail to expose vulnerabilities rooted […]
ARL-Tangram: Unleash the Resource Efficiency in Agentic Reinforcement Learning
arXiv:2603.13019v1 Announce Type: cross Abstract: Agentic reinforcement learning (RL) has emerged as a transformative workload in cloud clusters, enabling large language models (LLMs) to solve complex problems through interactions with real world. However, unlike traditional RL, agentic RL demands substantial external cloud resources, e.g., CPUs for code execution and GPUs for reward models, that exist […]
Variation-aware Flexible 3D Gaussian Editing
arXiv:2602.11638v3 Announce Type: replace-cross Abstract: Indirect editing methods for 3D Gaussian Splatting (3DGS) have recently witnessed significant advancements. These approaches operate by first applying edits in the rendered 2D space and subsequently projecting the modifications back into 3D. However, this paradigm inevitably introduces cross-view inconsistencies and constrains both the flexibility and efficiency of the editing […]
Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM Expansions for Query Expansion
arXiv:2602.08917v2 Announce Type: replace-cross Abstract: Query expansion with large language models is promising but often relies on hand-crafted prompts, manually chosen exemplars, or a single LLM, making it non-scalable and sensitive to domain shift. We present an automated, domain-adaptive QE framework that builds in-domain exemplar pools by harvesting pseudo-relevant passages using a BM25-MonoT5 pipeline. A […]