arXiv:2603.08605v3 Announce Type: replace-cross Abstract: Background and objectives: Colorectal cancer histopathological grading depends on accurate segmentation of glandular structures. Current deep learning approaches rely on large scale pixel level annotations that are labor intensive and difficult to obtain in routine clinical practice. Weakly supervised semantic segmentation offers a promising alternative. However, class activation map based […]
SortScrews: A Dataset and Baseline for Real-time Screw Classification
arXiv:2603.13027v1 Announce Type: cross Abstract: Automatic identification of screw types is important for industrial automation, robotics, and inventory management. However, publicly available datasets for screw classification are scarce, particularly for controlled single-object scenarios commonly encountered in automated sorting systems. In this work, we introduce $textbfSortScrews$, a dataset for casewise visual classification of screws. The dataset […]
Purify Once, Edit Freely: Breaking Image Protections under Model Mismatch
arXiv:2603.13028v1 Announce Type: cross Abstract: Diffusion models enable high-fidelity image editing but can also be misused for unauthorized style imitation and harmful content generation. To mitigate these risks, proactive image protection methods embed small, often imperceptible adversarial perturbations into images before sharing to disrupt downstream editing or fine-tuning. However, in realistic post-release scenarios, content owners […]
One Supervisor, Many Modalities: Adaptive Tool Orchestration for Autonomous Queries
arXiv:2603.11545v2 Announce Type: replace-cross Abstract: We present an agentic AI framework for autonomous multimodal query processing that coordinates specialized tools across text, image, audio, video, and document modalities. A central Supervisor dynamically decomposes user queries, delegates subtasks to modality-appropriate tools (e.g., object detection, OCR, speech transcription), and synthesizes results through adaptive routing strategies rather than […]
Interrogating Design Homogenization in Web Vibe Coding
arXiv:2603.13036v1 Announce Type: cross Abstract: Generative AI is known for its tendency to homogenize, often reproducing dominant style conventions found in training data. However, it remains unclear how these homogenizing effects extend to complex structural tasks like web design. As lay creators increasingly turn to LLMs to ‘vibe-code’ websites — prompting for aesthetic and functional […]
MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive Abstraction
arXiv:2602.23228v2 Announce Type: replace-cross Abstract: With the explosive growth of digital entertainment, automated video summarization has become indispensable for applications such as content indexing, personalized recommendation, and efficient media archiving. Automatic synopsis generation for long-form videos, such as movies and TV series, presents a significant challenge for existing Vision-Language Models (VLMs). While proficient at single-image […]
SAW: Toward a Surgical Action World Model via Controllable and Scalable Video Generation
arXiv:2603.13024v1 Announce Type: cross Abstract: A surgical world model capable of generating realistic surgical action videos with precise control over tool-tissue interactions can address fundamental challenges in surgical AI and simulation — from data scarcity and rare event synthesis to bridging the sim-to-real gap for surgical automation. However, current video generation methods, the very core […]
Is Human Annotation Necessary? Iterative MBR Distillation for Error Span Detection in Machine Translation
arXiv:2603.12983v1 Announce Type: cross Abstract: Error Span Detection (ESD) is a crucial subtask in Machine Translation (MT) evaluation, aiming to identify the location and severity of translation errors. While fine-tuning models on human-annotated data improves ESD performance, acquiring such data is expensive and prone to inconsistencies among annotators. To address this, we propose a novel […]
Narrative Weaver: Towards Controllable Long-Range Visual Consistency with Multi-Modal Conditioning
arXiv:2603.06688v2 Announce Type: replace-cross Abstract: We present “Narrative Weaver”, a novel framework that addresses a fundamental challenge in generative AI: achieving multi-modal controllable, long-range, and consistent visual content generation. While existing models excel at generating high-fidelity short-form visual content, they struggle to maintain narrative coherence and visual consistency across extended sequences – a critical limitation […]
Efficient Real-World Autonomous Racing via Attenuated Residual Policy Optimization
arXiv:2603.12960v1 Announce Type: cross Abstract: Residual policy learning (RPL), in which a learned policy refines a static base policy using deep reinforcement learning (DRL), has shown strong performance across various robotic applications. Its effectiveness is particularly evident in autonomous racing, a domain that serves as a challenging benchmark for real-world DRL. However, deploying RPL-based controllers […]
daVinci-Env: Open SWE Environment Synthesis at Scale
arXiv:2603.13023v1 Announce Type: cross Abstract: Training capable software engineering (SWE) agents demands large-scale, executable, and verifiable environments that provide dynamic feedback loops for iterative code editing, test execution, and solution refinement. However, existing open-source datasets remain limited in scale and repository diversity, while industrial solutions are opaque with unreleased infrastructure, creating a prohibitive barrier for […]
Delta1 with LLM: symbolic and neural integration for credible and explainable reasoning
arXiv:2603.12953v1 Announce Type: cross Abstract: Neuro-symbolic reasoning increasingly demands frameworks that unite the formal rigor of logic with the interpretability of large language models (LLMs). We introduce an end to end explainability by construction pipeline integrating the Automated Theorem Generator Delta1 based on the full triangular standard contradiction (FTSC) with LLMs. Delta1 deterministically constructs minimal […]