Weakly Supervised Teacher-Student Framework with Progressive Pseudo-mask Refinement for Gland Segmentation

arXiv:2603.08605v3 Announce Type: replace-cross Abstract: Background and objectives: Colorectal cancer histopathological grading depends on accurate segmentation of glandular structures. Current deep learning approaches rely on large scale pixel level annotations that are labor intensive and difficult to obtain in routine clinical practice. Weakly supervised semantic segmentation offers a promising alternative. However, class activation map based […]

SortScrews: A Dataset and Baseline for Real-time Screw Classification

arXiv:2603.13027v1 Announce Type: cross Abstract: Automatic identification of screw types is important for industrial automation, robotics, and inventory management. However, publicly available datasets for screw classification are scarce, particularly for controlled single-object scenarios commonly encountered in automated sorting systems. In this work, we introduce $textbfSortScrews$, a dataset for casewise visual classification of screws. The dataset […]

Purify Once, Edit Freely: Breaking Image Protections under Model Mismatch

arXiv:2603.13028v1 Announce Type: cross Abstract: Diffusion models enable high-fidelity image editing but can also be misused for unauthorized style imitation and harmful content generation. To mitigate these risks, proactive image protection methods embed small, often imperceptible adversarial perturbations into images before sharing to disrupt downstream editing or fine-tuning. However, in realistic post-release scenarios, content owners […]

One Supervisor, Many Modalities: Adaptive Tool Orchestration for Autonomous Queries

arXiv:2603.11545v2 Announce Type: replace-cross Abstract: We present an agentic AI framework for autonomous multimodal query processing that coordinates specialized tools across text, image, audio, video, and document modalities. A central Supervisor dynamically decomposes user queries, delegates subtasks to modality-appropriate tools (e.g., object detection, OCR, speech transcription), and synthesizes results through adaptive routing strategies rather than […]

Interrogating Design Homogenization in Web Vibe Coding

arXiv:2603.13036v1 Announce Type: cross Abstract: Generative AI is known for its tendency to homogenize, often reproducing dominant style conventions found in training data. However, it remains unclear how these homogenizing effects extend to complex structural tasks like web design. As lay creators increasingly turn to LLMs to ‘vibe-code’ websites — prompting for aesthetic and functional […]

MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive Abstraction

arXiv:2602.23228v2 Announce Type: replace-cross Abstract: With the explosive growth of digital entertainment, automated video summarization has become indispensable for applications such as content indexing, personalized recommendation, and efficient media archiving. Automatic synopsis generation for long-form videos, such as movies and TV series, presents a significant challenge for existing Vision-Language Models (VLMs). While proficient at single-image […]

SAW: Toward a Surgical Action World Model via Controllable and Scalable Video Generation

arXiv:2603.13024v1 Announce Type: cross Abstract: A surgical world model capable of generating realistic surgical action videos with precise control over tool-tissue interactions can address fundamental challenges in surgical AI and simulation — from data scarcity and rare event synthesis to bridging the sim-to-real gap for surgical automation. However, current video generation methods, the very core […]

Is Human Annotation Necessary? Iterative MBR Distillation for Error Span Detection in Machine Translation

arXiv:2603.12983v1 Announce Type: cross Abstract: Error Span Detection (ESD) is a crucial subtask in Machine Translation (MT) evaluation, aiming to identify the location and severity of translation errors. While fine-tuning models on human-annotated data improves ESD performance, acquiring such data is expensive and prone to inconsistencies among annotators. To address this, we propose a novel […]

Narrative Weaver: Towards Controllable Long-Range Visual Consistency with Multi-Modal Conditioning

arXiv:2603.06688v2 Announce Type: replace-cross Abstract: We present “Narrative Weaver”, a novel framework that addresses a fundamental challenge in generative AI: achieving multi-modal controllable, long-range, and consistent visual content generation. While existing models excel at generating high-fidelity short-form visual content, they struggle to maintain narrative coherence and visual consistency across extended sequences – a critical limitation […]

Efficient Real-World Autonomous Racing via Attenuated Residual Policy Optimization

arXiv:2603.12960v1 Announce Type: cross Abstract: Residual policy learning (RPL), in which a learned policy refines a static base policy using deep reinforcement learning (DRL), has shown strong performance across various robotic applications. Its effectiveness is particularly evident in autonomous racing, a domain that serves as a challenging benchmark for real-world DRL. However, deploying RPL-based controllers […]

daVinci-Env: Open SWE Environment Synthesis at Scale

arXiv:2603.13023v1 Announce Type: cross Abstract: Training capable software engineering (SWE) agents demands large-scale, executable, and verifiable environments that provide dynamic feedback loops for iterative code editing, test execution, and solution refinement. However, existing open-source datasets remain limited in scale and repository diversity, while industrial solutions are opaque with unreleased infrastructure, creating a prohibitive barrier for […]

Delta1 with LLM: symbolic and neural integration for credible and explainable reasoning

arXiv:2603.12953v1 Announce Type: cross Abstract: Neuro-symbolic reasoning increasingly demands frameworks that unite the formal rigor of logic with the interpretability of large language models (LLMs). We introduce an end to end explainability by construction pipeline integrating the Automated Theorem Generator Delta1 based on the full triangular standard contradiction (FTSC) with LLMs. Delta1 deterministically constructs minimal […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844