Topology-Preserving Data Augmentation for Ring-Type Polygon Annotations

arXiv:2603.14764v1 Announce Type: cross Abstract: Geometric data augmentation is widely used in segmentation pipelines and typically assumes that polygon annotations represent simply connected regions. However, in structured domains such as architectural floorplan analysis, ring-type regions are often encoded as a single cyclic polygon chain connecting outer and inner boundaries. During augmentation, clipping operations may remove […]

MURE: Hierarchical Multi-Resolution Encoding via Vision-Language Models for Visual Document Retrieval

arXiv:2603.13349v1 Announce Type: cross Abstract: Visual Document Retrieval (VDR) requires representations that capture both fine-grained visual details and global document structure to ensure retrieval efficacy while maintaining computational efficiency. Existing VDR models struggle to balance effectiveness and efficiency when processing high-resolution documents: they often either lose fine-grained information or generate an excessive number of visual […]

EvoDriveVLA: Evolving Autonomous Driving Vision-Language-Action Model via Collaborative Perception-Planning Distillation

arXiv:2603.09465v2 Announce Type: replace-cross Abstract: Vision-Language-Action models have shown great promise for autonomous driving, yet they suffer from degraded perception after unfreezing the visual encoder and struggle with accumulated instability in long-term planning. To address these challenges, we propose EvoDriveVLA-a novel collaborative perception-planning distillation framework that integrates self-anchored perceptual constraints and oracle-guided trajectory optimization. Specifically, […]

KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation

arXiv:2603.11501v2 Announce Type: replace-cross Abstract: Graph-based Retrieval-Augmented Generation (GraphRAG) constructs the Knowledge Graph (KG) from external databases to enhance the timeliness and accuracy of Large Language Model (LLM) generations. However, this reliance on external data introduces new attack surfaces. Attackers can inject poisoned texts into databases to manipulate LLMs into producing harmful target responses for […]

Disentangling Prompt Dependence to Evaluate Segmentation Reliability in Gynecological MRI

arXiv:2603.13369v1 Announce Type: cross Abstract: Promptable segmentation models (e.g., the Segment Anything Models) enable generalizable, zero-shot segmentation across diverse domains. Although predictions are deterministic for a fixed image-prompt pair, the robustness of these models to variations in user prompts, referred to as prompt dependence, remains underexplored. In safety-critical workflows with substantial inter-user variability, interpretable and […]

DDS-UDA: Dual-Domain Synergy for Unsupervised Domain Adaptation in Joint Segmentation of Optic Disc and Optic Cup

arXiv:2603.13345v1 Announce Type: cross Abstract: Convolutional neural networks (CNNs) have achieved exciting performance in joint segmentation of optic disc and optic cup on single-institution datasets. However, their clinical translation is hindered by two major challenges: limited availability of large-scale, high-quality annotations and performance degradation caused by domain shift during deployment across heterogeneous imaging protocols and […]

HindSight: Evaluating Research Idea Generation via Future Impact

arXiv:2603.15164v1 Announce Type: cross Abstract: Evaluating AI-generated research ideas typically relies on LLM judges or human panels — both subjective and disconnected from actual research impact. We introduce hs, a time-split evaluation framework that measures idea quality by matching generated ideas against real future publications and scoring them by citation impact and venue acceptance. Using […]

Truth as a Compression Artifact in Language Model Training

arXiv:2603.11749v2 Announce Type: replace-cross Abstract: Why do language models trained on contradictory data prefer correct answers? In controlled experiments with small transformers (3.5M–86M parameters), we show that this preference tracks the compressibility structure of errors rather than truth per se. We train GPT-2 style models on corpora where each mathematical problem appears with both correct […]

RAZOR: Ratio-Aware Layer Editing for Targeted Unlearning in Vision Transformers and Diffusion Models

arXiv:2603.14819v1 Announce Type: cross Abstract: Transformer based diffusion and vision-language models have achieved remarkable success; yet, efficiently removing undesirable or sensitive information without retraining remains a central challenge for model safety and compliance. We introduce Ratio-Aware Zero/One-step Optimized Retentive unlearning (RAZOR), a lightweight, model-agnostic unlearning framework that generalizes forgetting updates to coordinated multi-layer and multi-head […]

Composing Concepts from Images and Videos via Concept-prompt Binding

arXiv:2512.09824v2 Announce Type: replace-cross Abstract: Visual concept composition, which aims to integrate different elements from images and videos into a single, coherent visual output, still falls short in accurately extracting complex concepts from visual inputs and flexibly combining concepts from both images and videos. We introduce Bind & Compose, a one-shot method that enables flexible […]

Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models

arXiv:2601.08955v2 Announce Type: replace-cross Abstract: Recent advances in world models have shown promise for modeling future dynamics of environmental states, enabling agents to reason and act without accessing real environments. Current methods mainly perform single-step or fixed-horizon rollouts, leaving their potential for complex task planning under-exploited. We propose Imagine-then-Plan (textttITP), a unified framework for agent […]

A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines

arXiv:2602.22442v2 Announce Type: replace Abstract: Agent-based AutoML systems rely on large language models to make complex, multi-stage decisions across data processing, model selection, and evaluation. However, existing evaluation practices remain outcome-centric, focusing primarily on final task performance. Through a review of prior work, we find that none of the surveyed agentic AutoML systems report structured, […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844