arXiv:2604.16487v2 Announce Type: replace-cross Abstract: CLIP retrieval is typically framed as a pointwise similarity problem in a shared embedding space. While CLIP achieves strong global cross-modal alignment, many retrieval failures arise from local geometric inconsistencies: nearby items are incorrectly ordered, leading to systematic confusions (e.g., pentagon vs. hexagon) and produces diffuse, weakly controlled result sets. […]
DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding
arXiv:2604.12812v4 Announce Type: replace Abstract: Existing Multimodal Large Language Models (MLLMs) suffer from significant performance degradation on the long document understanding task as document length increases. This stems from two fundamental challenges: 1) a low Signal-to-Noise Ratio (SNR), with crucial evidence buried in irrelevant pages; and 2) supervision scarcity, as datasets offering only final short […]
LePREC: Reasoning as Classification over Structured Factors for Assessing Relevance of Legal Issues
arXiv:2604.19464v1 Announce Type: cross Abstract: More than half of the global population struggles to meet their civil justice needs due to limited legal resources. While Large Language Models (LLMs) have demonstrated impressive reasoning capabilities, significant challenges remain even at the foundational step of legal issue identification. To investigate LLMs’ capabilities in this task, we constructed […]
Uncertainty Quantification in Detection Transformers: Object-Level Calibration and Image-Level Reliability
arXiv:2412.01782v4 Announce Type: replace-cross Abstract: DETR and its variants have emerged as promising architectures for object detection, offering an end-to-end prediction pipeline. In practice, however, DETRs generate hundreds of predictions that far outnumber the actual objects present in an image. This raises a critical question: which of these predictions could be trusted? This is particularly […]
Beyond the ‘Diff’: Addressing Agentic Entropy in Agentic Software Development
arXiv:2604.16323v2 Announce Type: replace-cross Abstract: As autonomous coding agents become deeply embedded in software development workflows, their high operational velocity introduces a critical oversight challenge: the accumulating divergence between agentic actions and architectural intent. We term this process agentic entropy: a systemic drift that traditional code diff-based and HCXAI methods fail to capture, as they […]
A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends
arXiv:2507.09861v2 Announce Type: replace-cross Abstract: Visually Rich Document Understanding (VRDU) has become a pivotal area of research, driven by the need to automatically interpret documents that contain intricate visual, textual, and structural elements. Recently, Multimodal Large Language Models (MLLMs) have demonstrated significant promise in this domain, including both OCR-based and OCR-free approaches for information extraction […]
Counting Worlds Branching Time Semantics for post-hoc Bias Mitigation in generative AI
arXiv:2604.19431v1 Announce Type: cross Abstract: Generative AI systems are known to amplify biases present in their training data. While several inference-time mitigation strategies have been proposed, they remain largely empirical and lack formal guarantees. In this paper we introduce CTLF, a branching-time logic designed to reason about bias in series of generative AI outputs. CTLF […]
Cloning Deterministic Worlds: The Critical Role of Latent Geometry in Long-Horizon World Models
arXiv:2510.26782v3 Announce Type: replace-cross Abstract: A world model is an internal model that simulates how the world evolves. Given past observations and actions, it predicts the future physical state of both the embodied agent and its environment. Accurate world models are essential for enabling agents to think, plan, and reason effectively in complex, dynamic settings. […]
Catching Every Ripple: Enhanced Anomaly Awareness via Dynamic Concept Adaptation
arXiv:2604.14726v2 Announce Type: replace-cross Abstract: Online anomaly detection (OAD) plays a pivotal role in real-time analytics and decision-making for evolving data streams. However, existing methods often rely on costly retraining and rigid decision boundaries, limiting their ability to adapt both effectively and efficiently to concept drift in dynamic environments. To address these challenges, we propose […]
Diversifying Toxicity Search in Large Language Models Through Speciation
arXiv:2601.20981v2 Announce Type: replace-cross Abstract: Evolutionary prompt search is a practical black-box approach for red teaming large language models, however existing methods often collapse onto a small family of high-performing prompts, limiting coverage of distinct failure modes. We present a speciated quality-diversity extension of textitToxSearch that maintains multiple high-toxicity prompt niches in parallel rather than […]
GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes
arXiv:2604.19411v1 Announce Type: cross Abstract: Understanding road scenes in a geometrically consistent, scene-centric representation is crucial for planning and mapping. We present GOLD-BEV, a framework that learns dense bird’s-eye-view (BEV) semantic environment maps-including dynamic agents-from ego-centric sensors, using time-synchronized aerial imagery as supervision only during training. BEV-aligned aerial crops provide an intuitive target space, enabling […]
Early Pruning for Public Transport Routing
arXiv:2603.12592v2 Announce Type: replace-cross Abstract: Routing algorithms for public transport, particularly the widely used RAPTOR and its variants, often face performance bottlenecks during the transfer relaxation phase, especially on dense transfer graphs, when supporting unlimited transfers. This inefficiency arises from iterating over many potential inter-stop connections (walks, bikes, e-scooters, etc.). To maintain acceptable performance, practitioners […]