REZE: Representation Regularization for Domain-adaptive Text Embedding Pre-finetuning

arXiv:2604.17257v2 Announce Type: replace-cross Abstract: Recent text embedding models are often adapted to specialized domains via contrastive pre-finetuning (PFT) on a naive collection of scattered, heterogeneous tasks. However, this approach often introduces task-induced bias alongside domain knowledge, leading to uncontrolled representation shifts that distort the pretrained embedding geometry and cause substantial performance degradation. To address […]

Memory Assignment for Finite-Memory Strategies in Adversarial Patrolling Games

arXiv:2505.14137v2 Announce Type: replace Abstract: Adversarial Patrolling games form a subclass of Security games where a Defender moves between locations, guarding vulnerable targets. The main algorithmic problem is constructing a strategy for the Defender that minimizes the worst damage an Attacker can cause. We focus on the class of finite-memory (also known as regular) Defender’s […]

Beyond Itinerary Planning-A Real-World Benchmark for Multi-Turn and Tool-Using Travel Tasks

arXiv:2512.22673v3 Announce Type: replace Abstract: Travel planning is a natural real-world task to test large language models’ (LLMs) planning and tool-use abilities. Although prior work has studied LLM performance on travel planning, existing settings still differ from real-world needs, mainly due to limited domain coverage, insufficient modeling of users’ implicit preferences in multi-turn conversations, and […]

Geometry-Aware CLIP Retrieval via Local Cross-Modal Alignment and Steering

arXiv:2604.16487v2 Announce Type: replace-cross Abstract: CLIP retrieval is typically framed as a pointwise similarity problem in a shared embedding space. While CLIP achieves strong global cross-modal alignment, many retrieval failures arise from local geometric inconsistencies: nearby items are incorrectly ordered, leading to systematic confusions (e.g., pentagon vs. hexagon) and produces diffuse, weakly controlled result sets. […]

DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding

arXiv:2604.12812v4 Announce Type: replace Abstract: Existing Multimodal Large Language Models (MLLMs) suffer from significant performance degradation on the long document understanding task as document length increases. This stems from two fundamental challenges: 1) a low Signal-to-Noise Ratio (SNR), with crucial evidence buried in irrelevant pages; and 2) supervision scarcity, as datasets offering only final short […]

LePREC: Reasoning as Classification over Structured Factors for Assessing Relevance of Legal Issues

arXiv:2604.19464v1 Announce Type: cross Abstract: More than half of the global population struggles to meet their civil justice needs due to limited legal resources. While Large Language Models (LLMs) have demonstrated impressive reasoning capabilities, significant challenges remain even at the foundational step of legal issue identification. To investigate LLMs’ capabilities in this task, we constructed […]

Uncertainty Quantification in Detection Transformers: Object-Level Calibration and Image-Level Reliability

arXiv:2412.01782v4 Announce Type: replace-cross Abstract: DETR and its variants have emerged as promising architectures for object detection, offering an end-to-end prediction pipeline. In practice, however, DETRs generate hundreds of predictions that far outnumber the actual objects present in an image. This raises a critical question: which of these predictions could be trusted? This is particularly […]

Beyond the ‘Diff’: Addressing Agentic Entropy in Agentic Software Development

arXiv:2604.16323v2 Announce Type: replace-cross Abstract: As autonomous coding agents become deeply embedded in software development workflows, their high operational velocity introduces a critical oversight challenge: the accumulating divergence between agentic actions and architectural intent. We term this process agentic entropy: a systemic drift that traditional code diff-based and HCXAI methods fail to capture, as they […]

A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends

arXiv:2507.09861v2 Announce Type: replace-cross Abstract: Visually Rich Document Understanding (VRDU) has become a pivotal area of research, driven by the need to automatically interpret documents that contain intricate visual, textual, and structural elements. Recently, Multimodal Large Language Models (MLLMs) have demonstrated significant promise in this domain, including both OCR-based and OCR-free approaches for information extraction […]

Catching Every Ripple: Enhanced Anomaly Awareness via Dynamic Concept Adaptation

arXiv:2604.14726v2 Announce Type: replace-cross Abstract: Online anomaly detection (OAD) plays a pivotal role in real-time analytics and decision-making for evolving data streams. However, existing methods often rely on costly retraining and rigid decision boundaries, limiting their ability to adapt both effectively and efficiently to concept drift in dynamic environments. To address these challenges, we propose […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844