Turning Generators into Retrievers: Unlocking MLLMs for Natural Language-Guided Geo-Localization

arXiv:2604.10721v1 Announce Type: cross Abstract: Natural-language Guided Cross-view Geo-localization (NGCG) aims to retrieve geo-tagged satellite imagery using textual descriptions of ground scenes. While recent NGCG methods commonly rely on CLIP-style dual-encoder architectures, they often suffer from weak cross-modal generalization and require complex architectural designs. In contrast, Multimodal Large Language Models (MLLMs) offer powerful semantic reasoning […]

Adoption and Effectiveness of AI-Based Anomaly Detection for Cross Provider Health Data Exchange

arXiv:2604.09630v1 Announce Type: cross Abstract: This study investigates the adoption and effectiveness of AI-based anomaly detection in cross-provider electronic health record (EHR) environments. It aims to (1) identify the organisational and digital capabilities required for successful implementation and (2) evaluate the performance and interpretability of lightweight anomaly detection approaches using contextual audit data. A semi-systematic […]

Digital hybridity and relics in cultural heritage: using corpus linguistics to inform design in emerging technologies from AI to VR

arXiv:2604.09669v1 Announce Type: cross Abstract: Hybrid technologies enable the blending of physical and digital elements, creating new ways to experience and interact with the world. Such technologies can transform engagement with relics, both secular and sacred but they present challenges for capturing faith, belief, and representation responsibly. Given the complexities of digital representation and the […]

Identity-Aware U-Net: Fine-grained Cell Segmentation via Identity-Aware Representation Learning

arXiv:2604.09702v1 Announce Type: cross Abstract: Precise segmentation of objects with highly similar shapes remains a challenging problem in dense prediction, especially in scenarios with ambiguous boundaries, overlapping instances, and weak inter-instance visual differences. While conventional segmentation models are effective at localizing object regions, they often lack the discriminative capacity required to reliably distinguish a target […]

A-IO: Adaptive Inference Orchestration for Memory-Bound NPUs

arXiv:2604.09752v1 Announce Type: cross Abstract: During the deployment of Large Language Models (LLMs), the autoregressive decoding phase on heterogeneous NPU platforms (e.g., Ascend 910B) faces severe memory-bound challenges. This study reveals the “Model Scaling Paradox” caused by the static deployment of single-sized models. It also points out the kernel synchronization overhead of fine-grained speculative decoding […]

Diffusion Denoiser Achievable Analysis for Finite Blocklength Unsourced Random Access

arXiv:2604.09904v1 Announce Type: cross Abstract: Polyanskiy proposed a framework for the unsourced multiple access channel (MAC) problem where users employ a common codebook in the finite blocklength regime. However, existing approaches handle channel noise before the joint decoder. In this work, we introduce a decoder compatible diffusion denoiser as a lightweight analysis within joint decoding. […]

Adapting 2D Multi-Modal Large Language Model for 3D CT Image Analysis

arXiv:2604.10233v1 Announce Type: cross Abstract: 3D medical image analysis is of great importance in disease diagnosis and treatment. Recently, multimodal large language models (MLLMs) have exhibited robust perceptual capacity, strong cross-modal alignment, and promising generalizability. Therefore, they have great potential to improve the performance of medical report generation (MRG) and medical visual question answering (MVQA), […]

From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation

arXiv:2604.10470v1 Announce Type: cross Abstract: Legal consultation question answering (Legal CQA) presents unique challenges compared to traditional legal QA tasks, including the scarcity of high-quality training data, complex task composition, and strong contextual dependencies. To address these, we construct JurisCQAD, a large-scale dataset of over 43,000 real-world Chinese legal queries annotated with expert-validated positive and […]

Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models

arXiv:2604.10567v1 Announce Type: cross Abstract: Diffusion-based language models (dLLMs) have emerged as a promising alternative to autoregressive language models, offering the potential for parallel token generation and bidirectional context modeling. However, harnessing this flexibility for fully non-autoregressive decoding remains an open question, particularly for reasoning and planning tasks. In this work, we investigate non-autoregressive decoding […]

Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior

arXiv:2604.03401v2 Announce Type: replace-cross Abstract: Understanding student engagement usually requires time-consuming manual observation or invasive recording that raises privacy concerns. We present a privacy-preserving pipeline that analyzes classroom videos to extract insights about student attention, without storing any identifiable footage. Our system runs on a single GPU, using OpenPose for skeletal extraction and Gaze-LLE for […]

Learning and Enforcing Context-Sensitive Control for LLMs

arXiv:2604.10667v1 Announce Type: cross Abstract: Controlling the output of Large Language Models (LLMs) through context-sensitive constraints has emerged as a promising approach to overcome the limitations of Context-Free Grammars (CFGs) in guaranteeing generation validity. However, such constraints typically require manual specification — a significant barrier demanding specialized expertise. We introduce a framework that automatically learns […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844