Case-Aware Medical Image Classification with Multimodal Knowledge Graphs and Reliability-Guided Refinement

arXiv:2605.22547v2 Announce Type: replace-cross Abstract: Deep learning has brought significant progress to medical image classification, yet most existing methods still rely on isolated visual evidence and cannot effectively leverage similar cases or external knowledge. In clinical practice, diagnosis is typically supported by similar historical cases and their associated symptoms. To explicitly model this evidence-based diagnostic […]

Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions

arXiv:2604.08304v2 Announce Type: replace-cross Abstract: Retrieval-augmented generation (RAG) extends large language models (LLMs) with external knowledge, but this access path also introduces security risks that existing work often conflates with inherent LLM flaws. We frame secure RAG as securing external knowledge access and organize the literature with SLOT, a taxonomy along four axes: the attack […]

Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations

arXiv:2605.27958v1 Announce Type: cross Abstract: Linear probes trained on LLM activations are increasingly proposed as deception-detection metrics, yet report AUROC exceeding 0.96 on clean benchmarks while collapsing under distributional shift. This paper systematically pressure-tests probe-based metrics across the Gemma 3 model family (1B-27B parameters), diagnosing why they fail rather than merely documenting that they fail. […]

Treatment Effect Estimation with Differentiated Networked Effect on Graph Data

arXiv:2605.24358v2 Announce Type: replace-cross Abstract: Estimating individual treatment effect (ITE) from observational graph data is crucial for decision-making in the fields such as commerce and medicine. This task is challenging due to interference, where individual outcomes can be influenced by the treatments and covariates of their neighbors. Existing methods attempt to model such interference for […]

UniMaia: Steering Chess Policies with Language for Human-like Play

arXiv:2605.27767v1 Announce Type: cross Abstract: Recent advances in large language models have enabled natural language to serve as a flexible interface for controlling complex systems, but often at the cost of large-scale multimodal training or weakened domain-specific inductive biases. In structured decision-making domains such as chess, specialized policy networks achieve strong performance but lack semantic […]

LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models

arXiv:2605.19729v3 Announce Type: replace-cross Abstract: We demonstrate that in knowledge distillation for diffusion models, the teacher network’s highly complex denoising process – stemming from its substantially larger capacity – poses a significant challenge for the student model to faithfully mimic. To address this problem, we propose a coarse-to-fine distillation framework with LInear FiTtingbased distillation (LIFT) […]

On the Learnability of Test-Time Adaptation: A Recovery Complexity Perspective

arXiv:2605.28057v1 Announce Type: cross Abstract: Test-time adaptation (TTA) aims to adapt models to maintain reliable performance on non-stationary test streams without requiring labeled data. Despite its empirical success, the learnability of TTA under non-stationary streams remains unexplored. A key challenge is the lack of a principled theoretical framework that simultaneously aligns with the TTA objective […]

VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild

arXiv:2605.27882v1 Announce Type: cross Abstract: LLM-based agents score well on search benchmarks, yet real users consistently find results unsatisfying, revealing a persistent evaluation-experience gap. We attribute this gap to existing benchmarks’ reliance on over-specified queries, single-turn interactions, and fixed-schema evaluation, none of which reflect real search behavior where users and agents collaboratively refine vague intent […]

When Think-with-Image Meets Safety: What Determines Multimodal Jailbreak Robustness?

arXiv:2605.27932v1 Announce Type: cross Abstract: Think-with-image reasoning is emerging as a new inference paradigm for large vision-language models, but its safety implications remain poorly understood. Existing systems already span multiple process designs, including direct response generation, text-only prior turn, visual-state manipulation, and explicit external image-tool invocation. In this paper, we ask which of these evaluated […]

FLORO: A Multimodal Geospatial Foundation Model for Ecological Remote Sensing Across Sensors and Scales

arXiv:2605.28174v1 Announce Type: cross Abstract: Foundation models offer a promising route to transferable remote sensing representations, but many current approaches depend on very large pretraining datasets and fixed sensor configurations, limiting their suitability for ecological and environmental applications, where observations often vary across platforms, spatial and spectral resolutions, and available modalities. We introduce FLORO, a […]

Misalignment Between Backpropagation and the Hierarchy of Brain Responses to Images

arXiv:2605.28693v1 Announce Type: new Abstract: Backpropagation is the core learning mechanism underlying deep learning. However, whether and how this algorithm is implemented in the brain remains highly debated. In particular, while forward activations of pretrained models reliably map onto the cortical hierarchy of visual processing, it is unknown whether backpropagated gradients exhibit a similar correspondence. […]

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

arXiv:2605.18740v3 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs) still struggle with fine-grained visual understanding, where answers often depend on small but decisive evidence in the full image. We observe a regional-to-global perception gap: the same MLLM answers fine-grained questions more accurately when conditioned on evidence-centered crops than on the corresponding full images, suggesting […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844