arXiv:2510.17004v2 Announce Type: replace-cross Abstract: Purpose: To develop and evaluate a multi-agent framework (ReclAIm) for automated monitoring, detection, and correction of performance decline in medical image classification models. Materials and Methods: ReclAIm is a large language model-based multi-agent system that operates through natural language interaction. A master agent coordinating three task-specific agents performed performance evaluation […]
Didact: A Cross-Domain Capability Discovery System for Defence
arXiv:2606.06942v1 Announce Type: cross Abstract: Policymakers in defence and defence-aligned sectors must monitor rapidly evolving research alongside sector priorities relevant to operational and strategic needs. In practice, these sources are fragmented across heterogeneous formats, disjoint repositories, and siloed update streams, making capability discovery slow and difficult to audit. We present Didact, a prototype that integrates […]
Deterministic access to global viral sequence data enables robust agentic scientific discovery
arXiv:2606.06749v1 Announce Type: new Abstract: Public viral genome resources such as the National Center for Biotechnology Information (NCBI) Virus database are central to outbreak response, evolutionary analysis, vaccine design, and genomic surveillance. Yet many high-value retrieval workflows remain optimized for interactive use rather than deterministic, reproducible programmatic interfaces. This creates a challenge for Large Language […]
OpenHalDet: A Unified Benchmark for Hallucination Detection across Diverse Generation Scenarios
arXiv:2606.06959v1 Announce Type: cross Abstract: Hallucination detection is essential for the reliable deployment of large language models (LLMs). However, existing evaluations face two core challenges: inconsistent inference configuration and evaluation, and limited coverage of downstream domains and tasks. Consequently, reported detector performance is often difficult to compare, reproduce, and generalize beyond specific experimental settings. We […]
Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models
arXiv:2602.02600v3 Announce Type: replace-cross Abstract: Diffusion language models (DLMs) have recently emerged as a competitive alternative to autoregressive (AR) models, offering parallel decoding, competitive generation quality, and initial evidence of improved jailbreak robustness. Despite this progress, the role of sampling mechanisms in shaping refusal behavior remains poorly understood. To address this gap, we present a […]
A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders
arXiv:2606.07007v1 Announce Type: cross Abstract: We propose a unified mathematical framework for a geometric understanding of concept learning and neuron interpretation in sparse autoencoders (SAEs). While SAEs improve interpretability of neural networks by learning sparse feature representations, a principled definition of ”concept” and ”learning” remains unclear. We formalize concepts as sets of data points and […]
AdMem: Advanced Memory for Task-solving Agents
arXiv:2606.06787v1 Announce Type: new Abstract: Large Language Models (LLMs) show promise as tool-using agents but remain limited in long-horizon tasks that require remembering, organizing, and reusing knowledge. Prior memory approaches aim to resolve the situation, but mainly focus on storing factual information. Recent work on procedural memory improves task reuse, yet often reduces to replaying […]
STREAM: Stochastic Riemannian Flow Matching with Anisotropic Decoder for Digital Histopathology Image Generation
arXiv:2606.07036v1 Announce Type: cross Abstract: Synthetic histopathology image generation addresses critical challenges in computational pathology, including patient privacy and the growing need for large-scale training data for foundation models. Latent diffusion models have dominated the image generation domain, with recent works emphasizing that the choice of latent space is critical to the quality of generated […]
$mathrmECI_mathrmsem$: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives
arXiv:2603.20990v3 Announce Type: replace-cross Abstract: Hard-negative source selection for dense retrieval is usually decided only after fine-tuning and downstream evaluation. We propose $mathrmECI_mathrmsem$, a semantic residual variant of Effective Contrastive Information (ECI) that ranks candidate negative sources using frozen target-encoder embeddings. $mathrmECI_mathrmsem$ is training-free, not label-free: each scored example requires a query, a labeled positive, […]
Evidence-Based Intelligent Diagnostic and Therapeutic Visualization System with Large Language Models: Multi-Turn Interaction and Multimodal Treatment Plan Generation
arXiv:2606.06869v1 Announce Type: new Abstract: Aim: Existing AI-assisted traditional Chinese medicine diagnostic tools suffer from opaque reasoning processes, passive interaction, and limited treatment plan presentation. This study proposes a knowledge-enhanced visual diagnostic system to improve the transparency and interpretability of syndrome differentiation and treatment. Methods: The system is built upon a Neo4j knowledge graph comprising […]
OffQ: Taming Structured Outliers in LLM Quantization by Offsetting
arXiv:2606.07116v1 Announce Type: cross Abstract: Low-bit quantization has been widely adopted to accelerate the inference of large language models (LLMs) by significantly reducing computational cost and memory usage. However, activation outliers pose a major challenge to effective quantization, often leading to notable performance degradation. In this paper, we introduce OffQ, a method designed to mitigate […]
COF26: A new on-top functional for multiconfiguration pair-density functional theory
arXiv:2605.06215v2 Announce Type: replace-cross Abstract: Multiconfiguration pair-density functional theory (MC-PDFT) provides an efficient and accurate framework for computing electronic energies in strongly correlated molecular systems, with the quality of the on-top functional being a key determinant of its predictive accuracy. Here, we introduce MMCDDB26, a rigorously curated benchmark database comprising 76 datasets and 1,495 reactions. […]