May 22, 2026 – Page 19 – dijee Pharma Intelligence

Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications

arXiv:2603.15676v2 Announce Type: replace-cross Abstract: LLM applications are AI systems whose nondeterministic outputs and evolving model behavior make traditional testing insufficient for release governance. We present an automated self-testing framework that introduces quality gates with evidence-based release decisions (PROMOTE/HOLD/ROLLBACK) across five empirically grounded dimensions: task success rate, research context preservation, P95 latency, safety pass rate, […]

May 22, 2026

Prototype-Grounded Concept Models for Verifiable Concept Alignment

arXiv:2604.16076v2 Announce Type: replace-cross Abstract: Concept Bottleneck Models (CBMs) aim to improve interpretability in Deep Learning by structuring predictions through human-understandable concepts, but they provide no way to verify whether learned concepts align with the human’s intended meaning, hurting interpretability. We introduce Prototype-Grounded Concept Models (PGCMs), which ground concepts in learned visual prototypes: image parts […]

May 22, 2026

Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

arXiv:2605.07985v2 Announce Type: replace-cross Abstract: Selecting the optimal LLM inference configuration requires evaluation across hardware, serving engines, attention backends, and model architectures, since no single choice performs best across all workloads. Profile-based simulators are the standard tool, yet they hardcode their operation set to a specific configuration and re-profile every operation from scratch, making exploration […]

May 22, 2026

Temporal Aware Pruning for Efficient Diffusion-based Video Generation

arXiv:2605.17837v2 Announce Type: replace-cross Abstract: Video diffusion models have recently enabled high-quality video generation with ViT-based architectures, but remain computationally intensive because generation requires attention computation over long spatiotemporal sequences. Token pruning has proven effective for ViTs and VLMs. However, most prior pruning methods are attention-based and operate per frame, failing to ensure the vital […]

May 22, 2026

Diffusion-guided Generalizable Enhancer for Urban Scene Reconstruction

arXiv:2605.22420v1 Announce Type: cross Abstract: Urban scene reconstruction from real-world observations has emerged as a powerful tool for self-driving development and testing. While current neural rendering approaches achieve high-fidelity rendering along the recorded trajectories, their quality degrades significantly under large viewpoint shifts, limiting the applicability for closed-loop simulation. Recent works have shown promising results in […]

May 22, 2026

From Correlation to Cause: A Five-Stage Methodology for Feature Analysis in Transformer Language Models

arXiv:2605.22462v1 Announce Type: cross Abstract: We propose a five-stage methodology for causal feature analysis in transformer language models (probe design, feature extraction, causal validation, robustness testing, and deployment integration) and demonstrate it end-to-end on GPT-2 small performing the Indirect Object Identification (IOI) task. Activation patching recovers the canonical IOI circuit (layer-9 head 9 alone gives […]

May 22, 2026

Case-Aware Medical Image Classification with Multimodal Knowledge Graphs and Reliability-Guided Refinement

arXiv:2605.22547v1 Announce Type: cross Abstract: Deep learning has brought significant progress to medical image classification, yet most existing methods still rely on isolated visual evidence and cannot effectively leverage similar cases or external knowledge. In clinical practice, diagnosis is typically supported by historical similar cases and their associated symptoms. To simulate this diagnostic process, we […]

May 22, 2026

Healthcare LLM Benchmarks Are Only as Good as Their Explicit Assumptions

arXiv:2605.22612v1 Announce Type: cross Abstract: Benchmarks are necessary for healthcare evaluation, but are not sufficient for predicting deployment performance. Our position is that the evaluation–deployment gap arises not because of poorly designed benchmarks, but from implicit assumptions about how users interact with models that cannot be surfaced from benchmarks alone. To make this precise, we […]

May 22, 2026

Federated Single-Agent Robotics: Multi-Robot Coordination Without Intra-Robot Multi-Agent Fragmentation

arXiv:2604.11028v2 Announce Type: replace-cross Abstract: As embodied robots move toward fleet-scale operation, multi-robot coordination is becoming a central systems challenge. Existing approaches often treat this as motivation for increasing internal multi-agent decomposition within each robot. We argue for a different principle: multi-robot coordination does not require intra-robot multi-agent fragmentation. Each robot should remain a single […]

May 22, 2026

MindLoom: Composing Thought Modes for Frontier-Level Reasoning Data Synthesis

arXiv:2605.21630v1 Announce Type: new Abstract: Although LLMs have made substantial progress in reasoning, systematically producing frontier-level reasoning data remains difficult. Existing synthesis methods often have limited visibility into the structural factors that govern problem difficulty, which can result in narrow diversity and unstable difficulty control. In this work, we view the difficulty of a reasoning […]

May 22, 2026

TO-Agents: A Multi-Agent AI Pipeline for Preference-Guided Topology Optimization

arXiv:2605.21622v1 Announce Type: new Abstract: Topology optimization can generate efficient structures, but designers often must manually translate qualitative intent, such as desired visual style, product experience, or manufacturability into solver settings that are not directly tied to those preferences. We present TO-Agents, a multi-agent AI framework that connects natural-language design intent with iterative topology optimization. […]

May 22, 2026

Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs

arXiv:2605.21602v1 Announce Type: new Abstract: Many safety and alignment failures of large language models (LLMs) occur due to out-of-distribution (OOD) situations: unusual prompt or response patterns that are unforeseen by model developers. We systematically study whether LLM monitoring pipelines can detect these OOD alignment failures by introducing a benchmark called Misalignment Out Of Distribution (MOOD). […]

May 22, 2026

Subscribe for Updates