May 5, 2026 – Page 4 – dijee Pharma Intelligence

GEASS: Training-Free Caption Steering for Hallucination Mitigation in Vision-Language Models

arXiv:2605.01733v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) excel at grounded reasoning but remain prone to object hallucination. Recent work treats self-generated captions as a uniformly positive resource, yet we find that naively embedding one can degrade rather than help–dropping Qwen2.5-VL-3B accuracy on HallusionBench by nearly 10 points. Two structural properties explain this. First, captions […]

May 5, 2026

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

arXiv:2604.28031v2 Announce Type: replace-cross Abstract: When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherence in multi-turn LLM-assisted scientific ideation. Across 2,146 scored benchmark runs spanning seven models from five providers (including two open-weight), four interaction conditions, and […]

May 5, 2026

Research on Vision-Language Question Answering Models for Industrial Robots

arXiv:2605.01483v1 Announce Type: cross Abstract: A hierarchical cross-modal fusion model is proposed for vision-language question answering (VLQA) in industrial robotics, targeting the challenges of semantic ambiguity, complex environmental layouts, and domain-specific terminology common in modern manufacturing. The framework integrates advanced object detection, multi-scale visual encoding, syntactic parsing, and task-aware semantic attention to unite vision and […]

May 5, 2026

Quality-Aware Exploration Budget Allocation for Cooperative Multi-Agent Reinforcement Learning

arXiv:2605.01865v1 Announce Type: cross Abstract: Cooperative multi-agent reinforcement learning (MARL) requires agents to discover joint strategies in a combinatorially large state-action space, yet effective coordination configurations are exceedingly rare. Intrinsic motivation, which augments task rewards with novelty bonuses, is a popular approach for driving exploration, but its effectiveness hinges on the exploration intensity $beta$, where […]

May 5, 2026

Retrieval-Guided Generation for Safer Histopathology Image Captioning

arXiv:2605.00893v1 Announce Type: cross Abstract: Generative vision-language models can produce fluent medical image captions but remain prone to hallucination, over-specific diagnostic claims, and factual inconsistency-serious issues in pathology. We investigate retrieval-guided generation (RGG) as a safer alternative, where captions are formed by summarizing expert text from visually similar cases rather than generated de novo. On […]

May 5, 2026

EventADL: Open-Box Anomaly Detection and Localization Framework for Events in Cloud-Based Service Systems

arXiv:2605.00936v1 Announce Type: cross Abstract: Anomaly detection and localization (ADL) is critical for maintaining reliability and availability in cloud systems. Recent ADL developments focus on metric and log data, leaving event data unexplored. To address this gap, we propose EventADL, the first open-box event-based ADL framework for cloud-based service systems. To motivate the design of […]

May 5, 2026

Model Organisms Are Leaky: Perplexity Differencing Often Reveals Finetuning Objectives

arXiv:2605.00994v1 Announce Type: cross Abstract: Finetuning can significantly modify the behavior of large language models, including introducing harmful or unsafe behaviors. To study these risks, researchers develop model organisms: models finetuned to exhibit specific known behaviors for controlled experimentation. Identifying these behaviors remains challenging. We show that a simple perplexity-based method can surface finetuning objectives […]

May 5, 2026

Multi-Perspective Transformers in ARC-AGI-2 Challenge

arXiv:2605.01154v1 Announce Type: cross Abstract: ARC-AGI-2 is a benchmark of human-intuitive visual puzzles that measures a machine’s ability to generalize from limited examples, interpret symbolic meaning, and flexibly apply rules in varying contexts. In this paper, we discuss our approach to solving the ARC-AGI-2 puzzles with TinyLM, with additional fine-tuning at test time, including Test-Time-Training […]

May 5, 2026

ABox Abduction for Inconsistent Knowledge Bases under Repair Semantics

arXiv:2605.01341v1 Announce Type: cross Abstract: Given a knowledge base (KB) with a non-entailed fact, the ABox abduction problem asks for possible extensions of the KB that would entail this fact. This problem has many applications, ranging from diagnosis to explainability and repair. ABox abduction has been well-investigated for consistent KBs and classical semantics, but little […]

May 5, 2026

HepScript: A Dual-Use DSL for Human-AI Collaborative Data Analysis Workflows in High-Energy Physics

arXiv:2605.01423v1 Announce Type: cross Abstract: The escalating data scale in High-Energy Physics (HEP) fuels a growing aspiration for higher analytical efficiency. While Large Language Models (LLMs) offer a path toward automation via agentic AI, they struggle with complex scientific workflows that require deep domain knowledge and are tightly coupled to experiment-specific codebases. To address this, […]

May 5, 2026

Model Merging: Foundations and Algorithms

arXiv:2605.01580v1 Announce Type: cross Abstract: Modern deep learning usually treats models as separate artifacts: trained independently, specialized for particular purposes, and replaced when improved versions appear. This thesis studies model merging as an alternative paradigm: combining independently trained neural networks directly in weight space, with little or no optimization and without requiring access to the […]

May 5, 2026

GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory

arXiv:2605.01688v1 Announce Type: cross Abstract: Long-horizon conversational agents rely on memory systems with increasingly sophisticated retrieval mechanisms. However, retrieved fragments are typically fed to the language model as unstructured text, lacking the relational, temporal, and thematic structures essential for complex reasoning. To bridge this reasoning gap, we introduce GRAVITY (textbfGeneration-time textbfRelational textbfAnchoring textbfVia textbfInjected textbfTopological […]

May 5, 2026

Subscribe for Updates