May 4, 2026 – Page 24 – dijee Pharma Intelligence

Controllable Logical Hypothesis Generation for Abductive Reasoning in Knowledge Graphs

arXiv:2505.20948v3 Announce Type: replace Abstract: Abductive reasoning in knowledge graphs aims to generate plausible logical hypotheses from observed entities, with broad applications in areas such as clinical diagnosis and scientific discovery. However, due to a lack of controllability, a single observation may yield numerous plausible but redundant or irrelevant hypotheses on large-scale knowledge graphs. To […]

May 4, 2026

Training-Free Time Series Classification via In-Context Reasoning with LLM Agents

arXiv:2510.05950v2 Announce Type: replace Abstract: Time series classification (TSC) spans diverse application scenarios, yet labeled data are often scarce, making task-specific training costly and inflexible. Recent reasoning-oriented large language models (LLMs) show promise in understanding temporal patterns, but purely zero-shot usage remains suboptimal. We propose FETA, a multi-agent framework for training-free TSC via exemplar-based in-context […]

May 4, 2026

Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics

arXiv:2512.01020v2 Announce Type: replace Abstract: Evaluating the quality of LLM-generated reasoning traces in expert domains (e.g., law) is essential for ensuring credibility and explainability, yet remains challenging due to the inherent complexity of such reasoning tasks. We introduce LEGIT (LEGal Issue Trees), a novel large-scale (24K instances) expert-level legal reasoning dataset with an emphasis on […]

May 4, 2026

Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants

arXiv:2603.03565v2 Announce Type: replace Abstract: Conversational shopping assistants (CSAs) represent a compelling application of agentic AI, but moving from prototype to production reveals two underexplored challenges: how to evaluate multi-turn interactions and how to optimize tightly coupled multi-agent systems. Grocery shopping further amplifies these difficulties, as user requests are often underspecified, highly preference-sensitive, and constrained […]

May 4, 2026

Language models recognize dropout and Gaussian noise applied to their activations

arXiv:2604.17465v2 Announce Type: replace Abstract: We provide evidence that language models can detect, localize and, to a certain degree, verbalize the difference between perturbations applied to their activations. More precisely, we either (a) mask activations, simulating dropout, or (b) add Gaussian noise to them, at a target sentence. We then ask a multiple-choice question such […]

May 4, 2026

D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery

arXiv:2604.27977v2 Announce Type: replace Abstract: Despite recent progress in language models and agents for scientific data-driven discovery, further advancing their capabilities is held back by the absence of verifiable environments representing real-world scientific tasks. To fill this gap, we introduce D3-Gym, the first automatically constructed dataset with verifiable environments for scientific Data-Driven Discovery. D3-Gym comprises […]

May 4, 2026

Last-Iterate Convergence of General Parameterized Policies in Constrained MDPs

arXiv:2408.11513v2 Announce Type: replace-cross Abstract: This paper focuses on learning a Constrained Markov Decision Process (CMDP) via general parameterized policies. We propose a Primal-Dual based Regularized Accelerated Natural Policy Gradient (PDR-ANPG) algorithm that uses entropy and quadratic regularizers to reach this goal. For parameterized policy classes with a transferred compatibility approximation error, $epsilon_mathrmbias$, PDR-ANPG achieves […]

May 4, 2026

Exploring the System 1 Thinking Capability of Large Reasoning Models

arXiv:2504.10368v4 Announce Type: replace-cross Abstract: This paper explores the system 1 thinking capability of Large Reasoning Models (LRMs), the intuitive ability to respond efficiently with minimal token usage. While existing LRMs rely on long-chain reasoning and excel at complex tasks, their system 1 thinking ability remains largely underexplored. This capability is essential as it reflects […]

May 4, 2026

ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering

arXiv:2505.23723v2 Announce Type: replace-cross Abstract: The emergence of large language model (LLM)-based agents has significantly advanced the development of autonomous machine learning (ML) engineering. However, the dominant prompt-based paradigm exhibits limitations: smaller models lack the capacity to learn from execution trajectories for generalization, while large proprietary models incur high computational overhead, restricting accessibility and scalability. […]

May 4, 2026

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

arXiv:2507.01955v3 Announce Type: replace-cross Abstract: Multimodal foundation models (MFMs), such as GPT-4o, have recently made remarkable progress. However, their detailed visual understanding beyond question answering remains unclear. In this paper, we benchmark popular MFMs (GPT-4o, o4-mini, Gemini 1.5 Pro and Gemini 2.0 Flash, Claude 3.5 Sonnet, Qwen2-VL, Llama 3.2) on standard computer vision tasks (semantic […]

May 4, 2026

InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information

arXiv:2508.07630v2 Announce Type: replace-cross Abstract: We introduce InterChart, a diagnostic benchmark that evaluates how well vision-language models (VLMs) reason across multiple related charts, a task central to real-world applications such as scientific reporting, financial analysis, and public policy dashboards. Unlike prior benchmarks focusing on isolated, visually uniform charts, InterChart challenges models with diverse question types […]

May 4, 2026

LLM DNA: Tracing Model Evolution via Functional Representations

arXiv:2509.24496v3 Announce Type: replace-cross Abstract: The explosive growth of large language models (LLMs) has created a vast but opaque landscape: millions of models exist, yet their evolutionary relationships through fine-tuning, distillation, or adaptation are often undocumented or unclear, complicating LLM management. Existing methods are limited by task specificity, fixed model sets, or strict assumptions about […]

May 4, 2026

Subscribe for Updates