June 9, 2026 – Page 7 – dijee Pharma Intelligence

Cherry-pick Override: Unsafe Directional Commitment in LLM Judges under Mixed Evidence

arXiv:2606.07834v1 Announce Type: cross Abstract: LLM judges increasingly turn verdicts into system commitments. Under mixed evidence (claims with both supporting and refuting sources) this is unsafe: when the schema exposes CONFLICTING as the authorized non-directional verdict, returning SUPPORTS/REFUTES is an unauthorized directional commitment, a failure we name Cherry-pick Override (CCO). We define CCO under an […]

June 9, 2026

MBABench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance

arXiv:2605.22664v2 Announce Type: replace Abstract: LLM agents are increasingly expected to carry out end-to-end workflows, producing complete artifacts from high-level user instructions. To meet enterprise needs, frontier AI labs have developed agents that can construct entire spreadsheets from scratch. This is especially relevant in finance, where core workflows such as financial modeling, forecasting, and scenario […]

June 9, 2026

The Cross-Architecture Substrate: A Domain-Transcendent, Calibration-Surviving Geometric Invariant of Modern Vision Encoders

arXiv:2606.07882v1 Announce Type: cross Abstract: Different vision neural networks — trained to classify, contrast, reconstruct, or match images to text — should have correspondingly different internal representations. We report that they do not. After training, the top sixteen principal directions of variation inside thirteen modern vision encoders converge to the same sixteen-dimensional geometric object. We […]

June 9, 2026

Contract2Tool: Learning Preconditions and Effects for Reliable Tool-Augmented LLM Agents

arXiv:2606.07904v1 Announce Type: new Abstract: Tool-augmented large language model agents increasingly rely on external APIs, but standard tool schemas describe how to call a tool, not when the tool is causally appropriate or what task state it produces. Causal tool filtering addresses this gap by using lightweight contracts that specify each tool’s preconditions, effects, risk […]

June 9, 2026

Minibatch Selection via Partition Matroid Constrained Gradient Matching

arXiv:2606.07954v1 Announce Type: cross Abstract: Training large language models (LLMs) on heterogeneous data requires selecting minibatches that balance convergence speed with coverage across domains. Existing methods either select samples independently within each domain or rely on computationally expensive proxy models to learn continuous domain weights. We propose PartitionSel, a cross-domain minibatch selection approach that maximizes […]

June 9, 2026

Graph-to-SFILES: Control structure prediction from process topologies using generative artificial intelligence

arXiv:2412.00508v2 Announce Type: replace-cross Abstract: Control structure design is an important but tedious step in P&ID development. Generative artificial intelligence (AI) promises to reduce P&ID development time by supporting engineers. Previous research on generative AI in chemical process design mainly represented processes by sequences. However, graphs offer a promising alternative because of their permutation invariance. […]

June 9, 2026

Rewrite to Translate, Translate to Reward: Reinforcement Learning for Source Rewriting in Machine Translation

arXiv:2606.08011v1 Announce Type: cross Abstract: Although directly prompting off-the-shelf Large Language Models (LLMs) to generate meaning-preserving source rewrites can effectively enhance Machine Translation (MT) quality, doing so requires manually tuning prompts for different MT models. In this work, we propose RLSR (Reinforcement Learning for Source Rewriting), a novel RL-based framework for training a source rewriting […]

June 9, 2026

MemToolAgent overview with a simple restaurant booking scenario where the agent retrieves similar memories, receives feedback on an invalid time format, and generates a reflection to update its memory

arXiv:2606.07909v1 Announce Type: new Abstract: Modern large language model (LLM) agents can use external tools to help users solve complex tasks. However, for problems that require learning from long-term historical events or from previous agent-environment interactions, LLM agents are required to use memory mechanisms to store and retrieve experiences. While sophisticated memory systems exist for […]

June 9, 2026

GIScholarBench: Benchmarking LLM Overconfidence in GIS Research

arXiv:2606.08036v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used in academic research workflows, but scholarly tasks require high factual precision and therefore expose a key weakness: overconfidence. Here, overconfidence is defined behaviorally as the tendency to produce confident, assertive, and well-formatted outputs even when the underlying knowledge is incomplete or unverifiable, rather […]

June 9, 2026

TAO: Tolerance-Aware Optimistic Verification for Floating-Point Neural Networks

arXiv:2510.16028v4 Announce Type: replace-cross Abstract: Neural networks increasingly run on hardware outside the user’s control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs […]

June 9, 2026

Fast LLM-Based Semantic Filtering: From a Unified Framework to an Adaptive Two-Phase Method

arXiv:2606.08090v1 Announce Type: cross Abstract: Evaluating a natural-language yes/no predicate over a document corpus under an accuracy target – the semantic filter – is a cornerstone of LLM-based data processing. Calling the LLM on every document (the oracle) is prohibitive, so cascades pair the oracle with a fast proxy. As deployed today, they leave four […]

June 9, 2026

EditSR: Enhancing Neural Symbolic Regression via Edit-based Rectification

arXiv:2606.07915v1 Announce Type: new Abstract: Neural symbolic regression models improve inference efficiency by shifting structural search to pretraining, but their one-pass autoregressive decoding is prone to error accumulation, which may lead to generating structurally incorrect expressions, especially in complex expression generation scenarios. Existing rectification strategies can alleviate this issue, but they often depend on restarting […]

June 9, 2026

Subscribe for Updates