arXiv:2604.21480v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as customer-facing agents, yet evaluating their reliability remains challenging due to stochastic, multi-turn interactions. Current evaluation protocols rely on linear Monte Carlo rollouts of complete agent-user conversations to estimate success. However, this approach is computationally inefficient, repeatedly regenerating identical early prefixes, and often […]
PanGuide3D: Cohort-Robust Pancreas Tumor Segmentation via Probabilistic Pancreas Conditioning and a Transformer Bottleneck
arXiv:2604.20981v1 Announce Type: new Abstract: Pancreatic tumor segmentation in contrast-enhanced computed tomography (CT) is clinically important yet technically challenging: lesions are often small, heterogeneous, and easily confused with surrounding soft tissue, and models that perform well on one cohort frequently degrade under cohort shift. Our goal is to improve cross-cohort generalization while keeping the model […]
GeoMind: An Agentic Workflow for Lithology Classification with Reasoned Tool Invocation
arXiv:2604.21501v1 Announce Type: new Abstract: Lithology classification in well logs is a fundamental geoscience data mining task that aims to infer rock types from multi dimensional geophysical sequences. Despite recent progress, existing approaches typically formulate the problem as a static, single-step discriminative mapping. This static paradigm limits evidence-based diagnostic reasoning against geological standards, often yielding […]
Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI
arXiv:2604.20972v1 Announce Type: new Abstract: Content moderation systems are typically evaluated by measuring agreement with human labels. In rule-governed environments this assumption fails: multiple decisions may be logically consistent with the governing policy, and agreement metrics penalize valid decisions while mischaracterizing ambiguity as error – a failure mode we term the Agreement Trap. We formalize […]
Satisfying Rationality Postulates of Structured Argumentation Through Deductive Support — Technical Report
arXiv:2604.21515v1 Announce Type: new Abstract: ASPIC-style structured argumentation frameworks provide a formal basis for reasoning in artificial intelligence by combining internal argument structure with abstract argumentation semantics. A key challenge in these frameworks is ensuring compliance with five critical rationality postulates: closure, direct consistency, indirect consistency, non-interference, and crash-resistance. Recent approaches, including ASPIC$^ominus$ and Deductive […]
Unbiased Prevalence Estimation with Multicalibrated LLMs
arXiv:2604.21549v1 Announce Type: new Abstract: Estimating the prevalence of a category in a population using imperfect measurement devices (diagnostic tests, classifiers, or large language models) is fundamental to science, public health, and online trust and safety. Standard approaches correct for known device error rates but assume these rates remain stable across populations. We show this […]
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering
arXiv:2604.21027v1 Announce Type: new Abstract: Electronic health record (EHR) question answering is often handled by LLM-based pipelines that are costly to deploy and do not explicitly leverage the hierarchical structure of clinical data. Motivated by evidence that medical ontologies and patient trajectories exhibit hyperbolic geometry, we propose HypEHR, a compact Lorentzian model that embeds codes, […]
Probabilistic Verification of Neural Networks via Efficient Probabilistic Hull Generation
arXiv:2604.21556v1 Announce Type: new Abstract: The problem of probabilistic verification of a neural network investigates the probability of satisfying the safe constraints in the output space when the input is given by a probability distribution. It is significant to answer this problem when the input is affected by disturbances often modeled by probabilistic variables. In […]
RELOOP: Recursive Retrieval with Multi-Hop Reasoner and Planners for Heterogeneous QA
arXiv:2510.20505v4 Announce Type: replace-cross Abstract: Retrieval-augmented generation (RAG) remains brittle on multi-step questions and heterogeneous evidence sources, trading accuracy against latency and token/tool budgets. This paper introduces RELOOP, a structure aware framework using Hierarchical Sequence (HSEQ) that (i) linearize documents, tables, and knowledge graphs into a reversible hierarchical sequence with lightweight structural tags, and (ii) […]
CoFEE: Reasoning Control for LLM-Based Feature Discovery
arXiv:2604.21584v1 Announce Type: new Abstract: Feature discovery from complex unstructured data is fundamentally a reasoning problem: it requires identifying abstractions that are predictive of a target outcome while avoiding leakage, proxies, and post-outcome signals. With the introduction of ever-improving Large Language Models (LLMs), our method provides a structured method for addressing this challenge. LLMs are […]
Who Defines Fairness? Target-Based Prompting for Demographic Representation in Generative Models
arXiv:2604.21036v1 Announce Type: new Abstract: Text-to-image(T2I) models like Stable Diffusion and DALL-E have made generative AI widely accessible, yet recent studies reveal that these systems often replicate societal biases, particularly in how they depict demographic groups across professions. Prompts such as ‘doctor’ or ‘CEO’ frequently yield lighter-skinned outputs, while lower-status roles like ‘janitor’ show more […]
GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion
arXiv:2604.21649v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown immense potential in Knowledge Graph Completion (KGC), yet bridging the modality gap between continuous graph embeddings and discrete LLM tokens remains a critical challenge. While recent quantization-based approaches attempt to align these modalities, they typically treat quantization as flat numerical compression, resulting in semantically […]