An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress

arXiv:2604.24076v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly deployed in high-stakes and operational settings, evaluation strategies based solely on aggregate accuracy are often insucient to characterize system reliability. This study proposes a thermodynamic inspired modeling framework for analyzing the stability of LLM outputs under conditions of uncertainty and perturbation. The framework […]

Explanation Quality Assessment as Ranking with Listwise Rewards

arXiv:2604.24176v1 Announce Type: new Abstract: We reformulate explanation quality assessment as a ranking problem rather than a generation problem. Instead of optimizing models to produce a single “best” explanation token-by-token, we train reward models to discriminate among multiple candidate explanations and learn their relative quality. Concretely, we construct per-instance candidate sets with graded quality levels […]

MIMIC: A Generative Multimodal Foundation Model for Biomolecules

arXiv:2604.24506v1 Announce Type: new Abstract: Biological function emerges from coupled constraints across sequence, structure, regulation, evolution, and cellular context, yet most foundation models in biology are trained within one modality or for a fixed forward task. We present MIMIC, a generative multimodal foundation model trained on our newly curated and aligned dataset, LORE, linking nucleic […]

NeSyCat: A Monad-Based Categorical Semantics of the Neurosymbolic ULLER Framework

arXiv:2604.24612v1 Announce Type: new Abstract: ULLER (Unified Language for LEarning and Reasoning) offers a unified first-order logic (FOL) syntax, enabling its knowledge bases to be used directly across a wide range of neurosymbolic systems. The original specification endows this syntax with three pairwise independent semantics: classical, fuzzy, and probabilistic, each accompanied by dedicated semantic rules. […]

Case-Specific Rubrics for Clinical AI Evaluation: Methodology, Validation, and LLM-Clinician Agreement Across 823 Encounters

arXiv:2604.24710v1 Announce Type: new Abstract: Objective. Clinical AI documentation systems require evaluation methodologies that are clinically valid, economically viable, and sensitive to iterative changes. Methods requiring expert review per scoring instance are too slow and expensive for safe, iterative deployment. We present a case-specific, clinician-authored rubric methodology for clinical AI evaluation and examine whether LLM-generated […]

RedParrot: Accelerating NL-to-DSL for Business Analytics via Query Semantic Caching

arXiv:2604.22758v1 Announce Type: cross Abstract: Recently, at Xiaohongshu, the rapid expansion of e-commerce and advertising demands real-time business analytics with high accuracy and low latency. To meet this demand, systems typically rely on converting natural language (NL) queries into Domain-Specific Languages (DSLs) to ensure semantic consistency, validation, and portability. However, existing multi-stage LLM pipelines for […]

Learning in Blocks: A Multi Agent Debate Assisted Personalized Adaptive Learning Framework for Language Learning

arXiv:2604.22770v1 Announce Type: cross Abstract: Most digital language learning curricula rely on discrete-item quizzes that test recall rather than applied conversational proficiency. When progression is driven by quiz performance, learners can advance despite persistent gaps in using grammar and vocabulary during interaction. Recent work on LLM-based judging suggests a path toward scoring open-ended conversations, but […]

Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing

arXiv:2604.22782v1 Announce Type: cross Abstract: Serving transformer language models with high throughput requires caching Key-Values (KVs) to avoid redundant computation during autoregressive generation. The memory footprint of KV caching is significant and heavily impacts serving costs. This work proposes to lessen these memory requirements. While recent work has largely addressed KV cache reduction via compression […]

Representation Homogeneity and Systemic Instability in AI-Dominated Financial Markets: A Structural Approach

arXiv:2604.22818v1 Announce Type: cross Abstract: This paper investigates how similarity in the informational representation of market states among Artificial Intelligence (AI) trading agents can generate systemic instability in financial markets. We construct a structural multi-agent market model calibrated using high-frequency microstructural moments. AI agents are modeled through a two-layer decision architecture consisting of a nonlinear […]

Failure-Centered Runtime Evaluation for Deployed Trilingual Public-Space Agents

arXiv:2604.23990v1 Announce Type: new Abstract: This paper presents PSA-Eval, a failure-centered runtime evaluation framework for deployed trilingual public-space agents. The central claim is that, when the evaluation object shifts from a static input-output mapping to a runtime system, the basic unit of analysis should shift from score to failure. PSA-Eval extends the conventional chain Question […]

A2DEPT: Large Language Model-Driven Automated Algorithm Design via Evolutionary Program Trees

arXiv:2604.24043v1 Announce Type: new Abstract: Designing heuristics for combinatorial optimization problems (COPs) is a fundamental yet challenging task that traditionally requires extensive domain expertise. Recently, Large Language Model (LLM)-based Automated Heuristic Design (AHD) has shown promise in autonomously generating heuristic components with minimal human intervention. However, most existing LLM-based AHD methods enforce fixed algorithmic templates […]

SemML 2.0: Synthesizing Controllers for LTL

arXiv:2604.24102v1 Announce Type: new Abstract: Synthesizing a reactive system from specifications given in linear temporal logic (LTL) is a classical problem, finding its applications in safety-critical systems design. These systems are typically represented using either Mealy machines or AIGER circuits. We present the second version of SemML, which outperforms all state-of-the-art tools for finding either […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844