FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles

arXiv:2603.11339v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly applied to financial analysis, yet their ability to audit structured financial statements under explicit accounting principles remains poorly explored. Existing benchmarks primarily evaluate question answering, numerical reasoning, or anomaly detection on synthetically corrupted data, making it unclear whether models can reliably verify or localize […]

Stop Listening to Me! How Multi-turn Conversations Can Degrade Diagnostic Reasoning

arXiv:2603.11394v1 Announce Type: cross Abstract: Patients and clinicians are increasingly using chatbots powered by large language models (LLMs) for healthcare inquiries. While state-of-the-art LLMs exhibit high performance on static diagnostic reasoning benchmarks, their efficacy across multi-turn conversations, which better reflect real-world usage, has been understudied. In this paper, we evaluate 17 LLMs across three clinical […]

Can Theoretical Physics Research Benefit from Language Agents?

arXiv:2506.06214v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are rapidly advancing across diverse domains, yet their application in theoretical physics remains inadequate. While current models show competence in mathematical reasoning and code generation, we identify critical gaps in physical intuition, constraint satisfaction, and reliable reasoning that cannot be addressed through prompting alone. Physics demands […]

Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI

arXiv:2603.11340v1 Announce Type: new Abstract: In this paper, we present a novel black-box online controller that uses only end-to-end measurements over short segments, without internal instrumentation, and hill climbing to maximize goodput, defined as the throughput of requests that satisfy the service-level objective. We provide empirical evidence that this design is well-founded. Using this advance […]

Leveraging Phytolith Research using Artificial Intelligence

arXiv:2603.11476v1 Announce Type: cross Abstract: Phytolith analysis is a crucial tool for reconstructing past vegetation and human activities, but traditional methods are severely limited by labour-intensive, time-consuming manual microscopy. To address this bottleneck, we present Sorometry: a comprehensive end-to-end artificial intelligence pipeline for the high-throughput digitisation, inference, and interpretation of phytoliths. Our workflow processes z-stacked […]

A Foundational Theory of Quantitative Abstraction: Adjunctions, Duality, and Logic for Probabilistic Systems

arXiv:2510.19444v3 Announce Type: replace-cross Abstract: The analysis and control of stochastic dynamical systems rely on probabilistic models such as (continuous-space) Markov decision processes, but large or continuous state spaces make exact analysis intractable and call for principled quantitative abstraction. This work develops a unified theory of such abstraction by integrating category theory, coalgebra, quantitative logic, […]

OrthoEraser: Coupled-Neuron Orthogonal Projection for Concept Erasure

arXiv:2603.11493v1 Announce Type: cross Abstract: Text-to-image (T2I) models face significant safety risks from adversarial induction, yet current concept erasure methods often cause collateral damage to benign attributes when suppressing selected neurons entirely. This occurs because sensitive and benign semantics exhibit non-orthogonal superposition, sharing activation subspaces where their respective vectors are inherently entangled. To address this […]

Human Navigation Behaviour and Brain Dynamics in Real-world Contexts

arXiv:2603.11347v1 Announce Type: new Abstract: The study of navigation behaviour and the associated brain dynamics have been a focus increasing research over the last decades. Coinciding with this has been an increased focus on a more ecological understanding of cognition. Here we review recent research seeking to provide a more naturalistic, ecological understanding of human […]

EReCu: Pseudo-label Evolution Fusion and Refinement with Multi-Cue Learning for Unsupervised Camouflage Detection

arXiv:2603.11521v1 Announce Type: cross Abstract: Unsupervised Camouflaged Object Detection (UCOD) remains a challenging task due to the high intrinsic similarity between target objects and their surroundings, as well as the reliance on noisy pseudo-labels that hinder fine-grained texture learning. While existing refinement strategies aim to alleviate label noise, they often overlook intrinsic perceptual cues, leading […]

TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting

arXiv:2603.11352v1 Announce Type: new Abstract: Transformer-based time series foundation models face a fundamental trade-off in choice of tokenization: point-wise embeddings preserve temporal fidelity but scale poorly with sequence length, whereas fixed-length patching improves efficiency by imposing uniform boundaries that may disrupt natural transitions and blur informative local dynamics. In order to address these limitations, we […]

Performance Evaluation of Open-Source Large Language Models for Assisting Pathology Report Writing in Japanese

arXiv:2603.11597v1 Announce Type: cross Abstract: The performance of large language models (LLMs) for supporting pathology report writing in Japanese remains unexplored. We evaluated seven open-source LLMs from three perspectives: (A) generation and information extraction of pathology diagnosis text following predefined formats, (B) correction of typographical errors in Japanese pathology reports, and (C) subjective evaluation of […]

Evaluating LLM-Based Grant Proposal Review via Structured Perturbations

arXiv:2603.08281v2 Announce Type: replace-cross Abstract: As AI-assisted grant proposals outpace manual review capacity in a kind of “Malthusian trap” for the research ecosystem, this paper investigates the capabilities and limitations of LLM-based grant reviewing for high-stakes evaluation. Using six EPSRC proposals, we develop a perturbation-based framework probing LLM sensitivity across six quality axes: funding, timeline, […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844