May 20, 2026 – Page 7 – dijee Pharma Intelligence

The Growing Pains of Frontier Models: When Leaderboards Stop Separating and What to Measure Next

arXiv:2605.18840v1 Announce Type: cross Abstract: Leaderboards rank frontier models on independent axes but do not reveal whether capabilities reinforce or trade off across releases — and at the frontier, this interaction is the more informative signal. We decompose paired SWE-bench and GPQA Diamond scores into a population coupling trend and per-release residual ($h$-field) that diagnoses […]

May 20, 2026

TERGAD: Structure-Aware Text-Enhanced Representations for Graph Anomaly Detection

arXiv:2605.19738v1 Announce Type: cross Abstract: Graph Anomaly Detection (GAD) aims to identify atypical graph entities, such as nodes, edges, or substructures, that deviate significantly from the majority. While existing text-rich approaches typically integrate structural context into the data representation pipeline using raw textual features, they often neglect the structural context of nodes. This limitation hinders […]

May 20, 2026

Transformers Linearly Represent Highly Structured World Models

arXiv:2605.18847v1 Announce Type: cross Abstract: Do transformers, when trained on sequential reasoning traces, build internal models of the underlying task? And if so, does the structure of those internal representations mirror the structure of the domain? We train an 8-layer transformer on Sudoku solving traces and perform a mechanistic analysis of its internal computation. We […]

May 20, 2026

How Far Are We From True Auto-Research?

arXiv:2605.19156v1 Announce Type: new Abstract: Recent auto-research systems can produce complete papers, but feasibility is not the same as quality, and the field still lacks a systematic study of how good agent-generated papers actually are. We introduce ResearchArena, a minimal scaffold that lets off-the-shelf agents (Claude Code using Opus 4.6, Codex using GPT-5.4, and Kimi […]

May 20, 2026

Robust Checkpoint Selection for Multimodal LLMs via Agentic Evaluation and Stability-Aware Ranking

arXiv:2605.18852v1 Announce Type: cross Abstract: Checkpoint selection for multimodal large language models (MLLMs) presents significant challenges when performance differentials are marginal and evaluation signals are prone to noise. Existing methodologies rely heavily on static benchmarks or pointwise scoring, which frequently misalign with in-the-wild usage and lack robust uncertainty estimation, particularly in OCR-heavy scenarios. In this […]

May 20, 2026

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

arXiv:2605.19846v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) have demonstrated remarkable capabilities in general video understanding, yet they often struggle with the fine-grained comprehension crucial for real-world applications requiring nuanced interpretation of human actions and interactions. While some recent human-centric benchmarks evaluate aspects of model behaviour such as fairness/ethics, emotion perception, and broader human-centric metrics, […]

May 20, 2026

Towards Family-Grouped Hierarchical Federated Learning on Sub-5KB Models: A Feasibility Study of Privacy-Preserving ECG Monitoring for Ultra-Resource-Constrained Wearables

arXiv:2605.18862v1 Announce Type: cross Abstract: Cardiovascular disease remains the leading cause of death worldwide, and early detection of arrhythmias through continuous ECG monitoring on wearable devices can prevent life-threatening events. Federated Learning (FL) enables privacy-preserving collaborative training by keeping raw ECG data on device, yet standard FL incurs prohibitive communication overhead and standard deep learning […]

May 20, 2026

A putative model of the gut-muscle axis in aged livestock

arXiv:2605.19171v1 Announce Type: new Abstract: The gut-muscle axis has been proposed to link gut microbiota with skeletal muscle physiology, yet its universality across livestock species remains unclear. Using aged laying hens, a livestock model with a relatively short digestive tract, we examined the gut microbiota, faecal metabolome, and breast-muscle metabolome by integrative multi-omics analyses in […]

May 20, 2026

EVA-0: Test-Time Model Evolution with Only Two Forward Passes per Sample

arXiv:2605.18867v1 Announce Type: cross Abstract: Test-time model evolution offers a promising way for deployed models to improve from unlabeled test-time experience, yet most existing methods depend on backpropagation (BP), which incurs substantial memory overhead and makes them difficult to deploy on edge devices, quantized models, specialized accelerators, or black-box models. In this work, we study […]

May 20, 2026

A Case for Agentic Tuning: From Documentation to Action in PostgreSQL

arXiv:2605.19988v1 Announce Type: cross Abstract: Documentation has long guided computer system tuning by distilling expert knowledge into per-parameter recommendations. Yet such guides capture only what experts conclude, discarding how they reason. This fundamental gap manifests in three concrete deficiencies: documentation grows stale as software evolves, fails under heterogeneous workloads, and ignores inter-parameter dependencies. We propose […]

May 20, 2026

EUPHORIA: Efficient Universal Planning via Hybrid Optimization for Robust Industrial Robotic Assembly

arXiv:2605.18872v1 Announce Type: cross Abstract: Robotic assembly in architectural construction faces a persistent bottleneck: existing planners are either highly specialized, requiring prohibitive retraining for every new geometric design, or operationally inefficient, treating structural sequencing and kinematic motion as disjoint processes. We present EUPHORIA, a unified framework that achieves universal few-shot adaptability and dynamic efficiency through […]

May 20, 2026

Discoverable Agent Knowledge — A Formal Framework for Agentic KG Affordances (Extended Version)

arXiv:2605.19186v1 Announce Type: new Abstract: Two decades ago, the Semantic Web Services community was asked how agents with different ontological commitments could discover, compose, and invoke web services coherently. The response was OWL-S and WSMO: formally grounded capability descriptions specifying what a service could do, what the agent must already know for invocation to be […]

May 20, 2026

Subscribe for Updates