Large Language Models Meet Biomedical Knowledge Graphs for Mechanistically Grounded Therapeutic Prioritization

arXiv:2604.19815v1 Announce Type: new Abstract: Drug repurposing is often framed as a candidate identification task, but existing approaches provide limited guidance for distinguishing biologically plausible candidates from historically well-connected ones. Here we introduce DrugKLM, a hybrid framework that integrates biomedical knowledge graph structure with large language model-based mechanistic reasoning to enable mechanistically grounded therapeutic prioritization. […]

LLM Agents Predict Social Media Reactions but Do Not Outperform Text Classifiers: Benchmarking Simulation Accuracy Using 120K+ Personas of 1511 Humans

arXiv:2604.19787v1 Announce Type: cross Abstract: Social media platforms mediate how billions form opinions and engage with public discourse. As autonomous AI agents increasingly participate in these spaces, understanding their behavioral fidelity becomes critical for platform governance and democratic resilience. Previous work demonstrates that LLM-powered agents can replicate aggregate survey responses, yet few studies test whether […]

Exploiting LLM-as-a-Judge Disposition on Free Text Legal QA via Prompt Optimization

arXiv:2604.20726v1 Announce Type: cross Abstract: This work explores the role of prompt design and judge selection in LLM-as-a-Judge evaluations of free text legal question answering. We examine whether automatic task prompt optimization improves over human-centered design, whether optimization effectiveness varies by judge feedback style, and whether optimized prompts transfer across judges. We systematically address these […]

Emergence Transformer: Dynamical Temporal Attention Matters

arXiv:2604.19816v1 Announce Type: new Abstract: The Transformer, a breakthrough architecture in artificial intelligence, owes its success to the attention mechanism, which utilizes long-range interactions in sequential data, enabling the emergent coherence between large language models (LLMs) and data distributions. However, temporal attention, that is, different forms of long-range interactions in temporal sequences, has rarely been […]

Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech

arXiv:2604.19801v1 Announce Type: cross Abstract: Automatic Speech Recognition (ASR) is increasingly used in applications involving child speech, such as language learning and literacy acquisition. However, the effectiveness of such applications is limited by high ASR error rates. The negative effects can be mitigated by identifying in advance which ASR-outputs are reliable. This work aims to […]

OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model

arXiv:2604.20806v1 Announce Type: cross Abstract: Large vision-language models (LVLMs) have made substantial advances in reasoning tasks at the Olympiad level. Nevertheless, current Olympiad-level multimodal reasoning benchmarks for these models often emphasize single-image analysis and fail to exploit contextual information across multiple images. We present OMIBench, a benchmark designed to evaluate Olympiad-level reasoning when the required […]

Model Capability Assessment and Safeguards for Biological Weaponization

arXiv:2604.19811v1 Announce Type: cross Abstract: AI leaders and safety reports increasingly warn that advances in model reasoning may enable biological misuse, including by low-expertise users, while major labs describe safeguards as expanding but still evolving rather than settled. This study benchmarks ChatGPT 5.2 Auto, Gemini 3 Pro Thinking, Claude Opus 4.5 and Meta’s Muse Spark […]

JTPRO: A Joint Tool-Prompt Reflective Optimization Framework for Language Agents

arXiv:2604.19821v1 Announce Type: new Abstract: Large language model (LLM) agents augmented with external tools often struggle as number of tools grow large and become domain-specific. In such settings, ambiguous tool descriptions and under-specified agent instructions frequently lead to tool mis-selection and incorrect slot/value instantiation. We hypothesize that this is due to two root causes: generic, […]

SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution

arXiv:2604.19825v1 Announce Type: cross Abstract: State-of-the-art code generation frameworks rely on mental simulation, where LLMs internally trace execution to verify correctness. We expose a fundamental limitation: the Mental-Reality Gap — where models hallucinate execution traces and confidently validate buggy code. This gap manifests along two orthogonal dimensions: the Specification Gap (overlooking edge cases during planning) […]

A Survey of Scaling in Large Language Model Reasoning

arXiv:2504.02181v2 Announce Type: replace Abstract: The rapid advancements in large Language models (LLMs) have significantly enhanced their reasoning capabilities, driven by various strategies such as multi-agent collaboration. However, unlike the well-established performance improvements achieved through scaling data and model size, the scaling of reasoning in LLMs is more complex and can even negatively impact reasoning […]

More Is Different: Toward a Theory of Emergence in AI-Native Software Ecosystems

arXiv:2604.19827v1 Announce Type: cross Abstract: Software engineering faces a fundamental challenge: multi-agent AI systems fail in ways that defy explanation by traditional theories. While individual agents perform correctly, their interactions degrade entire ecosystems, revealing a gap in our understanding of software evolution. This paper argues that AI-native software ecosystems must be studied as complex adaptive […]

Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations

arXiv:2604.19837v1 Announce Type: new Abstract: Autonomous agents operating in open-world tasks — where the completion boundary is not given in advance — face denominator blindness: they systematically underestimate the scope of the target space. Forage V1 addressed this through co-evolving evaluation (an independent Evaluator discovers what “complete” means) and method isolation (Evaluator and Planner cannot […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844