Computer Environments Elicit General Agentic Intelligence in LLMs

arXiv:2601.16206v3 Announce Type: replace-cross Abstract: Agentic intelligence in large language models (LLMs) requires not only model intrinsic capabilities but also interactions with external environments. Equipping LLMs with computers now represents a prevailing trend. However, the computer environment’s intrinsic value has not been systematically investigated, particularly its potential to elicit general capabilities. Here we introduce LLM-in-Sandbox, […]

Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge

arXiv:2510.18196v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are commonly used as evaluators in various applications, but the reliability of the outcomes remains a challenge. One such challenge is using LLMs-as-judges for direct assessment, i.e., assigning scores from a specified range without any references. Focusing on summarization, we first show that this challenge stems […]

Evaluating LLM-Based 0-to-1 Software Generation in End-to-End CLI Tool Scenarios

arXiv:2604.06742v1 Announce Type: cross Abstract: Large Language Models (LLMs) are driving a shift towards intent-driven development, where agents build complete software from scratch. However, existing benchmarks fail to assess this 0-to-1 generation capability due to two limitations: reliance on predefined scaffolds that ignore repository structure planning, and rigid white-box unit testing that lacks end-to-end behavioral […]

Illocutionary Explanation Planning for Source-Faithful Explanations in Retrieval-Augmented Language Models

arXiv:2604.06211v1 Announce Type: cross Abstract: Natural language explanations produced by large language models (LLMs) are often persuasive, but not necessarily scrutable: users cannot easily verify whether the claims in an explanation are supported by evidence. In XAI, this motivates a focus on faithfulness and traceability, i.e., the extent to which an explanation’s claims can be […]

Cross-Lingual Transfer and Parameter-Efficient Adaptation in the Turkic Language Family: A Theoretical Framework for Low-Resource Language Models

arXiv:2604.06202v1 Announce Type: cross Abstract: Large language models (LLMs) have transformed natural language processing, yet their capabilities remain uneven across languages. Most multilingual models are trained primarily on high-resource languages, leaving many languages with large speaker populations underrepresented in both training data and evaluation benchmarks. This imbalance is particularly visible in the Turkic language family. […]

The Human Condition as Reflected in Contemporary Large Language Models

arXiv:2604.06206v1 Announce Type: cross Abstract: This study seeks to uncover evidence of a latent structure in evolved human culture as it is refracted through contemporary large language models (LLMs). Drawing on parallel responses from six leading generative models to a prompt which asks directly what their training corpora reveal about human culture and behavior, we […]

Thinking in Graphs with CoMAP: A Shared Visual Workspace for Designing Project-Based Learning

arXiv:2604.06200v1 Announce Type: cross Abstract: Designing project-based learning (PBL) demands managing highly interdependent components, a task that both traditional linear tools and purely conversational AI struggle with. Traditional tools fail to capture the non-linear nature of creative design, while conversational systems lack the persistent, shared context necessary for reflective collaboration. Grounded in theories of distributed […]

SensorPersona: An LLM-Empowered System for Continual Persona Extraction from Longitudinal Mobile Sensor Streams

arXiv:2604.06204v1 Announce Type: cross Abstract: Personalization is essential for Large Language Model (LLM)-based agents to adapt to users’ preferences and improve response quality and task performance. However, most existing approaches infer personas from chat histories, which capture only self-disclosed information rather than users’ everyday behaviors in the physical world, limiting the ability to infer comprehensive […]

Extracting Breast Cancer Phenotypes from Clinical Notes: Comparing LLMs with Classical Ontology Methods

arXiv:2604.06208v1 Announce Type: cross Abstract: A significant amount of data held in Oncology Electronic Medical Records (EMRs) is contained in unstructured provider notes — including but not limited to the chemotherapy (or cancer treatment) outcome, different biomarkers, the tumor’s location, sizes, and growth patterns of a patient. The clinical studies show that the majority of […]

Invisible Influences: Investigating Implicit Intersectional Biases through Persona Engineering in Large Language Models

arXiv:2604.06213v1 Announce Type: cross Abstract: Large Language Models (LLMs) excel at human-like language generation but often embed and amplify implicit, intersectional biases, especially under persona-driven contexts. Existing bias audits rely on static, embedding-based tests (CEAT, I-WEAT, I-SEAT) that quantify absolute association strengths. We show that they have limitations in capturing dynamic shifts when models adopt […]

ToxReason: A Benchmark for Mechanistic Chemical Toxicity Reasoning via Adverse Outcome Pathway

arXiv:2604.06264v1 Announce Type: new Abstract: Recent advances in large language models (LLMs) have enabled molecular reasoning for property prediction. However, toxicity arises from complex biological mechanisms beyond chemical structure, necessitating mechanistic reasoning for reliable prediction. Despite its importance, current benchmarks fail to systematically evaluate this capability. LLMs can generate fluent but biologically unfaithful explanations, making […]

MAT-Cell: A Multi-Agent Tree-Structured Reasoning Framework for Batch-Level Single-Cell Annotation

arXiv:2604.06269v1 Announce Type: new Abstract: Automated cellular reasoning faces a core dichotomy: supervised methods fall into the Reference Trap and fail to generalize to out-of-distribution cell states, while large language models (LLMs), without grounded biological priors, suffer from a Signal-to-Noise Paradox that produces spurious associations. We propose MAT-Cell, a neuro-symbolic reasoning framework that reframes single-cell […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844