arXiv:2604.18237v1 Announce Type: cross Abstract: In large-scale distributed scenarios, increasingly complex tasks demand more intelligent collaboration across networks, requiring the joint extraction of structural representations from data samples. However, conventional task-specific approaches often result in nonstructural embeddings, leading to collapsed variability among data samples within the same class, particularly in classification tasks. To address this […]
ArgBench: Benchmarking LLMs on Computational Argumentation Tasks
arXiv:2604.17366v1 Announce Type: cross Abstract: Argumentation skills are an essential toolkit for large language models (LLMs). These skills are crucial in various use cases, including self-reflection, debating collaboratively for diverse answers, and countering hate speech. In this paper, we create the first benchmark for a standardized evaluation of LLM-based approaches to computational argumentation, encompassing 33 […]
Agentic Frameworks for Reasoning Tasks: An Empirical Study
arXiv:2604.16646v1 Announce Type: new Abstract: Recent advances in agentic frameworks have enabled AI agents to perform complex reasoning and decision-making. However, evidence comparing their reasoning performance, efficiency, and practical suitability remains limited. To address this gap, we empirically evaluate 22 widely used agentic frameworks across three reasoning benchmarks: BBH, GSM8K, and ARC. The frameworks were […]
Polarization and Integration in Global AI Research
arXiv:2604.17602v1 Announce Type: cross Abstract: The AI race amplifies security risks and international tensions. While the US restricts mobility and knowledge flows, challenges regulatory efforts to protect its advantage, China leads initiatives of global governance. Both strategies depend on cross-country relationships in AI innovation; yet, how this system evolves is unclear. Here, we measure the […]
A multimodal and temporal foundation model for virtual patient representations at healthcare system scale
arXiv:2604.18570v1 Announce Type: cross Abstract: Modern medicine generates vast multimodal data across siloed systems, yet no existing model integrates the full breadth and temporal depth of the clinical record into a unified patient representation. We introduce Apollo, a multimodal temporal foundation model trained and evaluated on over three decades of longitudinal hospital records from a […]
NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions
arXiv:2604.16493v1 Announce Type: cross Abstract: Natural Language to SQL (NL2SQL) technology empowers non-expert users to query relational databases without requiring SQL expertise. While large language models (LLMs) have greatly improved NL2SQL algorithms, their rapid development outpaces systematic evaluation, leaving a critical gap in understanding their effectiveness, efficiency, and limitations. To this end, we present NL2SQLBench, […]
From Subsumption to Satisfiability: LLM-Assisted Active Learning for OWL Ontologies
arXiv:2604.16672v1 Announce Type: new Abstract: In active learning, membership queries (MQs) allow a learner to pose questions to a teacher, such as ”Is every apple a fruit?”, to which the teacher responds correctly with yes or no. These MQs can be viewed as subsumption tests with respect to the target ontology. Inspired by the standard […]
Understanding Tool-Augmented Agents for Lean Formalization: A Factorial Analysis
arXiv:2604.16538v1 Announce Type: cross Abstract: Automatic translation of natural language mathematics into faithful Lean 4 code is hindered by the fundamental dissonance between informal set-theoretic intuition and strict formal type theory. This gap often causes LLMs to hallucinate non-existent library definitions, resulting in code that fails to compile or lacks semantic fidelity. In this work, […]
Offline Materials Optimization with CliqueFlowmer
arXiv:2603.06082v4 Announce Type: replace Abstract: Recent advances in deep learning inspired neural network-based approaches to computational materials discovery (CMD). A plethora of problems in this field involve finding materials that optimize a target property. Nevertheless, the increasingly popular generative modeling methods are ineffective at boldly exploring attractive regions of the materials space due to their […]
Towards Trustworthy Depression Estimation via Disentangled Evidential Learning
arXiv:2604.16579v1 Announce Type: cross Abstract: Automated depression estimation is highly vulnerable to signal corruption and ambient noise in real-world deployment. Prevailing deterministic methods produce uncalibrated point estimates, exposing safety-critical clinical systems to the severe risk of overconfident misdiagnoses. To establish a highly resilient and trustworthy assessment paradigm, we propose EviDep, an evidential learning framework that […]
Agentic Risk-Aware Set-Based Engineering Design
arXiv:2604.16687v1 Announce Type: new Abstract: This paper introduces a multi-agent framework guided by Large Language Models (LLMs) to assist in the early stages of engineering design, a phase often characterized by vast parameter spaces and inherent uncertainty. Operating under a human-in-the-loop paradigm and demonstrated on the canonical problem of aerodynamic airfoil design, the framework employs […]
Aligning Backchannel and Dialogue Context Representations via Contrastive LLM Fine-Tuning
arXiv:2604.16622v1 Announce Type: cross Abstract: Backchannels (e.g., `yeah’, `mhm’, and `right’) are short, non-interruptive feedback signals whose lexical form and prosody jointly convey pragmatic meaning. While prior computational research has largely focused on predicting backchannel timing, the relationship between lexico-prosodic form and meaning remains underexplored. We propose a two-stage framework: first, fine-tuning large language models […]