arXiv:2512.16310v2 Announce Type: replace-cross Abstract: Driven by Large Language Models, the single-agent, multi-tool architecture has become a popular paradigm for autonomous agents. However, this architecture introduces a severe privacy risk, which we term Tools Orchestration Privacy Risk (TOP-R): an agent, to achieve a benign user goal, autonomously aggregates non-sensitive fragments from multiple tools and synthesizes […]
Neural Signals Generate Clinical Notes in the Wild
arXiv:2601.22197v2 Announce Type: replace-cross Abstract: Generating clinical reports that summarize abnormal patterns, diagnostic findings, and clinical interpretations from long-term EEG recordings remains labor-intensive. We curate a large-scale clinical EEG dataset with $9,922$ reports paired with approximately $11,000$ hours of EEG recordings from $9,048$ patients. We therefore develop CELM, the first clinical EEG-to-Language foundation model capable […]
IntelliAsk: Learning to Ask High-Quality Research Questions via RLVR
arXiv:2602.15849v2 Announce Type: replace-cross Abstract: Peer review relies on substantive, evidence-based questions, yet current LLMs generate surface-level queries that perform worse than human reviewer questions in expert evaluation. To address this gap, we curate a high-quality dataset of reviewer questions from OpenReview and conduct a human preference study where expert annotators evaluate question-paper pairs across […]
Theory of Code Space: Do Code Agents Understand Software Architecture?
arXiv:2603.00601v3 Announce Type: replace-cross Abstract: AI code agents excel at isolated tasks yet struggle with multi-file software engineering requiring architectural understanding. We introduce Theory of Code Space (ToCS), a benchmark that evaluates whether agents can construct, maintain, and update coherent architectural beliefs during codebase exploration. Agents explore procedurally generated codebases under partial observability — opening […]
Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations
arXiv:2603.03332v2 Announce Type: replace-cross Abstract: Chain-of-Thought (CoT) prompting has emerged as a foundational technique for eliciting reasoning from Large Language Models (LLMs), yet the robustness of this approach to corruptions in intermediate reasoning steps remains poorly understood. This paper presents a comprehensive empirical evaluation of LLM robustness to a structured taxonomy of 5 CoT perturbation […]
Abductive Reasoning with Syllogistic Forms in Large Language Models
arXiv:2603.06428v1 Announce Type: cross Abstract: Research in AI using Large-Language Models (LLMs) is rapidly evolving, and the comparison of their performance with human reasoning has become a key concern. Prior studies have indicated that LLMs and humans share similar biases, such as dismissing logically valid inferences that contradict common beliefs. However, criticizing LLMs for these […]
PONTE: Personalized Orchestration for Natural Language Trustworthy Explanations
arXiv:2603.06485v1 Announce Type: cross Abstract: Explainable Artificial Intelligence (XAI) seeks to enhance the transparency and accountability of machine learning systems, yet most methods follow a one-size-fits-all paradigm that neglects user differences in expertise, goals, and cognitive needs. Although Large Language Models can translate technical explanations into natural language, they introduce challenges related to faithfulness and […]
RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering
arXiv:2603.06542v1 Announce Type: cross Abstract: Conversational generative AI is rapidly entering healthcare, where general-purpose models must integrate heterogeneous patient signals and support diverse interaction styles while producing clinically meaningful outputs. In respiratory care, non-invasive audio, such as recordings captured via mobile microphones, enables scalable screening and longitudinal monitoring, but the heterogeneity challenge is particularly acute: […]
BInD: Bond and Interaction-generating Diffusion Model for Multi-objective Structure-based Drug Design
arXiv:2405.16861v3 Announce Type: replace Abstract: Recent remarkable advancements in geometric deep generative models, coupled with accumulated structural data, enable structure-based drug design (SBDD) using only target protein information. However, existing models often struggle to balance multiple objectives, excelling only in specific tasks. BInD, a diffusion model with knowledge-based guidance, is introduced to address this limitation […]
VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
arXiv:2506.06727v4 Announce Type: replace Abstract: Large Multimodal Models have achieved remarkable progress in integrating vision and language, enabling strong performance across perception, reasoning, and domain-specific tasks. However, their capacity to reason over multiple, visually similar inputs remains insufficiently explored. Such fine-grained comparative reasoning is central to real-world tasks, especially in mathematics and education, where learners […]
A Multi-Agent System Enables Versatile Information Extraction from the Chemical Literature
arXiv:2507.20230v3 Announce Type: replace Abstract: To fully expedite AI-powered chemical research, high-quality chemical databases are the foundation. Automatic extraction of chemical information from the literature is essential for constructing reaction databases, but it is currently limited by the multimodality and style variability of chemical information. In this work, we developed a multimodal large language model […]
Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning
arXiv:2601.15160v3 Announce Type: replace Abstract: Large language models have achieved near-expert performance in structured reasoning domains like mathematics and programming, yet their ability to perform compositional multi-hop reasoning in specialized scientific fields remains limited. We propose a bottom-up learning paradigm in which models are grounded in axiomatic domain facts and compose them to solve complex, […]