arXiv:2604.01449v2 Announce Type: replace Abstract: Artificial intelligence (AI) systems are increasingly integrated into healthcare and pharmacy workflows, supporting tasks such as medication recommendations, dosage determination, and drug interaction detection. While these systems often demonstrate strong performance under standard evaluation metrics, their reliability in real-world decision-making remains insufficiently understood. In high-risk domains such as medication management, […]
Zero-shot Concept Bottleneck Models
arXiv:2502.09018v2 Announce Type: replace-cross Abstract: Concept bottleneck models (CBMs) are inherently interpretable and intervenable neural network models, which explain their final label prediction by the intermediate prediction of high-level semantic concepts. However, they require target task training to learn input-to-concept and concept-to-label mappings, incurring target dataset collections and training resources. In this paper, we present […]
SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
arXiv:2507.22264v2 Announce Type: replace-cross Abstract: Contrastive Language-Image Pre-training (CLIP)~citepradford2021learning has emerged as a pivotal model in computer vision and multimodal learning, achieving state-of-the-art performance at aligning visual and textual representations through contrastive learning. However, CLIP struggles with potential information misalignment in many image-text datasets and suffers from entangled representation. On the one hand, short captions […]
Attribution Gradients: Incrementally Unfolding Citations for Critical Examination of Attributed AI Answers
arXiv:2510.00361v2 Announce Type: replace-cross Abstract: AI answer engines are a relatively new kind of information search tool: rather than returning a ranked list of documents, they generate an answer to a search question with inline citations to sources. But reading the cited sources is costly, and citation links themselves offer little guidance about what evidence […]
The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment
arXiv:2511.21331v2 Announce Type: replace-cross Abstract: Learning joint representations across multiple modalities remains a central challenge in multimodal machine learning. Prevailing approaches predominantly operate in pairwise settings, aligning two modalities at a time. While some recent methods aim to capture higher-order interactions among multiple modalities, they often overlook or insufficiently preserve pairwise relationships, limiting their effectiveness […]
Autonomous Computational Catalysis Research via Agentic Systems
arXiv:2601.13508v2 Announce Type: replace-cross Abstract: Fully automating the scientific process is a transformative ambition in materials science, yet current artificial intelligence masters isolated workflow fragments. In computational catalysis, a system autonomously navigating the entire research lifecycle from conception to a scientifically meaningful manuscript remains an open challenge. Here we present CatMaster, a catalysis-native multi-agent framework […]
SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond
arXiv:2603.01589v2 Announce Type: replace-cross Abstract: The success of large language models (LLMs) in scientific domains has heightened safety concerns, prompting numerous benchmarks to evaluate their scientific safety. Existing benchmarks often suffer from limited risk coverage and a reliance on subjective evaluation. To address these problems, we introduce SafeSci, a comprehensive framework for safety evaluation and […]
JointFM-0.1: A Foundation Model for Multi-Target Joint Distributional Prediction
arXiv:2603.20266v2 Announce Type: replace-cross Abstract: Despite the rapid advancements in Artificial Intelligence (AI), Stochastic Differential Equations (SDEs) remain the gold-standard formalism for modeling systems under uncertainty. However, applying SDEs in practice is fraught with challenges: modeling risk is high, calibration is often brittle, and high-fidelity simulations are computationally expensive. This technical report introduces JointFM, a […]
Learn2Fold: Structured Origami Generation with World Model Planning
arXiv:2603.29585v2 Announce Type: replace-cross Abstract: The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence. As a result, origami […]
Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus
arXiv:2604.02923v1 Announce Type: cross Abstract: Large Language Models (LLMs), particularly those employing Mixture-of-Experts (MoE) architectures, have achieved remarkable capabilities across diverse natural language processing tasks. However, these models frequently suffer from hallucinations — generating plausible but factually incorrect content — and exhibit systematic biases that are amplified by uneven expert activation during inference. In this […]
Prompt Compression in the Wild: Measuring Latency, Rate Adherence, and Quality for Faster LLM Inference
arXiv:2604.02985v1 Announce Type: cross Abstract: With the wide adoption of language models for IR — and specifically RAG systems — the latency of the underlying LLM becomes a crucial bottleneck, since the long contexts of retrieved passages lead large prompts and therefore, compute increase. Prompt compression, which reduces the size of input prompts while aiming […]
R2-Write: Reflection and Revision for Open-Ended Writing with Deep Reasoning
arXiv:2604.03004v1 Announce Type: cross Abstract: While deep reasoning with long chain-of-thought has dramatically improved large language models in verifiable domains like mathematics, its effectiveness for open-ended tasks such as writing remains unexplored. In this paper, we conduct a systematic investigation revealing that existing mainstream reasoning models achieve limited gains on open-ended writing tasks. Our further […]