arXiv:2512.21135v1 Announce Type: cross Abstract: Text-guided medical segmentation enhances segmentation accuracy by utilizing clinical reports as auxiliary information. However, existing methods typically rely on unaligned image and text encoders, which necessitate complex interaction modules for multimodal fusion. While CLIP provides a pre-aligned multimodal feature space, its direct application to medical imaging is limited by three […]
Learning Evolving Latent Strategies for Multi-Agent Language Systems without Model Fine-Tuning
arXiv:2512.20629v1 Announce Type: cross Abstract: This study proposes a multi-agent language framework that enables continual strategy evolution without fine-tuning the language model’s parameters. The core idea is to liberate the latent vectors of abstract concepts from traditional static semantic representations, allowing them to be continuously updated through environmental interaction and reinforcement feedback. We construct a […]
LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation
arXiv:2512.21243v1 Announce Type: cross Abstract: Methods that use Large Language Models (LLM) as planners for embodied instruction following tasks have become widespread. To successfully complete tasks, the LLM must be grounded in the environment in which the robot operates. One solution is to use a scene graph that contains all the necessary information. Modern methods […]
Intersectional Fairness in Vision-Language Models for Medical Image Disease Classification
arXiv:2512.15249v2 Announce Type: replace-cross Abstract: Medical artificial intelligence (AI) systems, particularly multimodal vision-language models (VLM), often exhibit intersectional biases where models are systematically less confident in diagnosing marginalised patient subgroups. Such bias can lead to higher rates of inaccurate and missed diagnoses due to demographically skewed data and divergent distributions of diagnostic certainty. Current fairness […]
One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents
arXiv:2512.20957v1 Announce Type: cross Abstract: Locating the files and functions requiring modification in large open-source software (OSS) repositories is challenging due to their scale and structural complexity. Existing large language model (LLM)-based methods typically treat this as a repository-level retrieval task and rely on multiple auxiliary tools, which overlook code execution logic and complicate model […]
Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
arXiv:2512.16378v2 Announce Type: replace-cross Abstract: As Large Language Models (LLMs) expand beyond text, integrating speech as a native modality has given rise to SpeechLLMs, which aim to translate spoken language directly, thereby bypassing traditional transcription-based pipelines. Whether this integration improves speech-to-text translation quality over established cascaded architectures, however, remains an open question. We present Hearing […]
ReACT-Drug: Reaction-Template Guided Reinforcement Learning for de novo Drug Design
arXiv:2512.20958v1 Announce Type: cross Abstract: De novo drug design is a crucial component of modern drug development, yet navigating the vast chemical space to find synthetically accessible, high-affinity candidates remains a significant challenge. Reinforcement Learning (RL) enhances this process by enabling multi-objective optimization and exploration of novel chemical space – capabilities that traditional supervised learning […]
Interpretable Plant Leaf Disease Detection Using Attention-Enhanced CNN
arXiv:2512.17864v2 Announce Type: replace-cross Abstract: Plant diseases pose a significant threat to global food security, necessitating accurate and interpretable disease detection methods. This study introduces an interpretable attention-guided Convolutional Neural Network (CNN), CBAM-VGG16, for plant leaf disease detection. By integrating Convolution Block Attention Module (CBAM) at each convolutional stage, the model enhances feature extraction and […]
Can Agentic AI Match the Performance of Human Data Scientists?
arXiv:2512.20959v1 Announce Type: cross Abstract: Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) have significantly automated data science workflows, but a fundamental question persists: Can these agentic AI systems truly match the performance of human data scientists who routinely leverage […]
Code2Doc: A Quality-First Curated Dataset for Code Documentation
arXiv:2512.18748v2 Announce Type: replace-cross Abstract: The performance of automatic code documentation generation models depends critically on the quality of the training data used for supervision. However, most existing code documentation datasets are constructed through large scale scraping of public repositories with limited quality control. As a result, they often contain noisy documentation, extensive duplication, and […]