DETACH : Decomposed Spatio-Temporal Alignment for Exocentric Video and Ambient Sensors with Staged Learning

Learning Evolving Latent Strategies for Multi-Agent Language Systems without Model Fine-Tuning

arXiv:2512.20629v1 Announce Type: cross Abstract: This study proposes a multi-agent language framework that enables continual strategy evolution without fine-tuning the language model’s parameters. The core

LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation

arXiv:2512.21243v1 Announce Type: cross Abstract: Methods that use Large Language Models (LLM) as planners for embodied instruction following tasks have become widespread. To successfully complete

Intersectional Fairness in Vision-Language Models for Medical Image Disease Classification

arXiv:2512.15249v2 Announce Type: replace-cross Abstract: Medical artificial intelligence (AI) systems, particularly multimodal vision-language models (VLM), often exhibit intersectional biases where models are systematically less confident

One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents

arXiv:2512.20957v1 Announce Type: cross Abstract: Locating the files and functions requiring modification in large open-source software (OSS) repositories is challenging due to their scale and

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

arXiv:2512.16378v2 Announce Type: replace-cross Abstract: As Large Language Models (LLMs) expand beyond text, integrating speech as a native modality has given rise to SpeechLLMs, which