Learning Evolving Latent Strategies for Multi-Agent Language Systems without Model Fine-Tuning

arXiv:2512.20629v1 Announce Type: cross Abstract: This study proposes a multi-agent language framework that enables continual strategy evolution without fine-tuning the language model’s parameters. The core

Intersectional Fairness in Vision-Language Models for Medical Image Disease Classification

arXiv:2512.15249v2 Announce Type: replace-cross Abstract: Medical artificial intelligence (AI) systems, particularly multimodal vision-language models (VLM), often exhibit intersectional biases where models are systematically less confident

One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents

arXiv:2512.20957v1 Announce Type: cross Abstract: Locating the files and functions requiring modification in large open-source software (OSS) repositories is challenging due to their scale and

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

arXiv:2512.16378v2 Announce Type: replace-cross Abstract: As Large Language Models (LLMs) expand beyond text, integrating speech as a native modality has given rise to SpeechLLMs, which

ReACT-Drug: Reaction-Template Guided Reinforcement Learning for de novo Drug Design

arXiv:2512.20958v1 Announce Type: cross Abstract: De novo drug design is a crucial component of modern drug development, yet navigating the vast chemical space to find

LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation

December 25, 2025

arXiv:2512.21243v1 Announce Type: cross
Abstract: Methods that use Large Language Models (LLM) as planners for embodied instruction following tasks have become widespread. To successfully complete tasks, the LLM must be grounded in the environment in which the robot operates. One solution is to use a scene graph that contains all the necessary information. Modern methods rely on prebuilt scene graphs and assume that all task-relevant information is available at the start of planning. However, these approaches do not account for changes in the environment that may occur between the graph construction and the task execution. We propose LookPlanGraph – a method that leverages a scene graph composed of static assets and object priors. During plan execution, LookPlanGraph continuously updates the graph with relevant objects, either by verifying existing priors or discovering new entities. This is achieved by processing the agents egocentric camera view using a Vision Language Model. We conducted experiments with changed object positions VirtualHome and OmniGibson simulated environments, demonstrating that LookPlanGraph outperforms methods based on predefined static scene graphs. To demonstrate the practical applicability of our approach, we also conducted experiments in a real-world setting. Additionally, we introduce the GraSIF (Graph Scenes for Instruction Following) dataset with automated validation framework, comprising 514 tasks drawn from SayPlan Office, BEHAVIOR-1K, and VirtualHome RobotHow. Project page available at https://lookplangraph.github.io .

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844