Fast Approximation Algorithm for Non-Monotone DR-submodular Maximization under Size Constraint

arXiv:2511.02254v1 Announce Type: cross Abstract: This work studies the non-monotone DR-submodular Maximization over a ground set of $n$ subject to a size constraint $k$. We

AI Credibility Signals Outrank Institutions and Engagement in Shaping News Perception on Social Media

arXiv:2511.02370v1 Announce Type: cross Abstract: AI-generated content is rapidly becoming a salient component of online information ecosystems, yet its influence on public trust and epistemic

Near Optimal Convergence to Coarse Correlated Equilibrium in General-Sum Markov Games

arXiv:2511.02157v1 Announce Type: cross Abstract: No-regret learning dynamics play a central role in game theory, enabling decentralized convergence to equilibrium for concepts such as Coarse

Estimation of Segmental Longitudinal Strain in Transesophageal Echocardiography by Deep Learning

arXiv:2511.02210v1 Announce Type: cross Abstract: Segmental longitudinal strain (SLS) of the left ventricle (LV) is an important prognostic indicator for evaluating regional LV dysfunction, in

Shared Parameter Subspaces and Cross-Task Linearity in Emergently Misaligned Behavior

arXiv:2511.02022v1 Announce Type: cross Abstract: Recent work has discovered that large language models can develop broadly misaligned behaviors after being fine-tuned on narrowly harmful datasets,

TRACE: Textual Reasoning for Affordance Coordinate Extraction

November 5, 2025

arXiv:2511.01999v1 Announce Type: cross
Abstract: Vision-Language Models (VLMs) struggle to translate high-level instructions into the precise spatial affordances required for robotic manipulation. While visual Chain-of-Thought (CoT) methods exist, they are often computationally intensive. In this work, we introduce TRACE (Textual Reasoning for Affordance Coordinate Extraction), a novel methodology that integrates a textual Chain of Reasoning (CoR) into the affordance prediction process. We use this methodology to create the TRACE dataset, a large-scale collection created via an autonomous pipeline that pairs instructions with explicit textual rationales. By fine-tuning a VLM on this data, our model learns to externalize its spatial reasoning before acting. Our experiments show that our TRACE-tuned model achieves state-of-the-art performance, reaching 48.1% accuracy on the primary Where2Place (W2P) benchmark (a 9.6% relative improvement) and 55.0% on the more challenging W2P(h) subset. Crucially, an ablation study demonstrates that performance scales directly with the amount of reasoning data used, confirming the CoR’s effectiveness. Furthermore, analysis of the model’s attention maps reveals an interpretable reasoning process where focus shifts dynamically across reasoning steps. This work shows that training VLMs to generate a textual CoR is an effective and robust strategy for enhancing the precision, reliability, and interpretability of VLM-based robot control. Our dataset and code are available at https://github.com/jink-ucla/TRACE

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844