arXiv:2511.20821v2 Announce Type: replace-cross Abstract: Diffusion models have established the state-of-the-art in text-to-image generation, but their performance often relies on a diffusion prior network to translate text embeddings into the visual manifold for easier decoding. These priors are computationally expensive and require extensive training on massive datasets. In this work, we challenge the necessity of […]
SIT-Graph: State Integrated Tool Graph for Multi-Turn Agents
arXiv:2512.07287v1 Announce Type: cross Abstract: Despite impressive advances in agent systems, multi-turn tool-use scenarios remain challenging. It is mainly because intent is clarified progressively and the environment evolves with each tool call. While reusing past experience is natural, current LLM agents either treat entire trajectories or pre-defined subtasks as indivisible units, or solely exploit tool-to-tool […]
Towards Accurate UAV Image Perception: Guiding Vision-Language Models with Stronger Task Prompts
arXiv:2512.07302v1 Announce Type: cross Abstract: Existing image perception methods based on VLMs generally follow a paradigm wherein models extract and analyze image content based on user-provided textual task prompts. However, such methods face limitations when applied to UAV imagery, which presents challenges like target confusion, scale variations, and complex backgrounds. These challenges arise because VLMs’ […]
Subgoal Graph-Augmented Planning for LLM-Guided Open-World Reinforcement Learning
arXiv:2511.20993v2 Announce Type: replace-cross Abstract: Large language models (LLMs) offer strong high-level planning capabilities for reinforcement learning (RL) by decomposing tasks into subgoals. However, their practical utility is limited by poor planning-execution alignment, which reflects a critical gap between abstract plans and actionable, environment-compatible behaviors. This misalignment arises from two interrelated limitations: (1) LLMs often […]
Exact Synthetic Populations for Scalable Societal and Market Modeling
arXiv:2512.07306v1 Announce Type: cross Abstract: We introduce a constraint-programming framework for generating synthetic populations that reproduce target statistics with high precision while enforcing full individual consistency. Unlike data-driven approaches that infer distributions from samples, our method directly encodes aggregated statistics and structural relations, enabling exact control of demographic profiles without requiring any microdata. We validate […]
MIDG: Mixture of Invariant Experts with knowledge injection for Domain Generalization in Multimodal Sentiment Analysis
arXiv:2512.07430v1 Announce Type: cross Abstract: Existing methods in domain generalization for Multimodal Sentiment Analysis (MSA) often overlook inter-modal synergies during invariant features extraction, which prevents the accurate capture of the rich semantic information within multimodal data. Additionally, while knowledge injection techniques have been explored in MSA, they often suffer from fragmented cross-modal knowledge, overlooking specific […]
Effective Attention-Guided Multi-Scale Medical Network for Skin Lesion Segmentation
arXiv:2512.07275v1 Announce Type: cross Abstract: In the field of healthcare, precise skin lesion segmentation is crucial for the early detection and accurate diagnosis of skin diseases. Despite significant advances in deep learning for image processing, existing methods have yet to effectively address the challenges of irregular lesion shapes and low contrast. To address these issues, […]
Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models
arXiv:2512.07564v1 Announce Type: cross Abstract: Vision-language models (VLMs) frequently generate hallucinated content plausible but incorrect claims about image content. We propose a training-free self-correction framework enabling VLMs to iteratively refine responses through uncertainty-guided visual re-attention. Our method combines multidimensional uncertainty quantification (token entropy, attention dispersion, semantic consistency, claim confidence) with attention-guided cropping of under-explored regions. […]
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
arXiv:2412.15701v5 Announce Type: replace Abstract: While the advancement of large language models has spurred the development of AI agents to automate tasks, numerous use cases inherently require agents to collaborate with humans due to humans’ latent preferences, domain expertise, or the need for control. To facilitate the study of human-agent collaboration, we introduce Collaborative Gym […]
SINRL: Socially Integrated Navigation with Reinforcement Learning using Spiking Neural Networks
arXiv:2512.07266v1 Announce Type: cross Abstract: Integrating autonomous mobile robots into human environments requires human-like decision-making and energy-efficient, event-based computation. Despite progress, neuromorphic methods are rarely applied to Deep Reinforcement Learning (DRL) navigation approaches due to unstable training. We address this gap with a hybrid socially integrated DRL actor-critic approach that combines Spiking Neural Networks (SNNs) […]