arXiv:2509.22161v3 Announce Type: replace-cross Abstract: Vector quantization, which discretizes a continuous vector space into a finite set of representative vectors (a codebook), has been widely adopted in modern machine learning. Despite its effectiveness, vector quantization poses a fundamental challenge: the non-differentiable quantization step blocks gradient backpropagation. Smoothed vector quantization addresses this issue by relaxing the […]
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
arXiv:2505.19616v4 Announce Type: replace-cross Abstract: Multimodal Large Language Models demonstrate strong performance on multimodal benchmarks, yet often exhibit poor robustness when exposed to spurious modality interference, such as irrelevant text in vision understanding, or irrelevant visual content in question answering. At its core, modality interference refers to cases where spurious signals from non-essential modalities distort […]
Bring My Cup! Personalizing Vision-Language-Action Models with Visual Attentive Prompting
arXiv:2512.20014v2 Announce Type: replace-cross Abstract: While Vision-Language-Action (VLA) models generalize well to generic instructions, they struggle with personalized commands such as “bring my cup,” where the robot must act on one specific instance among visually similar objects. We study this setting of manipulating personal objects, in which a VLA must identify and control a user-specific […]
No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS
arXiv:2509.18531v2 Announce Type: replace-cross Abstract: Recent work reports gains in neural text-to-speech (TTS) with Group Relative Policy Optimization (GRPO). However, in the absence of a verifiable reward for textitprosody, GRPO trained on transcription-oriented signals (CER/NLL) lowers error rates yet collapses prosody into monotone, unnatural speech; adding speaker-similarity further destabilizes training and degrades CER. We address […]
XFACTORS: Disentangled Information Bottleneck via Contrastive Supervision
arXiv:2601.21688v1 Announce Type: cross Abstract: Disentangled representation learning aims to map independent factors of variation to independent representation components. On one hand, purely unsupervised approaches have proven successful on fully disentangled synthetic data, but fail to recover semantic factors from real data without strong inductive biases. On the other hand, supervised approaches are unstable and […]
Reputation as a Solution to Cooperation Collapse in LLM-based MASs
arXiv:2505.05029v3 Announce Type: replace Abstract: Cooperation has long been a fundamental topic in both human society and AI systems. However, recent studies indicate that the collapse of cooperation may emerge in multi-agent systems (MASs) driven by large language models (LLMs). To address this challenge, we explore reputation systems as a remedy. We propose RepuNet, a […]
ECSEL: Explainable Classification via Signomial Equation Learning
arXiv:2601.21789v1 Announce Type: cross Abstract: We introduce ECSEL, an explainable classification method that learns formal expressions in the form of signomial equations, motivated by the observation that many symbolic regression benchmarks admit compact signomial structure. ECSEL directly constructs a structural, closed-form expression that serves as both a classifier and an explanation. On standard symbolic regression […]
Actionable Interpretability Must Be Defined in Terms of Symmetries
arXiv:2601.12913v3 Announce Type: replace Abstract: This paper argues that interpretability research in Artificial Intelligence (AI) is fundamentally ill-posed as existing definitions of interpretability fail to describe how interpretability can be formally tested or designed for. We posit that actionable definitions of interpretability must be formulated in terms of *symmetries* that inform model design and lead […]
From Particles to Agents: Hallucination as a Metric for Cognitive Friction in Spatial Simulation
arXiv:2601.21977v1 Announce Type: cross Abstract: Traditional architectural simulations (e.g. Computational Fluid Dynamics, evacuation, structural analysis) model elements as deterministic physics-based “particles” rather than cognitive “agents”. To bridge this, we introduce textbfAgentic Environmental Simulations, where Large Multimodal generative models actively predict the next state of spatial environments based on semantic expectation. Drawing on examples from accessibility-oriented […]
QUARK: Robust Retrieval under Non-Faithful Queries via Query-Anchored Aggregation
arXiv:2601.21049v1 Announce Type: new Abstract: User queries in real-world retrieval are often non-faithful (noisy, incomplete, or distorted), causing retrievers to fail when key semantics are missing. We formalize this as retrieval under recall noise, where the observed query is drawn from a noisy recall process of a latent target item. To address this, we propose […]
Investigating Associational Biases in Inter-Model Communication of Large Generative Models
arXiv:2601.22093v1 Announce Type: cross Abstract: Social bias in generative AI can manifest not only as performance disparities but also as associational bias, whereby models learn and reproduce stereotypical associations between concepts and demographic groups, even in the absence of explicit demographic information (e.g., associating doctors with men). These associations can persist, propagate, and potentially amplify […]
Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report
arXiv:2601.21051v1 Announce Type: new Abstract: We present Foundation-Sec-8B-Reasoning, the first open-source native reasoning model for cybersecurity. Built upon our previously released Foundation-Sec-8B base model (derived from Llama-3.1-8B-Base), the model is trained through a two-stage process combining supervised fine-tuning (SFT) and reinforcement learning from verifiable rewards (RLVR). Our training leverages proprietary reasoning data spanning cybersecurity analysis, […]