Pushing Toward the Simplex Vertices: A Simple Remedy for Code Collapse in Smoothed Vector Quantization

arXiv:2509.22161v3 Announce Type: replace-cross Abstract: Vector quantization, which discretizes a continuous vector space into a finite set of representative vectors (a codebook), has been widely adopted in modern machine learning. Despite its effectiveness, vector quantization poses a fundamental challenge: the non-differentiable quantization step blocks gradient backpropagation. Smoothed vector quantization addresses this issue by relaxing the […]

Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models

arXiv:2505.19616v4 Announce Type: replace-cross Abstract: Multimodal Large Language Models demonstrate strong performance on multimodal benchmarks, yet often exhibit poor robustness when exposed to spurious modality interference, such as irrelevant text in vision understanding, or irrelevant visual content in question answering. At its core, modality interference refers to cases where spurious signals from non-essential modalities distort […]

Bring My Cup! Personalizing Vision-Language-Action Models with Visual Attentive Prompting

arXiv:2512.20014v2 Announce Type: replace-cross Abstract: While Vision-Language-Action (VLA) models generalize well to generic instructions, they struggle with personalized commands such as “bring my cup,” where the robot must act on one specific instance among visually similar objects. We study this setting of manipulating personal objects, in which a VLA must identify and control a user-specific […]

No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS

arXiv:2509.18531v2 Announce Type: replace-cross Abstract: Recent work reports gains in neural text-to-speech (TTS) with Group Relative Policy Optimization (GRPO). However, in the absence of a verifiable reward for textitprosody, GRPO trained on transcription-oriented signals (CER/NLL) lowers error rates yet collapses prosody into monotone, unnatural speech; adding speaker-similarity further destabilizes training and degrades CER. We address […]

XFACTORS: Disentangled Information Bottleneck via Contrastive Supervision

arXiv:2601.21688v1 Announce Type: cross Abstract: Disentangled representation learning aims to map independent factors of variation to independent representation components. On one hand, purely unsupervised approaches have proven successful on fully disentangled synthetic data, but fail to recover semantic factors from real data without strong inductive biases. On the other hand, supervised approaches are unstable and […]

Reputation as a Solution to Cooperation Collapse in LLM-based MASs

arXiv:2505.05029v3 Announce Type: replace Abstract: Cooperation has long been a fundamental topic in both human society and AI systems. However, recent studies indicate that the collapse of cooperation may emerge in multi-agent systems (MASs) driven by large language models (LLMs). To address this challenge, we explore reputation systems as a remedy. We propose RepuNet, a […]

ECSEL: Explainable Classification via Signomial Equation Learning

arXiv:2601.21789v1 Announce Type: cross Abstract: We introduce ECSEL, an explainable classification method that learns formal expressions in the form of signomial equations, motivated by the observation that many symbolic regression benchmarks admit compact signomial structure. ECSEL directly constructs a structural, closed-form expression that serves as both a classifier and an explanation. On standard symbolic regression […]

Actionable Interpretability Must Be Defined in Terms of Symmetries

arXiv:2601.12913v3 Announce Type: replace Abstract: This paper argues that interpretability research in Artificial Intelligence (AI) is fundamentally ill-posed as existing definitions of interpretability fail to describe how interpretability can be formally tested or designed for. We posit that actionable definitions of interpretability must be formulated in terms of *symmetries* that inform model design and lead […]

From Particles to Agents: Hallucination as a Metric for Cognitive Friction in Spatial Simulation

arXiv:2601.21977v1 Announce Type: cross Abstract: Traditional architectural simulations (e.g. Computational Fluid Dynamics, evacuation, structural analysis) model elements as deterministic physics-based “particles” rather than cognitive “agents”. To bridge this, we introduce textbfAgentic Environmental Simulations, where Large Multimodal generative models actively predict the next state of spatial environments based on semantic expectation. Drawing on examples from accessibility-oriented […]

Investigating Associational Biases in Inter-Model Communication of Large Generative Models

arXiv:2601.22093v1 Announce Type: cross Abstract: Social bias in generative AI can manifest not only as performance disparities but also as associational bias, whereby models learn and reproduce stereotypical associations between concepts and demographic groups, even in the absence of explicit demographic information (e.g., associating doctors with men). These associations can persist, propagate, and potentially amplify […]

Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report

arXiv:2601.21051v1 Announce Type: new Abstract: We present Foundation-Sec-8B-Reasoning, the first open-source native reasoning model for cybersecurity. Built upon our previously released Foundation-Sec-8B base model (derived from Llama-3.1-8B-Base), the model is trained through a two-stage process combining supervised fine-tuning (SFT) and reinforcement learning from verifiable rewards (RLVR). Our training leverages proprietary reasoning data spanning cybersecurity analysis, […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844