arXiv:2504.19373v4 Announce Type: replace-cross Abstract: Recent advances in multi-modal large reasoning models (MLRMs) have shown significant ability to interpret complex visual content. While these models enable impressive reasoning capabilities, they also introduce novel and underexplored privacy risks. In this paper, we identify a novel category of privacy leakage in MLRMs: Adversaries can infer sensitive geolocation […]
Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks?
arXiv:2509.06350v2 Announce Type: replace-cross Abstract: Jailbreak attacks on Large Language Models (LLMs) have demonstrated various successful methods whereby attackers manipulate models into generating harmful responses that they are designed to avoid. Among these, Greedy Coordinate Gradient (GCG) has emerged as a general and effective approach that optimizes the tokens in a suffix to generate jailbreakable […]
Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space
arXiv:2510.12603v2 Announce Type: replace-cross Abstract: Multimodal reasoning aims to enhance the capabilities of MLLMs by incorporating intermediate reasoning steps before reaching the final answer. It has evolved from text-only reasoning to the integration of visual information, enabling the thought process to be conveyed through both images and text. Despite its effectiveness, current multimodal reasoning methods […]
First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation
arXiv:2511.04715v2 Announce Type: replace-cross Abstract: Identifying how training samples influence/impact Large Language Model (LLM) decision-making is essential for effectively interpreting model decisions and auditing large-scale datasets. Current training sample influence estimation methods (also known as influence functions) undertake this goal by utilizing information flow through the model via its first-order and higher-order gradient terms. However, […]
Geometric Scaling of Bayesian Inference in LLMs
arXiv:2512.23752v3 Announce Type: replace-cross Abstract: Recent work has shown that small transformers trained in controlled “wind-tunnel” settings can implement exact Bayesian inference, and that their training dynamics produce a geometric substrate — low-dimensional value manifolds and progressively orthogonal keys — that encodes posterior structure. We investigate whether this geometric signature persists in production-grade language models. […]
Unsupervised Ensemble Learning Through Deep Energy-based Models
arXiv:2601.20556v1 Announce Type: cross Abstract: Unsupervised ensemble learning emerged to address the challenge of combining multiple learners’ predictions without access to ground truth labels or additional data. This paradigm is crucial in scenarios where evaluating individual classifier performance or understanding their strengths is challenging due to limited information. We propose a novel deep energy-based method […]
HESTIA: A Hessian-Guided Differentiable Quantization-Aware Training Framework for Extremely Low-Bit LLMs
arXiv:2601.20745v1 Announce Type: cross Abstract: As large language models (LLMs) continue to scale, deployment is increasingly bottlenecked by the memory wall, motivating a shift toward extremely low-bit quantization. However, most quantization-aware training (QAT) methods apply hard rounding and the straight-through estimator (STE) from the beginning of the training, which prematurely discretizes the optimization landscape and […]
Reward Models Inherit Value Biases from Pretraining
arXiv:2601.20838v1 Announce Type: cross Abstract: Reward models (RMs) are central to aligning large language models (LLMs) with human values but have received less attention than pre-trained and post-trained LLMs themselves. Because RMs are initialized from LLMs, they inherit representations that shape their behavior, but the nature and extent of this influence remain understudied. In a […]
Lifted Forward Planning in Relational Factored Markov Decision Processes with Concurrent Actions
arXiv:2505.22147v2 Announce Type: replace Abstract: Decision making is a central problem in AI that can be formalized using a Markov Decision Process. A problem is that, with increasing numbers of (indistinguishable) objects, the state space grows exponentially. To compute policies, the state space has to be enumerated. Even more possibilities have to be enumerated if […]
Analysis of approximate linear programming solution to Markov decision problem with log barrier function
arXiv:2509.19800v2 Announce Type: replace Abstract: There are two primary approaches to solving Markov decision problems (MDPs): dynamic programming based on the Bellman equation and linear programming (LP). Dynamic programming methods are the most widely used and form the foundation of both classical and modern reinforcement learning (RL). By contrast, LP-based methods have been less commonly […]
Gabliteration: Adaptive Multi-Directional Neural Weight Modification for Selective Behavioral Alteration in Large Language Models
arXiv:2512.18901v3 Announce Type: replace Abstract: We present Gabliteration, a novel neural weight modification technique that advances beyond traditional abliteration methods by implementing adaptive multi-directional projections with regularized layer selection. Our approach addresses the fundamental limitation of existing methods that compromise model quality while attempting to modify specific behavioral patterns. Through dynamic layer optimization, regularized projection […]
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
arXiv:2601.18631v2 Announce Type: replace Abstract: When humans face problems beyond their immediate capabilities, they rely on tools, providing a promising paradigm for improving visual reasoning in multimodal large language models (MLLMs). Effective reasoning, therefore, hinges on knowing which tools to use, when to invoke them, and how to compose them over multiple steps, even when […]