arXiv:2604.03058v1 Announce Type: cross Abstract: LLMs can be socially sycophantic, affirming users when they ask questions like “am I in the wrong?” rather than providing genuine assessment. We hypothesize that this behavior arises from incorrect assumptions about the user, like underestimating how often users are seeking information over reassurance. We present Verbalized Assumptions, a framework […]
Automated Malware Family Classification using Weighted Hierarchical Ensembles of Large Language Models
arXiv:2604.02490v1 Announce Type: cross Abstract: Malware family classification remains a challenging task in automated malware analysis, particularly in real-world settings characterized by obfuscation, packing, and rapidly evolving threats. Existing machine learning and deep learning approaches typically depend on labeled datasets, handcrafted features, supervised training, or dynamic analysis, which limits their scalability and effectiveness in open-world […]
Mitigating LLM biases toward spurious social contexts using direct preference optimization
arXiv:2604.02585v1 Announce Type: new Abstract: LLMs are increasingly used for high-stakes decision-making, yet their sensitivity to spurious contextual information can introduce harmful biases. This is a critical concern when models are deployed for tasks like evaluating teachers’ instructional quality, where biased assessment can affect teachers’ professional development and career trajectories. We investigate model robustness to […]
Social Meaning in Large Language Models: Structure, Magnitude, and Pragmatic Prompting
arXiv:2604.02512v1 Announce Type: cross Abstract: Large language models (LLMs) increasingly exhibit human-like patterns of pragmatic and social reasoning. This paper addresses two related questions: do LLMs approximate human social meaning not only qualitatively but also quantitatively, and can prompting strategies informed by pragmatic theory improve this approximation? To address the first, we introduce two calibration-focused […]
Gradient Boosting within a Single Attention Layer
arXiv:2604.03190v1 Announce Type: cross Abstract: Transformer attention computes a single softmax-weighted average over values — a one-pass estimate that cannot correct its own errors. We introduce emphgradient-boosted attention, which applies the principle of gradient boosting emphwithin a single attention layer: a second attention pass, with its own learned projections, attends to the prediction error of […]
Feature Attribution Stability Suite: How Stable Are Post-Hoc Attributions?
arXiv:2604.02532v1 Announce Type: cross Abstract: Post-hoc feature attribution methods are widely deployed in safety-critical vision systems, yet their stability under realistic input perturbations remains poorly characterized. Existing metrics evaluate explanations primarily under additive noise, collapse stability to a single scalar, and fail to condition on prediction preservation, conflating explanation fragility with model sensitivity. We introduce […]
Do Audio-Visual Large Language Models Really See and Hear?
arXiv:2604.02605v1 Announce Type: new Abstract: Audio-Visual Large Language Models (AVLLMs) are emerging as unified interfaces to multimodal perception. We present the first mechanistic interpretability study of AVLLMs, analyzing how audio and visual features evolve and fuse through different layers of an AVLLM to produce the final text outputs. We find that although AVLLMs encode rich […]
Understanding the Effects of Safety Unalignment on Large Language Models
arXiv:2604.02574v1 Announce Type: cross Abstract: Safety alignment has become a critical step to ensure LLMs refuse harmful requests while providing helpful and harmless responses. However, despite the ubiquity of safety alignment for deployed frontier models, two separate lines of recent work–jailbreak-tuning (JT) and weight orthogonalization (WO)–have shown that safety guardrails may be largely disabled, resulting […]
Chain-of-Authorization: Embedding authorization into large language models
arXiv:2603.22869v2 Announce Type: replace Abstract: Although Large Language Models (LLMs) have evolved from text generators into the cognitive core of modern AI systems, their inherent lack of authorization awareness exposes these systems to catastrophic risks, ranging from unintentional data leakage to unauthorized command execution. Existing defense mechanisms are fundamentally decoupled from internal reasoning, rendering them […]
LitPivot: Developing Well-Situated Research Ideas Through Dynamic Contextualization and Critique within the Literature Landscape
arXiv:2604.02600v1 Announce Type: cross Abstract: Developing a novel research idea is hard. It must be distinct enough from prior work to claim a contribution while also building on it. This requires iteratively reviewing literature and refining an idea based on what a researcher reads; yet when an idea changes, the literature that matters often changes […]
AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models
arXiv:2604.02617v1 Announce Type: new Abstract: Scientific and Technical Intelligence (S&TI) analysis requires verifying complex technical claims across rapidly growing literature, where existing approaches fail to bridge the verification gap between surface-level accuracy and deeper methodological validity. We present AutoVerifier, an LLM-based agentic framework that automates end-to-end verification of technical claims without requiring domain expertise. AutoVerifier […]
Analytic Drift Resister for Non-Exemplar Continual Graph Learning
arXiv:2604.02633v1 Announce Type: cross Abstract: Non-Exemplar Continual Graph Learning (NECGL) seeks to eliminate the privacy risks intrinsic to rehearsal-based paradigms by retaining solely class-level prototype representations rather than raw graph examples for mitigating catastrophic forgetting. However, this design choice inevitably precipitates feature drift. As a nascent alternative, Analytic Continual Learning (ACL) capitalizes on the intrinsic […]