FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning

arXiv:2601.21682v1 Announce Type: cross Abstract: Large language models (LLMs) demonstrate impressive capabilities across diverse tasks but raise concerns about privacy, copyright, and harmful materials. Existing

ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack

arXiv:2601.20903v1 Announce Type: cross Abstract: Multi-turn jailbreak attacks have emerged as a critical threat to Large Language Models (LLMs), bypassing safety mechanisms by progressively constructing

SAGE: Sequence-level Adaptive Gradient Evolution for Generative Recommendation

arXiv:2601.21452v1 Announce Type: cross Abstract: While works such as OneRec have validated the scaling laws of Large Language Models (LLMs) in recommender systems, they rely

Shaping capabilities with token-level data filtering

arXiv:2601.21571v1 Announce Type: cross Abstract: Current approaches to reducing undesired capabilities in language models are largely post hoc, and can thus be easily bypassed by

The Surprising Difficulty of Search in Model-Based Reinforcement Learning

arXiv:2601.21306v1 Announce Type: cross Abstract: This paper investigates search in model-based reinforcement learning (RL). Conventional wisdom holds that long-term predictions and compounding errors are the

SHARP: Social Harm Analysis via Risk Profiles for Measuring Inequities in Large Language Models

January 30, 2026

arXiv:2601.21235v1 Announce Type: cross
Abstract: Large language models (LLMs) are increasingly deployed in high-stakes domains, where rare but severe failures can result in irreversible harm. However, prevailing evaluation benchmarks often reduce complex social risk to mean-centered scalar scores, thereby obscuring distributional structure, cross-dimensional interactions, and worst-case behavior. This paper introduces Social Harm Analysis via Risk Profiles (SHARP), a framework for multidimensional, distribution-aware evaluation of social harm. SHARP models harm as a multivariate random variable and integrates explicit decomposition into bias, fairness, ethics, and epistemic reliability with a union-of-failures aggregation reparameterized as additive cumulative log-risk. The framework further employs risk-sensitive distributional statistics, with Conditional Value at Risk (CVaR95) as a primary metric, to characterize worst-case model behavior. Application of SHARP to eleven frontier LLMs, evaluated on a fixed corpus of n=901 socially sensitive prompts, reveals that models with similar average risk can exhibit more than twofold differences in tail exposure and volatility. Across models, dimension-wise marginal tail behavior varies systematically across harm dimensions, with bias exhibiting the strongest tail severities, epistemic and fairness risks occupying intermediate regimes, and ethical misalignment consistently lower; together, these patterns reveal heterogeneous, model-dependent failure structures that scalar benchmarks conflate. These findings indicate that responsible evaluation and governance of LLMs require moving beyond scalar averages toward multidimensional, tail-sensitive risk profiling.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844