Inside the marketplace powering bespoke AI deepfakes of real women

Civitai—an online marketplace for buying and selling AI-generated content, backed by the venture capital firm Andreessen Horowitz—is letting users buy custom instruction files for generating

How the sometimes-weird world of lifespan extension is gaining influence

For the last couple of years, I’ve been following the progress of a group of individuals who believe death is humanity’s “core problem.” Put simply,

FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning

arXiv:2601.21682v1 Announce Type: cross Abstract: Large language models (LLMs) demonstrate impressive capabilities across diverse tasks but raise concerns about privacy, copyright, and harmful materials. Existing

SAGE: Sequence-level Adaptive Gradient Evolution for Generative Recommendation

arXiv:2601.21452v1 Announce Type: cross Abstract: While works such as OneRec have validated the scaling laws of Large Language Models (LLMs) in recommender systems, they rely

SHARP: Social Harm Analysis via Risk Profiles for Measuring Inequities in Large Language Models

arXiv:2601.21235v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed in high-stakes domains, where rare but severe failures can result in irreversible harm.

StepShield: When, Not Whether to Intervene on Rogue Agents

January 30, 2026

arXiv:2601.22136v1 Announce Type: cross
Abstract: Existing agent safety benchmarks report binary accuracy, conflating early intervention with post-mortem analysis. A detector that flags a violation at step 8 enables intervention; one that reports it at step 48 provides only forensic value. This distinction is critical, yet current benchmarks cannot measure it. We introduce StepShield, the first benchmark to evaluate when violations are detected, not just whether. StepShield contains 9,213 code agent trajectories, including 1,278 meticulously annotated training pairs and a 7,935-trajectory test set with a realistic 8.1% rogue rate. Rogue behaviors are grounded in real-world security incidents across six categories. We propose three novel temporal metrics: Early Intervention Rate (EIR), Intervention Gap, and Tokens Saved. Surprisingly, our evaluation reveals that an LLM-based judge achieves 59% EIR while a static analyzer achieves only 26%, a 2.3x performance gap that is entirely invisible to standard accuracy metrics. We further show that early detection has direct economic benefits: our cascaded HybridGuard detector reduces monitoring costs by 75% and projects to $108M in cumulative savings over five years at enterprise scale. By shifting the focus of evaluation from whether to when, StepShield provides a new foundation for building safer and more economically viable AI agents. The code and data are released under an Apache 2.0 license.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844