Behavior change beyond intervention: an activity-theoretical perspective on human-centered design of personal health technology

IntroductionModern personal technologies, such as smartphone apps with artificial intelligence (AI) capabilities, have a significant potential for helping people make necessary changes in their behavior

A data-centric perspective on designing AI foundation models for healthcare

Post Content

Correction: AI-Enabled Wearables for Motor Function Assessment and Rehabilitation in Parkinson Disease: Scoping Review

Post Content

Layer-wise MoE Routing Locality under Shared-Prefix Code Generation: Token-Identity Decomposition and Compile-Equivalent Fork Redundancy

arXiv:2604.17182v1 Announce Type: cross Abstract: In LLM-based code generation, multiple code candidates are often generated in parallel from the same prompt — for example, in

HQA-VLAttack: Towards High Quality Adversarial Attack on Vision-Language Pre-Trained Models

arXiv:2604.16499v1 Announce Type: cross Abstract: Black-box adversarial attack on vision-language pre-trained models is a practical and challenging task, as text and image perturbations need to

Reverse Constitutional AI: A Framework for Controllable Toxic Data Generation via Probability-Clamped RLAIF

April 21, 2026

arXiv:2604.17769v1 Announce Type: cross
Abstract: Ensuring the safety of large language models (LLMs) requires robust red teaming, yet the systematic synthesis of high-quality toxic data remains under-explored. We propose Reverse Constitutional AI (R-CAI), a framework for automated and controllable adversarial data generation that moves beyond isolated jailbreak prompts. By inverting a harmless constitution into a constitution of toxicity and iteratively refining model outputs through a critique–revision pipeline, R-CAI enables scalable synthesis of multi-dimensional adversarial data without human annotation. Optimizing solely for toxicity-related rewards, however, can lead to reward hacking and degraded semantic coherence. To address this challenge, we introduce probability clamping within reinforcement learning from AI feedback, which stabilizes adversarial optimization while preserving adversarial intent. Experiments demonstrate that R-CAI generates diverse, high-quality toxic data and that probability clamping substantially improves semantic coherence (15%) without sacrificing adversarial strength. Overall, R-CAI provides a fully automated framework for red teaming data generation and systematic safety evaluation of aligned language models.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844