arXiv:2603.05773v2 Announce Type: replace-cross
Abstract: Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection automatically triggers refusal. However, the persistence of jailbreak attacks suggests a fundamental mechanistic decoupling. We propose the textbfunderlineDisentangled textbfunderlineSafety textbfunderlineHypothesis textbf(DSH), positing that safety computation operates on two distinct subspaces: a textitRecognition Axis ($mathbfv_H$, “Knowing”) and an textitExecution Axis ($mathbfv_R$, “Acting”). Our geometric analysis reveals a universal “Reflex-to-Dissociation” evolution, where these signals transition from antagonistic entanglement in early layers to structural independence in deep layers. To validate this, we introduce textitDouble-Difference Extraction and textitAdaptive Causal Steering. Using our curated textscAmbiguityBench, we demonstrate a causal double dissociation, effectively creating a state of “Knowing without Acting.” Crucially, we leverage this disentanglement to propose the textbfRefusal Erasure Attack (REA), which achieves State-of-the-Art attack success rates by surgically lobotomizing the refusal mechanism. Furthermore, we uncover a critical architectural divergence, contrasting the textitExplicit Semantic Control of Llama3.1 with the textitLatent Distributed Control of Qwen2.5. The code and dataset are available at https://anonymous.4open.science/r/DSH.
Effectiveness of Al-Assisted Patient Health Education Using Voice Cloning and ChatGPT: Prospective Randomized Controlled Trial
Background: Traditional patient education often lacks personalization and engagement, potentially limiting knowledge acquisition and treatment adherence. Advances in artificial intelligence (AI), including voice cloning technology




