Disclosure in the era of generative artificial intelligence

Generative artificial intelligence (AI) has rapidly become embedded in academic writing, assisting with tasks ranging from language editing to drafting text and producing evidence. Despite

Agile Development and Testing of a Gamified Human Milk Feeding Education Mobile App for Participants of the Special Supplemental Nutrition Program for Women, Infants, and Children: Co-Design Approach

Background: Human milk feeding and breastfeeding are the recommended standards for infant feeding. Nevertheless, breastfeeding rates in the United States remain below target levels, with

Keeping Clinical Trials on an Inclusive Track

Post Content

Behavior change beyond intervention: an activity-theoretical perspective on human-centered design of personal health technology

IntroductionModern personal technologies, such as smartphone apps with artificial intelligence (AI) capabilities, have a significant potential for helping people make necessary changes in their behavior

Immortal AI, Mortal Life: Long-Term Perspectives on AI and Human Knowledge

Post Content

Cross-Lingual Jailbreak Detection via Semantic Codebooks

April 29, 2026

arXiv:2604.25716v1 Announce Type: cross
Abstract: Safety mechanisms for large language models (LLMs) remain predominantly English-centric, creating systematic vulnerabilities in multilingual deployment. Prior work shows that translating malicious prompts into other languages can substantially increase jailbreak success rates, exposing a structural cross-lingual security gap. We investigate whether such attacks can be mitigated through language-agnostic semantic similarity without retraining or language-specific adaptation. Our approach compares multilingual query embeddings against a fixed English codebook of jailbreak prompts, operating as a training-free external guardrail for black-box LLMs. We conduct a systematic evaluation across four languages, two translation pipelines, four safety benchmarks, three embedding models, and three target LLMs (Qwen, Llama, GPT-3.5). Our results reveal two distinct regimes of cross-lingual transfer. On curated benchmarks containing canonical jailbreak templates, semantic similarity generalizes reliably across languages, achieving near-perfect separability (AUC up to 0.99) and substantial reductions in absolute attack success rates under strict low-false-positive constraints. However, under distribution shift – on behaviorally diverse and heterogeneous unsafe benchmarks – separability degrades markedly (AUC $approx$ 0.60-0.70), and recall in the security-critical low-FPR regime drops across all embedding models.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844