From pilot to policy: why AI health interventions fail to scale in developing countries

Post Content

Infectious disease burden and surveillance challenges in Jordan and Palestine: a systematic review and meta-analysis

BackgroundJordan and Palestine face public health challenges due to infectious diseases, with the added detrimental factors of long-term conflict, forced relocation, and lack of resources.

One Token Is Enough: Improving Diffusion Language Models with a Sink Token

arXiv:2601.19657v2 Announce Type: replace-cross Abstract: Diffusion Language Models (DLMs) have emerged as a compelling alternative to autoregressive approaches, enabling parallel text generation with competitive performance.

Trustworthy Intelligent Education: A Systematic Perspective on Progress, Challenges, and Future Directions

arXiv:2601.21837v1 Announce Type: cross Abstract: In recent years, trustworthiness has garnered increasing attention and exploration in the field of intelligent education, due to the inherent

RobustExplain: Evaluating Robustness of LLM-Based Explanation Agents for Recommendation

arXiv:2601.19120v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly used to generate natural-language explanations in recommender systems, acting as explanation agents that reason

Mitigating Sensitive Information Leakage in LLMs4Code through Machine Unlearning

January 29, 2026

arXiv:2502.05739v2 Announce Type: replace-cross
Abstract: Large Language Models for Code (LLMs4Code) have achieved strong performance in code generation, but recent studies reveal that they may memorize and leak sensitive information contained in training data, posing serious privacy risks. To address this gap, this work presents the first comprehensive empirical study on applying machine unlearning to mitigate sensitive information leakage in LLMs4Code. We first construct a dedicated benchmark that includes: (i) a synthetic forget set containing diverse forms of personal information, and (ii) a retain set designed to evaluate whether code-generation capability is preserved after unlearning. Using this benchmark, we systematically assess three representative unlearning algorithms (GA, GA+GD, GA+KL) across three widely used open-source LLMs4Code models (AIXCoder-7B, CodeLlama-7B, CodeQwen-7B). Experimental results demonstrate that machine unlearning can substantially reduce direct memorization-based leakage: on average, the direct leak rate drops by more than 50% while retaining about over 91% of the original code-generation performance. Moreover, by analyzing post-unlearning outputs, we uncover a consistent shift from direct to indirect leakage, revealing an underexplored vulnerability that persists even when the target data has been successfully forgotten. Our findings show that machine unlearning is a feasible and effective solution for enhancing privacy protection in LLMs4Code, while also highlighting the need for future techniques capable of mitigating both direct and indirect leakage simultaneously.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844