Infectious disease burden and surveillance challenges in Jordan and Palestine: a systematic review and meta-analysis

BackgroundJordan and Palestine face public health challenges due to infectious diseases, with the added detrimental factors of long-term conflict, forced relocation, and lack of resources.

From pilot to policy: why AI health interventions fail to scale in developing countries

Post Content

One Token Is Enough: Improving Diffusion Language Models with a Sink Token

arXiv:2601.19657v2 Announce Type: replace-cross Abstract: Diffusion Language Models (DLMs) have emerged as a compelling alternative to autoregressive approaches, enabling parallel text generation with competitive performance.

Trustworthy Intelligent Education: A Systematic Perspective on Progress, Challenges, and Future Directions

arXiv:2601.21837v1 Announce Type: cross Abstract: In recent years, trustworthiness has garnered increasing attention and exploration in the field of intelligent education, due to the inherent

RobustExplain: Evaluating Robustness of LLM-Based Explanation Agents for Recommendation

arXiv:2601.19120v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly used to generate natural-language explanations in recommender systems, acting as explanation agents that reason

SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data

January 27, 2026

arXiv:2505.20347v2 Announce Type: replace-cross
Abstract: Recent advances have demonstrated the effectiveness of Reinforcement Learning (RL) in improving the reasoning capabilities of Large Language Models (LLMs). However, existing works inevitably rely on high-quality instructions and verifiable rewards for effective training, both of which are often difficult to obtain in specialized domains. In this paper, we propose Self-play Reinforcement Learning (SeRL) to bootstrap LLM training with limited initial data. Specifically, SeRL comprises two complementary modules: self-instruction and self-rewarding. The former module generates additional instructions based on the available data at each training step, employing robust online filtering strategies to ensure instruction quality, diversity, and difficulty. The latter module introduces a simple yet effective majority-voting mechanism to estimate response rewards for additional instructions, eliminating the need for external annotations. Finally, SeRL performs conventional RL based on the generated data, facilitating iterative self-play learning. Extensive experiments on various reasoning benchmarks and across different LLM backbones demonstrate that the proposed SeRL yields results superior to its counterparts and achieves performance on par with those obtained by high-quality data with verifiable rewards. Our code is available at https://github.com/wantbook-book/SeRL.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844