arXiv:2604.04410v1 Announce Type: cross Abstract: Aligning language models with human preferences is essential for ensuring their safety and reliability. Although most existing approaches assume specific human preference models such as the Bradley-Terry model, this assumption may fail to accurately capture true human preferences, and consequently, these methods lack statistical consistency, i.e., the guarantee that language […]
Boosted Distributional Reinforcement Learning: Analysis and Healthcare Applications
arXiv:2604.04334v1 Announce Type: cross Abstract: Researchers and practitioners are increasingly considering reinforcement learning to optimize decisions in complex domains like robotics and healthcare. To date, these efforts have largely utilized expectation-based learning. However, relying on expectation-focused objectives may be insufficient for making consistent decisions in highly uncertain situations involving multiple heterogeneous groups. While distributional reinforcement […]
Partial health status observability and time horizon uncertainty in mean-field game epidemiological models
arXiv:2604.04305v2 Announce Type: cross Abstract: We introduce Mean-Field Game (MFG) epidemiological models, in which immunity either wanes with time in a fully observable way or disappears instantaneously with no direct observation (making a previously recovered individual fully susceptible again without realizing it). Both interpretations create computational challenges for rational noninfected individuals deciding on their contact […]
Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning
arXiv:2603.29328v3 Announce Type: replace-cross Abstract: Backdoor attacks on federated learning (FL) are most often evaluated with synthetic corner patches or out-of-distribution (OOD) patterns that are unlikely to arise in practice. In this paper, we revisit the backdoor threat to standard FL (a single global model) under a more realistic setting where triggers must be semantically […]
How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models
arXiv:2604.04385v2 Announce Type: cross Abstract: This paper identifies a recurring sparse routing mechanism in alignment-trained language models: a gate attention head reads detected content and triggers downstream amplifier heads that boost the signal toward refusal. Using political censorship and safety refusal as natural experiments, the mechanism is traced across 9 models from 6 labs, all […]
“When to Hand Off, When to Work Together”: Expanding Human-Agent Co-Creative Collaboration through Concurrent Interaction
arXiv:2603.02050v4 Announce Type: replace-cross Abstract: As agents move into shared workspaces and their execution becomes visible, human-agent collaboration faces a fundamental shift from sequential delegation to concurrent co-creation. This raises a new coordination problem: what interaction patterns emerge, and what agent capabilities are required to support them? Study 1 (N=10) revealed that process visibility naturally […]
Lightweight liquid neural networks decipher salivary metabolic fingerprinting for high-risk periodontitis screening in diabetes
npj Digital Medicine, Published online: 07 April 2026; doi:10.1038/s41746-026-02593-7 Lightweight liquid neural networks decipher salivary metabolic fingerprinting for high-risk periodontitis screening in diabetes
The portability paradox of foundation models for clinical decision support
npj Digital Medicine, Published online: 07 April 2026; doi:10.1038/s41746-026-02615-4 Yakdan et al. demonstrate that foundation models (FMs) trained to predict cervical spondylotic myelopathy from electronic health record data outperform traditional models on internal datasets but lose their advantage during external validation. This suggests that the feature-dense patterns learned by FMs may reduce their portability across […]
Evaluation of artificial intelligence-generated vignettes depicting patient chatbot use in psychiatric contexts
npj Digital Medicine, Published online: 07 April 2026; doi:10.1038/s41746-026-02605-6 Evaluation of artificial intelligence-generated vignettes depicting patient chatbot use in psychiatric contexts
Interpretable machine learning models for stroke risk prediction in patients with newly diagnosed atrial fibrillation
npj Digital Medicine, Published online: 07 April 2026; doi:10.1038/s41746-026-02470-3 Interpretable machine learning models for stroke risk prediction in patients with newly diagnosed atrial fibrillation