Week one of the Musk v. Altman trial: What it was like in the room

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. Two of

Tailoring AI solutions for health care needs

Tailoring AI solutions for health care needs

The AI market is full of big promises of grand transformation. Health care is a prime target for those promises, beset as it is by

Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving

arXiv:2605.00254v1 Announce Type: cross Abstract: Mixture-of-experts (MoE) architectures have turned LLM serving into a cluster-scale workload in which communication consumes a considerable portion of LLM

AlphaInventory: Evolving White-Box Inventory Policies via Large Language Models with Deployment Guarantees

arXiv:2605.00369v1 Announce Type: cross Abstract: We study how large language models can be used to evolve inventory policies in online, non-stationary environments. Our work is

BWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMs

arXiv:2605.00422v1 Announce Type: cross Abstract: Large language models (LLMs) have driven major progress in NLP, yet their substantial memory and compute demands still hinder practical

DynamicPO: Dynamic Preference Optimization for Recommendation

May 4, 2026

arXiv:2605.00327v1 Announce Type: cross
Abstract: In large language model (LLM)-based recommendation systems, direct preference optimization (DPO) effectively aligns recommendations with user preferences, requiring multi-negative objective functions to leverage abundant implicit-feedback negatives and sharpen preference boundaries. However, our empirical analyses reveal a counterintuitive phenomenon, preference optimization collapse, where increasing the number of negative samples can lead to performance degradation despite a continuously decreasing training loss. We further theoretically demonstrate that this collapse arises from gradient suppression, caused by the dominance of easily discriminable negatives over boundary-critical negatives that truly define user preference boundaries. As a result, boundary-relevant signals are under-optimized, weakening the model’s decision boundary. Motivated by these observations, we propose DynamicPO (Dynamic Preference Optimization), a lightweight and plug-and-play framework comprising two adaptive mechanisms: Dynamic Boundary Negative Selection, which identifies and prioritizes informative negatives near the model’s decision boundary, and Dual-Margin Dynamic beta Adjustment, which calibrates optimization strength per sample according to boundary ambiguity. Extensive experiments on three public datasets show that DynamicPO effectively prevents optimization collapse and improves recommendation accuracy on multi-negative preference optimization methods, with negligible computational overhead. Our code and datasets are available at https://github.com/xingyuHuxingyu/DynamicPO.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844