arXiv:2605.26554v1 Announce Type: cross
Abstract: Contextual dueling bandits form a cornerstone of preference-based decision-making, with critical applications in
recommender systems and large language model alignment. However, standard algorithms rely on the idealized assumption
of immediate feedback, a condition frequently violated in real-world scenarios such as prompt optimization. This
setting introduces a unique theoretical challenge: unlike linear bandits, dueling bandit estimators lack closed-form
solutions, rendering naive adaptations of standard weighting techniques biased. To address this, we formalize the
problem of Contextual Dueling Bandits with Stochastic Delayed Feedback and propose two novel algorithms: Linear
(LDB-DF) and Neural (NDB-DF) Dueling Bandits with Delayed Feedback. Central to our approach is a novel estimator that
integrates an Inverse Probability Weighting (IPW) mechanism directly into the loss function, ensuring unbiased
correction for delayed or missing feedback. We provide comprehensive theoretical analysis, establishing an
O(d*sqrt(T)) regret bound for the linear setting and sub-linear guarantees for the neural setting. Extensive
experiments on both simulated and real-world datasets demonstrate the effectiveness of our propose.
Semantic Robustness Probing via Inpainting: An Interactive Tool for Safety-Critical Object Detection
arXiv:2605.27155v1 Announce Type: cross Abstract: Testing object detectors in safety-critical domains requires semantically meaningful probes beyond pixel-level corruptions. We present SemProbe, a tool for semantic



