RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents

Directing the Narrative: A Finetuning Method for Controlling Coherence and Style in Story Generation

arXiv:2603.17295v1 Announce Type: cross Abstract: Story visualization requires generating sequential imagery that aligns semantically with evolving narratives while maintaining rigorous consistency in character identity and

Towards Unsupervised Adversarial Document Detection in Retrieval Augmented Generation Systems

arXiv:2603.17176v1 Announce Type: cross Abstract: Retrieval augmented generation systems have become an integral part of everyday life. Whether in internet search engines, email systems, or

SA-CycleGAN-2.5D: Self-Attention CycleGAN with Tri-Planar Context for Multi-Site MRI Harmonization

arXiv:2603.17219v1 Announce Type: cross Abstract: Multi-site neuroimaging analysis is fundamentally confounded by scanner-induced covariate shifts, where the marginal distribution of voxel intensities $P(mathbfx)$ varies non-linearly

Dependence Fidelity and Downstream Inference Stability in Generative Models

arXiv:2603.17041v1 Announce Type: cross Abstract: Recent advances in generative AI have led to increasingly realistic synthetic data, yet evaluation criteria remain focused on marginal distribution

Hidden Clones: Exposing and Fixing Family Bias in Vision-Language Model Ensembles

arXiv:2603.17111v1 Announce Type: cross Abstract: Ensembling Vision-Language Models (VLMs) from different providers maximizes benchmark accuracy, yet models from the same architectural family share correlated errors