Improving Fine-Grained Rice Leaf Disease Detection via Angular-Compactness Dual Loss Learning

arXiv:2603.25006v1 Announce Type: cross Abstract: Early detection of rice leaf diseases is critical, as rice is a staple crop supporting a substantial share of the

Pixelis: Reasoning in Pixels, from Seeing to Acting

arXiv:2603.25091v1 Announce Type: cross Abstract: Most vision-language systems are static observers: they describe pixels, do not act, and cannot safely improve under shift. This passivity

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

arXiv:2603.24857v1 Announce Type: cross Abstract: As machine learning (ML) systems expand in both scale and functionality, the security landscape has become increasingly complex, with a

TIGFlow-GRPO: Trajectory Forecasting via Interaction-Aware Flow Matching and Reward-Driven Optimization

arXiv:2603.24936v1 Announce Type: cross Abstract: Human trajectory forecasting is important for intelligent multimedia systems operating in visually complex environments, such as autonomous driving and crowd

Grokking as a Falsifiable Finite-Size Transition

arXiv:2603.24746v1 Announce Type: cross Abstract: Grokking — the delayed onset of generalization after early memorization — is often described with phase-transition language, but that claim

Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning

March 26, 2026

arXiv:2603.14867v2 Announce Type: replace-cross
Abstract: Many strategic decision-making problems, such as environment design for warehouse robots, can be naturally formulated as bi-level reinforcement learning (RL), where a leader agent optimizes its objective while a follower solves a Markov decision process (MDP) conditioned on the leader’s decisions. In many situations, a fundamental challenge arises when the leader cannot intervene in the follower’s optimization process; it can only observe the optimization outcome. We address this decentralized setting by deriving the hypergradient of the leader’s objective, i.e., the gradient of the leader’s strategy that accounts for changes in the follower’s optimal policy. Unlike prior hypergradient-based methods that require extensive data for repeated state visits or rely on gradient estimators whose complexity can increase substantially with the high-dimensional leader’s decision space, we leverage the Boltzmann covariance trick to derive an alternative hypergradient formulation. This enables efficient hypergradient estimation solely from interaction samples, even when the leader’s decision space is high-dimensional. Additionally, to our knowledge, this is the first method that enables hypergradient-based optimization for 2-player Markov games in decentralized settings. Experiments highlight the impact of hypergradient updates and demonstrate our method’s effectiveness in both discrete and continuous state tasks.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844