arXiv:2605.11538v1 Announce Type: cross
Abstract: Group Relative Policy Optimization (GRPO) has emerged as a promising approach for improving the reasoning capabilities of large language models. However, it struggles to effectively balance the tradeoff between exploration and exploitation during training, often resulting in suboptimal performance. Motivated by the theoretical insight that changes in entropy are governed by the covariance between token probabilities and their corresponding advantages, we propose a hyperparameter-free, covariance-weighted optimization method that dynamically down-weights extreme token-level updates via a Gaussian kernel. This approach automatically reduces the instability caused by exploration-exploitation trade-off while preserving informative learning signals. Extensive empirical evaluations show that our approach improves downstream performance across reasoning benchmarks compared with GRPO, and effectively stablizes entropy as training progresses.
Diabetic Retinopathy Classification using Downscaling Algorithms and Deep Learning
arXiv:2605.11430v1 Announce Type: cross Abstract: Diabetic Retinopathy (DR) is an art and science of recording and classifying the retinal images of a diabetic patient. DR


