Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective

arXiv:2605.12969v3 Announce Type: replace-cross Abstract: Group Relative Policy Optimization (GRPO) is one of the most widely adopted RLVR algorithms for post-training large language models on reasoning tasks. We first show that GRPO admits an equivalent discriminative reformulation, in which policy optimization maximizes the expected score gap between verified positive and negative rollouts. This reformulation reveals […]

Position: Beyond Sensitive Attributes, ML Fairness Should Quantify Structural Injustice via Social Determinants

arXiv:2508.08337v3 Announce Type: replace-cross Abstract: Algorithmic fairness research has largely framed unfairness as discrimination along sensitive attributes. However, this approach limits visibility into unfairness as structural injustice instantiated through social determinants, which are contextual variables that shape attributes and outcomes without pertaining to specific individuals. This position paper argues that the field should quantify structural […]

ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition

arXiv:2601.19919v2 Announce Type: replace-cross Abstract: Knowledge distillation (KD) is one of the most effective paradigms for compressing large-scale foundation models into deployable architectures. In the context of Automatic Speech Recognition (ASR), previous studies have predominantly focused on forcing the student model to strictly mimic the predictive distribution of a massive teacher model. However, this static […]

REBot: From RAG to CatRAG with Semantic Enrichment and Graph Routing

arXiv:2510.01800v3 Announce Type: replace Abstract: Academic regulation advising is essential for helping students interpret and comply with institutional policies, yet building effective systems requires domain specific regulatory resources. To address this challenge, we propose REBot, an LLM enhanced advisory chatbot powered by CatRAG, a hybrid retrieval reasoning framework that integrates retrieval augmented generation with graph […]

LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition

arXiv:2605.24005v2 Announce Type: replace Abstract: The evolution of Large Language Model (LLM) reasoning is bottlenecked by the scarcity of high-quality process data. While self-alignment via endogenous rewards offers a solution, mining valid supervision faces three challenges: (1) Label Noise via Mimetic Bias, where rewards prioritize statistical likelihood over logical truth, creating a “correctness illusion” that […]

Identifying High-Confidence Social Biases in LLMs for Trustworthy Conversational Tutoring Agents

arXiv:2606.01584v1 Announce Type: cross Abstract: Conversational tutoring agents have been shown to improve learning engagement and student outcomes, and large language models (LLMs) are increasingly used in these systems to provide scalable, personalized feedback. However, LLMs may perpetuate or amplify stereotypical social biases, posing particular risks in educational settings. In this study, we evaluate LLMs […]

Self-Trained Verification for Training- and Test-Time Self-Improvement

arXiv:2605.30290v2 Announce Type: replace-cross Abstract: Self-improvement at scale has been a longstanding goal for reasoning models, and there are two natural places to do it: at test time, through verification-refinement (V-R) loops; and at training time, through self-training methods. Both are gated by the same bottleneck: the verifier. V-R loops stall when verifier scores inflate […]

Prior Availability in Industrial Visual Sim-to-Real: A Review of CAD-Guided and CAD-Unavailable Regimes

arXiv:2605.30581v2 Announce Type: replace-cross Abstract: Industrial visual sim-to-real is often described as transferring from synthetic images to real images, but industrial deployment usually involves a broader mismatch between available evidence and required decisions. A system may be built from CAD renderings, simulated RGB-D observations, normal reference images, synthetic defects, pretrained feature spaces, or language prompts, […]

Estimating Mutual Information between Time Series and Temporal Event Sequences Across Diverse Analysis Tasks

arXiv:2606.01602v1 Announce Type: cross Abstract: Pairwise dependence measures such as correlation and causality are fundamental to temporal data mining, yet there is still no principled and robust way to quantify dependence between heterogeneous data types, especially between continuous time series and discrete temporal event sequences. Existing approaches rely on ad hoc transformations or mutual-information estimators […]

FedMTFI: Feature Importance Based Optimized Multi Teacher Knowledge Distillation in Heterogeneous Federated Learning Environment

arXiv:2606.01607v1 Announce Type: cross Abstract: Federated learning (FL) is a decentralized approach that enables collaborative model training without exposing raw data. Instead of transferring sensitive data, it allows devices to share only model weights, keeping personal data locally and secure. However, in real world settings, the data held by devices is often not evenly distributed […]

CalArena: A Large-Scale Post-Hoc Calibration Benchmark

arXiv:2605.30188v2 Announce Type: replace-cross Abstract: Reliable probability estimates are critical in many machine learning applications, yet modern classifiers are often poorly calibrated. Post-hoc calibration provides a simple and widely used solution, but the large number of proposed methods, combined with small-scale and inconsistent evaluations, makes it difficult to determine which approaches are truly effective in […]

Agent Operating Systems (AOS): Integrating Agentic Control Planes into, and Beyond, Traditional Operating Systems

arXiv:2606.01508v1 Announce Type: cross Abstract: Traditional operating systems were designed around deterministic programs, explicit control flow, and human initiated workflows. Their core abstractions processes, threads, system calls, files, and permissions assume bounded behavior and predictable interaction patterns. Agentic AI systems introduce a different execution model: long-lived, goal-directed entities that reason probabilistically, invoke tools dynamically, and […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844