A Metamorphic Testing Approach to Diagnosing Memorization in LLM-Based Program Repair

arXiv:2604.21579v1 Announce Type: cross Abstract: LLM-based automated program repair (APR) techniques have shown promising results in reducing debugging costs. However, prior results can be affected by data leakage: large language models (LLMs) may memorize bug fixes when evaluation benchmarks overlap with their pretraining data, leading to inflated performance estimates. In this paper, we investigate whether […]

Fairness under uncertainty in sequential decisions

arXiv:2604.21711v1 Announce Type: cross Abstract: Fair machine learning (ML) methods help identify and mitigate the risk that algorithms encode or automate social injustices. Algorithmic approaches alone cannot resolve structural inequalities, but they can support socio-technical decision systems by surfacing discriminatory biases, clarifying trade-offs, and enabling governance. Although fairness is well studied in supervised learning, many […]

AEL: Agent Evolving Learning for Open-Ended Environments

arXiv:2604.21725v1 Announce Type: cross Abstract: LLM agents increasingly operate in open-ended environments spanning hundreds of sequential episodes, yet they remain largely stateless: each task is solved from scratch without converting past experience into better future behavior. The central obstacle is not emphwhat to remember but emphhow to use what has been remembered, including which retrieval […]

Probably Approximately Consensus: On the Learning Theory of Finding Common Ground

arXiv:2604.21811v1 Announce Type: cross Abstract: A primary goal of online deliberation platforms is to identify ideas that are broadly agreeable to a community of users through their expressed preferences. Yet, consensus elicitation should ideally extend beyond the specific statements provided by users and should incorporate the relative salience of particular topics. We address this issue […]

GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR

arXiv:2601.09361v3 Announce Type: replace-cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is a key paradigm for improving large-scale reasoning models. Unlike supervised fine-tuning (SFT), RLVR exhibits distinct optimization dynamics and is sensitive to the preservation of pre-trained geometric structures. However, existing parameter-efficient methods face key limitations in this regime. Low-rank adaptation methods, such as PiSSA, […]

A Multi-Stage Warm-Start Deep Learning Framework for Unit Commitment

arXiv:2604.21891v1 Announce Type: cross Abstract: Maintaining instantaneous balance between electricity supply and demand is critical for reliability and grid instability. System operators achieve this through solving the task of Unit Commitment (UC),ca high dimensional large-scale Mixed-integer Linear Programming (MILP) problem that is strictly and heavily governed by the grid physical constraints. As grid integrate variable […]

Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation

arXiv:2604.17656v2 Announce Type: replace-cross Abstract: Video-to-music (V2M) is the fundamental task of creating background music for an input video. Recent V2M models achieve audiovisual alignment by typically relying on visual conditioning alone and provide limited semantic and stylistic controllability to the end user. In this paper, we present Video-Robin, a novel text-conditioned video-to-music generation model […]

Survey on Evaluation of LLM-based Agents

arXiv:2503.16416v2 Announce Type: replace Abstract: LLM-based agents represent a paradigm shift in AI, enabling autonomous systems to plan, reason, and use tools while interacting with dynamic environments. This paper provides the first comprehensive survey of evaluation methods for these increasingly capable agents. We analyze the field of agent evaluation across five perspectives: (1) Core LLM […]

Deep FinResearch Bench: Evaluating AI’s Ability to Conduct Professional Financial Investment Research

arXiv:2604.21006v1 Announce Type: new Abstract: We introduce Deep FinResearch Bench, a practical and comprehensive evaluation framework for deep research (DR) agents in financial investment research. The benchmark assesses three dimensions of report quality: qualitative rigor, quantitative forecasting and valuation accuracy, and claim credibility and verifiability. Particularly, we define corresponding qualitative and quantitative evaluation metrics and […]

OpenEstimate: Evaluating LLMs on Reasoning Under Uncertainty with Real-World Data

arXiv:2510.15096v2 Announce Type: replace Abstract: Real-world settings where language models (LMs) are deployed — in domains spanning healthcare, finance, and other forms of knowledge work — require models to grapple with incomplete information and reason under uncertainty. Yet most LM evaluations focus on problems with well-defined answers and success criteria. This gap exists in part […]

Adaptive Test-Time Compute Allocation with Evolving In-Context Demonstrations

arXiv:2604.21018v1 Announce Type: new Abstract: While scaling test-time compute can substantially improve model performance, existing approaches either rely on static compute allocation or sample from fixed generation distributions. In this work, we introduce a test-time compute allocation framework that jointly adapts where computation is spent and how generation is performed. Our method begins with a […]

Deconstructing Superintelligence: Identity, Self-Modification and Diff’erance

arXiv:2604.19845v2 Announce Type: replace Abstract: Self-modification is often taken as constitutive of artificial superintelligence (SI), yet modification is a relative action requiring a supplement outside the operation. When self-modification extends to this supplement, the classical self-referential structure collapses. We formalise this on an associative operator algebra $mathcalA$ with update $hatU$, discrimination $hatD$, and self-representation $hatR$, […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844