The Incommensurability Principle in Biological Transport

arXiv:2605.03219v1 Announce Type: cross Abstract: Biological vascular networks exhibit branching exponents ($alpha^* approx 2.72$) conserved across developmental stages and observed in multiple mammalian species [Kassab

Partially Observed Structural Causal Models

arXiv:2605.03268v1 Announce Type: cross Abstract: Here we introduce Partially Observed Structural Causal Models (POSCMs) that formalize causal systems where latent contexts co-determine both the interaction

Approaching human parity in the quality of automated organoid image segmentation

arXiv:2605.03053v1 Announce Type: cross Abstract: Organoids are complex, three dimensional, self-organizing cell cultures which manifest organ-like features and represent a powerful platform for studying human

ARISE: A Repository-level Graph Representation and Toolset for Agentic Fault Localization and Program Repair

arXiv:2605.03117v1 Announce Type: cross Abstract: Repository-level fault localization (FL) and automated program repair (APR) require an agent to identify the relevant code units across files,

Exploring Pass-Rate Reward in Reinforcement Learning for Code Generation

arXiv:2605.02944v1 Announce Type: cross Abstract: Reinforcement learning (RL) from unit-test feedback has become a standard post-training recipe for improving large language models (LLMs) on code

ValueAlpha: Agreement-Gated Stress Testing of LLM-Judged Investment Rationales Before Returns Are Observable

April 29, 2026

arXiv:2604.25224v1 Announce Type: new
Abstract: Long-horizon investment decisions create a pre-realization evaluation problem: realized returns are the eventual arbiter of investment quality, but they arrive too late and are too noisy to guide many model-development and governance decisions. LLM judges offer a tempting substitute for pre-deployment evaluation of AI-finance systems, but unvalidated judges may reward verbosity, confidence, or rubric mimicry rather than financial judgment. This paper introduces textbfValueAlpha, a preregistered agreement-gated stress-test protocol for deciding when LLM-judged investment-rationale claims are publishable, qualified, or invalid.
In a controlled market-state capital-allocation prototype with 1,000 honest decision cycles and 100 preregistered adversarial controls (1,100 trajectories, 5,500 judge calls), ValueAlpha clears the aggregate agreement gate at (barkappa_w = 0.7168) but prevents several overclaims. Lower-rank systems collapse into a tie-class, one rubric dimension fails the per-dimension gate (textttconstraint_awareness, (barkappa_w = 0.2022)), single-judge rankings are family-dependent, and terse-correct rationales receive a (Delta = -2.81) rubric-point penalty relative to honest rationales. A targeted anchor-specificity probe further shows that financial constructs such as constraint awareness are operationally load-bearing.
The contribution is therefore not a leaderboard and not a claim to measure true investment skill. ValueAlpha is a pre-calibration metrology layer for AI-finance evaluation: it determines whether a proposed LLM-judge-based investment-rationale claim is stable enough, agreed enough, and uncontaminated enough to be reported at all.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844