OpenAI is throwing everything into building a fully automated researcher

OpenAI is refocusing its research efforts and throwing its resources into a new grand challenge. The San Francisco firm has set its sights on building

Mind-altering substances are (still) falling short in clinical trials

This week I want to look at where we are with psychedelics, the mind-altering substances that have somehow made the leap from counterculture to major

Intellectual Stewardship: Re-adapting Human Minds for Creative Knowledge Work in the Age of AI

arXiv:2603.18117v1 Announce Type: cross Abstract: Background: Amid the opportunities and risks introduced by generative AI, learning research needs to envision how human minds and responsibilities

MCP-38: A Comprehensive Threat Taxonomy for Model Context Protocol Systems (v1.0)

arXiv:2603.18063v1 Announce Type: cross Abstract: The Model Context Protocol (MCP) introduces a structurally distinct attack surface that existing threat frameworks, designed for traditional software systems

Agentic LLM Framework for Adaptive Decision Discourse

arXiv:2502.10978v2 Announce Type: replace Abstract: Effective decision-making in complex systems requires synthesizing diverse perspectives to address multifaceted challenges under uncertainty. This study introduces an agentic

Optimize Wider, Not Deeper: Consensus Aggregation for Policy Optimization

March 16, 2026

arXiv:2603.12596v1 Announce Type: cross
Abstract: Proximal policy optimization (PPO) approximates the trust region update using multiple epochs of clipped SGD. Each epoch may drift further from the natural gradient direction, creating path-dependent noise. To understand this drift, we can use Fisher information geometry to decompose policy updates into signal (the natural gradient projection) and waste (the Fisher-orthogonal residual that consumes trust region budget without first-order surrogate improvement). Empirically, signal saturates but waste grows with additional epochs, creating an optimization-depth dilemma. We propose Consensus Aggregation for Policy Optimization (CAPO), which redirects compute from depth to width: $K$ PPO replicates are optimized on the same batch, differing only in minibatch shuffling order, and then aggregated into a consensus. We study aggregation in two spaces: Euclidean parameter space, and the natural parameter space of the policy distribution via the logarithmic opinion pool. In natural parameter space, the consensus provably achieves higher KL-penalized surrogate and tighter trust region compliance than the mean expert; parameter averaging inherits these guarantees approximately. On continuous control tasks, CAPO outperforms PPO and compute-matched deeper baselines under fixed sample budgets by up to 8.6x. CAPO demonstrates that policy optimization can be improved by optimizing wider, rather than deeper, without additional environment interactions.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844