Why Self-Supervised Encoders Want to Be Normal

arXiv:2604.27743v1 Announce Type: cross Abstract: We develop a geometric and information-theoretic framework for encoder-decoder learning built on the Information Bottleneck (IB) principle. Recasting IB as

  • Home
  • Uncategorized
  • How Much Heavy Lifting Can an Agent Harness Do?: Measuring the LLM’s Residual Role in a Planning Agent

arXiv:2604.07236v3 Announce Type: replace
Abstract: Agent harnesses — the stateful programs that wrap a language model and decide what it sees at each step — are now known to change end-to-end performance on a fixed model by as much as six times. That observation raises a question asked less often than it should be: once the harness is serious, how much of an agent’s competence does the harness itself already carry, and how much genuinely still needs the LLM? We study this in noisy Collaborative Battleship, a partially observable planning setting with belief update, information-gathering questions, and uncertainty-aware action selection. We externalize a planning harness into four progressively richer layers — posterior belief tracking, declarative planning, symbolic reflection, and an LLM-backed revision gate — and report per-layer contribution under a common runtime. We report emphwin rate as the primary, game-level metric and emphF1 as a secondary, local-targeting indicator, and pre-specify emphheavy lifting as the single largest positive marginal to the primary metric. Across 54 games, the declarative planning layer does most of the heavy lifting under this criterion, raising win rate from 50.0% (Wilson 95% CI $[37.1,62.9]$) to 74.1% ($[61.1,83.9]$) over a belief-only harness (+24.1pp, +0.017 F1). Symbolic reflection is mechanistically real but calibration-sensitive, shifting board-level outcomes by up to $pm0.140$ F1 without being net-positive on aggregate. LLM-backed revision activates on only 4.3% of turns at the strictest confidence threshold and yields a small, non-monotonic change (+0.005 F1, -3.7pp win rate). The contribution is methodological: once harness layers are made externally measurable, one can ask not only how far the harness already carries the agent, but also where the LLM’s role is actually residual rather than central.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844