arXiv:2602.18956v2 Announce Type: replace
Abstract: We introduce INDUCTION, a benchmark for finite structure concept synthesis in first order logic. Given small finite relational worlds with extensionally labeled target predicates, models must output a single first order logical formula that explains the target uniformly across worlds, with correctness verified via exact model checking. The benchmark includes three regimes, FullObs, CI (contrastive), and EC (existential completion), nd penalizes formula bloat. We find sharp difficulty gradients, persistent hard structural families, and observe that low bloat formulas generalize far better on held out worlds. Elite recent models show qualitatively different behaviors across tasks and performance metrics, hinting to their different strategies of concept generalization.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844