• Home
  • Uncategorized
  • AInstein: Can LLMs Solve Research Problems From Parametric Memory Alone?

arXiv:2510.05432v2 Announce Type: replace
Abstract: Can large language models solve AI research problems using only their parametric knowledge, without fine-tuning, retrieval, or other external aids? We introduce AInstein, a framework for testing whether LLM agents can generate and refine solutions to research problems through iterative critique loops. A blind study with 20 domain experts on held-out ICLR 2026 problems validates our automated metrics, which we then scale to 1,214 ICLR 2025 papers using an LLM-as-a-judge paradigm. Two metrics capture complementary aspects of performance: Success Rate (does the solution address the problem?) and Rediscovery (does it match the published approach?). LLMs succeed on over 70% of problems, yet strictly rediscover the published solution less than 19% of the time, suggesting genuine problem-solving rather than associative recall. However, this ability has clear limits: models handle familiar methodological territory well but fail when solutions require cross-domain analogical transfer, a pattern we call the parametric knowledge boundary. On the ResearchPlanGen benchmark (2,645 problems), our training-free iterative refinement strategy matches RL finetuning, and a criteria-coverage analysis pins down the ceiling of what test-time refinement alone can achieve. Together, these findings map both the capabilities and the limits of LLMs as autonomous scientific problem-solvers.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844