arXiv:2603.17075v1 Announce Type: cross
Abstract: Motivated by auto-proof generation and Valiant’s VP vs. VNP conjecture, we study the problem of discovering efficient arithmetic circuits to compute polynomials, using addition and multiplication gates. We formulate this problem as a single-player game, where an RL agent attempts to build the circuit within a fixed number of operations. We implement an AlphaZero-style training loop and compare two approaches: Proximal Policy Optimization with Monte Carlo Tree Search (PPO+MCTS) and Soft Actor-Critic (SAC). SAC achieves the highest success rates on two-variable targets, while PPO+MCTS scales to three variables and demonstrates steady improvement on harder instances. These results suggest that polynomial circuit synthesis is a compact, verifiable setting for studying self-improving search policies.
Three immunoregulatory signatures define non-productive HIV infection in CD4+ T memory stem cells
The persistent HIV reservoir constitutes the main obstacle to curing HIV/AIDS disease. Our understanding of how non-productive HIV infections are established in primary human CD4+




