arXiv:2601.05386v2 Announce Type: replace
Abstract: Cheating in chess, by using advice from powerful software, has become a major problem, reaching the highest levels. As opposed to the large majority of previous work, which concerned em detection of cheating, here we try to evaluate the possible gain in performance, obtained by cheating a limited number of times during a game. We develop threshold-based and Bellman-style intervention policies, and test them in a controlled engine-vs-engine setting using Stockfish. A judicious choice of 1 or 2 cheats yields average scores of 0.71 and 0.82, respectively, compared to 0.51 with no cheats. We also introduce a fast, engine-free simulator that enables hyperparameter optimization without running games, closely matching the engine-based optimum.
The goal of this work is not to assist cheaters, but to measure the effectiveness of cheating — which is crucial as part of the effort to contain and detect it.
Using GPT-4 to annotate the severity of all phenotypic abnormalities within the human phenotype ontology
IntroductionThe Human Phenotype Ontology (HPO) provides a unified framework cataloguing over 17,500 phenotypic abnormalities across more than 8,600 rare diseases, defining hierarchical relationships between them.