Kalmer, a specific based-App intervention for the treatment of Non-suicidal self-injury (NSSI): a technical and usability study in a non-clinical population

IntroductionNon-suicidal self-injury (NSSI), defined as the deliberate infliction of harm to oneself without suicidal intent, poses a significant and growing mental health concern worldwide, particularly

A bridge, not a destination: YouTube viewer perspectives on AI mental health support and human therapy

BackgroundArtificial intelligence (AI) tools are increasingly used for mental health support, yet little is known about how they are understood outside clinical trials and survey-based

Construction and prototype effect evaluation of a multi-agent collaborative system for operating room nursing

ObjectiveTo develop an operating room intelligent collaborative management system, define its intelligent auxiliary role for nursing teams, and evaluate its efficacy in process optimization, efficiency

Personalized vs. population-based speech models for multi-dimensional mental health prediction

IntroductionMental disorders such as depression, anxiety, and stress are increasingly prevalent, particularly among young adults. Traditional assessment methods rely on self-reports and resource-intensive clinician interviews,

SENTINEL-Chain: a blockchain-integrated privacy-preserving framework for secure healthcare data publishing

IntroductionElectronic health records (EHRs) are central to healthcare analytics, but their granularity increases re-identification risk when shared. Conventional privacy-preserving methods including k-anonymity, l-diversity, and differential

Executable World Models for ARC-AGI-3 in the Era of Coding Agents

June 9, 2026

arXiv:2605.05138v2 Announce Type: replace
Abstract: We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, and plans through the model before acting. The system is intentionally direct: it uses a scripted controller, predefined world-model interfaces, verifier programs, and a plan executor, but no hand-coded game-specific logic. The agent-facing prompts, workspace, and controller contain no game-specific code, game-specific prompts, hand-coded heuristics, hidden solutions, or other game-specific information; the same agent and prompts are used across games. Because the coding agent has broad system access, we audit unintended information channels, describe earlier vulnerable harnesses, and explain how the current harness closes observed leakage channels while reducing benchmark-specific information exposure. We report results on the 25 public ARC-AGI-3 games. Each playthrough starts from a fresh agent instance and clean workspace, with no access to files or conversation state from earlier playthroughs. With GPT-5.5 high reasoning effort, the agent fully solved 15 games and achieved a mean per-game RHAE of 58.12%. With GPT-5.4 high reasoning effort, it fully solved 8 games and achieved a mean per-game RHAE of 41.29%. Performance on the private validation set, which is not yet available to us, remains to be tested. Overall, the results provide preliminary evidence that verifier-driven executable world models are a promising approach for ARC-AGI-3 agents. Full run artifacts are released with the code at https://github.com/astroseger/arc-3-agents-baseline1.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844