Disclosure in the era of generative artificial intelligence

Generative artificial intelligence (AI) has rapidly become embedded in academic writing, assisting with tasks ranging from language editing to drafting text and producing evidence. Despite

Based on dual perspectives of management and ethics: exploring challenges and governance approaches for new media applications in psychiatric specialty hospitals

The further promotion and application of new media technologies present new opportunities for psychiatric specialty hospitals in areas such as health education, doctor-patient communication, service

Stratified and combined analysis of the quality of lumbar spinal stenosis–related videos on major Chinese short video platforms

BackgroundLumbar spinal stenosis (LSS) is a degenerative disorder in which narrowing of the spinal canal compresses neural elements, causing pain, numbness, and limited mobility. With

Differential acceptance of a national digital health platform among community and frontline health workers in Cote d’Ivoire: a cross-sectional study

IntroductionMobile-based digital health solutions are critical technologies that play a significant role in improving the quality of healthcare services. Cote d’Ivoire is digitizing its community-based

A framework for culturally adapting mental mHealth apps

Mobile health (mHealth) apps are increasingly deployed for evidence-based mental health interventions, broadening access to care. While effective, Internet-based Cognitive Behavioural Therapy, delivered via web

ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks

April 14, 2026

arXiv:2604.10981v1 Announce Type: new
Abstract: ATANT v1.0 (arXiv:2604.06710) defined continuity as a system property with 7 required properties and introduced a 10-checkpoint, LLM-free evaluation methodology validated on a 250-story corpus. Since publication, a recurring reviewer and practitioner question has concerned not the framework itself but its relationship to a wider set of memory evaluations: LOCOMO, LongMemEval, BEAM, MemoryBench, Zep’s evaluation suite, Letta/MemGPT’s evaluations, and RULER. This companion paper, v1.1, does not modify the v1.0 standard. It closes a related-work gap that v1.0 left brief under page limits. We show by structural analysis that none of these benchmarks measures continuity as defined in v1.0: of the 7 required properties, the median existing eval covers 1 property, the mean covers 0.43 when partial credit is scored at 0.5, and no eval covers more than 2. We provide a cell-by-cell property-coverage matrix, identify methodological defects specific to each benchmark (including an empty-gold scoring bug in the LOCOMO reference implementation that renders 23% of its corpus unscorable by construction), and publish our reference implementation’s LOCOMO score (8.8%) alongside the structural reason that number is uninformative about continuity. We publish our 8.8% LOCOMO score alongside our 96% ATANT cumulative-scale score as a calibration pair: the 87-point divergence is evidence that the two benchmarks measure different properties, not that one system is an order of magnitude better than another. The position v1.1 takes is not adversarial: each benchmark measures a real capability. The claim is that none of them can adjudicate continuity, and conflating them with continuity evaluation has led the field to under-invest in the properties v1.0 names.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844