Dissociable contributions of cortical thickness and surface area to cognitive ageing: evidence from multiple longitudinal cohorts.

Cortical volume, a widely-used marker of brain ageing, is the product of two genetically and developmentally dissociable morphometric features: thickness and area. However, it remains

ATHILAfinder: a tool to detect ATHILA LTR retrotransposons in plant genomes

Motivation The ATHILA lineage of LTR retrotransposons has colonised all branches of the plant tree of life. In Arabidopsis thaliana and A. lyrata, ATHILA elements

Microfluidic low-input profiling reveals lncRNA roles in disease

Long noncoding RNAs (lncRNAs) regulate gene expression through binding to DNA, various RNAs, and proteins, playing potentially important but poorly understood roles in diseases. Existing

DNA Traces on the Shroud of Turin: Metagenomics of the 1978 Official Sample Collection

This research provides original insights into the diversity of DNA extracted from samples collected in 1978 from the Turin Shroud, revealing its biological complexity through

TEsingle enables locus-specific transposable element expression analysis at single-cell resolution

Transposable elements (TEs) are mobile genetic sequences that can generate new copies of themselves via insertional mutations. These viral-like sequences comprise nearly half the human

Video-EM: Event-Centric Episodic Memory for Long-Form Video Understanding

March 10, 2026

arXiv:2508.09486v2 Announce Type: replace-cross
Abstract: Video Large Language Models (Video-LLMs) have shown strong video understanding, yet their application to long-form videos remains constrained by limited context windows. A common workaround is to compress long videos into a handful of representative frames via retrieval or summarization. However, most existing pipelines score frames in isolation, implicitly assuming that frame-level saliency is sufficient for downstream reasoning. This often yields redundant selections, fragmented temporal evidence, and weakened narrative grounding for long-form video question answering. We present textbfVideo-EM, a training-free, event-centric episodic memory framework that reframes long-form VideoQA as emphepisodic event construction followed by emphmemory refinement. Instead of treating retrieved keyframes as independent visuals, Video-EM employs an LLM as an active memory agent to orchestrate off-the-shelf tools: it first localizes query-relevant moments via multi-grained semantic matching, then groups and segments them into temporally coherent events, and finally encodes each event as a grounded episodic memory with explicit temporal indices and spatio-temporal cues (capturing emphwhen, emphwhere, emphwhat, and involved entities). To further suppress verbosity and noise from imperfect upstream signals, Video-EM integrates a reasoning-driven self-reflection loop that iteratively verifies evidence sufficiency and cross-event consistency, removes redundancy, and adaptively adjusts event granularity. The outcome is a compact yet reliable emphevent timeline — a minimal but sufficient episodic memory set that can be directly consumed by existing Video-LLMs without additional training or architectural changes.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844