arXiv:2509.09151v2 Announce Type: replace-cross
Abstract: Research in video understanding has advanced rapidly, driven by increasingly diverse datasets and more powerful model architectures. While existing surveys typically organize progress by tasks, benchmarks, or model families, they provide limited insight into why particular architectures emerged and succeeded. In this survey, we argue that the evolution of video understanding is fundamentally shaped by dataset structure. We present a dataset-centric perspective that connects dataset structure, inductive biases, and architectural design within a unified framework. We show that different datasets require models to capture specific invariances and capabilities, such as robustness to viewpoint changes, sensitivity to temporal ordering, reasoning over long-range dependencies, relational interactions, and cross-modal alignment. These requirements naturally give rise to inductive biases, i.e., architectural assumptions that favor particular patterns of reasoning and generalization. From this perspective, milestone architectures, including two-stream networks, 3D CNNs, temporal models, transformers, graph-based methods, and multimodal foundation models, can be understood as architectural responses to the challenges posed by evolving datasets. Building on this framework, we systematically analyze how dataset characteristics have shaped architectural innovation across video understanding tasks and discuss the representational biases induced by different data regimes. By unifying datasets, inductive biases, and architectures into a coherent perspective, this survey offers both a retrospective explanation of the field’s evolution and a forward-looking roadmap toward general-purpose video understanding systems. Code and dynamic video visualizations of dataset-induced biases are available at https://time.griffith.edu.au/paper-sites/video-understanding/.
Wavelet analysis of human recombination rates demonstrates divergence on fine scales
Background: Recombination rates can be estimated across the genome, underpinning genetic analyses such as identification of regions under selection. Accurate recombination mapping requires observing a


