Dissociable contributions of cortical thickness and surface area to cognitive ageing: evidence from multiple longitudinal cohorts.

Cortical volume, a widely-used marker of brain ageing, is the product of two genetically and developmentally dissociable morphometric features: thickness and area. However, it remains

Animal collocation revisited: intercohort comparison and a case study comparing call combinations between sexes in common marmosets

Many animals communicate using sequences of signals, but identifying recurrent, non-random signal combinations remains methodologically challenging. Collocation analyses are increasingly popular approaches for detecting which

Helicase: Vectorized parsing and bitpacking of genomic sequences

Modern sequencing pipelines routinely produce billions of reads, yet the dominant storage formats (FASTQ and FASTA) are text-based and sequential, making high-throughput parsing a persistent

Ineffectual Genomic Error Correction Under Environmental Perturbation Dynamically Regulates Mutational Supply and Robustness

Adaptive evolution depends on the supply of heritable variation, yet excessive mutation threatens viability by degrading essential molecular functions. Here, we show that this trade-off

Three immunoregulatory signatures define non-productive HIV infection in CD4+ T memory stem cells

The persistent HIV reservoir constitutes the main obstacle to curing HIV/AIDS disease. Our understanding of how non-productive HIV infections are established in primary human CD4+

Spatial Transcriptomics as Images for Large-Scale Pretraining

March 19, 2026

arXiv:2603.13432v2 Announce Type: replace-cross
Abstract: Spatial Transcriptomics (ST) profiles thousands of gene expression values at discrete spots with precise coordinates on tissue sections, preserving spatial context essential for clinical and pathological studies. With rising sequencing throughput and advancing platforms, the expanding data volumes motivate large-scale ST pretraining. However, the fundamental unit for pretraining, i.e., what constitutes a single training sample, remains ill-posed. Existing choices fall into two camps: (1) treating each spot as an independent sample, which discards spatial dependencies and collapses ST into single-cell transcriptomics; and (2) treating an entire slide as a single sample, which produces prohibitively large inputs and drastically fewer training examples, undermining effective pretraining. To address this gap, we propose treating spatial transcriptomics as croppable images. Specifically, we define a multi-channel image representation with fixed spatial size by cropping patches from raw slides, thereby preserving spatial context while substantially increasing the number of training samples. Along the channel dimension, we define gene subset selection rules to control input dimensionality and improve pretraining stability. Extensive experiments show that the proposed image-like dataset construction for ST pretraining consistently improves downstream performance, outperforming conventional pretraining schemes. Ablation studies verify that both spatial patching and channel design are necessary, establishing a unified, practical paradigm for organizing ST data and enabling large-scale pretraining.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844