• Home
  • Uncategorized
  • A patient-aware benchmarking of CNN and transformer architectures for breast cancer histopathology classification

IntroductionBreast cancer diagnosis using histopathological imaging remains a critical yet challenging task, requiring automated systems that generalize reliably across patients and varying imaging conditions. While deep learning models have shown strong performance, many prior studies employ image-wise data splits that introduce patient-level data leakage, resulting in overly optimistic and potentially misleading evaluations. This study aims to address this limitation by establishing a rigorous, leakage-free benchmarking framework for binary breast cancer histopathology classification.MethodsA comprehensive evaluation of nine deep learning architectures was conducted on the BreaKHis dataset, comprising 7,909 images from 82 patients. The models include six convolutional neural networks (ResNet50, MobileNetV2, VGG16, DenseNet121, Xception, and EfficientNetB0), one modern convolutional architecture (ConvNeXt), and two transformer-based models (Swin-Small and Swin-Base). A strict 5-fold patient-aware cross-validation protocol was implemented to ensure that images from the same patient were not shared between training and validation sets. All models were trained under identical experimental conditions. Performance was assessed using accuracy, precision, recall, and F1-score, reported as mean ± standard deviation. Statistical significance was evaluated using paired t-tests and Wilcoxon signed-rank tests with Bonferroni correction.ResultsAll evaluated architectures demonstrated comparable performance, achieving mean accuracies in the range of 0.91-0.93. ResNet50 achieved the highest mean accuracy (0.9267 ± 0.0435) and F1-score (0.9472), although differences among models were marginal. Statistical analysis confirmed that no pairwise differences were statistically significant (p > 0.05 after correction). Magnification-wise analysis indicated that intermediate resolutions (40× and 200×) provided more discriminative features, whereas higher magnification (400×) resulted in reduced performance due to limited contextual information.DiscussionThe findings highlight that, under a rigorously controlled and leakage-free evaluation protocol, architectural differences among modern deep learning models do not lead to statistically significant performance variations. Instead, evaluation design plays a more critical role in determining reliable outcomes. The proposed patient-aware benchmarking framework enhances reproducibility and provides a robust foundation for future research, supporting the development of clinically translatable AI systems for breast cancer diagnosis.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844