Understanding the value of virtual care technologies: development of a framework in the veterans health administration

IntroductionHealthcare systems, including the Veterans Health Administration (VHA), are facing tremendous growth in virtual care technologies that are intended to foster connections between patients, informal

A pilot study of human–AI conversational interaction and its impact on loneliness and wellbeing

IntroductionWith the growing accessibility of advanced artificial intelligence (AI) chatbots, there is a need to understand their impact on users’ psychological wellbeing. This pilot study

Using GPT-4 to annotate the severity of all phenotypic abnormalities within the human phenotype ontology

IntroductionThe Human Phenotype Ontology (HPO) provides a unified framework cataloguing over 17,500 phenotypic abnormalities across more than 8,600 rare diseases, defining hierarchical relationships between them.

Portable automated rapid testing for auditory assessment: repeated at-home testing in older adults

IntroductionHearing challenges are prevalent in older adults and are associated with age-related cognitive decline. However, measuring age-related changes in hearing faces critical barriers related to

Why health information technology safety problems remain invisible

Post Content

Anatomy-Guided Vision-Language Learning with Angular Prototype Separation for Multi-Label Video Capsule Endoscopy Classification Under Class Imbalance

May 25, 2026

arXiv:2603.17879v2 Announce Type: replace-cross
Abstract: This work presents a multi-label temporal event detection framework for video capsule endoscopy (VCE) that addresses the extreme class imbalance inherent in the Galar dataset by combining two principal contributions: an Angular Separation Loss on class prototypes and a Biological State Machine temporal decoder. The backbone remains BiomedCLIP, a biomedical vision-language foundation model. Three consecutive frames are fused through a Local Differencing Attention module that amplifies transient pathological signals by suppressing static temporal redundancy. An Anatomy Context Head then conditions pathological predictions on soft anatomical activations, exploiting the known spatial co-occurrence structure of GI findings. Learnable text-feature prompts and prototype-based logit augmentation are trained alongside an Angular Separation Loss that penalizes off-diagonal cosine similarity between class prototypes, preventing the prototype collapse that afflicts rare classes under extreme imbalance. To counteract the skewed label distribution, the training regime combines asymmetric focal loss, inverse-frequency weighted sampling, temporal Mixup, Exponential Moving Average, and per-class threshold calibration. The Biological State Machine decoder replaces naive gap merging with a physiologically grounded forward-only state transition over anatomy labels, eliminating the fragmentation artefact that produced hundreds of spurious anatomy events per video in the prior approach and reducing per-video anatomy output to 2–3 clinically realistic events. On the held-out RARE-VISION test set comprising three NaviCam examinations (161,025 frames), the updated pipeline achieves an overall temporal mAP@0.5 of 0.3597 and mAP@0.95 of 0.3399, representing a relative improvement of 46% and 44% respectively over the prior submission, with total inference completed in approximately 21 minutes on a single GPU.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844