Identifying needs in adult rehabilitation to support the clinical implementation of robotics and allied technologies: an Italian national survey

IntroductionRobotics and technological interventions are increasingly being explored as solutions to improve rehabilitation outcomes but their implementation in clinical practice remains very limited. Understanding patient

Assessing nurses’ attitudes toward artificial intelligence in Kazakhstan: psychometric validation of a nine-item scale

BackgroundArtificial intelligence (AI) is increasingly integrated into healthcare, yet the attitudes and knowledge of nurses, who are the key mediators of AI implementation, remain underexplored.

Digital Health Technology Use Among Rehabilitation Professionals in China: Multi-Province Cross-Sectional Survey

Background: The rapid expansion of rehabilitation needs in China has intensified pressure on a workforce that remains unevenly distributed. Digital health technologies (DHTs) offer potential

Goal Setting and Anchoring Effects on Meditation Using a Digital Platform: Large-Scale Digital Field Study

Background: Meditation has grown in popularity in recent years, but many people who try meditation often fail to establish a habit. Goal setting has been

Innovations in Deaf Health Care Communication: Systematic Review of Sign Language Recognition Systems

Background: Deaf individuals often face communication challenges when interacting with those who can hear. Within health care settings, these challenges may pose risks to their

The Geometry of Compromise: Unlocking Generative Capabilities via Controllable Modality Alignment

April 2, 2026

arXiv:2604.00279v1 Announce Type: cross
Abstract: Vision-Language Models (VLMs) such as CLIP learn a shared embedding space for images and text, yet their representations remain geometrically separated, a phenomenon known as the modality gap. This gap limits tasks requiring cross-modal interchangeability, such as captioning and joint clustering. Existing post-processing approaches can partially improve cross-modal compatibility; however, we show through geometric analysis that they primarily reduce the global centroid offset while leaving the underlying distributional mismatch intact. We decompose the modality gap into a Centroid Gap and a Distribution Gap, and demonstrate that the Distribution Gap is the true predictor of cross-modal task quality ($R^2 = 0.986$), whereas the commonly used Raw Gap is misleading ($R^2 = 0.691$). Motivated by this observation, we propose TPC-CMA (Three-Phase Curriculum for Cross-Modal Alignment), a fine-tuning framework that explicitly reduces both components. The proposed CMA jointly mitigates centroid offsets and reshapes the distributional structure, while a three-phase curriculum with gradient-aware scheduling progressively introduces alignment during training to enable stable optimization. Experiments demonstrate that our method significantly improves cross-modal alignment. With $alpha_texttarget=0.05$, the modality gap is reduced by 66.6% with only 4.84% accuracy drop. Under stronger alignment ($alpha_texttarget=0.5$), the gap is reduced by 82.3%, clustering ARI improves from 0.318 to 0.516, and captioning CIDEr increases by 57.1% over the original model. Our code and pre-trained models will be made publicly available upon acceptance.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844