Explainable AI in kidney stone detection and segmentation: a mini review

Kidney stones are one of the most common renal disorders that can produce severe complications if not diagnosed and treated early. Recently, advances in AI

Feasibility testing of a home-based exercise intervention in children with cerebral palsy who are ambulant—a study protocol of the HOME-EX study

Children gain increased health and well-being by participating in physical activity. Children with cerebral palsy who are ambulatory (CP-A) are known to be less physically

Patient and clinician perceptions, expectations, and usability of ankle exoskeletons for daily living: a mixed-methods survey study

Ankle exoskeletons offer promising support for individuals with chronic foot drop, yet user and clinician perspectives on their use in daily living remain underexplored. Related

Why digital health fails silently: a sociotechnical theory of health information technology–related risk

IntroductionHealth information technology (HIT) is now integral to healthcare delivery, supporting clinical documentation, prescribing, diagnostics, and care coordination. Although these technologies offer substantial benefits, they

A maturity model framework for federated networks of trusted research environments

IntroductionA Trusted Research Environment (TRE) is a highly secure computer system where sensitive data is stored that researchers can access remotely and make use of

Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters

May 20, 2026

arXiv:2605.19523v1 Announce Type: cross
Abstract: Vision-Language Models (VLMs) have demonstrated remarkable proficiency in general multi-modal understanding; yet they struggle to efficiently acquire continually evolving domain-specific skills. Conventional approaches to enhancing VLM capabilities, such as Supervised Fine-Tuning (SFT), require extensive dataset curation and substantial computational resources. Model merging has emerged as an efficient alternative that enables the transfer of domain-specific expertise from Large Language Models (LLMs) to VLMs without incurring additional training data requirements or significant computational overhead. Unlike conventional merging of homogeneous LLMs, which mainly aggregates existing capabilities, cross-modal skill injection aims to induce emergent cross-modal capabilities by integrating a domain-expert LLM into a VLM. However, existing research lacks a systematic analysis of the applicability and methodology of cross-modal skill injection. In this study, we investigate cross-modal skill injection across three main aspects: scenarios, methods, and hyperparameters. For scenarios, we find that cross-modal skill injection generally performs well in instruction-following and cross-lingual settings, yet struggles with mathematical reasoning. For methods, we find that classic approaches such as TA and DARE consistently achieve superior performance over alternative merging methods. We also provide a systematic and quantitative analysis of the hyperparameter tuning that these classic methods critically depend on.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844