Digital health tools and point solutions—pitfalls in population health program measurement

Digital health tools are generally poorly regulated and often lack strong research evidence, posing challenges for purchasers of point solutions such as employer groups and

Crisis support teams’ technological openness and learning attitudes toward the AI based virtual patient system crisis support VR

BackgroundAgainst the backdrop of escalating global humanitarian crises, innovative didactic simulations are becoming increasingly important. A promising alternative to traditional classroom-based didactics for learning psychological

Ensemble based in transfer learning for cytological classification in pleural fluid

Pleural effusion cytology is critical for diagnosing benign and malignant conditions, yet manual interpretation remains time-consuming and prone to subjectivity. The increasing burden of malignant

From Engel’s Bio-Psycho-Social model to the personalized health determinants model: a comprehensive framework and illustrative operationalization for precision health

Engel’s Bio-Psycho-Social (BPS) model (1977) reframed healthcare by integrating biological, psychological, and social perspectives. Despite its influence, the model has been criticized for insufficient specificity

Advancing women’s health through equity in quantitative sciences: promoting sex- and gender-based modeling in clinical trials and real-world studies

Post Content

Offline Semantic Guidance for Efficient Vision-Language-Action Policy Distillation

May 18, 2026

arXiv:2605.16241v1 Announce Type: cross
Abstract: Billion-parameter Vision-Language-Action (VLA) policies have recently shown impressive performance in robotic manipulation, yet their size and inference cost remain major obstacles for real-time closed-loop control. We introduce textbfVLA-AD, a distillation framework that uses a Vision-Language Model as an offline semantic supervisor to transfer large VLA teachers into lightweight student policies. Instead of relying only on low-level action imitation, VLA-AD augments teacher-provided 7-DoF action targets with high-level semantic guidance, including task phase anchors and multi-frame operating-direction descriptions. These auxiliary signals are used only during training: at test time, the student policy runs independently, with neither the VLA teacher nor the VLM required. We evaluate VLA-AD on three LIBERO benchmark suites. Using OpenVLA-7B as the teacher, our method produces a 158M-parameter student, yielding a $44times$ reduction in model size while matching the teacher with only a $0.27%$ average relative gap. The resulting policy runs at 12.5 Hz on an RTX 4090, achieving a $3.28times$ inference speedup over OpenVLA-7B. We further show that the same semantic distillation pipeline generalizes to a different $pi_0.5$-4B teacher, where the student outperforms the teacher on two suites and remains within $0.53%$ on textttlibero_goal. Additional analysis indicates that phase-level supervision and multi-frame directional cues make the student less sensitive to noisy teacher actions, such as erroneous high-frequency gripper changes. Overall, VLA-AD demonstrates that offline semantic guidance from VLMs can substantially improve the efficiency, robustness, and deployability of VLA policy distillation.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844