Portable automated rapid testing for auditory assessment: repeated at-home testing in older adults

IntroductionHearing challenges are prevalent in older adults and are associated with age-related cognitive decline. However, measuring age-related changes in hearing faces critical barriers related to

Why health information technology safety problems remain invisible

Post Content

Why digital health fails silently: a sociotechnical theory of health information technology–related risk

IntroductionHealth information technology (HIT) is now integral to healthcare delivery, supporting clinical documentation, prescribing, diagnostics, and care coordination. Although these technologies offer substantial benefits, they

Unburdening healthcare systems through telenursing in chronic respiratory disease management: a systematic review

Background/objectivesChronic respiratory diseases represent a major cause of morbidity/mortality and healthcare expenditure due to disease exacerbations, emergency department (ED) presentations, hospitalizations, and length of stay

Human-supervised, large language model-based clinical decision support aligned to national newborn protocols in Kenya: a pragmatic, early-stage evaluation

IntroductionTimely, protocol-adherent clinical decisions are crucial for reducing neonatal mortality in low-resource settings. Translating extensive national guidelines into bedside practice remains challenging.ObjectiveWe developed and evaluated

Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

May 22, 2026

arXiv:2605.07985v2 Announce Type: replace-cross
Abstract: Selecting the optimal LLM inference configuration requires evaluation across hardware, serving engines, attention backends, and model architectures, since no single choice performs best across all workloads. Profile-based simulators are the standard tool, yet they hardcode their operation set to a specific configuration and re-profile every operation from scratch, making exploration prohibitively expensive. This cost stems from a missing structural understanding: every input dimension of each operation is fixed by the model configuration or determined by the incoming request. Many model-configuration values (e.g., head size, layer count) recur across models, so the same operation runs in many configurations; a single sweep over the request-dependent dimensions can serve them all. We present Dooly, which exploits this structure to achieve configuration-agnostic, redundancy-aware profiling. Dooly performs a single inference pass, labels each input dimension with its origin via taint propagation, and selectively profiles only operations absent from its latency database; stateful operations such as attention are isolated by reusing the serving engine’s own initialization code, eliminating manual instrumentation. It builds latency regression models based on the database, which becomes a drop-in backend for existing simulators. Across two GPU platforms, three attention backends, and diverse model architectures, Dooly achieves simulation accuracy within 5% MAPE for TTFT and 8% for TPOT while reducing profiling GPU-hours by 56.4% across 12 models compared to the existing profiling approach. We have open-sourced Dooly at https://github.com/dooly-project.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844