Crisis support teams’ technological openness and learning attitudes toward the AI based virtual patient system crisis support VR

BackgroundAgainst the backdrop of escalating global humanitarian crises, innovative didactic simulations are becoming increasingly important. A promising alternative to traditional classroom-based didactics for learning psychological

Ensemble based in transfer learning for cytological classification in pleural fluid

Pleural effusion cytology is critical for diagnosing benign and malignant conditions, yet manual interpretation remains time-consuming and prone to subjectivity. The increasing burden of malignant

ChatGPT in healthcare: perceptions, ethical considerations, and practice implications among healthcare professionals in Ecuador and other countries in the Americas: a cross-sectional survey study

BackgroundGenerative artificial intelligence tools, such as ChatGPT, are increasingly discussed in healthcare; however, evidence from Latin American professional settings is limited and must be interpreted

Advancing women’s health through equity in quantitative sciences: promoting sex- and gender-based modeling in clinical trials and real-world studies

Post Content

Trauma-informed conversational agents for mental health: understanding user perspectives and experiences

IntroductionMental health conversational agents (MHCAs) offer scalable, accessible psychological support yet raise concerns about safety and appropriateness for trauma, exposed users. While trauma-informed care (TIC)

AGENTSERVESIM: A Hardware-aware Simulator for Multi-Turn LLM Agent Serving

June 9, 2026

arXiv:2606.09613v1 Announce Type: cross
Abstract: Multi-turn LLM agents interleave model calls with external tool invocations, shifting serving from stateless request processing to stateful program execution. Serving these workloads requires scheduling, KV-cache management, and routing policies that use program-level context, including turn dependencies, tool-induced gaps, and reusable KV state. Evaluating such policies directly on real systems is costly, since each design point may require dedicated accelerator time across arrival rates, model scales, serving-instance counts, and memory hierarchies. Simulation offers a scalable alternative, but existing LLM serving simulators target stateless request-level workloads and therefore omit the core dynamics of agent serving: multi-turn program execution, cross-turn cache locality, and KV-cache residency during tool gaps. We present AGENTSERVESIM, a hardware-aware simulator for multi-turn LLM agent serving. AGENTSERVESIM evaluates serving policies at program granularity through composable modules: a Program Orchestrator preserves program identity and turn order, a Tool Simulator materializes tool-induced gaps, a Session-Aware Router maintains program-to-instance affinity for cache-aware dispatch, and a KV Residency Model tracks policy-defined KV placement across HBM, host DRAM/CXL, and eviction. Across real serving deployments and hardware configurations, AGENTSERVESIM reproduces real-system behavior within 6% error across key performance metrics while running entirely on commodity CPUs. These results show that AGENTSERVESIM enables controlled, repeatable exploration of agent-serving policies without requiring exhaustive deployment on costly accelerators.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844