arXiv:2511.16964v1 Announce Type: cross
Abstract: Maximizing performance on available GPU hardware is an ongoing challenge for modern AI inference systems. Traditional approaches include writing custom GPU kernels and using specialized model compilers to tune high-level code for specific GPU targets. Recent work shows that LLM-based multi-agent systems can effectively perform such tuning, often outperforming existing compilers and eliminating the need for manual kernel development. However, the dynamics of multi-agent systems for this task remain unexplored. In this work, we present a logical framework for comparing multi-agent PyTorch optimization systems. Our evaluation shows that exploit-heavy strategies perform best when paired with error-fixing agents, and that performance correlates with the granularity of optimization steps. The best implementation achieves an average 2.88x speedup on an H100 GPU across diverse tasks in KernelBench, a benchmark suite covering a range of machine learning architectures in PyTorch.
Trust and anxiety as primary drivers of digital health acceptance in multiple sclerosis: toward an extended disease-specific technology acceptance model
BackgroundDigital health applications and AI-supported wearables may benefit people with Multiple Sclerosis (MS), yet fluctuating cognitive and physical symptoms could shape adoption in ways not




