OptoLoop: An optogenetic tool to probe the functional role of genome organization

The genome folds inside the cell nucleus into hierarchical architectural features, such as chromatin loops and domains. If and how this genome organization influences the

Integrating Longitudinal Metabolite Profiles Improves Trait Prediction in Pigs in a Trait- and Timepoint-Dependent Manner

Background Accurate prediction of genetic merit is essential for accelerating genetic improvement in pigs, particularly for traits that are costly or difficult to measure directly.

A De Novo Algorithm for Allele Reconstruction from Oxford Nanopore Amplicon Reads, with Application to CYP2D6

The Oxford Nanopore Technologies’ sequencing platform offers a path towards bedside genomics, producing long reads that can completely cover a gene of interest, and thus

Efficacy of Minnelide in a Next-Generation Dual-Recombinase Regulated Genetically Engineered Mouse Model of CIC::DUX4 Sarcoma

CIC::DUX4 sarcoma (CDS) is a lethal cancer driven by a fusion between tumor suppressor Capicua (CIC) and pioneer transcription factor double homeobox 4 (DUX4). To

AI-assisted Image-Based Phenotyping Reveals Genetic Architecture of Pod Traits in Mungbean (Vigna radiata L.)

Mungbean (Vigna radiata (L.) R. Wilczek) is a vital source of digestible proteins and is well-suited for the plant-based protein industry. In this study, we

Autocomp: A Powerful and Portable Code Optimizer for Tensor Accelerators

November 7, 2025

arXiv:2505.18574v5 Announce Type: replace-cross
Abstract: Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today’s computing landscape. However, even with significant efforts in building compilers, programming these tensor accelerators remains challenging, leaving much of their potential underutilized. Recently, large language models (LLMs), trained on large amounts of code, have shown significant promise in code generation and optimization tasks, but generating low-resource languages, such as specialized tensor accelerator code still poses a significant challenge. We tackle this challenge with Autocomp, an approach that empowers accelerator programmers to leverage domain knowledge and hardware feedback to optimize code via an automated LLM-driven search. We accomplish this by: 1) formulating each optimization pass as a structured two-phase prompt, divided into planning and code generation phases, 2) inserting domain knowledge during planning via a concise and adaptable optimization menu, and 3) integrating correctness and performance metrics from hardware as feedback at each search iteration. Across three distinct hardware platforms, we demonstrate that Autocomp-optimized code runs 5.6x faster than the vendor-provided library (Gemmini), outperforms expert-level hand-tuned code by 1.9x (AWS Trainium), and achieves 3.8x higher performance than a machine learning-based cost model for GPUs (NVIDIA L40S). Additionally, we demonstrate that optimization schedules generated from Autocomp can be reused across similar tensor operations, improving speedups by up to 24% under a fixed sample budget.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844