Creating psychological safety in the AI era

Creating psychological safety in the AI era

Rolling out enterprise-grade AI means climbing two steep cliffs at once. First, understanding and implementing the tech itself. And second, creating the cultural conditions where

Why it’s time to reset our expectations for AI

Can I ask you a question: How do you feel about AI right now? Are you still excited? When you hear that OpenAI or Google

CTIGuardian: A Few-Shot Framework for Mitigating Privacy Leakage in Fine-Tuned LLMs

arXiv:2512.12914v1 Announce Type: cross Abstract: Large Language Models (LLMs) are often fine-tuned to adapt their general-purpose knowledge to specific tasks and domains such as cyber

DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch Decoders

arXiv:2512.13690v1 Announce Type: cross Abstract: Video diffusion models have revolutionized generative video synthesis, but they are imprecise, slow, and can be opaque during generation —

From Moderation to Mediation: Can LLMs Serve as Mediators in Online Flame Wars?

arXiv:2512.03005v2 Announce Type: replace Abstract: The rapid advancement of large language models (LLMs) has opened new possibilities for AI for good applications. As LLMs increasingly

CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving

December 16, 2025

arXiv:2512.11920v1 Announce Type: new
Abstract: Large Language Models (LLMs) have revolutionized natural language processing tasks, but their deployment in datacenter environments faces significant challenges due to the massive memory requirements of key-value (KV) caches. During the autoregressive decoding process, KV caches consume substantial GPU memory, limiting batch sizes and overall system throughput. To address these challenges, we propose textbfCXL-SpecKV, a novel disaggregated KV-cache architecture that leverages Compute Express Link (CXL) interconnects and FPGA accelerators to enable efficient speculative execution and memory disaggregation. Our approach introduces three key innovations: (i) a CXL-based memory disaggregation framework that offloads KV-caches to remote FPGA memory with low latency, (ii) a speculative KV-cache prefetching mechanism that predicts and preloads future tokens’ cache entries, and (iii) an FPGA-accelerated KV-cache compression and decompression engine that reduces memory bandwidth requirements by up to 4$times$. When evaluated on state-of-the-art LLM models, CXL-SpecKV achieves up to 3.2$times$ higher throughput compared to GPU-only baselines, while reducing memory costs by 2.8$times$ and maintaining accuracy. Our system demonstrates that intelligent memory disaggregation combined with speculative execution can effectively address the memory wall challenge in large-scale LLM serving. Our code implementation has been open-sourced at https://github.com/FastLM/CXL-SpecKV.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844