AI needs a strong data fabric to deliver business value

AI needs a strong data fabric to deliver business value

Artificial intelligence is moving quickly in the enterprise, from experimentation to everyday use. Organizations are deploying copilots, agents, and predictive systems across finance, supply chains,

Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control

arXiv:2604.19018v1 Announce Type: cross Abstract: Inference-time LLM alignment methods, particularly activation steering, offer an alternative to fine-tuning by directly modifying activations during generation. Existing methods,

Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph

arXiv:2604.18883v1 Announce Type: cross Abstract: Current AI-assisted programming tools are predominantly linear and chat-based, which deviates from the iterative and branching nature of programming itself.

Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

arXiv:2604.18955v1 Announce Type: cross Abstract: In this study, we present the first comprehensive evaluation of modern LLMs – including GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro,

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

arXiv:2604.19533v1 Announce Type: cross Abstract: We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core

POLAR: Online Learning for LoRA Adapter Caching and Routing in Edge LLM Serving

April 21, 2026

arXiv:2604.16583v1 Announce Type: cross
Abstract: Edge deployment of large language models (LLMs) increasingly relies on libraries of lightweight LoRA adapters, yet GPU/DRAM can keep only a small resident subset at a time. Serving a request through a non-resident adapter requires paging its weights from storage, incurring measurable latency. This creates a two-timescale online control problem: on a slow timescale, the system selects which adapters remain resident in fast memory, while on a fast timescale it routes each request to an adapter whose context-dependent utility is unknown a priori. The two decisions are tightly coupled: the cache determines the cost of exploration, and the router determines which adapters receive informative feedback. We formulate this joint caching-and-routing problem as a two-timescale contextual bandit and propose POLAR (Paging and Online Learning for Adapter Routing). POLAR pairs a cache-aware LinUCB router with an epoch-based cache controller. We study two variants. A fixed-epoch version provides a robust baseline with worst-case regret guarantees under arbitrary contexts. An epoch-doubling version, POLAR+, adds forced exploration and improved cache optimization to achieve $widetildemathcalO(dsqrtNT+sqrtKT)$ sublinear regret under stochastic regularity and cacheability conditions, where $N$ is the adapter count, $K$ the cache size, $d$ the context dimension, and $T$ the horizon. The routing term matches the standard contextual-bandit rate up to logarithmic factors, showing that the memory hierarchy does not fundamentally slow routing learning. Experiments using 15 real LoRA adapters for Qwen2.5-7B together with measured GPU paging latencies show that adaptive cache control substantially outperforms non-adaptive baselines and exhibits scaling trends consistent with the theory.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844