The AI Hype Index: AI gets booed in graduation season

It is one thing to say AI will change the world. It is another to expect the class of 2026 to applaud it. In fact,

Grimlock: Guarding High-Agency Systems with eBPF and Attested Channels

arXiv:2605.27488v1 Announce Type: cross Abstract: Agentic systems increasingly run user-authored orchestration code that invokes tools, spawns subtasks, and delegates work across machines and clouds. Although

Developing an Intelligent Job Recommendation System Using Semantic Retrieval and Explainable AI Techniques

arXiv:2605.27656v1 Announce Type: cross Abstract: Online recruitment platforms require recommendation methods capable of retrieving relevant job opportunities from large and heterogeneous collections of job postings.

Benchmarking Fairness in Spiking Neural Networks: Data Bias, Spurious Features, and Hardware Effects

arXiv:2605.27407v1 Announce Type: cross Abstract: Evaluating fairness in Spiking Neural Networks (SNNs) demands rigorous benchmarks that reflect real-world complexities, yet existing assessments remain limited by

Checking Fact with Better Retrieval: Dynamic Contrastive Learning for Evidence Retrieval

arXiv:2605.27449v1 Announce Type: cross Abstract: In the field of multimodal fact checking, the accuracy of retrieving evidence from different modalities has a significant impact on

One LR Doesn’t Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs

May 28, 2026

arXiv:2605.22297v3 Announce Type: replace-cross
Abstract: Learning rate configuration is a fundamental aspect of modern deep learning. The prevailing practice of applying a uniform learning rate across all layers overlooks the structural heterogeneity of Transformers, potentially limiting their effectiveness as the backbone of Large Language Models (LLMs). In this paper, we introduce Layerwise Learning Rate (LLR), an adaptive scheme that assigns distinct learning rates to individual Transformer layers. Our method is grounded in Heavy-Tailed Self-Regularization (HT-SR) theory, which characterizes the empirical spectral density (ESD) of weight correlation matrices to quantify heavy-tailedness. Layers with weaker heavy-tailedness are assigned larger learning rates to accelerate training, while layers with stronger heavy-tailedness receive smaller learning rates. By tailoring learning rates in this manner, LLR promotes more balanced training across layers, leading to faster convergence and improved generalization. Extensive experiments across architectures ranging from LLaMA to GPT-nano, optimizers including AdamW and Muon, and model scales from 60M to 3B parameters with up to 100B training tokens demonstrate the effectiveness of LLR. LLR achieves up to 1.5x training speedup and consistently outperforms uniform-learning-rate baselines. In particular, it improves the average zero-shot accuracy of 1B models from 47.09% to 49.02%, and that of 3B models from 48.58% to 50.61%. A key advantage of LLR is its low tuning overhead: it can transfer nearly optimal learning-rate settings directly from the uniform baseline. Code is available at https://github.com/hed-ucas/Layer-wise-Learning-Rate.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844