Short window attention enables long-term memorization

arXiv:2509.24552v3 Announce Type: replace-cross Abstract: Recent works show that hybrid architectures combining local sliding window attention layers and global attention layers outperform either of these architectures taken separately. However, the impact of the window length and the interplay between local layers and global layers remain under-studied. In this work, we first analyze the interaction between […]

Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning

arXiv:2604.21999v3 Announce Type: replace-cross Abstract: We study learned memory tokens as a computational scratchpad for a single-block Universal Transformer with Adaptive Computation Time (ACT) on Sudoku-Extreme, a combinatorial reasoning benchmark. Memory tokens are empirically necessary: no configuration without them reaches non-trivial performance. The optimal count has a sharp lower threshold (T=0 always fails, T=8 reliably […]

From Laboratory to Real-World Applications: Benchmarking Agentic Code Reasoning at the Repository Level

arXiv:2601.03731v3 Announce Type: replace-cross Abstract: As large language models (LLMs) evolve into autonomous agents, evaluating repository-level reasoning, the ability to maintain logical consistency across massive, real-world, interdependent file systems, has become critical. Current benchmarks typically fluctuate between isolated code snippets and black-box evaluations. We present RepoReason, a white-box diagnostic benchmark centered on abductive assertion verification. […]

RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs

arXiv:2605.01913v1 Announce Type: cross Abstract: Fine-tuning safety-aligned language models for downstream tasks often leads to substantial degradation of refusal behavior, making models vulnerable to adversarial misuse. While prior work has shown that safety-relevant features are encoded in structured representations within the model’s activation space, how these representations change during fine-tuning and why alignment degrades remains […]

A Synthesizable RTL Implementation of Predictive Coding Networks

arXiv:2603.18066v2 Announce Type: replace-cross Abstract: Backpropagation has enabled modern deep learning but is difficult to realize as an online, fully distributed hardware learning system due to global error propagation, phase separation, and heavy reliance on centralized memory. Predictive coding offers an alternative in which inference and learning arise from local prediction-error dynamics between adjacent layers. […]

Short-wave signal versus indirect prey-taxis

arXiv:2604.20469v2 Announce Type: replace-cross Abstract: We address a short-wave asymptotic for one class of quasi-linear second-order PDE systems involving the cross-diffusion described by the so-called Patlak-Keller-Segel law. It is common to employ these equations for modeling the predator-prey community with the prey-taxis that means the interactions of two species of particles or cells or anything […]

MetaErr: Towards Predicting Error Patterns in Deep Neural Networks

arXiv:2604.23289v2 Announce Type: replace-cross Abstract: Due to the unprecedented success of deep learning, it has become an integral component in several multimedia computing applications in todays world. Unfortunately, deep learning systems are not perfect and can fail, sometimes abruptly, without prior warning or explanation. While reducing the error rate of deep neural networks has been […]

Stochastic Sparse Attention for Memory-Bound Inference

arXiv:2605.01910v1 Announce Type: cross Abstract: Autoregressive decoding becomes bandwidth-limited at long contexts, as generating each token requires reading all $n_k$ key and value vectors from KV cache. We present Stochastic Additive No-mulT Attention (SANTA), a method that sparsifies value-cache access by sampling $S ll n_k$ indices from the post-softmax distribution and aggregates only those value […]

The Conversations Beneath the Code: Triadic Data for Long-Horizon Software Engineering Agents

arXiv:2605.02244v1 Announce Type: cross Abstract: Frontier software engineering agents have saturated short-horizon benchmarks while regressing on the work that constitutes senior engineering: long-horizon, multi-engineer, ambiguous-specification deliverables. This paper takes a position on what training data is needed to close the gap. The substrate for the next generation of SWE agents is neither larger GitHub scrapes […]

Behavior-Grounded Lane Representation Learning for Multi-Task Traffic Digital Twins

arXiv:2605.01901v1 Announce Type: cross Abstract: Traffic digital twins are powerful tools for advanced traffic management, and most systems are built on static geometric representations. However, these representations fail to capture the dynamic functional semantics required for behavior-aware reasoning, such as how a lane operates under complex traffic conditions. To address this gap, we introduce GeoLaneRep, […]

FEAT: Fashion Editing and Try-On from Any Design

arXiv:2605.02393v1 Announce Type: cross Abstract: Fashion design aims to express a designer’s creative intent and to depict how garments interact with the human body. Recent methods condition on multimodal inputs to support garment editing and virtual try-on. However, existing methods still (i) confine design to garment-related images, excluding creative design sources such as artwork, abstract […]

Improving LLM Code Reasoning via Semantic Equivalence Self-Play with Formal Verification

arXiv:2604.17010v2 Announce Type: replace-cross Abstract: We introduce a self-play framework for semantic equivalence in Haskell, utilizing formal verification to guide adversarial training between a generator and an evaluator. The framework leverages Liquid Haskell proofs for validating equivalence and execution-based counterexamples for inequivalence, organized via a difficulty-aware curriculum. To facilitate this, we release textbfOpInstruct-HSx, a synthetic […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844