Boundary Mass and the Soft-to-Hard Limit in Mixture-of-Experts

arXiv:2605.02124v1 Announce Type: cross Abstract: Softmax-routed mixture-of-experts models approach hard routing as the temperature tends to zero, but this limit is singular near routing ties. This paper studies that singularity at the population level for squared-loss MoE regression. The central object is the emphboundary mass, namely the probability that the top two router scores are […]

LLM-Assisted Repository-Level Generation with Structured Spec-Driven Engineering

arXiv:2605.02455v1 Announce Type: cross Abstract: State-of-the-art Large Language Models (LLMs) excel in code generation at the function level. However, the output quality significantly declines when scaling to repository-level systems. Current workflows relying only on natural language prompts suffer from inherent ambiguity and a lack of verifiability. To address this, we propose structured spec-driven engineering (SSDE), […]

TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs

arXiv:2510.15545v4 Announce Type: replace-cross Abstract: Accelerating the inference of large language models (LLMs) has been a critical challenge in generative AI. Speculative decoding (SD) substantially improves LLM inference efficiency. However, its utility is limited by a fundamental constraint: the draft and target models must share the same vocabulary, thus limiting the herd of available draft […]

Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior

arXiv:2604.03401v3 Announce Type: replace-cross Abstract: Understanding student engagement usually requires time-consuming manual observation or invasive recording that raises privacy concerns. We present a privacy-preserving pipeline that analyzes classroom videos to extract insights about student attention, without storing any identifiable footage. Our system runs on a single GPU, using OpenPose for skeletal extraction and Gaze-LLE for […]

LITcoder: A General-Purpose Library for Building and Comparing Encoding Models

arXiv:2509.09152v2 Announce Type: replace-cross Abstract: We introduce LITcoder, an open-source library for building and benchmarking neural encoding models. Designed as a flexible backend, LITcoder provides standardized tools for aligning continuous stimuli (e.g., text and speech) with brain data, transforming stimuli into representational features, mapping those features onto brain data, and evaluating the predictive performance of […]

Intersectional Sycophancy: How Perceived User Demographics Shape False Validation in Large Language Models

arXiv:2604.11609v2 Announce Type: replace Abstract: Large language models exhibit sycophantic tendencies, but whether this behavior varies systematically with perceived user demographics is underexplored. Inspired by intersectionality (overlapping identities produce compounded effects), we probe whether frontier models conditionally exhibit sycophancy. Across 768 multi-turn conversations spanning 128 personas (varying race, age, gender, confidence) and three domains (mathematics, […]

DIPLI: Deep Image Prior Lucky Imaging for Blind Astronomical Image Restoration

arXiv:2503.15984v3 Announce Type: replace-cross Abstract: Modern image restoration and super-resolution methods utilize deep learning due to its superior performance compared to traditional algorithms. However, deep learning typically requires large labeled training datasets, which are rarely available in astrophotography. Deep Image Prior (DIP) bypasses this constraint by performing unsupervised optimization on a single image without training […]

InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation

arXiv:2512.21788v3 Announce Type: replace-cross Abstract: Parameter-Efficient Fine-Tuning of Diffusion Transformers (DiTs) for diverse, multi-conditional tasks often suffers from task interference when using monolithic adapters like LoRA. The Mixture of Low-rank Experts (MoLE) architecture offers a modular solution, but its potential is usually limited by routing policies that operate at a token level. Such local routing […]

LLM-VA: Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment

arXiv:2601.19487v2 Announce Type: replace-cross Abstract: Safety-aligned LLMs suffer from two failure modes: jailbreak (answering harmful inputs) and over-refusal (declining benign queries). Existing vector steering methods adjust the magnitude of answer vectors, but this creates a fundamental trade-off — reducing jailbreak increases over-refusal and vice versa. We identify the root cause: LLMs encode the decision to […]

Why Self-Supervised Encoders Want to Be Normal

arXiv:2604.27743v2 Announce Type: replace-cross Abstract: Self-supervised learning has achieved remarkable empirical success in learning robust representations without explicit labels, most recently demonstrated within the framework of Joint-Embedding Predictive Architectures (JEPA). However, a fundamental question remains: what analytical principles drive these encoders toward specific distributional states? In this paper, we demonstrate that the preference for normal […]

EdgeLPR: On the Deep Neural Network trade-off between Precision and Performance in LiDAR Place Recognition

arXiv:2605.02275v1 Announce Type: cross Abstract: Place recognition is essential for long-term autonomous navigation, enabling loop closure and consistent mapping. Although deep learning has improved performance, deploying such models on resource-constrained platforms remains challenging. This work explores efficient LiDAR-based place recognition for EdgeAI by leveraging Bird’s Eye View representations to enable lightweight image-based networks. We benchmark […]

Dependency Parsing Across the Resource Spectrum: Evaluating Architectures on High and Low-Resource Languages

arXiv:2605.02608v1 Announce Type: cross Abstract: Transformer-based models achieve state-of-the-art dependency parsing for high-resource languages, yet their advantage over simpler architectures in low-resource settings remains poorly understood. We evaluate four parsers — the Biaffine LSTM, Stack-Pointer Network, AfroXLMR-large, and RemBERT — across ten typologically diverse languages, with a focus on low-resource African languages. We find that […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844