arXiv:2605.19141v1 Announce Type: cross Abstract: Large language models are increasingly deployed as automated judges to evaluate the strength of arguments. As this role expands, their legitimacy depends on consistency, transparency, and the ability to separate argumentative structure from rhetorical appeal. However, we show that holistic judging – a common LLM-as-a-Judge practice where a model provides […]
The Extremum Stack is a Minimal Sufficient Statistic for Rate-Independent Functionals: A Kolmogorov Complexity Characterisation
arXiv:2605.18885v1 Announce Type: cross Abstract: We prove that the extremum stack of a discrete sequence is a minimal sufficient statistic for the class of all computable, causal, rate-independent functionals, in the sense of Kolmogorov complexity. Specifically, we establish K(Pi_n) – O(1) <= K_R(u_0:n) <= K(Pi_n) + O(1), where K_R(u_0:n) is the length of the shortest […]
ESLD (External Surrogate Latent Defense): A Latent-Space Architecture for Faster, Stronger Prompt-Injection Defense
arXiv:2605.18918v1 Announce Type: cross Abstract: Modern AI assistants are agentic. To answer a single user request, the underlying language model pulls in information from many sources, such as web searches, retrieved documents, tool outputs, and user follow-ups, and reasons over them across several steps. Any of these inputs can carry malicious content. This opens the […]
Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise
arXiv:2605.18832v1 Announce Type: cross Abstract: The Transformer is the foundational building block of modern AI, yet offers no principled handling of emphuncertainty, which is prevalent in real applications: cold-start tokens with sparse histories in sequential recommendation, heterogeneous signal quality in language models, and attention sinks induced by unconstrained softmax. Every token is treated with uniform […]
First-Passage Prediction of Grokking Delay: ACalibrated Law under AdamW with Causal Validation
arXiv:2605.18845v1 Announce Type: cross Abstract: We give the first quantitative prediction of grokking delay under AdamW. Treating the delay as a first-passage time, we derive a closed-form law T_grok – T_mem = (1 / 2 kappa_LL eta lambda) log(V_mem / V_star), where V_t = ||theta_t||^2 is the squared parameter norm, V_star is an architecture-dependent threshold, […]
MO-CAPO: Multi-Objective Cost-Aware Prompt Optimization
arXiv:2605.18869v1 Announce Type: cross Abstract: Large language models (LLMs) achieve strong performance across a wide range of tasks but are highly sensitive to prompt design, motivating the need for automatic prompt optimization. Existing methods predominantly focus on performance alone, ignoring competing objectives such as inference cost or latency. At the same time, existing work on […]
Dynamic Model Merging Made Slim
arXiv:2605.18904v1 Announce Type: cross Abstract: Model merging enables the reuse of fine-tuned models without joint training or access to original data. Dynamic merging further improves flexibility by selectively activating task-relevant parameters and efficiently composing experts across multiple tasks. However, existing dynamic methods either maintain a full shared model with tiny experts or allocate excessive capacity […]
Harnessing Self-Supervised Features for Art Classification
arXiv:2605.18974v1 Announce Type: cross Abstract: Classifying artworks presents a significant challenge due to the complex interplay of fine-grained details and abstract features that condition the style or genre of an artwork. This paper presents a systematic investigation of the effectiveness of supervised and self-supervised backbones as feature extractors for both artwork classification and retrieval, with […]
Counterfactual Likelihood Tests for Indirect Influence in Private Reasoning Channels
arXiv:2605.19092v1 Announce Type: cross Abstract: Reasoning systems increasingly separate intermediate computation into private and public channels, creating evaluation cases that look similar in transcripts: independent co-derivation, direct access to private content, and indirect influence through public communication. This paper presents a counterfactual likelihood test for measuring influence between private reasoning channels. The method replaces an […]
Quantized Machine Learning Models for Medical Imaging in Low-Resource Healthcare Settings
arXiv:2605.19207v1 Announce Type: cross Abstract: Deep learning models have shown strong performance in medical image analysis, but deploying them in low-resource clinical environments remains difficult due to computational, memory, and power constraints. This paper presents a multi-strategy compression framework for brain tumor classification from MRI, encompassing quantization-aware training, knowledge distillation from a DenseNet-101 teacher to […]
INSIGHTS: Demonstration-Based Summaries of Time Series Predictors
arXiv:2605.18849v1 Announce Type: cross Abstract: Explainability methods have progressed rapidly, but global explanations for time-series models remain underdeveloped, with most approaches focusing on local, instance-level attributions. We introduce INSIGHTS, a model-agnostic, user-centric approach for providing global explanations of time series models. Our approach prioritizes simplicity, efficiency, and transparency in its design, ensuring that stakeholders can […]
From Sparsity to Simplicity: Enabling Simpler Sequential Replacements via Sparse Attention Distillation
arXiv:2605.18865v1 Announce Type: cross Abstract: Self-attention serves as the core foundation of large-scale transformer pretraining, but its quadratic token interaction cost makes inference expensive. Replacing attention with simpler sequential modules is appealing, yet naive substitution is often lossy, especially at larger scales. This paper revisits attention replacement through the lens of sparsity. Based on the […]