Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding

arXiv:2511.03549v1 Announce Type: cross Abstract: Understanding the purpose of source code is a critical task in software maintenance, onboarding, and modernization. While large language models

Visualization Biases MLLM’s Decision Making in Network Data Tasks

arXiv:2511.03617v1 Announce Type: cross Abstract: We evaluate how visualizations can influence the judgment of MLLMs about the presence or absence of bridges in a network.

Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks

arXiv:2511.03328v1 Announce Type: cross Abstract: A recent advancement in Multimodal Large Language Models (MLLMs) research is the emergence of “reasoning MLLMs” that offer explicit control

Light over Heavy: Automated Performance Requirements Quantification with Linguistic Inducement

arXiv:2511.03421v1 Announce Type: cross Abstract: Elicited performance requirements need to be quantified for compliance in different engineering tasks, e.g., configuration tuning and performance testing. Much

RefAgent: A Multi-agent LLM-based Framework for Automatic Software Refactoring

arXiv:2511.03153v1 Announce Type: cross Abstract: Large Language Models (LLMs) have substantially influenced various software engineering tasks. Indeed, in the case of software refactoring, traditional LLMs

Sparse, self-organizing ensembles of local kernels detect rare statistical anomalies

November 6, 2025

arXiv:2511.03095v1 Announce Type: cross
Abstract: Modern artificial intelligence has revolutionized our ability to extract rich and versatile data representations across scientific disciplines. Yet, the statistical properties of these representations remain poorly controlled, causing misspecified anomaly detection (AD) methods to falter. Weak or rare signals can remain hidden within the apparent regularity of normal data, creating a gap in our ability to detect and interpret anomalies. We examine this gap and identify a set of structural desiderata for detection methods operating under minimal prior information: sparsity, to enforce parsimony; locality, to preserve geometric sensitivity; and competition, to promote efficient allocation of model capacity. These principles define a class of self-organizing local kernels that adaptively partition the representation space around regions of statistical imbalance. As an instantiation of these principles, we introduce SparKer, a sparse ensemble of Gaussian kernels trained within a semi-supervised Neyman–Pearson framework to locally model the likelihood ratio between a sample that may contain anomalies and a nominal, anomaly-free reference. We provide theoretical insights into the mechanisms that drive detection and self-organization in the proposed model, and demonstrate the effectiveness of this approach on realistic high-dimensional problems of scientific discovery, open-world novelty detection, intrusion detection, and generative-model validation. Our applications span both the natural- and computer-science domains. We demonstrate that ensembles containing only a handful of kernels can identify statistically significant anomalous locations within representation spaces of thousands of dimensions, underscoring both the interpretability, efficiency and scalability of the proposed approach.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844