arXiv:2603.13651v1 Announce Type: cross Abstract: Bibliographic reference extraction and parsing are foundational for citation indexing, linking, and downstream scholarly knowledge-graph construction. However, most established evaluations focus on clean, English, end-of-document bibliographies, and therefore underrepresent the Social Sciences and Humanities (SSH), where citations are frequently multilingual, embedded in footnotes, abbreviated, and shaped by heterogeneous historical conventions. […]
A Gauge Theory of Superposition: Toward a Sheaf-Theoretic Atlas of Neural Representations
arXiv:2603.00824v2 Announce Type: replace-cross Abstract: We develop a discrete gauge-theoretic framework for superposition in large language models (LLMs) that replaces the single-global-dictionary premise with a sheaf-theoretic atlas of local semantic charts. Contexts are clustered into a stratified context complex; each chart carries a local feature space and a local information-geometric metric (Fisher/Gauss-Newton) identifying predictively consequential […]
ManiBench: A Benchmark for Testing Visual-Logic Drift and Syntactic Hallucinations in Manim Code Generation
arXiv:2603.13251v1 Announce Type: new Abstract: Traditional benchmarks like HumanEval and MBPP test logic and syntax effectively, but fail when code must produce dynamic, pedagogical visuals. We introduce ManiBench, a specialized benchmark evaluating LLM performance in generating Manim CE code, where temporal fidelity and version-aware API correctness are critical. ManiBench targets two key failure modes: Syntactic […]
Level Up: Defining and Exploiting Transitional Problems for Curriculum Learning
arXiv:2603.13761v1 Announce Type: cross Abstract: Curriculum learning–ordering training examples in a sequence to aid machine learning–takes inspiration from human learning, but has not gained widespread acceptance. Static strategies for scoring item difficulty rely on indirect proxy scores of varying quality and produce curricula that are not specific to the learner at hand. Dynamic approaches base […]
The Big Send-off: Scalable and Performant Collectives for Deep Learning
arXiv:2504.18658v2 Announce Type: replace-cross Abstract: Collective communication is becoming increasingly important in data center and supercomputer workloads with an increase in distributed AI related jobs. However, existing libraries that provide collective support such as NCCL, RCCL, and Cray-MPICH exhibit several performance and scalability limitations on modern GPU supercomputers. To address these challenges, we introduce the […]
Prototypical Exemplar Condensation for Memory-efficient Online Continual Learning
arXiv:2603.13804v1 Announce Type: cross Abstract: Rehearsal-based continual learning (CL) mitigates catastrophic forgetting by maintaining a subset of samples from previous tasks for replay. Existing studies primarily focus on optimizing memory storage through coreset selection strategies. While these methods are effective, they typically require storing a substantial number of samples per class (SPC), often exceeding 20, […]
Multi-Axis Trust Modeling for Interpretable Account Hijacking Detection
arXiv:2603.13246v1 Announce Type: new Abstract: This paper proposes a Hadith-inspired multi-axis trust modeling framework, motivated by a structurally analogous problem in classical Hadith scholarship: assessing the trustworthiness of information sources using interpretable, multidimensional criteria rather than a single anomaly score. We translate five trust axes – long-term integrity (adalah), behavioral precision (dabt), contextual continuity (isnad), […]
SHAMISA: SHAped Modeling of Implicit Structural Associations for Self-supervised No-Reference Image Quality Assessment
arXiv:2603.13669v1 Announce Type: cross Abstract: No-Reference Image Quality Assessment (NR-IQA) aims to estimate perceptual quality without access to a reference image of pristine quality. Learning an NR-IQA model faces a fundamental bottleneck: its need for a large number of costly human perceptual labels. We propose SHAMISA, a non-contrastive self-supervised framework that learns from unlabeled distorted […]
Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference
arXiv:2603.13426v1 Announce Type: cross Abstract: Semantic routers in LLM inference gateways select tools in the critical request path, where every millisecond of added latency compounds across millions of requests. We propose Outcome-Aware Tool Selection (OATS), which interpolates tool embeddings toward the centroid of queries where they historically succeed — an offline process that adds no […]
Resolving Interference (RI): Disentangling Models for Improved Model Merging
arXiv:2603.13467v1 Announce Type: cross Abstract: Model merging has shown that multitask models can be created by directly combining the parameters of different models that are each specialized on tasks of interest. However, models trained independently on distinct tasks often exhibit interference that degrades the merged model’s performance. To solve this problem, we formally define the […]
MALicious INTent Dataset and Inoculating LLMs for Enhanced Disinformation Detection
arXiv:2603.14525v1 Announce Type: cross Abstract: The intentional creation and spread of disinformation poses a significant threat to public discourse. However, existing English datasets and research rarely address the intentionality behind the disinformation. This work presents MALINT, the first human-annotated English corpus developed in collaboration with expert fact-checkers to capture disinformation and its malicious intent. We […]
Geometry-Aware Semantic Reasoning for Training Free Video Anomaly Detection
arXiv:2603.13374v1 Announce Type: cross Abstract: Training-free video anomaly detection (VAD) has recently emerged as a scalable alternative to supervised approaches, yet existing methods largely rely on static prompting and geometry-agnostic feature fusion. As a result, anomaly inference is often reduced to shallow similarity matching over Euclidean embeddings, leading to unstable predictions and limited interpretability, especially […]