arXiv:2602.21204v3 Announce Type: replace-cross Abstract: Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping at test time. However, our analysis reveals multiple phenomena that contradict this memorization-based interpretation. Motivated by these findings, we revisit the formulation of TTT and show […]
Physics-Grounded Multi-Agent Architecture for Traceable, Risk-Aware Human-AI Decision Support in Manufacturing
arXiv:2605.04003v1 Announce Type: cross Abstract: High-precision CNC machining of free-form aerospace components requires bounded compensations informed by inspection, simulation, and process knowledge. Off-the-shelf large language model (LLM) assistants can generate text, but they do not reliably execute risk-constrained multi-step numerical workflows or provide auditable provenance for high-stakes decisions. We present multi-agent knowledge analysis (MAKA), a […]
Enumeration of Autocatalytic Subsystems in Large Chemical Reaction Networks
arXiv:2511.18883v2 Announce Type: replace Abstract: Autocatalysis is an important feature of metabolic networks, contributing crucially to the self-maintenance of organisms. Autocatalytic subsystems of chemical reaction networks (CRNs) are characterized in terms of algebraic conditions on submatrices of the stoichiometric matrix. Here, we derive sufficient conditions for subgraphs supporting irreducible autocatalytic systems in the bipartite KHonig […]
ELAS: Efficient Pre-Training of Low-Rank Large Language Models via 2:4 Activation Sparsity
arXiv:2605.03667v1 Announce Type: cross Abstract: Large Language Models (LLMs) have achieved remarkable capabilities, but their immense computational demands during training remain a critical bottleneck for widespread adoption. Low-rank training has received attention in recent years due to its ability to significantly reduce training memory usage. Meanwhile, applying 2:4 structured sparsity to weights and activations to […]
Spatiotemporal Convolutions on EEG signal — A Representation Learning Perspective on Efficient and Explainable EEG Classification with Convolutional Neural Nets
arXiv:2605.03874v1 Announce Type: cross Abstract: Classification of EEG signals using shallow Convolutional Neural Networks (CNNs) is a prevalent and successful approach across a variety of fields. Most of these models use independent one-dimensional (1D) convolutional layers along the spatial and temporal dimensions, which are concatenated without a non-linear activation layer between. In this paper, we […]
S2O: Early Stopping for Sparse Attention via Online Permutation
arXiv:2602.22575v2 Announce Type: replace-cross Abstract: Attention scales quadratically with sequence length, fundamentally limiting long-context inference. Existing block-granularity sparsification can reduce latency, but coarse blocks impose an intrinsic sparsity ceiling, making further improvements difficult even with carefully engineered designs. We present S2O, which performs early stopping for sparse attention via online permutation. Inspired by virtual-to-physical address […]
Revisiting Graph-Tokenizing Large Language Models: A Systematic Evaluation of Graph Token Understanding
arXiv:2605.03514v1 Announce Type: cross Abstract: The remarkable success of large language models (LLMs) has motivated researchers to adapt them as universal predictors for various graph tasks. As a widely recognized paradigm, Graph-Tokenizing LLMs (GTokenLLMs) compress complex graph data into graph tokens and treat them as prefix tokens for querying LLMs, leading many to believe that […]
AI4EOSC: a Federated Cloud Platform for Artificial Intelligence in Scientific Research
arXiv:2512.16455v3 Announce Type: replace-cross Abstract: The rapid growth of Artificial Intelligence and Machine Learning in scientific research has highlighted a gap between industry-standard MLOps tools and platforms, and the unique requirements of modern and Open Science, particularly regarding the FAIR (Findable, Accessible, Interoperable, and Reusable) principles. This paper presents AI4EOSC, a federated, open-source platform designed […]
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
arXiv:2507.07982v2 Announce Type: replace-cross Abstract: Videos inherently represent 2D projections of a dynamic 3D world. However, our analysis suggests that video diffusion models trained solely on raw video data often fail to capture meaningful geometric-aware structure in their learned representations. To bridge the gap between video diffusion models and the underlying 3D nature of the […]
Prediction horizon shapes representations in predictive learning
arXiv:2511.09290v2 Announce Type: replace-cross Abstract: Predictive learning has emerged as a central paradigm for training models across diverse data domains and is increasingly viewed as a foundation for modern artificial intelligence. A common intuition for this success is that accurate prediction requires models to capture the underlying dynamics of the environment, leading to the emergence […]
Personalized Worked Example Generation from Student Code Submissions Using Pattern-based Knowledge Components
arXiv:2604.24758v2 Announce Type: replace-cross Abstract: Adaptive programming practice often relies on fixed libraries of worked examples and practice problems, which require substantial authoring effort and may not correspond well to the logical errors and partial solutions students produce while writing code. As a result, students may receive learning content that does not directly address the […]
Disentangling Shared and Task-Specific Representations from Multi-Modal Clinical Data
arXiv:2605.03570v1 Announce Type: cross Abstract: Real-world clinical data is inherently multimodal, providing complementary evidence that mirrors the practical necessity of jointly assessing multiple related outcomes. Although multi-task learning can improve efficiency by sharing information across outcomes, existing approaches often fail to balance shared representation learning with outcome-specific modeling. Hard parameter sharing can trigger negative transfer […]