arXiv:2605.16331v1 Announce Type: new Abstract: Protein language models are increasingly used to guide experimental and clinical decisions, yet it is often unclear whether a confident prediction reflects recognition of biological evidence or retrieval of a statistical default. We examine this distinction for a near-universal biological rule, that proteins begin with methionine, by tracing the computational […]
Lost or Hidden? A Concept-Level Forgetting in Supervised Continual Learning
arXiv:2605.16374v1 Announce Type: cross Abstract: Continual learning studies how models can adapt to new tasks while retaining previously acquired knowledge. Although a broad spectrum of methods has been proposed to mitigate catastrophic forgetting, the field remains predominantly performance-driven, with limited insight into what forgetting actually corresponds to within the vision model’s representation space. Prior work […]
A Machine Learning Framework for EEG-Based Prediction of Treatment Efficacy in Chronic Neck Pain
arXiv:2605.16326v1 Announce Type: new Abstract: Chronic neck pain is a leading cause of disability worldwide, and current treatment selection remains largely trial and error. We present a machine learning framework that uses electroencephalography to predict treatment efficacy in patients with chronic neck pain, with the goal of supporting individualized therapy and reducing the burden on […]
Evaluating Cognitive Age Alignment in Interactive AI Agents
arXiv:2605.17894v1 Announce Type: new Abstract: While agentic AI and its core multimodal large language models (MLLMs) have demonstrated remarkable promise in language and visual reasoning across domains ranging from daily life to advanced scientific research, a profound gap remains between artificial and human intelligence. Despite the integration of powerful tools and advanced MLLMs, state-of-the-art AI […]
A neurosymbolic Approach with Epistemic Deep Learning for Hierarchical Image Classification
arXiv:2605.16383v1 Announce Type: cross Abstract: Deep neural networks achieve high accuracy on image classification tasks. Yet, they often produce overconfident predictions as which fail to express epistemic uncertainty, and frequently violate logical or structural constraints present in the data. These limitations are particularly pronounced in hierarchical classification, where predictions across fine and coarse levels must […]
Experiment-as-Code Labs: A Declarative Stack for AI-Driven Scientific Discovery
arXiv:2605.04375v2 Announce Type: replace-cross Abstract: To unleash the full potential of AI for Science, we must untether the agents from a purely digital environment. The agent’s ability to control and explore in real-world labs is essential because the physical lab remains foundational to scientific discovery. While some tasks can be performed on a computer (e.g., […]
Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice
arXiv:2605.16384v1 Announce Type: cross Abstract: Accurate and effective discrete image tokenization is crucial for long image sequence processing. However, current methods rigidly compress all content at a fixed rate, ignoring the variable information density of images and leading to either redundancy or information loss. Inspired by information entropy, we propose TaTok, a Theoretically grounded adaptive […]
Deterministic Decomposition of Stochastic Generative Dynamics
arXiv:2605.08794v2 Announce Type: replace-cross Abstract: Modern generative models can be understood as probability transport from a simple base distribution to a target data distribution. Deterministic transport models offer tractable velocity-field parameterizations, whereas stochastic generative models capture richer density evolution through drift and diffusion. Yet when stochastic dynamics are described through deterministic velocity fields, the effects […]
Generating Pretraining Tokens from Organic Data for Data-Bound Scaling
arXiv:2605.17849v1 Announce Type: cross Abstract: LLM pretraining is shifting from a compute-bound to a data-bound regime, where available human (organic) text falls far short of scaling demands. However, reaching the data-bound regime does not mean the model has fully utilized its organic corpus. In this paper, we introduce SynPro, a synthetic data generation framework that […]
MULTITEXTEDIT: Benchmarking Cross-Lingual Degradation in Text-in-Image Editing
arXiv:2605.08163v2 Announce Type: replace-cross Abstract: Text-in-image editing has become a key capability for visual content creation, yet existing benchmarks remain overwhelmingly English-centric and often conflate visual plausibility with semantic correctness. We introduce MULTITEXTEDIT, a controlled benchmark of 3,600 instances spanning 12 typologically diverse languages, 5 visual domains, and 7 editing operations. Language variants of each […]
Modelling Customer Trajectories with Reinforcement Learning for Practical Retail Insights
arXiv:2605.18449v1 Announce Type: cross Abstract: Understanding customer movement within retail spaces is essential for optimizing store layouts. Real-world trajectory data can provide highly accurate insights, but collecting it is costly and often infeasible for many retailers. Heuristics such as Travelling Salesman Problem (TSP) and Probabilistic Nearest Neighbours (PNN) are commonly used as inexpensive approximations, but […]
Scalable Uncertainty Reasoning in Knowledge Graphs
arXiv:2605.16568v1 Announce Type: new Abstract: Knowledge Graphs are pivotal for semantic data integration. The real-world data they model is often inherently uncertain. Within knowledge graphs, uncertainty manifests in three distinct levels: imprecise attribute values, probabilistic triple existence, and incomplete schema knowledge. However, current Semantic Web standards lack native support for reasoning over such uncertainty, and […]