arXiv:2605.00435v1 Announce Type: cross Abstract: Mode collapse is a persistent challenge in generative modeling and appears in autoregressive text generation as behaviors ranging from explicit looping to gradual loss of diversity and premature trajectory convergence. We take a dynamical-systems view and reinterpret mode collapse as reduced state-space accessibility caused by *geometric collapse*: during generation, the […]
Adaptation of AI-accelerated CFD Simulations to the IPU platform
arXiv:2605.00462v1 Announce Type: cross Abstract: Intelligence Processing Units (IPU) have proven useful for many AI applications. In this paper, we evaluate them within the emerging field of emphAI for simulation, where traditional numerical simulations are supported by artificial intelligence approaches. We focus specifically on a program for training machine learning models supporting a emphcomputational fluid […]
Scalable Context-Aware Graph Attention for Unsupervised Anomaly Detection in Large-Scale Mobile Networks
arXiv:2605.00482v1 Announce Type: cross Abstract: Mobile network operators must monitor thousands of heterogeneous network elements across the radio access network and the packet core, each exposing high-dimensional KPI time series. The scale and cost of incident labelling make supervised approaches impractical, motivating unsupervised anomaly detection robust to context shifts and nonstationarity. We propose textbfC-MTAD-GAT (emphContext-aware […]
Augmented Lagrangian Multiplier Network for State-wise Safety in Reinforcement Learning
arXiv:2605.00667v1 Announce Type: cross Abstract: Safety is a primary challenge in real-world reinforcement learning (RL). Formulating safety requirements as state-wise constraints has become a prominent paradigm. Handling state-wise constraints with the Lagrangian method requires a distinct multiplier for every state, necessitating neural networks to approximate them as a multiplier network. However, applying standard dual gradient […]
Controllable Logical Hypothesis Generation for Abductive Reasoning in Knowledge Graphs
arXiv:2505.20948v3 Announce Type: replace Abstract: Abductive reasoning in knowledge graphs aims to generate plausible logical hypotheses from observed entities, with broad applications in areas such as clinical diagnosis and scientific discovery. However, due to a lack of controllability, a single observation may yield numerous plausible but redundant or irrelevant hypotheses on large-scale knowledge graphs. To […]
Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves
arXiv:2604.27209v2 Announce Type: replace-cross Abstract: Large language models can now generate substantial code and draft research text, but research-software projects require more than either artifact alone. The mathematical thesis, executable system, benchmark surface, and public claims must mature together, yet often drift apart. We identify two LM-specific failure modes: hallucination accumulation, in which claims exceed […]
Are You the A-hole? A Fair, Multi-Perspective Ethical Reasoning Framework
arXiv:2605.00270v1 Announce Type: cross Abstract: Standard methods for aggregating natural language judgments, such as majority voting, often fail to produce logically consistent results when applied to high-conflict domains, treating differing opinions as noise. We propose a neuro-symbolic aggregation framework that formalizes conflict resolution through Weighted Maximum Satisfiability (MaxSAT). Our pipeline utilizes a language model to […]
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows
arXiv:2604.28139v2 Announce Type: replace-cross Abstract: LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces. Yet many agent benchmarks freeze a curated task set at release time and grade mainly the final response, making it difficult to evaluate agents against evolving workflow demand or verify whether a […]
Neuronal electricality founded in murburn-thermodynamic principles: 2. Comparisons, evidenced explanations, and predictions
arXiv:2605.00014v1 Announce Type: new Abstract: The analyses presented herein demonstrate that neuronal electrical activity can be consistently interpreted as a manifestation of murburn redox-mediated electronic dynamics rather than as a process fundamentally driven by transmembrane ionic flux. By integrating comparison with established models, quantitative predictions, and diverse experimental observations, the murburn framework emerges as a […]
NorBERTo: A ModernBERT Model Trained for Portuguese with 331 Billion Tokens Corpus
arXiv:2605.00086v1 Announce Type: cross Abstract: High-quality corpora are essential for advancing Natural Language Processing (NLP) in Portuguese. Building on previous encoder-only models such as BERTimbau and Albertina PT-BR, we introduce NorBERTo, a modern encoder based on the ModernBERT architecture, featuring long-context support and efficient attention mechanisms. NorBERTo is trained on Aurora-PT, a newly curated Brazilian […]
Removing Sandbagging in LLMs by Training with Weak Supervision
arXiv:2604.22082v2 Announce Type: replace-cross Abstract: As AI systems begin to automate complex tasks, supervision increasingly relies on weaker models or limited human oversight that cannot fully verify output quality. A model more capable than its supervisors could exploit this gap through sandbagging, producing work that appears acceptable but falls short of its true abilities. Can […]
Learning Rate Transfer in Normalized Transformers
arXiv:2604.27077v2 Announce Type: replace-cross Abstract: The Normalized Transformer, or nGPT (arXiv:2410.01131) achieves impressive training speedups and does not require weight decay or learning rate warmup. However, despite having hyperparameters that explicitly scale with model size, we observe that nGPT does not exhibit learning rate transfer across model dimension and token horizon. To rectify this, we […]