arXiv:2505.21627v3 Announce Type: replace-cross Abstract: State-of-the-art large language models require specialized hardware and substantial energy to operate. As a consequence, cloud-based services that provide access to large language models have become very popular. In these services, the price users pay for an output provided by a model depends on the number of tokens the model […]
Graph Neural Networks are Heuristics
arXiv:2601.13465v2 Announce Type: replace Abstract: We demonstrate that a single training trajectory can transform a graph neural network into an unsupervised heuristic for combinatorial optimization. Focusing on the Travelling Salesman Problem, we show that encoding global structural constraints as an inductive bias enables a non-autoregressive model to generate solutions via direct forward passes, without search, […]
Exploring Flow-Lenia Universes with a Curiosity-driven AI Scientist: Discovering Diverse Ecosystem Dynamics
arXiv:2505.15998v3 Announce Type: replace Abstract: We present a method for the automated discovery of system-level dynamics in Flow-Lenia–a continuous cellular automaton (CA) with mass conservation and parameter localization-using a curiosity–driven AI scientist. This method aims to uncover processes leading to self-organization of evolutionary and ecosystemic dynamics in CAs. We build on previous work which uses […]
MoE-ACT: Improving Surgical Imitation Learning Policies through Supervised Mixture-of-Experts
arXiv:2601.21971v1 Announce Type: cross Abstract: Imitation learning has achieved remarkable success in robotic manipulation, yet its application to surgical robotics remains challenging due to data scarcity, constrained workspaces, and the need for an exceptional level of safety and predictability. We present a supervised Mixture-of-Experts (MoE) architecture designed for phase-structured surgical manipulation tasks, which can be […]
Latent Adversarial Regularization for Offline Preference Optimization
arXiv:2601.22083v1 Announce Type: cross Abstract: Learning from human feedback typically relies on preference optimization that constrains policy updates through token-level regularization. However, preference optimization for language models is particularly challenging because token-space similarity does not imply semantic or behavioral similarity. To address this challenge, we leverage latent-space regularization for language model preference optimization. We introduce […]
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization
arXiv:2511.06411v2 Announce Type: replace Abstract: The soft-thinking paradigm for Large Language Model (LLM) reasoning can outperform the conventional discrete-token Chain-of-Thought (CoT) reasoning in some scenarios, underscoring its research and application value. However, while the discrete-token CoT reasoning pattern can be reinforced through policy optimization algorithms such as group relative policy optimization (GRPO), extending the soft-thinking […]
SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling
arXiv:2410.12481v3 Announce Type: replace-cross Abstract: The past years have seen Large Language Models (LLMs) strive not only as generative models but also as agents solving textual sequential decision-making tasks. When facing complex environments where their zero-shot abilities are insufficient, recent work showed online Reinforcement Learning (RL) could be used for the LLM agent to discover […]
SDSC:A Structure-Aware Metric for Semantic Signal Representation Learning
arXiv:2507.14516v3 Announce Type: replace-cross Abstract: We propose the Signal Dice Similarity Coefficient (SDSC), a structure-aware metric function for time series self-supervised representation learning. Most Self-Supervised Learning (SSL) methods for signals commonly adopt distance-based objectives such as mean squared error (MSE), which are sensitive to amplitude, invariant to waveform polarity, and unbounded in scale. These properties […]
NOSA: Native and Offloadable Sparse Attention
arXiv:2510.13602v2 Announce Type: replace-cross Abstract: Decoding throughput improvements from larger inference batches are limited by GPU memory, which is largely consumed by the key-value (KV) cache. Prior training-free KV cache offloading alleviates this by keeping redundant context on the CPU and fetching only a sparse subset for attention, but it often degrades long-generation quality due […]
SKETCH: Semantic Key-Point Conditioning for Long-Horizon Vessel Trajectory Prediction
arXiv:2601.18537v2 Announce Type: replace-cross Abstract: Accurate long-horizon vessel trajectory prediction remains challenging due to compounded uncertainty from complex navigation behaviors and environmental factors. Existing methods often struggle to maintain global directional consistency, leading to drifting or implausible trajectories when extrapolated over long time horizons. To address this issue, we propose a semantic-key-point-conditioned trajectory modeling framework, […]
When “Better” Prompts Hurt: Evaluation-Driven Iteration for LLM Applications
arXiv:2601.22025v1 Announce Type: cross Abstract: Evaluating Large Language Model (LLM) applications differs from traditional software testing because outputs are stochastic, high-dimensional, and sensitive to prompt and model changes. We present an evaluation-driven workflow – Define, Test, Diagnose, Fix – that turns these challenges into a repeatable engineering loop. We introduce the Minimum Viable Evaluation Suite […]
PRISM: Distribution-free Adaptive Computation of Matrix Functions for Accelerating Neural Network Training
arXiv:2601.22137v1 Announce Type: cross Abstract: Matrix functions such as square root, inverse roots, and orthogonalization play a central role in preconditioned gradient methods for neural network training. This has motivated the development of iterative algorithms that avoid explicit eigendecompositions and rely primarily on matrix multiplications, making them well suited for modern GPU accelerators. We present […]