arXiv:2604.17415v2 Announce Type: replace-cross Abstract: Reward-based fine-tuning steers a pretrained diffusion or flow-based generative model toward higher-reward samples while remaining close to the pretrained model. Although existing methods are derived from different perspectives, we show that many can be written under a common framework, which we call reward score matching (RSM). Under this view, alignment […]
RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners
arXiv:2605.00199v2 Announce Type: replace-cross Abstract: When a language model answers a table question, users have no way to verify which cells informed which reasoning steps. We introduce RSAT, a method that trains small language models (SLMs, 1-8B) to produce step-by-step reasoning with cell-level citations grounded in table evidence. Phase 1 (SFT) teaches a structured JSON […]
Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors
arXiv:2605.06490v1 Announce Type: new Abstract: AI systems have become increasingly capable of dangerous behaviours in many domains. This raises the question: Do models sometimes choose to violate human instructions in order to perform behaviour that is more useful for certain goals? We introduce a benchmark for measuring model propensity for instrumental convergence (IC) behaviour in […]
From Token Lists to Graph Motifs: Weisfeiler-Lehman Analysis of Sparse Autoencoder Features
arXiv:2605.06494v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) have become central to mechanistic interpretability, decomposing transformer activations into monosemantic features. Yet existing analyses characterise features almost exclusively through top-activating token lists or decoder weight vectors, leaving the higher-order co-occurrence structure shared across features largely unexamined. We introduce a graph-structured representation in which each SAE feature […]
ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning
arXiv:2605.06483v1 Announce Type: new Abstract: Signal Temporal Logic (STL) is an expressive formal language for specifying spatio-temporal requirements over real-valued, real-time signals. It has been widely used for the verification and synthesis of autonomous systems and cyber-physical systems. In practice, however, users often express their requirements in natural language rather than in structured STL formulas, […]
asRoBallet: Closing the Sim2Real Gap via Friction-Aware Reinforcement Learning for Underactuated Spherical Dynamics
arXiv:2604.24916v2 Announce Type: replace-cross Abstract: We introduce asRoBallet, to the best of our knowledge, the first end-to-end reinforcement learning (RL) locomotion policy deployed on a humanoid ballbot hardware platform. Historically, ballbots have served as a canonical benchmark for underactuated and nonholonomic control, which are characterized by a reality gap in complex friction models for wheel-ball-floor […]
Patch-Effect Graph Kernels for LLM Interpretability
arXiv:2605.06480v1 Announce Type: new Abstract: Mechanistic interpretability aims to reverse-engineer transformer computations by identifying causal circuits through activation patching. However, scaling these interventions across diverse prompts and task families produces high-dimensional, unstructured datasets that are difficult to compare systematically. We propose a framework that reframes mechanistic analysis as a graph machine-learning problem by representing activation-patching […]
Process Matters more than Output for Distinguishing Humans from Machines
arXiv:2605.06524v1 Announce Type: new Abstract: Reliable human-machine discrimination is becoming increasingly important as large language models and autonomous agents are deployed in online settings. Existing approaches evaluate whether a system can produce behavior or responses indistinguishable from those of a human, following the emphasis on outputs as a criterion for intelligence proposed by Alan Turing. […]
Locality-aware Private Class Identification for Domain Adaptation with Extreme Label Shift
arXiv:2605.05567v1 Announce Type: new Abstract: Domain adaptation aims to transfer knowledge from a labeled source domain to an unlabeled target domain with different distributions. In real-world scenarios, the label spaces of the two domains often have an inclusion relationship, where some classes exist only in one domain but not the other. These non-overlapping classes are […]
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
arXiv:2605.03652v2 Announce Type: replace-cross Abstract: Video generation models internalize physical realism as their prior. Anime deliberately violates physics: smears, impact frames, chibi shifts; and its thousands of coexisting artistic conventions yield no single “physics of anime” a model can absorb. Physics-biased models therefore flatten the artistry that defines the medium or collapse under its stylistic […]
A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work
arXiv:2604.18555v1 Announce Type: cross Abstract: This note clarifies the relationship between the recent TurboQuant work and the earlier DRIVE (NeurIPS 2021) and EDEN (ICML 2022) schemes. DRIVE is a 1-bit quantizer that EDEN extended to any $b>0$ bits per coordinate; we refer to them collectively as EDEN. First, TurboQuant$_textmse$ is a special case of EDEN […]
SpatialEpiBench: Benchmarking Spatial Information and Epidemic Priors in Forecasting
arXiv:2605.06530v1 Announce Type: new Abstract: Accurate epidemic forecasting is crucial for public health response, resource allocation, and outbreak intervention, but remains difficult with sparse, noisy, and highly non-stationary data. Because epidemics unfold across interacting regions, spatiotemporal methods are natural candidates for improving forecasts. Despite growing interest in spatial information, no standardized benchmark exists, and current […]