Reward Score Matching: Unifying Reward-based Fine-tuning for Flow and Diffusion Models

arXiv:2604.17415v2 Announce Type: replace-cross Abstract: Reward-based fine-tuning steers a pretrained diffusion or flow-based generative model toward higher-reward samples while remaining close to the pretrained model. Although existing methods are derived from different perspectives, we show that many can be written under a common framework, which we call reward score matching (RSM). Under this view, alignment […]

RSAT: Structured Attribution Makes Small Language Models Faithful Table Reasoners

arXiv:2605.00199v2 Announce Type: replace-cross Abstract: When a language model answers a table question, users have no way to verify which cells informed which reasoning steps. We introduce RSAT, a method that trains small language models (SLMs, 1-8B) to produce step-by-step reasoning with cell-level citations grounded in table evidence. Phase 1 (SFT) teaches a structured JSON […]

Instrumental Choices: Measuring the Propensity of LLM Agents to Pursue Instrumental Behaviors

arXiv:2605.06490v1 Announce Type: new Abstract: AI systems have become increasingly capable of dangerous behaviours in many domains. This raises the question: Do models sometimes choose to violate human instructions in order to perform behaviour that is more useful for certain goals? We introduce a benchmark for measuring model propensity for instrumental convergence (IC) behaviour in […]

From Token Lists to Graph Motifs: Weisfeiler-Lehman Analysis of Sparse Autoencoder Features

arXiv:2605.06494v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) have become central to mechanistic interpretability, decomposing transformer activations into monosemantic features. Yet existing analyses characterise features almost exclusively through top-activating token lists or decoder weight vectors, leaving the higher-order co-occurrence structure shared across features largely unexamined. We introduce a graph-structured representation in which each SAE feature […]

ReasonSTL: Bridging Natural Language and Signal Temporal Logic via Tool-Augmented Process-Rewarded Learning

arXiv:2605.06483v1 Announce Type: new Abstract: Signal Temporal Logic (STL) is an expressive formal language for specifying spatio-temporal requirements over real-valued, real-time signals. It has been widely used for the verification and synthesis of autonomous systems and cyber-physical systems. In practice, however, users often express their requirements in natural language rather than in structured STL formulas, […]

asRoBallet: Closing the Sim2Real Gap via Friction-Aware Reinforcement Learning for Underactuated Spherical Dynamics

arXiv:2604.24916v2 Announce Type: replace-cross Abstract: We introduce asRoBallet, to the best of our knowledge, the first end-to-end reinforcement learning (RL) locomotion policy deployed on a humanoid ballbot hardware platform. Historically, ballbots have served as a canonical benchmark for underactuated and nonholonomic control, which are characterized by a reality gap in complex friction models for wheel-ball-floor […]

Patch-Effect Graph Kernels for LLM Interpretability

arXiv:2605.06480v1 Announce Type: new Abstract: Mechanistic interpretability aims to reverse-engineer transformer computations by identifying causal circuits through activation patching. However, scaling these interventions across diverse prompts and task families produces high-dimensional, unstructured datasets that are difficult to compare systematically. We propose a framework that reframes mechanistic analysis as a graph machine-learning problem by representing activation-patching […]

Process Matters more than Output for Distinguishing Humans from Machines

arXiv:2605.06524v1 Announce Type: new Abstract: Reliable human-machine discrimination is becoming increasingly important as large language models and autonomous agents are deployed in online settings. Existing approaches evaluate whether a system can produce behavior or responses indistinguishable from those of a human, following the emphasis on outputs as a criterion for intelligence proposed by Alan Turing. […]

AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics

arXiv:2605.03652v2 Announce Type: replace-cross Abstract: Video generation models internalize physical realism as their prior. Anime deliberately violates physics: smears, impact frames, chibi shifts; and its thousands of coexisting artistic conventions yield no single “physics of anime” a model can absorb. Physics-biased models therefore flatten the artistry that defines the medium or collapse under its stylistic […]

A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work

arXiv:2604.18555v1 Announce Type: cross Abstract: This note clarifies the relationship between the recent TurboQuant work and the earlier DRIVE (NeurIPS 2021) and EDEN (ICML 2022) schemes. DRIVE is a 1-bit quantizer that EDEN extended to any $b>0$ bits per coordinate; we refer to them collectively as EDEN. First, TurboQuant$_textmse$ is a special case of EDEN […]

SpatialEpiBench: Benchmarking Spatial Information and Epidemic Priors in Forecasting

arXiv:2605.06530v1 Announce Type: new Abstract: Accurate epidemic forecasting is crucial for public health response, resource allocation, and outbreak intervention, but remains difficult with sparse, noisy, and highly non-stationary data. Because epidemics unfold across interacting regions, spatiotemporal methods are natural candidates for improving forecasts. Despite growing interest in spatial information, no standardized benchmark exists, and current […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844