arXiv:2512.13402v2 Announce Type: replace-cross Abstract: Intraoperative navigation in spine surgery demands millimeter-level accuracy. Currently, this is achieved through radiation-intensive intraoperative imaging and bone-anchored markers that are invasive and disrupt surgical workflow. Markerless RGB-D registration methods offer a promising alternative. However, existing approaches rely on weak segmentation labels to isolate relevant anatomical structures, potentially propagating errors […]
Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations
arXiv:2602.19320v2 Announce Type: replace-cross Abstract: Agentic memory systems enable large language model (LLM) agents to maintain state across long interactions, supporting long-horizon reasoning and personalization beyond fixed context windows. Despite rapid architectural development, the empirical foundations of these systems remain fragile: existing benchmarks are often underscaled, evaluation metrics are misaligned with semantic utility, performance varies […]
Why Aggregate Accuracy is Inadequate for Evaluating Fairness in Law Enforcement Facial Recognition Systems
arXiv:2603.28675v2 Announce Type: replace-cross Abstract: Facial recognition systems are increasingly deployed in law enforcement and security contexts, where algorithmic decisions can carry significant societal consequences. Despite high reported accuracy, growing evidence demonstrates that such systems often exhibit uneven performance across demographic groups, leading to disproportionate error rates and potential harm. This paper argues that aggregate […]
TIP: Token Importance in On-Policy Distillation
arXiv:2604.14084v3 Announce Type: replace-cross Abstract: On-policy knowledge distillation (OPD) trains a student on its own rollouts under token-level supervision from a teacher. Not all token positions matter equally, but existing views of token importance are incomplete. We ask a direct question: which tokens carry the most useful learning signal in OPD? Our answer is that […]
A Theory of Time-Sensitive Language Generation: Sparse Hallucination Beats Mode Collapse
arXiv:2605.11302v2 Announce Type: replace-cross Abstract: We study language generation in the limit under a global preference ordering on strings, as introduced by Kleinberg and Wei. As is done in previous work, we aim for breadth, but impose an additional requirement of timeliness: higher-ranked strings should be generated earlier. A string is then only credited if […]
Exact Linear Attention
arXiv:2605.18848v2 Announce Type: replace-cross Abstract: This paper introduces Exact Linear Attention (ELA), a mechanism that achieves linear computational complexity for Transformer attention by exploiting the exact decomposition property of kernel functions, thereby eliminating approximation error. We identify and address two key limitations of prior linear attention — gradient explosion and token attention dilution — by […]
SURGE: An Event-Centric Social Media Sentiment Time Series Benchmark with Interaction Structure
arXiv:2605.21198v1 Announce Type: cross Abstract: Public events on social media generate large volumes of discussion whose collective dynamics carry direct value for opinion forecasting and crisis response. Capturing how these dynamics evolve across an event’s lifecycle requires organizing fragmented posts into event-level time series. Existing datasets cover only a small number of events within a […]
Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation
arXiv:2605.21258v1 Announce Type: cross Abstract: Current 3D-aware pretraining methods for embodied perception and manipulation are largely built on differentiable rendering frameworks, producing either fully implicit neural fields or fully explicit geometric primitives. Implicit representations, while expressive, lack explicit structural cues, whereas explicit ones preserve geometry but suffer from resolution limits and weak generalization. To address […]
Deformba: Vision State Space Model with Adaptive State Fusion
arXiv:2605.21308v1 Announce Type: cross Abstract: State Space Models (SSMs) have emerged as a powerful and efficient alternative to Transformers, demonstrating linear-time complexity and exceptional sequence modeling capabilities. However, their application to vision tasks remains challenging. First, existing vision SSMs largely depend on manually designed fixed scanning methods to flatten image patches into sequences, which imposes […]
SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents
arXiv:2605.21384v1 Announce Type: cross Abstract: As long-horizon coding agents produce more code than any developer can review, oversight collapses onto a single surface: the automated test suite. Reward hacking naturally arises in this setup, as the agent optimizes for passing tests while deviating from the users true goal. We study this reward hacking phenomenon by […]
torchtune: PyTorch native post-training library
arXiv:2605.21442v1 Announce Type: cross Abstract: Modern LLMs typically require multistage training pipelines to achieve strong downstream performance, with post-training serving as the main interface for adapting open-weight models. We introduce torchtune, a PyTorch-native library designed to streamline the post-training lifecycle of LLMs, enabling efficient fine-tuning, experimentation, and deployment-oriented workflows. Unlike many existing fine-tuning frameworks, which […]
Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling
arXiv:2605.21470v1 Announce Type: cross Abstract: Computer-use agents (CUA) automate tasks specified with natural language such as “order the cheapest item from Taco Bell” by generating sequences of calls to tools such as click, type, and scroll on a browser. Current implementations follow a sequential fetch-screenshot-execute loop where each iteration requires an LLM call, resulting in […]