arXiv:2602.18455v2 Announce Type: replace-cross Abstract: Search engines increasingly display LLM-generated answers shown above organic links, shifting search from link lists to answer-first summaries. Publishers contend these summaries substitute for source pages and cannibalize traffic, while platforms argue they are complementary by directing users through included links. We estimate the causal impact of Google’s AI Overview […]
Temporally Decoupled Diffusion Planning for Autonomous Driving
arXiv:2603.25462v1 Announce Type: cross Abstract: Motion planning in dynamic urban environments requires balancing immediate safety with long-term goals. While diffusion models effectively capture multi-modal decision-making, existing approaches treat trajectories as monolithic entities, overlooking heterogeneous temporal dependencies where near-term plans are constrained by instantaneous dynamics and far-term plans by navigational goals. To address this, we propose […]
Measuring What Matters — or What’s Convenient?: Robustness of LLM-Based Scoring Systems to Construct-Irrelevant Factors
arXiv:2603.25674v1 Announce Type: cross Abstract: Automated systems have been widely adopted across the educational testing industry for open-response assessment and essay scoring. These systems commonly achieve performance levels comparable to or superior than trained human raters, but have frequently been demonstrated to be vulnerable to the influence of construct-irrelevant factors (i.e., features of responses that […]
BMFM-RNA: whole-cell expression decoding improves transcriptomic foundation models
arXiv:2506.14861v2 Announce Type: replace Abstract: Transcriptomic foundation models pretrained with masked language modeling can achieve low pretraining loss yet produce poor cell representations for downstream tasks. We introduce whole-cell expression decoding (WCED), where models reconstruct the entire gene vocabulary from a single CLS token embedding, even with limited inputs, creating a maximally informative bottleneck. WCED […]
Characterizing Linear Alignment Across Language Models
arXiv:2603.18908v3 Announce Type: replace Abstract: Language models increasingly appear to learn similar representations, despite differences in training objectives, architectures, and data modalities. This emerging compatibility between independently trained models introduces new opportunities for cross-model alignment to downstream objectives. Moreover, this capability unlocks new potential application domains, such as settings where security, privacy, or competitive constraints […]
Adaptive Online Mirror Descent for Tchebycheff Scalarization in Multi-Objective Learning
arXiv:2410.21764v3 Announce Type: replace-cross Abstract: Multi-objective learning (MOL) aims to learn under multiple potentially conflicting objectives and strike a proper balance. While recent preference-guided MOL methods often rely on additional optimization objectives or constraints, we consider the classic Tchebycheff scalarization (TCH) that naturally allows for locating solutions with user-specified trade-offs. Due to its minimax formulation, […]
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
arXiv:2509.24296v2 Announce Type: replace-cross Abstract: The rapid advancement of Diffusion Large Language Models (dLLMs) introduces unprecedented vulnerabilities that are fundamentally distinct from Autoregressive LLMs, stemming from their iterative and parallel generation mechanisms. In this paper, we conduct an in-depth analysis of dLLM vulnerabilities to jailbreak attacks across two distinct dimensions: intra-step and inter-step dynamics. Experimental […]
MedShift: Implicit Conditional Transport for X-Ray Domain Adaptation
arXiv:2508.21435v2 Announce Type: replace-cross Abstract: Synthetic medical data offers a scalable solution for training robust models, but significant domain gaps limit its generalizability to real-world clinical settings. This paper addresses the challenge of cross-domain translation between synthetic and real X-ray images of the head, focusing on bridging discrepancies in attenuation behavior, noise characteristics, and soft […]
Context Matters: Peer-Aware Student Behavioral Engagement Measurement via VLM Action Parsing and LLM Sequence Classification
arXiv:2601.06394v2 Announce Type: replace-cross Abstract: Understanding student behavior in the classroom is essential to improve both pedagogical quality and student engagement. Existing methods for predicting student engagement typically require substantial annotated data to model the diversity of student behaviors, yet privacy concerns often restrict researchers to their own proprietary datasets. Moreover, the classroom context, represented […]
Theory of Dynamic Adaptive Coordination
arXiv:2603.11560v2 Announce Type: replace-cross Abstract: This paper develops a dynamical theory of adaptive coordination governed by persistent environmental memory. Moving beyond framework-specific equilibrium optimization or agent-centric learning, I model agents, incentives, and the environment as a recursively closed feedback architecture: a persistent environment stores accumulated coordination signals, a distributed incentive field transmits them locally, and […]
ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence
arXiv:2603.24621v1 Announce Type: new Abstract: We introduce ARC-AGI-3, an interactive benchmark for studying agentic intelligence through novel, abstract, turn-based environments in which agents must explore, infer goals, build internal models of environment dynamics, and plan effective action sequences without explicit instructions. Like its predecessors ARC-AGI-1 and 2, ARC-AGI-3 focuses entirely on evaluating fluid adaptive efficiency […]
How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
arXiv:2603.25325v1 Announce Type: cross Abstract: Weight pruning is a standard technique for compressing large language models, yet its effect on learned internal representations remains poorly understood. We present the first systematic study of how unstructured pruning reshapes the feature geometry of language models, using Sparse Autoencoders (SAEs) as interpretability probes. Across three model families (Gemma […]