arXiv:2605.29475v2 Announce Type: replace-cross Abstract: Large language models (LLMs) show remarkable potential in scientific hypothesis discovery. However, existing approaches face two critical limitations: they treat divergent exploratory search and convergent fine-grained refinement as isolated tasks, and they operate autonomously with little to no human guidance. We present MOOSE-Copilot, the first unified framework to bridge this […]
APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music
arXiv:2605.03395v2 Announce Type: replace-cross Abstract: Music popularity prediction has attracted growing research interest, with relevance to artists, platforms, and recommendation systems. However, the explosive rise of AI-generated music platforms has created an entirely new and largely unexplored landscape, where a surge of songs is produced and consumed daily without the traditional markers of artist reputation […]
A retrieval conditioned rebinding circuit for dynamic entity tracking in large language models
arXiv:2606.08644v1 Announce Type: cross Abstract: To interpret context correctly and retrieve relevant information, large language models must bind entities to their attributes and update these bindings as state changes. We analyze how LLMs implement this binding process in a dynamic state tracking. Using causal interventions, we identify a retrieval conditioned rebinding mechanism, a compact attention […]
An Enhanced Geometric-Spectral Feature Learning Framework for Airborne Multispectral Point Cloud Classification
arXiv:2606.09123v1 Announce Type: cross Abstract: Multispectral point cloud (MPC) is composed of 3D spatial-spectral information, which holds tremendous potential for accurate land-cover classification. However, the representation power of classification models is limited by inherent high-dimensional and heterogeneous spatial-spectral information, unbalanced sample distribution, and inter-class spectral similarity of airborne MPCs. We build two MPC datasets and […]
Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?
arXiv:2605.28860v2 Announce Type: replace-cross Abstract: Fine-tuning large language models (LLMs) frequently induces catastrophic forgetting of prior capabilities. Recent work has shown that reinforcement learning (RL) retains prior capabilities more effectively than supervised fine-tuning (SFT), attributing this to policy-gradient updates remaining closer to the base policy citeshenfeld2025rl. We extend this behavioral account to the mechanistic level […]
Cheap Reward Hacking Detection
arXiv:2606.08893v1 Announce Type: cross Abstract: A small transformer encoder is trained to map Terminal-Wrench trajectories onto a unit sphere where embedding distance approximates the $L_1$ distance between reward and metadata signals. A linear probe on top of that embedding detects reward hacking on the cleaned test split with AUC $0.9467$ and TPR@5%FPR $0.8296$, matching the […]
DYCP: Dynamic Context Pruning for Long-Form Dialogue with LLMs
arXiv:2601.07994v5 Announce Type: replace-cross Abstract: Large Language Models (LLMs) increasingly operate over long-form dialogues with frequent topic shifts. While recent LLMs support extended context windows, efficient management of dialogue history in practice is needed due to inference cost and latency constraints. We present DyCP, a lightweight context management method implemented outside the LLM that dynamically […]
Simple Self-Conditioning Adaptation for Masked Diffusion Models
arXiv:2604.26985v2 Announce Type: replace-cross Abstract: Masked diffusion models (MDMs) generate discrete sequences by iterative denoising under an absorbing masking process. In standard masked diffusion, if a token remains masked after a reverse update, the model discards its clean-state prediction for that position. Thus, still-masked positions must be repeatedly inferred from the mask token alone. This […]
Knowing How to Edit: Reliable Evaluation Signals for Diagnosing and Optimizing Prompts at Query Level
arXiv:2511.19829v3 Announce Type: replace Abstract: Prompt optimization has become a central mechanism for eliciting strong performance from LLMs, and recent work has made substantial progress by proposing diverse prompt evaluation metrics and optimization strategies. Despite these advances, prompt evaluation and prompt optimization are often developed in isolation, limiting the extent to which evaluation can effectively […]
A Study of Parallel Continuous Local Search
arXiv:2606.06656v2 Announce Type: replace Abstract: We study parallel Continuous Local Search (CLS) as a solution approach for Boolean satisfiability problems with symmetric pseudo-Boolean (PB) constraints. Here, the $n$-variable PB-satisfiability problem is relaxed to a continuous optimisation problem with a differentiable objective function on an $n$-dimensional hypercube. For satisfiable instances, the global minimisers of this optimisation […]
Context-Aware Deep Learning for Defect Classification in Atomic-Resolution STEM
arXiv:2606.09419v1 Announce Type: cross Abstract: Artificial intelligence is rapidly advancing materials characterization, yet most applications in electron microscopy rely solely on image contrast, overlooking the chemical and experimental context that shapes image formation. This limitation makes defect classification inherently ambiguous, as similar contrasts can arise from different materials or imaging conditions. Here we develop a […]
Difference-Aware Retrieval Policies for Imitation Learning
arXiv:2606.09758v1 Announce Type: cross Abstract: Parametric imitation learning via behavior cloning can suffer from poor generalization to out-of-distribution states due to compounding errors during deployment. We show that reusing the training data during inference via a semi-parametric retrieval-based imitation learning approach can alleviate this challenge. We present Difference-Aware Retrieval Policies for Imitation Learning (DARP), a […]