arXiv:2604.24594v3 Announce Type: replace-cross Abstract: As large language models (LLMs) evolve into agentic problem solvers, they increasingly rely on external, reusable skills to handle tasks beyond their native parametric capabilities. In existing agent systems, the dominant strategy for incorporating skills is to explicitly enumerate available skills within the context window. However, this strategy fails to […]
GOTabPFN: From Feature Ordering to Compact Tokenization for Tabular Foundation Models on High-Dimensional Data
arXiv:2606.05441v2 Announce Type: replace-cross Abstract: We investigate how to make small tabular foundation models effective for High-Dimensional, Low-Sample Size (HDLSS) tabular prediction without retraining large backbones. We introduce Graph-guided Ordering with Local Refinement (GO-LR), show its equivalence to weighted Minimum Linear Arrangement, and interpret the practical solver as a TSP-path-style surrogate. We propose GOTabPFN,which builds […]
Understanding Quantization-Aware Training: Gradients at Quantized Weights Bias to the Low-Loss Basin
arXiv:2606.09012v1 Announce Type: cross Abstract: Post-training quantization (PTQ) converts a trained full-precision model into low-bit weights without task-level retraining, while quantization-aware training (QAT) incorporates quantization into the training loop. Although PTQ is efficient and often accurate at moderate bitwidths, it can fail sharply at aggressive bitwidths; QAT is more expensive but can often recover the […]
Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation
arXiv:2606.09236v1 Announce Type: cross Abstract: Autonomous Racing has seen remarkable progress through deep Reinforcement Learning (RL), primarily for four-wheeled vehicles. However, motorbikes introduce substantially greater complexity due to the need to manage balance and lean angle, in addition to more reactive steering and throttle control, and a smaller weight. In this work, we present a […]
Closure-Validated Circuit Discovery in Attention Heads: Co-activation Proposes, Ablation Disposes
arXiv:2606.09607v1 Announce Type: cross Abstract: Interpretability increasingly treats groups of components, not individual units, as the basic object, and proposes to find them by clustering co-activation statistics. We ask whether such a cheap signal actually identifies an attention-head circuit. Adapting a sparse-autoencoder clustering recipe to attention heads — but validating by causal ablation rather than […]
HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions
arXiv:2503.14229v4 Announce Type: replace Abstract: Vision-and-Language Navigation (VLN) has been studied mainly in either discrete or continuous spaces, with little attention to dynamic, crowded environments. We present HA-VLN 2.0, a unified benchmark introducing explicit social-awareness constraints. Our contributions are: (i) a standardized task and metrics capturing both goal accuracy and personal-space adherence; (ii) HAPS 2.0 […]
NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning
arXiv:2602.21172v3 Announce Type: replace Abstract: Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with NORD (No Reasoning for Driving). Compared to existing VLAs, NORD achieves […]
Neural Scalable Symbolic Search Framework for Complex Logical Queries with Multiple Free Variables
arXiv:2605.25985v2 Announce Type: replace Abstract: Complex Query Answering (CQA) is a fundamental knowledge representation and reasoning task over incomplete knowledge graphs (KGs). Answering existential first-order queries with $k$ free variables (i.e., $textEFO_k$ queries) is a crucial yet challenging problem, as it requires ranking answer tuples in $mathcalE^k$, where $mathcalE$ denotes the entity set of a […]
Audio-FLAN: An Instruction-Following Dataset for Unified Audio Understanding and Generation of Speech, Music, and Sound
arXiv:2502.16584v2 Announce Type: replace-cross Abstract: Recent advancements in audio tokenization have significantly enhanced the integration of audio capabilities into large language models (LLMs). However, audio understanding and generation are often treated as distinct tasks, hindering the development of truly unified audio-language models. While instruction tuning has demonstrated remarkable success in improving generalization and zero-shot learning […]
Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB
arXiv:2511.11041v2 Announce Type: replace-cross Abstract: We find that current sentence-embedding models produce outputs with a consistent bias: every embedding $e$ decomposes as $tilde e + mu$, where the mean $mu$ is near-identical across all sentences. We study two training-free corrections — subtracting $mu$ directly (R1), or projecting each embedding off the mean direction (R2) — […]
MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering
arXiv:2601.22859v3 Announce Type: replace-cross Abstract: The evolution of Large Language Model (LLM) agents for software engineering (SWE) is constrained by the scarcity of verifiable datasets, a bottleneck stemming from the complexity of constructing executable environments across diverse languages. To address this, we introduce MEnvAgent, a Multi-language framework for automated Environment construction that facilitates scalable generation […]
Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model
arXiv:2603.25184v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has become essential for post-training large language models (LLMs) in reasoning tasks. While scaling rollouts can stabilize training and enhance performance, the computational overhead is a critical issue. In algorithms like GRPO, multiple rollouts per prompt incur prohibitive costs, as a large portion of prompts provide negligible […]