arXiv:2601.06188v2 Announce Type: replace Abstract: The size and capabilities of Earth-observing satellite constellations are rapidly increasing. Leveraging distributed onboard control, we can enable novel time-sensitive measurements and responses. However, deploying autonomy to large multiagent satellite systems necessitates algorithms with efficient computation and communication. We tackle this challenge and propose new, online algorithms for large-scale dynamic […]
Stroboscopic motion reversals in delay-coupled neural fields
arXiv:2601.19125v1 Announce Type: new Abstract: Visual illusions provide a window into the mechanisms underlying visual processing, and dynamical neural circuit models offer a natural framework for proposing and testing theories of their emergence. We propose and analyze a delay-coupled neural field model that explains stroboscopic percepts arising from the subsampling of a moving, often rotating, […]
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
arXiv:2410.11234v4 Announce Type: replace-cross Abstract: Offline reinforcement learning (RL) is a powerful approach for data-driven decision-making and control. Compared to model-free methods, offline model-based reinforcement learning (MBRL) explicitly learns world models from a static dataset and uses them as surrogate simulators, improving the data efficiency and enabling the learned policy to potentially generalize beyond the […]
Decoupled Split Learning via Auxiliary Loss
arXiv:2601.19261v1 Announce Type: cross Abstract: Split learning is a distributed training paradigm where a neural network is partitioned between clients and a server, which allows data to remain at the client while only intermediate activations are shared. Traditional split learning relies on end-to-end backpropagation across the client-server split point. This incurs a large communication overhead […]
Length-Adaptive Interest Network for Balancing Long and Short Sequence Modeling in CTR Prediction
arXiv:2601.19142v1 Announce Type: new Abstract: User behavior sequences in modern recommendation systems exhibit significant length heterogeneity, ranging from sparse short-term interactions to rich long-term histories. While longer sequences provide more context, we observe that increasing the maximum input sequence length in existing CTR models paradoxically degrades performance for short-sequence users due to attention polarization and […]
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
arXiv:2601.19280v1 Announce Type: cross Abstract: Recent progress in Large Language Model (LLM) reasoning is increasingly driven by the refinement of post-training loss functions and alignment strategies. However, standard Reinforcement Learning (RL) paradigms like Group Relative Policy Optimization (GRPO) remain constrained by static uniformity: uniform prompt sampling and a fixed number of rollouts per prompt. For […]
Entropy-Gated Branching for Efficient Test-Time Reasoning
arXiv:2503.21961v4 Announce Type: replace-cross Abstract: Test-time compute methods can significantly improve the reasoning capabilities and problem-solving accuracy of large language models (LLMs). However, these approaches require substantially more computational resources, with most compute wasted on exploring low-diversity branches where the model already exhibits high confidence. We observe that a small subset of uncertain reasoning steps […]
When Benchmarks Leak: Inference-Time Decontamination for LLMs
arXiv:2601.19334v1 Announce Type: cross Abstract: Benchmark-based evaluation is the de facto standard for comparing large language models (LLMs). However, its reliability is increasingly threatened by test set contamination, where test samples or their close variants leak into training data and artificially inflate reported performance. To address this issue, prior work has explored two main lines […]
TS-Debate: Multimodal Collaborative Debate for Zero-Shot Time Series Reasoning
arXiv:2601.19151v1 Announce Type: new Abstract: Recent progress at the intersection of large language models (LLMs) and time series (TS) analysis has revealed both promise and fragility. While LLMs can reason over temporal structure given carefully engineered context, they often struggle with numeric fidelity, modality interference, and principled cross-modal integration. We present TS-Debate, a modality-specialized, collaborative […]
Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection
arXiv:2601.19375v1 Announce Type: cross Abstract: Despite significant progress in alignment, large language models (LLMs) remain vulnerable to adversarial attacks that elicit harmful behaviors. Activation steering techniques offer a promising inference-time intervention approach, but existing methods suffer from critical limitations: activation addition requires careful coefficient tuning and is sensitive to layer-specific norm variations, while directional ablation […]
Why is Your Language Model a Poor Implicit Reward Model?
arXiv:2507.07981v3 Announce Type: replace-cross Abstract: Reward models are key to language model post-training and inference pipelines. Conveniently, recent work showed that every language model defines an implicit reward model (IM-RM), without requiring any architectural changes. However, such IM-RMs tend to generalize worse, especially out-of-distribution, compared to explicit reward models (EX-RMs) that apply a dedicated linear […]
Residual Tokens Enhance Masked Autoencoders for Speech Modeling
arXiv:2601.19399v1 Announce Type: cross Abstract: Recent speech modeling relies on explicit attributes such as pitch, content, and speaker identity, but these alone cannot capture the full richness of natural speech. We introduce RT-MAE, a novel masked autoencoder framework that augments the supervised attributes-based modeling with unsupervised residual trainable tokens, designed to encode the information not […]