Escaping the Verifier: Learning to Reason via Demonstrations

Scalable Multi-Objective and Meta Reinforcement Learning via Gradient Estimation

arXiv:2511.12779v2 Announce Type: replace-cross Abstract: We study the problem of efficiently estimating policies that simultaneously optimize multiple objectives in reinforcement learning (RL). Given $n$ objectives

Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training

arXiv:2512.08894v1 Announce Type: cross Abstract: While scaling laws for Large Language Models (LLMs) traditionally focus on proxy metrics like pretraining loss, predicting downstream task performance

TS-PEFT: Unveiling Token-Level Redundancy in Parameter-Efficient Fine-Tuning

arXiv:2511.16147v2 Announce Type: replace-cross Abstract: Current Parameter-Efficient Fine-Tuning (PEFT) methods typically operate under an implicit assumption: once a target module is selected, every token passing

Decoupling Template Bias in CLIP: Harnessing Empty Prompts for Enhanced Few-Shot Learning

arXiv:2512.08606v1 Announce Type: cross Abstract: The Contrastive Language-Image Pre-Training (CLIP) model excels in few-shot learning by aligning visual and textual representations. Our study shows that

On the Temporality for Sketch Representation Learning

arXiv:2512.04007v2 Announce Type: replace-cross Abstract: Sketches are simple human hand-drawn abstractions of complex scenes and real-world objects. Although the field of sketch representation learning has