arXiv:2603.25146v1 Announce Type: cross Abstract: Context: The rapid adoption of AI-assisted code generation tools, such as large language models (LLMs), is transforming software development practices. While these tools promise significant productivity gains, concerns regarding the quality, reliability, and security of AI-generated code are increasingly reported in both academia and industry. –Objective: This study aims to […]
FD$^2$: A Dedicated Framework for Fine-Grained Dataset Distillation
arXiv:2603.25144v1 Announce Type: cross Abstract: Dataset distillation (DD) compresses a large training set into a small synthetic set, reducing storage and training cost, and has shown strong results on general benchmarks. Decoupled DD further improves efficiency by splitting the pipeline into pretraining, sample distillation, and soft-label generation. However, existing decoupled methods largely rely on coarse […]
TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts
arXiv:2601.08881v2 Announce Type: replace-cross Abstract: Unified image generation and editing models suffer from severe task interference in dense diffusion transformers architectures, where a shared parameter space must compromise between conflicting objectives (e.g., local editing v.s. subject-driven generation). While the sparse Mixture-of-Experts (MoE) paradigm is a promising solution, its gating networks remain task-agnostic, operating based on […]
A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis
arXiv:2402.12760v2 Announce Type: replace-cross Abstract: Well-designed prompts have demonstrated the potential to guide text-to-image models in generating amazing images. Although existing prompt engineering methods can provide high-level guidance, it is challenging for novice users to achieve the desired results by manually entering prompts due to a discrepancy between novice-user-input prompts and the model-preferred prompts. To […]
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference
arXiv:2603.25730v1 Announce Type: cross Abstract: Autoregressive video diffusion models have demonstrated remarkable progress, yet they remain bottlenecked by intractable linear KV-cache growth, temporal repetition, and compounding errors during long-video generation. To address these challenges, we present PackForcing, a unified framework that efficiently manages the generation history through a novel three-partition KV-cache strategy. Specifically, we categorize […]
Planned Diffusion
arXiv:2510.18087v2 Announce Type: replace Abstract: Most large language models are autoregressive: they generate tokens one at a time. Discrete diffusion language models can generate multiple tokens in parallel, but sampling from them requires a denoising order: a strategy for deciding which tokens to decode at each step. Determining a good denoising order is difficult, and […]
GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs
arXiv:2603.25385v1 Announce Type: cross Abstract: Quantization techniques such as BitsAndBytes, AWQ, and GPTQ are widely used as a standard method in deploying large language models but often degrades accuracy when using low-bit representations, e.g., 4 bits. Low-rank correction methods (e.g., LQER, QERA, ASER) has been proposed to mitigate this issue, however, they restore all layers […]
Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes
arXiv:2603.25562v1 Announce Type: cross Abstract: On-policy distillation (OPD) is appealing for large language model (LLM) post-training because it evaluates teacher feedback on student-generated rollouts rather than fixed teacher traces. In long-horizon settings, however, the common sampled-token variant is fragile: it reduces distribution matching to a one-token signal and becomes increasingly unreliable as rollouts drift away […]
Foundry: Distilling 3D Foundation Models for the Edge
arXiv:2511.20721v2 Announce Type: replace-cross Abstract: Foundation models pre-trained with self-supervised learning (SSL) on large-scale datasets have become powerful general-purpose feature extractors. However, their immense size and computational cost make them prohibitive for deployment on edge devices such as robots and AR/VR headsets. Existing compression techniques like standard knowledge distillation create efficient ‘specialist’ models but sacrifice […]
TRACE: A Multi-Agent System for Autonomous Physical Reasoning for Seismology
arXiv:2603.21152v3 Announce Type: replace-cross Abstract: Inferring physical mechanisms that govern earthquake sequences from geophysical observations remains a challenging task, particularly across tectonically distinct environments where similar seismic patterns can reflect different underlying processes. Current seismological processing and interpretation rely heavily on experts’ choice of parameters and the synthesis of various seismological products, limiting reproducibility and […]
CodeNER: Code Prompting for Named Entity Recognition
arXiv:2507.20423v4 Announce Type: replace-cross Abstract: Recent studies have explored various approaches for treating candidate named entity spans as both source and target sequences in named entity recognition (NER) by leveraging large language models (LLMs). Although previous approaches have successfully generated candidate named entity spans with suitable labels, they rely solely on input context information when […]
Constant-Time Motion Planning with Manipulation Behaviors
arXiv:2512.00939v2 Announce Type: replace-cross Abstract: Recent progress in contact-rich robotic manipulation has been striking, yet most deployed systems remain confined to simple, scripted routines. One of the key barriers is the lack of motion planning algorithms that can provide verifiable guarantees for safety, efficiency and reliability. To address this, a family of algorithms called Constant-Time […]