When to Lock Attention: Training-Free KV Control in Video Diffusion

arXiv:2603.09657v1 Announce Type: cross Abstract: Maintaining background consistency while enhancing foreground quality remains a core challenge in video editing. Injecting full-image information often leads to background artifacts, whereas rigid background locking severely constrains the model’s capacity for foreground generation. To address this issue, we propose KV-Lock, a training-free framework tailored for DiT-based video diffusion models. […]

Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment

arXiv:2506.01290v2 Announce Type: replace-cross Abstract: High-quality time series (TS) data are essential for ensuring TS model performance, rendering research on rating TS data quality indispensable. Existing methods have shown promising rating accuracy within individual domains, primarily by extending data quality rating techniques such as influence functions and Shapley values to account for temporal characteristics. However, […]

Debiasing International Attitudes: LLM Agents for Simulating US-China Perception Changes

arXiv:2508.08837v3 Announce Type: replace-cross Abstract: Large Language Models (LLMs) offer transformative opportunities to address the longstanding challenge of modeling opinion evolution in computational social science. This study investigates how media influences cross-border attitudes – a key driver of global polarization – by developing an LLM-agent framework to disentangle sources of bias and assess LLMs’ capacity […]

VoiceBridge: General Speech Restoration with One-step Latent Bridge Models

arXiv:2509.25275v5 Announce Type: replace-cross Abstract: Bridge models have been investigated in speech enhancement but are mostly single-task, with constrained general speech restoration (GSR) capability. In this work, we propose VoiceBridge, a one-step latent bridge model (LBM) for GSR, capable of efficiently reconstructing 48 kHz fullband speech from diverse distortions. To inherit the advantages of data-domain […]

Duality in mass-action networks

arXiv:2603.08767v1 Announce Type: new Abstract: Mass-action networks are special cases of chemical reaction networks. For these systems, we argue that conserved quantities are dual to internal cycles. We introduce maximal invariant polyhedral supports, and we conjecture that there is a duality relation between preclusters and maximal invariant polyhedral supports. Given the close relation between maximal […]

Duality in mass-action networks

arXiv:2603.08767v1 Announce Type: new Abstract: Mass-action networks are special cases of chemical reaction networks. For these systems, we argue that conserved quantities are dual to internal cycles. We introduce maximal invariant polyhedral supports, and we conjecture that there is a duality relation between preclusters and maximal invariant polyhedral supports. Given the close relation between maximal […]

When to Lock Attention: Training-Free KV Control in Video Diffusion

arXiv:2603.09657v1 Announce Type: cross Abstract: Maintaining background consistency while enhancing foreground quality remains a core challenge in video editing. Injecting full-image information often leads to background artifacts, whereas rigid background locking severely constrains the model’s capacity for foreground generation. To address this issue, we propose KV-Lock, a training-free framework tailored for DiT-based video diffusion models. […]

From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents

arXiv:2601.22607v3 Announce Type: replace Abstract: Interactive tool-using agents must solve real-world tasks via multi-turn interaction with both humans and external environments, requiring dialogue state tracking, multi-step tool execution, while following complex instructions. Post-training such agents is challenging because synthesis for high-quality multi-turn tool-use data is difficult to scale, and reinforcement learning (RL) could face noisy […]

Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning

arXiv:2603.09184v1 Announce Type: cross Abstract: Most multi-agent systems rely exclusively on autoregressive language models (ARMs) that are based on sequential generation. Although effective for fluent text, ARMs limit global reasoning and plan revision. On the other hand, Discrete Diffusion Language Models (DDLMs) enable non-sequential, globally revisable generation and have shown strong planning capabilities, but their […]

MIL-PF: Multiple Instance Learning on Precomputed Features for Mammography Classification

arXiv:2603.09374v1 Announce Type: cross Abstract: Modern foundation models provide highly expressive visual representations, yet adapting them to high-resolution medical imaging remains challenging due to limited annotations and weak supervision. Mammography, in particular, is characterized by large images, variable multi-view studies and predominantly breast-level labels, making end-to-end fine-tuning computationally expensive and often impractical. We propose Multiple […]

ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video

arXiv:2603.09170v1 Announce Type: cross Abstract: Achieving versatile and naturalistic whole-body control for humanoid robot scene-interaction remains a significant challenge. While some recent works have demonstrated autonomous humanoid interactive control, they are constrained to rigid locomotion patterns and expensive teleoperation data collection, lacking the versatility to execute more human-like natural behaviors such as sitting or kicking. […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844