Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

arXiv:2603.02175v3 Announce Type: replace-cross Abstract: Instruction-based video editing has witnessed rapid progress, yet current methods often struggle with precise visual control, as natural language is inherently limited in describing complex visual nuances. Although reference-guided editing offers a robust solution, its potential is currently bottlenecked by the scarcity of high-quality paired training data. To bridge this […]

A Reference Architecture of Reinforcement Learning Frameworks

arXiv:2603.06413v1 Announce Type: cross Abstract: The surge in reinforcement learning (RL) applications gave rise to diverse supporting technology, such as RL frameworks. However, the architectural patterns of these frameworks are inconsistent across implementations and there exists no reference architecture (RA) to form a common basis of comparison, evaluation, and integration. To address this gap, we […]

BEVLM: Distilling Semantic Knowledge from LLMs into Bird’s-Eye View Representations

arXiv:2603.06576v1 Announce Type: cross Abstract: The integration of Large Language Models (LLMs) into autonomous driving has attracted growing interest for their strong reasoning and semantic understanding abilities, which are essential for handling complex decision-making and long-tail scenarios. However, existing methods typically feed LLMs with tokens from multi-view and multi-frame images independently, leading to redundant computation […]

ContextBench: Modifying Contexts for Targeted Latent Activation

arXiv:2506.15735v2 Announce Type: replace Abstract: Identifying inputs that trigger specific behaviours or latent features in language models could have a wide range of safety use cases. We investigate a class of methods capable of generating targeted, linguistically fluent inputs that activate specific latent features or elicit model behaviours. We formalise this approach as context modification […]

Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities

arXiv:2602.05073v2 Announce Type: replace Abstract: Uncertainty quantification (UQ) for large language models (LLMs) is a key building block for safety guardrails of daily LLM applications. Yet, even as LLM agents are increasingly deployed in highly complex tasks, most UQ research still centers on single-turn question-answering. We argue that UQ research must shift to realistic settings […]

MOOSEnger — a Domain-Specific AI Agent for the MOOSE Ecosystem

arXiv:2603.04756v2 Announce Type: replace Abstract: MOOSEnger is a tool-enabled AI agent tailored to the Multiphysics Object-Oriented Simulation Environment (MOOSE). MOOSE cases are specified in HIT “.i” input files; the large object catalog and strict syntax make initial setup and debugging slow. MOOSEnger offers a conversational workflow that turns natural-language intent into runnable inputs by combining […]

FALCON: Future-Aware Learning with Contextual Object-Centric Pretraining for UAV Action Recognition

arXiv:2409.18300v2 Announce Type: replace-cross Abstract: We introduce FALCON, a unified self-supervised video pretraining approach for UAV action recognition from raw RGB aerial footage, requiring no additional preprocessing at inference. UAV videos exhibit severe spatial imbalance: large, cluttered backgrounds dominate the field of view, causing reconstruction-based pretraining to waste capacity on uninformative regions and under-learn action-relevant […]

FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment

arXiv:2504.08603v4 Announce Type: replace-cross Abstract: Geometrically accurate and semantically expressive map representations have proven invaluable for robot deployment and task planning in unknown environments. Nevertheless, real-time, open-vocabulary semantic understanding of large-scale unknown environments still presents open challenges, mainly due to computational requirements. In this paper we present FindAnything, an open-world mapping framework that incorporates vision-language […]

Maximizing Asynchronicity in Event-based Neural Networks

arXiv:2505.11165v2 Announce Type: replace-cross Abstract: Event cameras deliver visual data with high temporal resolution, low latency, and minimal redundancy, yet their asynchronous, sparse sequential nature challenges standard tensor-based machine learning (ML). While the recent asynchronous-to-synchronous (A2S) paradigm aims to bridge this gap by asynchronously encoding events into learned features for ML pipelines, existing A2S approaches […]

TrinityDNA: A Bio-Inspired Foundational Model for Efficient Long-Sequence DNA Modeling

arXiv:2507.19229v2 Announce Type: replace-cross Abstract: The modeling of genomic sequences presents unique challenges due to their length and structural complexity. Traditional sequence models struggle to capture long-range dependencies and biological features inherent in DNA. In this work, we propose TrinityDNA, a novel DNA foundational model designed to address these challenges. The model integrates biologically informed […]

Non-Monotone Traveling Waves of the Weak Competition Lotka-Volterra System

arXiv:2510.04501v2 Announce Type: replace-cross Abstract: We investigate traveling wave solutions in the two-species reaction-diffusion Lotka-Volterra competition system under weak competition. For the strict weak competition regime $(b0)$, we construct refined upper and lower solutions combined with the Schauder fixed point theorem to establish the existence of traveling waves for all wave speeds $sgeq s^*:=max2,2sqrtad$, and […]

The Persistence of Cultural Memory: Investigating Multimodal Iconicity in Diffusion Models

arXiv:2511.11435v2 Announce Type: replace-cross Abstract: The ambiguity between generalization and memorization in TTI diffusion models becomes pronounced when prompts invoke culturally shared visual references, a phenomenon we term multimodal iconicity. These are instances in which images and texts reflect established cultural associations, such as when a title recalls a familiar artwork or film scene. Such […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844