Direct 3D-Aware Object Insertion via Decomposed Visual Proxies

arXiv:2606.06601v1 Announce Type: cross Abstract: Object insertion aims to seamlessly composite a reference object into a specified region of a background image. Recent diffusion-based methods achieve high visual quality but formulate insertion as a simple 2D inpainting task, providing no explicit control over the object’s 3D pose and limiting their practical applicability. We propose DIRECT […]

Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference

arXiv:2605.26099v3 Announce Type: replace-cross Abstract: Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs […]

Beyond Waypoints: A Trajectory-Centric Waypointing Paradigm for Vision-Language Navigation

arXiv:2606.07244v1 Announce Type: cross Abstract: Vision-Language Navigation in Continuous Environments (VLN-CE) requires agents to follow natural-language instructions while navigating in real-world-like environments. Most VLN-CE approach-es adopt a three-stage framework: a waypoint predictor proposes navigable waypoints, and a navigator selects the best waypoint, with a low-level controller executing the movement to it. However, this decoupled paradigm […]

Autoregression-Free Neural Operators for Time-Dependent PDEs

arXiv:2605.25413v3 Announce Type: replace-cross Abstract: Neural operators learn mappings from function-dependent inputs to solutions, providing an effective framework for solving partial differential equations (PDEs). For time-dependent PDEs, existing methods typically perform long-horizon prediction through autoregressive rollout directly in high-dimensional physical field spaces, where each predicted state is recursively fed back as the input for the […]

When Large Language Models Fail in Healthcare: Evaluating Sensitivity to Prompt Variations

arXiv:2606.07237v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used in healthcare for tasks such as clinical question answering, diagnosis support, and report summarization. Despite their promise, these models remain highly sensitive to subtle prompt perturbations, both lexical and syntactic, posing serious risks in safety-critical clinical applications. In this study, we conduct a […]

ActQuant: Sub-4-bit Action-Guided Quantization for Vision-Language-Action Models

arXiv:2605.24011v2 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models exhibit remarkable action generation for embodied intelligence, but their heavy compute make deployment on edge platforms impractical. Aggressive, sub-4-bit weight quantization is the natural solution, yet existing post-training quantization (PTQ) methods suffer severe performance degradation in this regime. To address this, we introduce ActQuant, an action-guided mixed-precision […]

DualGate-Net: A Prior-Gated Dual-Encoder Framework for Histopathology Cell Detection

arXiv:2606.07222v1 Announce Type: cross Abstract: Cell detection in histopathology images strongly depends on surrounding tissue context, where visually similar cells may belong to different classes under different microenvironments. Recent tissue-aware methods incorporate contextual priors, but often rely on static fusion strategies that may propagate noisy information. In this work, we propose DualGate-Net, a prior-aware dual-encoder […]

CHoE: Cross-Domain Heterogeneous Graph Prompt Learning via Structure-Conditioned Experts

arXiv:2605.15888v2 Announce Type: replace-cross Abstract: Heterogeneous Graph Prompt Learning (HGPL)has emerged as a promising paradigm for bridging the gap between the objectives of pre-training foundation models and their downstream applications in heterogeneous graph settings. However, existing HGPL methods are primarily designed for in-domain scenarios, whereas real-world deployments often span multiple domains, and the data used […]

An Abstract Architecture for Explainable Autonomy in Hazardous Environments

arXiv:2606.07211v1 Announce Type: cross Abstract: Autonomous robotic systems are being proposed for use in hazardous environments, often to reduce the risks to human workers. In the immediate future, it is likely that human workers will continue to use and direct these autonomous robots, much like other computerised tools but with more sophisticated decision-making. Therefore, one […]

Superintelligent Retrieval Agent: The Next Frontier of Agentic Retrieval

arXiv:2605.06647v2 Announce Type: replace-cross Abstract: Retrieval-augmented agents are increasingly the interface to large knowledge bases, yet most treat retrieval as a black box: they issue exploratory queries, inspect snippets, and reformulate until evidence emerges. This resembles how a newcomer searches an unfamiliar database rather than how an expert navigates it with strong priors about terminology […]

RETROSPECT: RETROsynthesis via Sequential Prediction, and Chemically Transformed-ranking

arXiv:2606.07181v1 Announce Type: cross Abstract: Single-step retrosynthesis needs both accurate first-ranked suggestions and candidate lists that are rich enough for downstream selection. We study this as a proposal-selection decomposition. Our system, RETROSPECT, combines a single Transformer proposal model, which we call the ChemAlign Transformer, with a LambdaMART reranker over structural, reaction-template, upstream-score, and optional DFT-derived […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844