ResBM: Residual Bottleneck Models for Low-Bandwidth Pipeline Parallelism

arXiv:2604.11947v1 Announce Type: cross Abstract: Unlocking large-scale low-bandwidth decentralized training has the potential to utilize otherwise untapped compute resources. In centralized settings, large-scale multi-node training is primarily enabled by data and pipeline parallelism, two techniques that require ultra-high-bandwidth communication. While efficient methods now exist for decentralized data parallelism, pipeline parallelism remains the primary challenge. Recent […]

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

arXiv:2604.12374v1 Announce Type: cross Abstract: We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for […]

KG-Reasoner: A Reinforced Model for End-to-End Multi-Hop Knowledge Graph Reasoning

arXiv:2604.12487v1 Announce Type: cross Abstract: Large Language Models (LLMs) exhibit strong abilities in natural language understanding and generation, yet they struggle with knowledge-intensive reasoning. Structured Knowledge Graphs (KGs) provide an effective form of external knowledge representation and have been widely used to enhance performance in classical Knowledge Base Question Answering (KBQA) tasks. However, performing precise […]

Continuous Knowledge Metabolism: Generating Scientific Hypotheses from Evolving Literature

arXiv:2604.12243v1 Announce Type: cross Abstract: Scientific hypothesis generation requires tracking how knowledge evolves, not just what is currently known. We introduce Continuous Knowledge Metabolism (CKM), a framework that processes scientific literature through sliding time windows and incrementally updates a structured knowledge base as new findings arrive. We present CKM-Lite, an efficient variant that achieves strong […]

Self-Monitoring Benefits from Structural Integration: Lessons from Metacognition in Continuous-Time Multi-Timescale Agents

arXiv:2604.11914v1 Announce Type: new Abstract: Self-monitoring capabilities — metacognition, self-prediction, and subjective duration — are often proposed as useful additions to reinforcement learning agents. But do they actually help? We investigate this question in a continuous-time multi-timescale agent operating in predator-prey survival environments of varying complexity, including a 2D partially observable variant. We first show […]

IAD-Unify: A Region-Grounded Unified Model for Industrial Anomaly Segmentation, Understanding, and Generation

arXiv:2604.12440v1 Announce Type: cross Abstract: Real-world industrial inspection requires not only localizing defects, but also explaining them in natural language and generating controlled defect edits. However, existing approaches fail to jointly support all three capabilities within a unified framework and evaluation protocol. We propose IAD-Unify, a dual-encoder unified framework in which a frozen DINOv2-based region […]

Evaluating the Limitations of Protein Sequence Representations for Parkinson’s Disease Classification

arXiv:2604.11852v1 Announce Type: new Abstract: The identification of reliable molecular biomarkers for Parkinson’s disease remains challenging due to its multifactorial nature. Although protein sequences constitute a fundamental and widely available source of biological information, their standalone discriminative capacity for complex disease classification remains unclear. In this work, we present a controlled and leakage-free evaluation of […]

The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap

arXiv:2604.11828v1 Announce Type: new Abstract: Science is widely regarded as humanity’s most reliable method for uncovering truths about the natural world. Yet the emphtrajectory of scientific discovery is rarely examined as an optimization problem in its own right. This paper argues that the body of scientific knowledge, at any given historical moment, represents a emphlocal […]

How Transformers Learn to Plan via Multi-Token Prediction

arXiv:2604.11912v1 Announce Type: cross Abstract: While next-token prediction (NTP) has been the standard objective for training language models, it often struggles to capture global structure in reasoning tasks. Multi-token prediction (MTP) has recently emerged as a promising alternative, yet its underlying mechanisms remain poorly understood. In this paper, we study how MTP facilitates reasoning, with […]

Context Kubernetes: Declarative Orchestration of Enterprise Knowledge for Agentic AI Systems

arXiv:2604.11623v2 Announce Type: replace Abstract: We introduce Context Kubernetes, an architecture for orchestrating enterprise knowledge in agentic AI systems, with a prototype implementation and eight experiments. The core observation is that delivering the right knowledge, to the right agent, with the right permissions, at the right freshness — across an entire organization — is structurally […]

The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results

arXiv:2604.11998v1 Announce Type: cross Abstract: Cross-domain few-shot object detection (CD-FSOD) remains a challenging problem for existing object detectors and few-shot learning approaches, particularly when generalizing across distinct domains. As part of NTIRE 2026, we hosted the second CD-FSOD Challenge to systematically evaluate and promote progress in detecting objects in unseen target domains under limited annotation […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844