HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission

arXiv:2510.19470v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) has become a popular architecture for scaling large models. However, the rapidly growing scale outpaces model training on a single DC, driving a shift toward a more flexible, cross-DC training paradigm. Under this, Expert Parallelism (EP) of MoE faces significant scalability issues due to the limited cross-DC bandwidth. […]

Optimizing the Unknown: Black Box Bayesian Optimization with Energy-Based Model and Reinforcement Learning

arXiv:2510.19530v1 Announce Type: cross Abstract: Existing Bayesian Optimization (BO) methods typically balance exploration and exploitation to optimize costly objective functions. However, these methods often suffer from a significant one-step bias, which may lead to convergence towards local optima and poor performance in complex or high-dimensional tasks. Recently, Black-Box Optimization (BBO) has achieved success across various […]

Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

arXiv:2510.19338v1 Announce Type: cross Abstract: In this technical report, we present the Ring-linear model series, specifically including Ring-mini-linear-2.0 and Ring-flash-linear-2.0. Ring-mini-linear-2.0 comprises 16B parameters and 957M activations, while Ring-flash-linear-2.0 contains 104B parameters and 6.1B activations. Both models adopt a hybrid architecture that effectively integrates linear attention and softmax attention, significantly reducing I/O and computational overhead […]

ColorAgent: Building A Robust, Personalized, and Interactive OS Agent

arXiv:2510.19386v1 Announce Type: cross Abstract: With the advancements in hardware, software, and large language model technologies, the interaction between humans and operating systems has evolved from the command-line interface to the rapidly emerging AI agent interactions. Building an operating system (OS) agent capable of executing user instructions and faithfully following user desires is becoming a […]

An Active Diffusion Neural Network for Graphs

arXiv:2510.19202v1 Announce Type: cross Abstract: The analogy to heat diffusion has enhanced our understanding of information flow in graphs and inspired the development of Graph Neural Networks (GNNs). However, most diffusion-based GNNs emulate passive heat diffusion, which still suffers from over-smoothing and limits their ability to capture global graph information. Inspired by the heat death […]

Knowledge and Common Knowledge of Strategies

arXiv:2510.19298v1 Announce Type: cross Abstract: Most existing work on strategic reasoning simply adopts either an informed or an uninformed semantics. We propose a model where knowledge of strategies can be specified on a fine-grained level. In particular, it is possible to distinguish first-order, higher-order, and common knowledge of strategies. We illustrate the effect of higher-order […]

WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation

arXiv:2510.19205v1 Announce Type: new Abstract: Current evaluation of web agents largely reduces to binary success metrics or conformity to a single reference trajectory, ignoring the structural diversity present in benchmark datasets. We present WebGraphEval, a framework that abstracts trajectories from multiple agents into a unified, weighted action graph. This representation is directly compatible with benchmarks […]

InvarGC: Invariant Granger Causality for Heterogeneous Interventional Time Series under Latent Confounding

arXiv:2510.19138v1 Announce Type: cross Abstract: Granger causality is widely used for causal structure discovery in complex systems from multivariate time series data. Traditional Granger causality tests based on linear models often fail to detect even mild non-linear causal relationships. Therefore, numerous recent studies have investigated non-linear Granger causality methods, achieving improved performance. However, these methods […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844