The Weight Gram Matrix Captures Sequential Feature Linearization in Deep Networks

arXiv:2605.06258v1 Announce Type: cross Abstract: Understanding how deep neural networks learn representations remains a central challenge in machine learning theory. In this work, we propose a feature-centric framework for analyzing neural network training by relating weight updates to feature evolution. We introduce a simple identity, the Feature Learning Equation, which identifies the weight Gram matrix […]

Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Systems Perspective

arXiv:2605.06347v1 Announce Type: cross Abstract: Large language models (LLMs) are reshaping how knowledge is produced, with increasing reliance on AI systems for generation, summarization, and reasoning. While prior work has studied cognitive offloading in humans and model collapse in recursive training, these effects are typically considered in isolation. We propose a unified perspective: humans and […]

Cross-Modal Navigation with Multi-Agent Reinforcement Learning

arXiv:2605.06595v1 Announce Type: cross Abstract: Robust embodied navigation relies on complementary sensory cues. However, high-quality and well-aligned multi-modal data is often difficult to obtain in practice. Training a monolithic model is also challenging as rich multi-modal inputs induce complex representations and substantially enlarge the policy space. Cross-modal collaboration among lightweight modality-specialized agents offers a scalable […]

ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

arXiv:2605.06667v1 Announce Type: cross Abstract: For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor’s motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new scene and enables per-frame control of intrinsic and […]

SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition

arXiv:2511.21471v4 Announce Type: replace Abstract: Spatial cognition is fundamental to real-world multimodal intelligence, allowing models to effectively interact with the physical environment. While multimodal large language models (MLLMs) have made significant strides, existing benchmarks often oversimplify spatial cognition, reducing it to a single-dimensional metric, which fails to capture the hierarchical structure and interdependence of spatial […]

Position: agentic AI orchestration should be Bayes-consistent

arXiv:2605.00742v2 Announce Type: replace Abstract: LLMs excel at predictive tasks and complex reasoning tasks, but many high-value deployments rely on decisions under uncertainty, for example, which tool to call, which expert to consult, or how many resources to invest. While the usefulness and feasibility of Bayesian approaches remain unclear for LLM inference, this position paper […]

CatNet: Controlling the False Discovery Rate in LSTM with SHAP Feature Importance and Gaussian Mirrors

arXiv:2411.16666v4 Announce Type: replace-cross Abstract: We introduce CatNet, an algorithm that effectively controls False Discovery Rate (FDR) and selects significant features in LSTM. CatNet employs the derivative of SHAP values to quantify the feature importance, and constructs a vector-formed mirror statistic for FDR control with the Gaussian Mirror algorithm. To avoid instability due to nonlinear […]

Multi-Objective Instruction-Aware Representation Learning in Procedural Content Generation RL

arXiv:2508.09193v3 Announce Type: replace-cross Abstract: Recent advancements in generative modeling emphasize the importance of natural language as a highly expressive and accessible modality for controlling content generation. However, existing instructed reinforcement learning for procedural content generation (IPCGRL) method often struggle to leverage the expressive richness of textual input, especially under complex, multi-objective instructions, leading to […]

SoccerMaster: A Vision Foundation Model for Soccer Understanding

arXiv:2512.11016v2 Announce Type: replace-cross Abstract: Soccer understanding has recently garnered growing research interest due to its domain-specific complexity and unique challenges. Unlike prior works that typically rely on isolated, task-specific expert models, this work aims to propose a unified model to handle diverse soccer visual understanding tasks, ranging from fine-grained perception (e.g., athlete detection and […]

Leviathan: Decoupling Input and Output Representations in Language Models

arXiv:2601.22040v2 Announce Type: replace-cross Abstract: Modern language models use a single matrix for input embedding and output projection. This couples two distinct objectives: token representation and discrimination over a vocabulary. This work introduces Leviathan, a Transformer architecture that replaces the input embedding matrix with learned embedding vectorization (LEV), a compact continuous mapping from token indices […]

Super-Level-Set Regression: Conditional Quantiles via Volume Minimization

arXiv:2605.06210v1 Announce Type: cross Abstract: Constructing minimum-volume prediction regions that satisfy conditional coverage is a fundamental challenge in multivariate regression. Standard approaches rely on explicitly estimating the full conditional density and subsequently thresholding it. This two-step plug-in process is notoriously difficult, sensitive to estimation errors, and computationally expensive. One would like to instead optimize the […]

Learning Discrete Autoregressive Priors with Wasserstein Gradient Flow

arXiv:2605.06148v1 Announce Type: cross Abstract: Discrete image tokenizers are commonly trained in two stages: first for reconstruction, and then with a prior model fitted to the frozen token sequences. This decoupling leaves the tokenizer unaware of the model that will later generate its tokens. As a result, the learned tokens may preserve image information well […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844