Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs

arXiv:2602.23136v2 Announce Type: replace-cross Abstract: Numerous studies have shown that multimodal LLMs process speech and images well but fail in non-intuitive ways rendering trivial tasks such as object counting unreliable. We investigate this behavior from an information-theoretic perspective by framing multimodal LLM inference as a mismatched decoder problem: a decoder trained primarily on text can […]

AI End-to-End Radiation Treatment Planning Under One Second

arXiv:2603.06338v1 Announce Type: cross Abstract: Artificial intelligence-based radiation therapy (RT) planning has the potential to reduce planning time and inter-planner variability, improving efficiency and consistency in clinical workflows. Most existing automated approaches rely on multiple dose evaluations and corrections, resulting in plan generation times of several minutes. We introduce AIRT (Artificial Intelligence-based Radiotherapy), an end-to-end […]

Theory of Code Space: Do Code Agents Understand Software Architecture?

arXiv:2603.00601v3 Announce Type: replace-cross Abstract: AI code agents excel at isolated tasks yet struggle with multi-file software engineering requiring architectural understanding. We introduce Theory of Code Space (ToCS), a benchmark that evaluates whether agents can construct, maintain, and update coherent architectural beliefs during codebase exploration. Agents explore procedurally generated codebases under partial observability — opening […]

FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment

arXiv:2602.17095v2 Announce Type: replace-cross Abstract: Parameter-efficient fine-tuning techniques such as low-rank adaptation (LoRA) enable large language models (LLMs) to adapt to downstream tasks efficiently. Federated learning (FL) further facilitates this process by enabling collaborative fine-tuning across distributed clients without sharing private data. However, the use of two separate low-rank matrices in LoRA for federated fine-tuning […]

IntelliAsk: Learning to Ask High-Quality Research Questions via RLVR

arXiv:2602.15849v2 Announce Type: replace-cross Abstract: Peer review relies on substantive, evidence-based questions, yet current LLMs generate surface-level queries that perform worse than human reviewer questions in expert evaluation. To address this gap, we curate a high-quality dataset of reviewer questions from OpenReview and conduct a human preference study where expert annotators evaluate question-paper pairs across […]

MoEless: Efficient MoE LLM Serving via Serverless Computing

arXiv:2603.06350v1 Announce Type: cross Abstract: Large Language Models (LLMs) have become a cornerstone of AI, driving progress across diverse domains such as content creation, search and recommendation systems, and AI-assisted workflows. To alleviate extreme training costs and advancing model scales, Mixture-of-Experts (MoE) has become a popular backbone for modern LLMs, which are commonly served in […]

Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

arXiv:2603.02175v3 Announce Type: replace-cross Abstract: Instruction-based video editing has witnessed rapid progress, yet current methods often struggle with precise visual control, as natural language is inherently limited in describing complex visual nuances. Although reference-guided editing offers a robust solution, its potential is currently bottlenecked by the scarcity of high-quality paired training data. To bridge this […]

Dynamic Chunking Diffusion Transformer

arXiv:2603.06351v1 Announce Type: cross Abstract: Diffusion Transformers process images as fixed-length sequences of tokens produced by a static $textitpatchify$ operation. While effective, this design spends uniform compute on low- and high-information regions alike, ignoring that images contain regions of varying detail and that the denoising process progresses from coarse structure at early timesteps to fine […]

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

arXiv:2603.02406v2 Announce Type: replace-cross Abstract: Generative models have recently advanced $textitde novo$ protein design by learning the statistical regularities of natural structures. However, current approaches face three key limitations: (1) Existing methods cannot jointly learn protein geometry and design tasks, where pretraining can be a solution; (2) Current pretraining methods mostly rely on local, non-rigid […]

Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

arXiv:2603.03332v2 Announce Type: replace-cross Abstract: Chain-of-Thought (CoT) prompting has emerged as a foundational technique for eliciting reasoning from Large Language Models (LLMs), yet the robustness of this approach to corruptions in intermediate reasoning steps remains poorly understood. This paper presents a comprehensive empirical evaluation of LLM robustness to a structured taxonomy of 5 CoT perturbation […]

CLAIRE: Compressed Latent Autoencoder for Industrial Representation and Evaluation — A Deep Learning Framework for Smart Manufacturing

arXiv:2603.06361v1 Announce Type: cross Abstract: Accurate fault detection in high-dimensional industrial environments remains a major challenge due to the inherent complexity, noise, and redundancy in sensor data. This paper introduces CLAIRE, i.e., a hybrid end-to-end learning framework that integrates unsupervised deep representation learning with supervised classification for intelligent quality control in smart manufacturing systems. It […]

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning

arXiv:2603.03704v2 Announce Type: replace-cross Abstract: Robot planning in partially observable environments, where not all objects are known or visible, is a challenging problem, as it requires reasoning under uncertainty through partially observable Markov decision processes. During the execution of a computed plan, a robot may unexpectedly observe task-irrelevant objects, which are typically ignored by naive […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844