SAGE-32B: Agentic Reasoning via Iterative Distillation

arXiv:2601.04237v2 Announce Type: replace Abstract: We demonstrate SAGE-32B, a 32 billion parameter language model that focuses on agentic reasoning and long range planning tasks. Unlike chat models that aim for general conversation fluency, SAGE-32B is designed to operate in an agentic loop, emphasizing task decomposition, tool usage, and error recovery. The model is initialized from […]

Mind the (DH) Gap! A Contrast in Risky Choices Between Reasoning and Conversational LLMs

arXiv:2602.15173v2 Announce Type: replace Abstract: The use of large language models either as decision support systems, or in agentic workflows, is rapidly transforming the digital ecosystem. However, the understanding of LLM decision-making under uncertainty remains limited. We study LLM risky choices along two dimensions: (1) prospect representation (based on an explicit representation or outcome history) […]

EHRAG: Bridging Semantic Gaps in Lightweight GraphRAG via Hybrid Hypergraph Construction and Retrieval

arXiv:2604.17458v2 Announce Type: replace Abstract: Graph-based Retrieval-Augmented Generation (GraphRAG) enhances LLMs by structuring corpus into graphs to facilitate multi-hop reasoning. While recent lightweight approaches reduce indexing costs by leveraging Named Entity Recognition (NER), they rely strictly on structural co-occurrence, failing to capture latent semantic connections between disjoint entities. To address this, we propose EHRAG, a […]

ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios

arXiv:2601.08620v2 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) pipelines must address challenges beyond simple single-document retrieval, such as interpreting visual elements (tables, charts, images), synthesizing information across documents, and providing accurate source grounding. Existing benchmarks fail to capture this complexity, often focusing on textual data, single-document comprehension, or evaluating retrieval and generation in isolation. We […]

GeoLaux: A Benchmark for Evaluating MLLMs’ Geometry Performance on Long-Step Problems Requiring Auxiliary Lines

arXiv:2508.06226v2 Announce Type: replace Abstract: Geometry problem solving (GPS) poses significant challenges for Multimodal Large Language Models (MLLMs) in diagram comprehension, knowledge application, long-step reasoning, and auxiliary line construction. However, current benchmarks lack fine-grained evaluation for long-step problems necessitating auxiliary construction. To address these limitations, we present GeoLaux, a fine-grained annotated dataset comprising 2186 calculation […]

Multi-Cycle Spatio-Temporal Adaptation in Human-Robot Teaming

arXiv:2604.19670v1 Announce Type: cross Abstract: Effective human-robot teaming is crucial for the practical deployment of robots in human workspaces. However, optimizing joint human-robot plans remains a challenge due to the difficulty of modeling individualized human capabilities and preferences. While prior research has leveraged the multi-cycle structure of domains like manufacturing to learn an individual’s tendencies […]

STaD: Scaffolded Task Design for Identifying Compositional Skill Gaps in LLMs

arXiv:2604.18177v2 Announce Type: replace-cross Abstract: Benchmarks are often used as a standard to understand LLM capabilities in different domains. However, aggregate benchmark scores provide limited insight into compositional skill gaps of LLMs and how to improve them. To make these weaknesses visible, we propose Scaffolded Task Design (STaD) framework. STaD generates controlled variations of benchmark […]

Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Advancing Time Series Foundation Models

arXiv:2603.16497v2 Announce Type: replace-cross Abstract: Time series foundation models (TSFMs) require diverse, real-world datasets to adapt across varying domains and temporal frequencies. However, current large-scale datasets predominantly focus on low-frequency time series with sampling intervals, i.e., time resolution, in the range of seconds to years, hindering their ability to capture the nuances of high-frequency time […]

ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators

arXiv:2512.09427v5 Announce Type: replace-cross Abstract: Existing memory management techniques severely hinder efficient Large Language Model serving on accelerators constrained by poor random-access bandwidth.While static pre-allocation preserves memory contiguity,it incurs significant overhead due to worst-case provisioning.Conversely,fine-grained paging mitigates this overhead but relies on HBM’s high random-access tolerance, making it unsuitable for LPDDR systems where non-sequential access […]

GAIR: Location-Aware Self-Supervised Contrastive Pre-Training with Geo-Aligned Implicit Representations

arXiv:2503.16683v2 Announce Type: replace-cross Abstract: Vision Transformer (ViT) has been widely used in computer vision tasks with excellent results by providing representations for a whole image or image patches. However, ViT lacks detailed localized image representations at arbitrary positions when applied to geospatial tasks that involve multiple geospatial data modalities, such as overhead remote sensing […]

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

arXiv:2508.21184v3 Announce Type: replace-cross Abstract: We propose a general-purpose approach for improving the ability of large language models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to act as effective multi-turn conversational agents and interactively interface with […]

AI-Based Detection of Temporal Changes in MR-Linac Images Acquired During Routine Prostate Radiotherapy

arXiv:2602.04983v2 Announce Type: replace-cross Abstract: Purpose: To investigate whether an AI-based method can detect subtle inter-fraction changes in MR-Linac images acquired during radiotherapy and explore the broader potential of MRLinac imaging. Methods: This retrospective study included longitudinal 0.35T MR-Linac images from 761 patients. To identify temporal changes, we employed a deep learning model using temporal […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844