EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce

arXiv:2512.08868v2 Announce Type: replace Abstract: Foundation agents have rapidly advanced in their ability to reason and interact with real environments, making the evaluation of their core capabilities increasingly important. While many benchmarks have been developed to assess agent performance, most concentrate on academic settings or artificially designed scenarios while overlooking the challenges that arise in […]

Optimizing the non-Clifford-count in unitary synthesis using Reinforcement Learning

arXiv:2509.21709v2 Announce Type: replace-cross Abstract: In this paper we study the potential of using reinforcement learning (RL) in order to synthesize quantum circuits, while optimizing the T-count and CS-count, of unitaries that are exactly implementable by the Clifford+T and Clifford+CS gate sets, respectively. We have designed our RL framework to work with channel representation of […]

ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning

arXiv:2512.10946v1 Announce Type: cross Abstract: Human-level contact-rich manipulation relies on the distinct roles of two key modalities: vision provides spatially rich but temporally slow global context, while force sensing captures rapid, high-frequency local contact dynamics. Integrating these signals is challenging due to their fundamental frequency and informational disparities. In this work, we propose ImplicitRDP, a […]

Achieving Trustworthy Real-Time Decision Support Systems with Low-Latency Interpretable AI Models

arXiv:2506.20018v2 Announce Type: replace Abstract: This paper investigates real-time decision support systems that leverage low-latency AI models, bringing together recent progress in holistic AI-driven decision tools, integration with Edge-IoT technologies, and approaches for effective human-AI teamwork. It looks into how large language models can assist decision-making, especially when resources are limited. The research also examines […]

$mathrmD^mathrm3$-Predictor: Noise-Free Deterministic Diffusion for Dense Prediction

arXiv:2512.07062v2 Announce Type: replace-cross Abstract: Although diffusion models with strong visual priors have emerged as powerful dense prediction backboens, they overlook a core limitation: the stochastic noise at the core of diffusion sampling is inherently misaligned with dense prediction that requires a deterministic mapping from image to geometry. In this paper, we show that this […]

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

arXiv:2512.10791v1 Announce Type: cross Abstract: We introduce The FACTS Leaderboard, an online leaderboard suite and associated set of benchmarks that comprehensively evaluates the ability of language models to generate factually accurate text across diverse scenarios. The suite provides a holistic measure of factuality by aggregating the performance of models on four distinct sub-leaderboards: (1) FACTS […]

Examining the Metrics for Document-Level Claim Extraction in Czech and Slovak

arXiv:2511.14566v2 Announce Type: replace-cross Abstract: Document-level claim extraction remains an open challenge in the field of fact-checking, and subsequently, methods for evaluating extracted claims have received limited attention. In this work, we explore approaches to aligning two sets of claims pertaining to the same source document and computing their similarity through an alignment score. We […]

Can LLMs Detect Their Confabulations? Estimating Reliability in Uncertainty-Aware Language Models

arXiv:2508.08139v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are prone to generating fluent but incorrect content, known as confabulation, which poses increasing risks in multi-turn or agentic applications where outputs may be reused as context. In this work, we investigate how in-context information influences model behavior and whether LLMs can identify their unreliable responses. […]

Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation

arXiv:2511.17844v2 Announce Type: replace-cross Abstract: Fine-tuning large-scale text-to-video diffusion models to add new generative controls, such as those over physical camera parameters (e.g., shutter speed or aperture), typically requires vast, high-fidelity datasets that are difficult to acquire. In this work, we propose a data-efficient fine-tuning strategy that learns these controls from sparse, low-quality synthetic data. […]

An M-Health Algorithmic Approach to Identify and Assess Physiotherapy Exercises in Real Time

arXiv:2512.10437v1 Announce Type: cross Abstract: This work presents an efficient algorithmic framework for real-time identification, classification, and evaluation of human physiotherapy exercises using mobile devices. The proposed method interprets a kinetic movement as a sequence of static poses, which are estimated from camera input using a pose-estimation neural network. Extracted body keypoints are transformed into […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844