Structure-Adaptive Conformal Inference for Large-Scale Out-of-Distribution Testing

arXiv:2605.26429v1 Announce Type: cross Abstract: This paper addresses structured out-of-distribution (OOD) testing in high-stakes machine learning applications. Traditional conformal methods rely on joint exchangeability, making it difficult to incorporate auxiliary information such as spatiotemporal or grouping structures. To overcome this limitation, we propose the structure-adaptive conformal q-value (SCQ), a significance index that integrates individual test […]

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

arXiv:2605.26457v1 Announce Type: cross Abstract: AI coding agents are increasingly used to write real-world software, but ensuring that their outputs are correct remains a fundamental challenge. Formal verification offers a promising path: an agent generates code together with a machine-checked proof, guaranteeing that the code satisfies a formal specification. However, there is no guarantee that […]

Unveiling the Fragility of Vision-Language Models: Multi-Modal Adversarial Synergy via Texture-Constrained Perturbations and Cross-Modal Optimization

arXiv:2605.26501v1 Announce Type: cross Abstract: Large Vision-Language Models (LVLMs) have transformed multi-modal understanding, excelling in tasks like image captioning and visual question answering by integrating visual and textual inputs. However, their robustness against adversarial attacks, particularly those exploiting both modalities, remains underexplored, posing risks to critical applications like autonomous driving and content moderation. Existing attacks […]

Recursive Flow Matching

arXiv:2605.26535v1 Announce Type: cross Abstract: Generative models have emerged as a powerful paradigm for solving physics systems and modeling complex spatiotemporal dynamics. However, achieving high physical accuracy without incurring high computational cost remains a fundamental challenge, as existing approaches face a critical speed-fidelity trade-off. In this work, we introduce Recursive Flow Matching (RecFM), a generative […]

On the Error-Correcting Effects of Stochasticity in Discrete Diffusion

arXiv:2605.26582v1 Announce Type: cross Abstract: Discrete diffusion models achieve strong performance in text and image generation, but their inference remains slow and must inherently balance sampling efficiency and sample quality. In this work, we present a systematic study of how the emphdegree of stochasticity in Markov transitions governs the sampling tradeoff. We show that highly […]

More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations

arXiv:2605.26647v1 Announce Type: cross Abstract: Feedforward network (FFN) layers account for a large fraction of parameters and nonlinear expressivity in Transformer-based large language models (LLMs). Despite the evolution from ReLU and GELU to gated variants such as SwiGLU, most FFN designs still use a single fixed activation function, applying the same nonlinear transformation to all […]

MatFormBench: A Benchmarking Evaluation Framework for Target-Driven Materials Formulation

arXiv:2605.26741v1 Announce Type: cross Abstract: Inverse design of materials has significantly advanced target-driven formulation optimization, yet existing materials machine learning benchmarks remain limited to forward property prediction, failing to systematically evaluate inverse optimization and generation algorithms, a critical gap that hinders the progress of target-driven materials design. To address this limitation, we propose MatFormBench, a […]

Implementation of Big Data Analytics for Diabetes Management: Needs Assessment in the Rwanda Healthcare System

arXiv:2605.26786v1 Announce Type: cross Abstract: Diabetes is a chronic metabolic disease that can lead to serious health problems if not diagnosed and managed early. Big Data Analytics (BDA) and machine learning offer practical tools for analyzing large health datasets and supporting early detection and better treatment decisions. However, their use in routine clinical practice is […]

Persistent AI Agents in Academic Research: A Single-Investigator Implementation Case Study

arXiv:2605.26870v1 Announce Type: cross Abstract: Background: Large language models are typically evaluated as models, benchmarks, or short conversational episodes. Less is known about what happens when an agent is embedded persistently in a real academic research environment with durable memory, local files, external tools, scheduled routines, delegated roles, and explicit safety protocols. Methods: A structured […]

EEG-FM-Audit: A Systematic Evaluation and Analysis Pipeline for EEG Foundation Models

arXiv:2605.26910v1 Announce Type: cross Abstract: Large EEG Foundation Models (FMs) have shown great potential for decoding EEG signals across diverse cognitive tasks. However, existing EEG-FM studies exhibit three critical limitations: opaque supervised baseline tuning, unverified contributions of complex learning paradigms, and a lack of transparency in model decision-making. To address these, we propose EEG-FM-Audit, a […]

Probabilistic Recurrent Intention Switching Model

arXiv:2605.26998v1 Announce Type: cross Abstract: Inverse reinforcement learning (IRL) recovers reward functions from observed behavior, yet traditional methods assume a single stationary reward that cannot capture goal switching within an episode. Recent multi-intention IRL methods address this by segmenting trajectories, but model intention transitions as either a memoryless Markov chain or via manual state augmentation […]

Anchor: Mitigating Artifact Drift in Agent Benchmark Generation

arXiv:2605.26321v1 Announce Type: new Abstract: AI agents are beginning to complete valuable, long-horizon business operations tasks, but training and evaluation environments for enterprise work still struggle to balance realism, verifiability, and scale. Environment and task creation frequently suffers from a failure mode we call artifact drift: when instructions, environments, oracles, and verifiers are created by […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844