Reflection of Episodes: Learning to Play Game from Expert and Self Experiences

arXiv:2502.13388v3 Announce Type: replace Abstract: StarCraft II is a complex and dynamic real-time strategy (RTS) game environment, which is very suitable for artificial intelligence and reinforcement learning research. To address the problem of Large Language Model(LLM) learning in complex environments through self-reflection, we propose a Reflection of Episodes(ROE) framework based on expert experience and self-experience. […]

Discovering Failure Modes in Vision-Language Models using RL

arXiv:2604.04733v1 Announce Type: cross Abstract: Vision-language Models (VLMs), despite achieving strong performance on multimodal benchmarks, often misinterpret straightforward visual concepts that humans identify effortlessly, such as counting, spatial reasoning, and viewpoint understanding. Previous studies manually identified these weaknesses and found that they often stem from deficits in specific skills. However, such manual efforts are costly, […]

Discrete Prototypical Memories for Federated Time Series Foundation Models

arXiv:2604.04475v1 Announce Type: cross Abstract: Leveraging Large Language Models (LLMs) as federated learning (FL)-based time series foundation models offers a promising way to transfer the generalization capabilities of LLMs to time series data while preserving access to private data. However, the semantic misalignment between time-series data and the text-centric latent space of existing LLMs often […]

Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

arXiv:2603.28921v2 Announce Type: replace-cross Abstract: Standard neural network training uses constant momentum (typically 0.9), a convention dating to 1964 with limited theoretical justification for its optimality. We derive a time-varying momentum schedule from the critically damped harmonic oscillator: mu(t) = 1 – 2*sqrt(alpha(t)), where alpha(t) is the current learning rate. This beta-schedule requires zero free […]

GROUNDEDKG-RAG: Grounded Knowledge Graph Index for Long-document Question Answering

arXiv:2604.04359v1 Announce Type: cross Abstract: Retrieval-augmented generation (RAG) systems have been widely adopted in contemporary large language models (LLMs) due to their ability to improve generation quality while reducing the required input context length. In this work, we focus on RAG systems for long-document question answering. Current approaches suffer from a heavy reliance on LLM […]

AI Agents Under EU Law

arXiv:2604.04604v1 Announce Type: cross Abstract: AI agents – i.e. AI systems that autonomously plan, invoke external tools, and execute multi-step action chains with reduced human involvement – are being deployed at scale across enterprise functions ranging from customer service and recruitment to clinical decision support and critical infrastructure management. The EU AI Act (Regulation 2024/1689) […]

Circuit Mechanisms for Spatial Relation Generation in Diffusion Transformers

arXiv:2601.06338v2 Announce Type: replace Abstract: Diffusion Transformers (DiTs) have greatly advanced text-to-image generation, but models still struggle to generate the correct spatial relations between objects as specified in the text prompt. In this study, we adopt a mechanistic interpretability approach to investigate how a DiT can generate correct spatial relations between objects. We train, from […]

Talk to Right Specialists: Iterative Routing in Multi-agent Systems for Question Answering

arXiv:2501.07813v2 Announce Type: replace-cross Abstract: Retrieval-augmented generation (RAG) agents are increasingly deployed to answer questions over local knowledge bases that cannot be centralized due to knowledge-sovereignty constraints. This results in two recurring failures in production: users do not know which agent to consult, and complex questions require evidence distributed across multiple agents. To overcome these […]

Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks

arXiv:2509.24186v2 Announce Type: replace-cross Abstract: Accuracy-based evaluation of Large Language Models (LLMs) measures benchmark-specific performance rather than underlying medical competency: it treats all questions as equally informative, conflates model ability with item characteristics, and thereby produces rankings that vary with benchmark choice. To address this, we introduce MedIRT, a psychometric evaluation framework grounded in Item […]

Fairness in Healthcare Processes: A Quantitative Analysis of Decision Making in Triage

arXiv:2601.11065v3 Announce Type: replace-cross Abstract: Fairness in automated decision-making has become a critical concern, particularly in high-pressure healthcare scenarios such as emergency triage, where fast and equitable decisions are essential. Process mining is increasingly investigating fairness. There is a growing area focusing on fairness-aware algorithms. So far, we know less how these concepts perform on […]

CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement Learning

arXiv:2603.21743v3 Announce Type: replace-cross Abstract: Building virtual cells with generative models to simulate cellular behavior in silico is emerging as a promising paradigm for accelerating drug discovery. However, prior image-based generative approaches can produce implausible cell images that violate basic physical and biological constraints. To address this, we propose to post-train virtual cell models with […]

Poisoned Identifiers Survive LLM Deobfuscation: A Case Study on Claude Opus 4.6

arXiv:2604.04289v1 Announce Type: cross Abstract: When an LLM deobfuscates JavaScript, can poisoned identifier names in the string table survive into the model’s reconstructed code, even when the model demonstrably understands the correct semantics? Using Claude Opus 4.6 across 192 inference runs on two code archetypes (force-directed graph simulation, A* pathfinding; 50 conditions, N=3-6), we found […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844