Stochastic ordering tools for continuous-time Markov chains and applications to reaction network models

arXiv:2604.00756v1 Announce Type: cross Abstract: Stochastic reaction networks are mathematical models with a wide range of applications in biochemistry, ecology, and epidemiology, and are often complex to analyze. Except for some special cases, it is generally difficult to predict how the abundances of all considered species evolve over time. A possible approach to address this […]

CarbonEdge: Carbon-Aware Deep Learning Inference Framework for Sustainable Edge Computing

arXiv:2603.27420v2 Announce Type: replace-cross Abstract: Deep learning applications at the network edge lead to a significant growth in AI-related carbon emissions, presenting a critical sustainability challenge. The existing edge computing frameworks optimize for latency and throughput, but they largely ignore the environmental impact of inference workloads. This paper introduces CarbonEdge, a carbon-aware deep learning inference […]

VibeGuard: A Security Gate Framework for AI-Generated Code

arXiv:2604.01052v1 Announce Type: cross Abstract: “Vibe coding,” in which developers delegate code generation to AI assistants and accept the output with little manual review, has gained rapid adoption in production settings. On March 31, 2026, Anthropic’s Claude Code CLI shipped a 59.8 MB source map file in its npm package, exposing roughly 512,000 lines of […]

Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents

arXiv:2509.25302v2 Announce Type: replace Abstract: The prevalent deployment of Large Language Model agents such as OpenClaw unlocks potential in real-world applications, while amplifying safety concerns. Among these concerns, the self-replication risk of LLM agents driven by objective misalignment (just like Agent Smith in the movie The Matrix) has transitioned from a theoretical warning to a […]

Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification

arXiv:2603.26648v2 Announce Type: replace-cross Abstract: Recent advances in large language models have improved the capabilities of coding agents, yet systematic evaluation of complex, end-to-end website development remains limited. To address this gap, we introduce Vision2Web, a hierarchical benchmark for visual website development, spanning from static UI-to-code generation, interactive multi-page frontend reproduction, to long-horizon full-stack website […]

CHIMERA-Bench: A Benchmark Dataset for Epitope-Specific Antibody Design

arXiv:2603.13431v2 Announce Type: replace-cross Abstract: Computational antibody design has seen rapid methodological progress, with dozens of deep generative methods proposed in the past three years, yet the field lacks a standardized benchmark for fair comparison and model development. These methods are evaluated on different SAbDab snapshots, non-overlapping test sets, and incompatible metrics, and the literature […]

Closing the Confidence-Faithfulness Gap in Large Language Models

arXiv:2603.25052v2 Announce Type: replace-cross Abstract: Large language models (LLMs) tend to verbalize confidence scores that are largely detached from their actual accuracy, yet the geometric relationship governing this behavior remain poorly understood. In this work, we present a mechanistic interpretability analysis of verbalized confidence, using linear probes and contrastive activation addition (CAA) steering to show […]

A CEFR-Inspired Classification Framework with Fuzzy C-Means To Automate Assessment of Programming Skills in Scratch

arXiv:2604.00730v1 Announce Type: cross Abstract: Context: Schools, training platforms, and technology firms increasingly need to assess programming proficiency at scale with transparent, reproducible methods that support personalized learning pathways. Objective: This study introduces a pedagogical framework for Scratch project assessment, aligned with the Common European Framework of Reference (CEFR), providing universal competency levels for students […]

Bridging the Simulation-to-Experiment Gap with Generative Models using Adversarial Distribution Alignment

arXiv:2604.01169v1 Announce Type: cross Abstract: A fundamental challenge in science and engineering is the simulation-to-experiment gap. While we often possess prior knowledge of physical laws, these physical laws can be too difficult to solve exactly for complex systems. Such systems are commonly modeled using simulators, which impose computational approximations. Meanwhile, experimental measurements more faithfully represent […]

GRASP: Gradient Realignment via Active Shared Perception for Multi-Agent Collaborative Optimization

arXiv:2604.00717v1 Announce Type: cross Abstract: Non-stationarity arises from concurrent policy updates and leads to persistent environmental fluctuations. Existing approaches like Centralized Training with Decentralized Execution (CTDE) and sequential update schemes mitigate this issue. However, since the perception of the policies of other agents remains dependent on sampling environmental interaction data, the agent essentially operates in […]

Mitigating Content Effects on Reasoning in Language Models through Fine-Grained Activation Steering

arXiv:2505.12189v3 Announce Type: replace Abstract: Large language models (LLMs) exhibit reasoning biases, often conflating content plausibility with formal logical validity. This can lead to wrong inferences in critical domains, where plausible arguments are incorrectly deemed logically valid or vice versa. This paper investigates how content biases on reasoning can be mitigated through activation steering, an […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844