Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks

arXiv:2603.04459v3 Announce Type: replace-cross Abstract: The rapid expansion of research in LLM safety presents challenges in tracking advancements, making benchmarks important evaluation infrastructures for identifying key trends and facilitating systematic comparisons. Yet no systematic assessment exists of their code quality and runnability, nor of what factors are associated with the community’s adoption of certain benchmarks […]

Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation

arXiv:2605.11651v3 Announce Type: replace-cross Abstract: Recent think-answer approaches in VLMs, such as Qwen3-VL-Thinking, boost reasoning performance by leveraging intermediate thinking steps before the final answer, but their computational cost becomes substantial, especially for larger VLMs. To distill such capabilities into compact think-answer VLMs, a primary objective is to improve the student’s ability to utilize visual […]

Traj-CoA: Patient Trajectory Modeling via Chain-of-Agents for Lung Cancer Risk Prediction

arXiv:2510.10454v2 Announce Type: replace Abstract: Large language models (LLMs) offer a generalizable approach for modeling patient trajectories, but suffer from the long and noisy nature of electronic health records (EHR) data in temporal reasoning. To address these challenges, we introduce Traj-CoA, a multi-agent system involving chain-of-agents for patient trajectory modeling. Traj-CoA employs a chain of […]

Rethinking Agentic Reinforcement Learning In Large Language Models

arXiv:2604.27859v3 Announce Type: replace Abstract: Reinforcement Learning (RL) has traditionally focused on training specialized agents to optimize predefined reward functions within narrowly defined environments. However, the advent of powerful Large Language Models (LLMs) and increasingly complex, open-ended tasks has catalyzed a paradigm shift towards agentic paradigms within RL. This emerging framework extends beyond traditional RL […]

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

arXiv:2605.14892v2 Announce Type: replace Abstract: LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use, yet remain limited when tasks require sustained coordination across roles, tools, and environments. Multi-agent systems address this through structured collaboration among specialized agents, but tighter coordination also amplifies a less explored risk: errors can propagate across agents […]

FAR: Function-preserving Attention Replacement for IMC-friendly Inference

arXiv:2505.21535v4 Announce Type: replace-cross Abstract: While transformers dominate modern vision and language models, their attention mechanism remains poorly suited for in-memory computing (IMC) devices due to intensive activation-to-activation multiplications and non-local memory access, leading to substantial latency and bandwidth overhead on ReRAM-based accelerators. To address this mismatch, we propose FAR, a Function-preserving Attention Replacement framework […]

ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking

arXiv:2510.13842v2 Announce Type: replace-cross Abstract: Knowledge poisoning poses a critical threat to Retrieval-Augmented Generation (RAG) systems by injecting adversarial content into knowledge bases, tricking Large Language Models (LLMs) into producing attacker-controlled outputs grounded in manipulated context. Prior work highlights LLMs’ susceptibility to misleading or malicious retrieved content. However, real-world fact-checking scenarios are more challenging, as […]

EMFusion: An Uncertainty-Aware Conditional Diffusion Framework for Frequency-Selective EMF Forecasting in Wireless Networks

arXiv:2512.15067v3 Announce Type: replace-cross Abstract: The rapid growth in wireless infrastructure has increased the need to accurately estimate and forecast electromagnetic field (EMF) levels to ensure ongoing compliance, assess potential health impacts, and support efficient network planning. While existing studies rely on univariate forecasting of wideband aggregate EMF data, frequency-selective multivariate forecasting is needed to […]

Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty

arXiv:2507.16806v2 Announce Type: replace-cross Abstract: When language models (LMs) are trained via reinforcement learning (RL) to generate natural language “reasoning chains”, their performance improves on a variety of difficult question answering tasks. Today, almost all successful applications of RL for reasoning use binary reward functions that evaluate the correctness of LM outputs. Because such reward […]

Sufficient Explanations in Databases and their Connections to Database Repairs

arXiv:2511.15623v2 Announce Type: replace-cross Abstract: We investigate the notion of sufficient explanation, and a sufficiency-degree as attribution score for database tuples in relation to query answering. We also investigate and exploit connections with database repairs as used for dealing with inconsistent databases; and with causality-based necessary explanations, obtaining new computational results. We show how to […]

Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training

arXiv:2602.00747v2 Announce Type: replace-cross Abstract: Determining an effective data mixture is a key factor in Large Language Model (LLM) pre-training, where models must balance general competence with proficiency on hard tasks such as math and code. However, identifying an optimal mixture remains an open challenge, as existing approaches either rely on unreliable tiny-scale proxy experiments […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844