arXiv:2603.04459v3 Announce Type: replace-cross Abstract: The rapid expansion of research in LLM safety presents challenges in tracking advancements, making benchmarks important evaluation infrastructures for identifying key trends and facilitating systematic comparisons. Yet no systematic assessment exists of their code quality and runnability, nor of what factors are associated with the community’s adoption of certain benchmarks […]
Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation
arXiv:2605.11651v3 Announce Type: replace-cross Abstract: Recent think-answer approaches in VLMs, such as Qwen3-VL-Thinking, boost reasoning performance by leveraging intermediate thinking steps before the final answer, but their computational cost becomes substantial, especially for larger VLMs. To distill such capabilities into compact think-answer VLMs, a primary objective is to improve the student’s ability to utilize visual […]
A Unified Generative-AI Framework for Smart Energy Infrastructure: Intelligent Gas Distribution, Utility Billing, Carbon Analytics, and Quantum-Inspired Optimisation
arXiv:2605.16232v1 Announce Type: cross Abstract: The accelerating convergence of smart metering, generative artificial intelligence, and quantum-inspired combinatorial optimisation is reshaping how energy utilities manage physical infrastructure, customer engagement, and environmental accountability
Traj-CoA: Patient Trajectory Modeling via Chain-of-Agents for Lung Cancer Risk Prediction
arXiv:2510.10454v2 Announce Type: replace Abstract: Large language models (LLMs) offer a generalizable approach for modeling patient trajectories, but suffer from the long and noisy nature of electronic health records (EHR) data in temporal reasoning. To address these challenges, we introduce Traj-CoA, a multi-agent system involving chain-of-agents for patient trajectory modeling. Traj-CoA employs a chain of […]
Rethinking Agentic Reinforcement Learning In Large Language Models
arXiv:2604.27859v3 Announce Type: replace Abstract: Reinforcement Learning (RL) has traditionally focused on training specialized agents to optimize predefined reward functions within narrowly defined environments. However, the advent of powerful Large Language Models (LLMs) and increasingly complex, open-ended tasks has catalyzed a paradigm shift towards agentic paradigms within RL. This emerging framework extends beyond traditional RL […]
Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
arXiv:2605.14892v2 Announce Type: replace Abstract: LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use, yet remain limited when tasks require sustained coordination across roles, tools, and environments. Multi-agent systems address this through structured collaboration among specialized agents, but tighter coordination also amplifies a less explored risk: errors can propagate across agents […]
FAR: Function-preserving Attention Replacement for IMC-friendly Inference
arXiv:2505.21535v4 Announce Type: replace-cross Abstract: While transformers dominate modern vision and language models, their attention mechanism remains poorly suited for in-memory computing (IMC) devices due to intensive activation-to-activation multiplications and non-local memory access, leading to substantial latency and bandwidth overhead on ReRAM-based accelerators. To address this mismatch, we propose FAR, a Function-preserving Attention Replacement framework […]
ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking
arXiv:2510.13842v2 Announce Type: replace-cross Abstract: Knowledge poisoning poses a critical threat to Retrieval-Augmented Generation (RAG) systems by injecting adversarial content into knowledge bases, tricking Large Language Models (LLMs) into producing attacker-controlled outputs grounded in manipulated context. Prior work highlights LLMs’ susceptibility to misleading or malicious retrieved content. However, real-world fact-checking scenarios are more challenging, as […]
EMFusion: An Uncertainty-Aware Conditional Diffusion Framework for Frequency-Selective EMF Forecasting in Wireless Networks
arXiv:2512.15067v3 Announce Type: replace-cross Abstract: The rapid growth in wireless infrastructure has increased the need to accurately estimate and forecast electromagnetic field (EMF) levels to ensure ongoing compliance, assess potential health impacts, and support efficient network planning. While existing studies rely on univariate forecasting of wideband aggregate EMF data, frequency-selective multivariate forecasting is needed to […]
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
arXiv:2507.16806v2 Announce Type: replace-cross Abstract: When language models (LMs) are trained via reinforcement learning (RL) to generate natural language “reasoning chains”, their performance improves on a variety of difficult question answering tasks. Today, almost all successful applications of RL for reasoning use binary reward functions that evaluate the correctness of LM outputs. Because such reward […]
Sufficient Explanations in Databases and their Connections to Database Repairs
arXiv:2511.15623v2 Announce Type: replace-cross Abstract: We investigate the notion of sufficient explanation, and a sufficiency-degree as attribution score for database tuples in relation to query answering. We also investigate and exploit connections with database repairs as used for dealing with inconsistent databases; and with causality-based necessary explanations, obtaining new computational results. We show how to […]
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training
arXiv:2602.00747v2 Announce Type: replace-cross Abstract: Determining an effective data mixture is a key factor in Large Language Model (LLM) pre-training, where models must balance general competence with proficiency on hard tasks such as math and code. However, identifying an optimal mixture remains an open challenge, as existing approaches either rely on unreliable tiny-scale proxy experiments […]