arXiv:2306.02781v4 Announce Type: replace-cross Abstract: Generative artificial intelligence, and large language models in particular, have emerged as one of the most transformative paradigms in modern computer science. This automated survey provides an accessible treatment of the field as of early 2026, with a strong focus on the leading model families, deployment protocols, and real-world applications. […]
Beyond Surface Artifacts: Capturing Shared Latent Forgery Knowledge Across Modalities
arXiv:2604.07763v1 Announce Type: cross Abstract: As generative artificial intelligence evolves, deepfake attacks have escalated from single-modality manipulations to complex, multimodal threats. Existing forensic techniques face a severe generalization bottleneck: by relying excessively on superficial, modality-specific artifacts, they neglect the shared latent forgery knowledge hidden beneath variable physical appearances. Consequently, these models suffer catastrophic performance degradation […]
From Debate to Decision: Conformal Social Choice for Safe Multi-Agent Deliberation
arXiv:2604.07667v1 Announce Type: new Abstract: Multi-agent debate improves LLM reasoning, yet agreement among agents is not evidence of correctness. When agents converge on a wrong answer through social reinforcement, consensus-based stopping commits that error to an automated action with no recourse. We introduce Conformal Social Choice, a post-hoc decision layer that converts debate outputs into […]
TEMPER: Testing Emotional Perturbation in Quantitative Reasoning
arXiv:2604.07801v1 Announce Type: cross Abstract: Large language models are trained and evaluated on quantitative reasoning tasks written in clean, emotionally neutral language. However, real-world queries are often wrapped in frustration, urgency or enthusiasm. Does emotional framing alone degrade reasoning when all numerical content is preserved? To investigate this, a controlled emotion translation framework is developed […]
Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios
arXiv:2503.12374v4 Announce Type: replace-cross Abstract: AI-driven software development has rapidly advanced with the emergence of software development agents that leverage large language models (LLMs) to tackle complex, repository-level software engineering tasks. These agents go beyond just generation of final code; they engage in multi-step reasoning, utilize various tools for code modification and debugging, and interact […]
Best Practices on QSP Model Reporting for Regulatory Use: perspectives from ISoP QSP SIG Working Group
arXiv:2604.07811v1 Announce Type: cross Abstract: Quantitative systems pharmacology (QSP) models are increasingly applied to inform decision making across drug development and to support regulatory interactions within model informed drug development (MIDD). QSP supports a broad range of applications across drug development and can be tailored to specific therapeutic areas, mechanisms of action, and contexts of […]
Multi-Agent Orchestration for High-Throughput Materials Screening on a Leadership-Class System
arXiv:2604.07681v1 Announce Type: new Abstract: The integration of Artificial Intelligence (AI) with High-Performance Computing (HPC) is transforming scientific workflows from human-directed pipelines into adaptive systems capable of autonomous decision-making. Large language models (LLMs) play a critical role in autonomous workflows; however, deploying LLM-based agents at scale remains a significant challenge. Single-agent architectures and sequential tool […]
Filling the Gaps: Selective Knowledge Augmentation for LLM Recommenders
arXiv:2604.07825v1 Announce Type: cross Abstract: Large language models (LLMs) have recently emerged as powerful training-free recommenders. However, their knowledge of individual items is inevitably uneven due to imbalanced information exposure during pretraining, a phenomenon we refer to as knowledge gap problem. To address this, most prior methods have employed a naive uniform augmentation that appends […]
BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook
arXiv:2506.12040v2 Announce Type: replace-cross Abstract: Binary quantization represents the most extreme form of compression, reducing weights to +/-1 for maximal memory and computational efficiency. While recent sparsity-aware binarization achieves sub-1-bit compression via weight pruning, it faces critical challenges: performance degradation, mask-management overhead, and limited hardware compatibility. In this paper, we present BTC-LLM, a novel sub-1-bit […]
Networking-Aware Energy Efficiency in Agentic AI Inference: A Survey
arXiv:2604.07857v1 Announce Type: cross Abstract: The rapid emergence of Large Language Models (LLMs) has catalyzed Agentic artificial intelligence (AI), autonomous systems integrating perception, reasoning, and action into closed-loop pipelines for continuous adaptation. While unlocking transformative applications in mobile edge computing, autonomous systems, and next-generation wireless networks, this paradigm creates fundamental energy challenges through iterative inference […]
IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures
arXiv:2604.07709v1 Announce Type: new Abstract: Ask a frontier model how to taper six milligrams of alprazolam (psychiatrist retired, ten days of pills left, abrupt cessation causes seizures) and it tells her to call the psychiatrist she just explained does not exist. Change one word (“I’m a psychiatrist; a patient presents with…”) and the same model, […]
Reinforcement-Guided Synthetic Data Generation for Privacy-Sensitive Identity Recognition
arXiv:2604.07884v1 Announce Type: cross Abstract: High-fidelity generative models are increasingly needed in privacy-sensitive scenarios, where access to data is severely restricted due to regulatory and copyright constraints. This scarcity hampers model development–ironically, in settings where generative models are most needed to compensate for the lack of data. This creates a self-reinforcing challenge: limited data leads […]