arXiv:2405.13729v3 Announce Type: replace-cross Abstract: In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, additional attributes are combined to associate with data samples. We show that the space spanned by the combination of dimensions and […]
Is Human-Like Text Liked by Humans? Multilingual Human Detection and Preference Against AI
arXiv:2502.11614v3 Announce Type: replace-cross Abstract: Prior studies have shown that distinguishing text generated by Large Language Models (LLMs) from human-written one is highly challenging for humans, and often no better than random guessing. To verify the generalizability of this finding across languages and domains, we perform an extensive case study to identify the upper bound […]
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?
arXiv:2505.10924v4 Announce Type: replace-cross Abstract: Recently, AI-driven interactions with computing devices have advanced from basic prototype tools to sophisticated, LLM-based systems that emulate human-like operations in graphical user interfaces. We are now witnessing the emergence of emphComputer-Using Agents (CUAs), capable of autonomously performing tasks such as navigating desktop applications, web pages, and mobile apps. However, […]
Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation
arXiv:2505.21190v2 Announce Type: replace-cross Abstract: Radiology reports convey detailed clinical observations and capture diagnostic reasoning that evolves over time. However, existing evaluation methods are limited to single-report settings and rely on coarse metrics that fail to capture fine-grained clinical semantics and temporal dependencies. We introduce LUNGUAGE, a benchmark dataset for structured radiology report generation that […]
Treatment, evidence, imitation, and chat
arXiv:2506.23040v5 Announce Type: replace-cross Abstract: Large language models are thought to have the potential to aid in medical decision making. This work investigates the degree to which this might be the case. We start with the treatment problem, the patient’s core medical decision-making task, which is solved in collaboration with a clinician. We discuss different […]
Vertex Features for Neural Global Illumination
arXiv:2508.07852v2 Announce Type: replace-cross Abstract: Recent research on learnable neural representations has been widely adopted in the field of 3D scene reconstruction and neural rendering applications. However, traditional feature grid representations often suffer from substantial memory footprint, posing a significant bottleneck for modern parallel computing hardware. In this paper, we present neural vertex features, a […]
Vibe Check: Understanding the Effects of LLM-Based Conversational Agents’ Personality and Alignment on User Perceptions in Goal-Oriented Tasks
arXiv:2509.09870v2 Announce Type: replace-cross Abstract: Large language models (LLMs) enable conversational agents (CAs) to express distinctive personalities, raising new questions about how such designs shape user perceptions. This study investigates how personality expression levels and user-agent personality alignment influence perceptions in goal-oriented tasks. In a between-subjects experiment (N=150), participants completed travel planning with CAs exhibiting […]
Emergent Coordination in Multi-Agent Language Models
arXiv:2510.05174v4 Announce Type: replace-cross Abstract: When are multi-agent LLM systems merely a collection of individual agents versus an integrated collective with higher-order structure? We introduce an information-theoretic framework to test — in a purely data-driven way — whether multi-agent systems show signs of higher-order structure. This information decomposition lets us measure whether dynamical emergence is […]
EvoDev: An Iterative Feature-Driven Framework for End-to-End Software Development with LLM-based Agents
arXiv:2511.02399v2 Announce Type: replace-cross Abstract: Recent advances in large language model agents offer the promise of automating end-to-end software development from natural language requirements. However, existing approaches largely adopt linear, waterfall-style pipelines, which oversimplify the iterative nature of real-world development and struggle with complex, large-scale projects. To address these limitations, we propose EvoDev, an iterative […]
Consist-Retinex: One-Step Noise-Emphasized Consistency Training Accelerates High-Quality Retinex Enhancement
arXiv:2512.08982v2 Announce Type: replace-cross Abstract: Retinex-based low-light image enhancement benefits from separating reflectance and illumination, yet recent generative approaches often rely on iterative sampling and are difficult to deploy under strict latency budgets. Consistency models offer a natural route to one-step restoration, but direct adaptation to Retinex-factorized enhancement is unstable: one-step inference is evaluated at […]
Safety Is Not Universal: The Selective Safety Trap in LLM Alignment
arXiv:2601.04389v2 Announce Type: replace-cross Abstract: Current safety evaluations of large language models (LLMs) create a dangerous illusion of universal protection by aggregating harms under generic categories such as “Identity Hate”, obscuring vulnerabilities toward specific populations. In this work, we expose the Selective Safety Trap: a systemic failure mode where models robustly defend specific populations while […]
HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing
arXiv:2601.21459v4 Announce Type: replace-cross Abstract: LLM role-playing, i.e., using LLMs to simulate specific personas, has emerged as a key capability in various applications, such as companionship, content creation and digital games. While current models effectively capture character tones and knowledge, simulating the inner thoughts behind their behaviors remains a challenge. Towards cognitive simulation in LLM […]