arXiv:2604.05117v1 Announce Type: cross Abstract: It is critical for vision-language models (VLMs) to comprehensively understand visual, temporal, and textual cues. However, despite rapid progress in multimodal modeling, video understanding performance still lags behind text-based reasoning. In this work, we find that progress is even worse than previously assumed: commonly reported long video understanding benchmarks contain […]
From Use to Oversight: How Mental Models Influence User Behavior and Output in AI Writing Assistants
arXiv:2604.05166v1 Announce Type: cross Abstract: AI-based writing assistants are ubiquitous, yet little is known about how users’ mental models shape their use. We examine two types of mental models — functional or related to what the system does, and structural or related to how the system works — and how they affect control behavior — […]
XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts
arXiv:2604.05242v1 Announce Type: cross Abstract: Multi-bit watermarking has emerged as a promising solution for embedding imperceptible binary messages into Large Language Model (LLM)-generated text, enabling reliable attribution and tracing of malicious usage of LLMs. Despite recent progress, existing methods still face key limitations: some become computationally infeasible for large messages, while others suffer from a […]
DQA: Diagnostic Question Answering for IT Support
arXiv:2604.05350v1 Announce Type: cross Abstract: Enterprise IT support interactions are fundamentally diagnostic: effective resolution requires iterative evidence gathering from ambiguous user reports to identify an underlying root cause. While retrieval-augmented generation (RAG) provides grounding through historical cases, standard multi-turn RAG systems lack explicit diagnostic state and therefore struggle to accumulate evidence and resolve competing hypotheses […]
Your LLM Agent Can Leak Your Data: Data Exfiltration via Backdoored Tool Use
arXiv:2604.05432v1 Announce Type: cross Abstract: Tool-use large language model (LLM) agents are increasingly deployed to support sensitive workflows, relying on tool calls for retrieval, external API access, and session memory management. While prior research has examined various threats, the risk of systematic data exfiltration by backdoored agents remains underexplored. In this work, we present Back-Reveal, […]
Learned Elevation Models as a Lightweight Alternative to LiDAR for Radio Environment Map Estimation
arXiv:2604.05520v1 Announce Type: cross Abstract: Next-generation wireless systems such as 6G operate at higher frequency bands, making signal propagation highly sensitive to environmental factors such as buildings and vege- tation. Accurate Radio Environment Map (REM) estimation is therefore increasingly important for effective network planning and operation. Existing methods, from ray-tracing simulators to deep learning generative […]
Analogical Reasoning as a Doctor: A Foundation Model for Gastrointestinal Endoscopy Diagnosis
arXiv:2604.05649v1 Announce Type: cross Abstract: Gastrointestinal diseases impose a growing global health burden, and endoscopy is a primary tool for early diagnosis. However, routine endoscopic image interpretation still suffers from missed lesions and limited efficiency. Although AI-assisted diagnosis has shown promise, existing models often lack generalizability, adaptability, robustness, and scalability because of limited medical data, […]
The Illusion of Latent Generalization: Bi-directionality and the Reversal Curse
arXiv:2604.04943v1 Announce Type: cross Abstract: The reversal curse describes a failure of autoregressive language models to retrieve a fact in reverse order (e.g., training on “$A > B$” but failing on “$B < A$”). Recent work shows that objectives with bidirectional supervision (e.g., bidirectional attention or masking-based reconstruction for decoder-only models) can mitigate the reversal […]
A mathematical theory of evolution for self-designing AIs
arXiv:2604.05142v1 Announce Type: new Abstract: As artificial intelligence systems (AIs) become increasingly produced by recursive self-improvement, a form of evolution may emerge, in which the traits of AI systems are shaped by the success of earlier AIs in designing and propagating their descendants. There is a rich mathematical theory modeling how behavioral traits are shaped […]
Learning to Retrieve from Agent Trajectories
arXiv:2604.04949v1 Announce Type: cross Abstract: Information retrieval (IR) systems have traditionally been designed and trained for human users, with learning-to-rank methods relying heavily on large-scale human interaction logs such as clicks and dwell time. With the rapid emergence of large language model (LLM) powered search agents, however, retrieval is increasingly consumed by agents rather than […]
Modeling Patient Care Trajectories with Transformer Hawkes Processes
arXiv:2604.05844v1 Announce Type: cross Abstract: Patient healthcare utilization consists of irregularly time-stamped events, such as outpatient visits, inpatient admissions, and emergency encounters, forming individualized care trajectories. Modeling these trajectories is crucial for understanding utilization patterns and predicting future care needs, but is challenging due to temporal irregularity and severe class imbalance. In this work, we […]
MG$^2$-RAG: Multi-Granularity Graph for Multimodal Retrieval-Augmented Generation
arXiv:2604.04969v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) mitigates hallucinations in Multimodal Large Language Models (MLLMs), yet existing systems struggle with complex cross-modal reasoning. Flat vector retrieval often ignores structural dependencies, while current graph-based methods rely on costly “translation-to-text” pipelines that discard fine-grained visual information. To address these limitations, we propose textbfMG$^2$-RAG, a lightweight textbfMulti-textbfGranularity […]