arXiv:2409.03597v4 Announce Type: replace-cross Abstract: This paper presents the Multimodal Laryngoscopic Video Analyzing System (MLVAS), a novel system that leverages both audio and video data to automatically extract key video segments and metrics from raw laryngeal videostroboscopic videos for assisted clinical assessment. The system integrates video-based glottis detection with an audio keyword spotting method to […]
Deliberative Dynamics and Value Alignment in LLM Debates
arXiv:2510.10002v2 Announce Type: replace Abstract: As large language models (LLMs) are increasingly deployed in sensitive everyday contexts — offering personal advice, mental health support, and moral guidance — understanding their behavior in navigating complex moral reasoning is essential. Most evaluations study this sociotechnical alignment through single-turn prompts, but it is unclear if these findings extend […]
INDUCTION: Finite-Structure Concept Synthesis in First-Order Logic
arXiv:2602.18956v2 Announce Type: replace Abstract: We introduce INDUCTION, a benchmark for finite structure concept synthesis in first order logic. Given small finite relational worlds with extensionally labeled target predicates, models must output a single first order logical formula that explains the target uniformly across worlds, with correctness verified via exact model checking. The benchmark includes […]
EndoSERV: A Vision-based Endoluminal Robot Navigation System
arXiv:2603.08324v1 Announce Type: cross Abstract: Robot-assisted endoluminal procedures are increasingly used for early cancer intervention. However, the intricate, narrow and tortuous pathways within the luminal anatomy pose substantial difficulties for robot navigation. Vision-based navigation offers a promising solution, but existing localization approaches are error-prone due to tissue deformation, in vivo artifacts and a lack of […]
Yo’City: Personalized and Boundless 3D Realistic City Scene Generation via Self-Critic Expansion
arXiv:2511.18734v3 Announce Type: replace-cross Abstract: Realistic 3D city generation is fundamental to a wide range of applications, including virtual reality and digital twins. However, most existing methods rely on training a single diffusion model, which limits their ability to generate personalized and boundless city-scale scenes. In this paper, we present Yo’City, a novel agentic framework […]
Mathematicians in the age of AI
arXiv:2603.03684v2 Announce Type: replace-cross Abstract: Recent developments show that AI can prove research-level theorems in mathematics, both formally and informally. This essay urges mathematicians to stay up-to-date with the technology, to consider the ways it will disrupt mathematical practice, and to respond appropriately to the challenges and opportunities we now face.
Ready2Unlearn: A Learning-Time Approach for Preparing Models with Future Unlearning Readiness
arXiv:2505.10845v2 Announce Type: replace-cross Abstract: Machine unlearning is the process of removing the imprint left by specific data samples during the training of a machine learning model. AI developers, including those building personalized technologies, employ machine unlearning for various purposes such as privacy protection, security, and to address ethical concerns. This paper introduces Ready2Unlearn, a […]
Generative Evolutionary Meta-Solver (GEMS): Scalable Surrogate-Free Multi-Agent Reinforcement Learning
arXiv:2509.23462v2 Announce Type: replace-cross Abstract: Scalable multi-agent reinforcement learning (MARL) remains a central challenge for AI. Existing population-based methods, like Policy-Space Response Oracles, PSRO, require storing explicit policy populations and constructing full payoff matrices, incurring quadratic computation and linear memory costs. We present Generative Evolutionary Meta-Solver (GEMS), a surrogate-free framework that replaces explicit populations with […]
Diffusion-Guided Pretraining for Brain Graph Foundation Models
arXiv:2602.09437v3 Announce Type: replace-cross Abstract: With the growing interest in foundation models for brain signals, graph-based pretraining has emerged as a promising paradigm for learning transferable representations from connectome data. However, existing contrastive and masked autoencoder methods typically rely on naive random dropping or masking for augmentation, which is ill-suited for brain graphs and hypergraphs […]
Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction
arXiv:2603.08230v1 Announce Type: cross Abstract: Speech emotion recognition plays an important role in various applications. However, most existing approaches predict a single emotion label, oversimplifying the inherently ambiguous nature of human emotional expression. Recent large audio-language models show promise in generating richer outputs, but their reasoning ability for ambiguous emotional understanding remains limited. In this […]
Stronger Enforcement of Instruction Hierarchy via Augmented Intermediate Representations
arXiv:2505.18907v2 Announce Type: replace Abstract: Prompt injection attacks are a critical security vulnerability in large language models (LLMs), allowing attackers to hijack model behavior by injecting malicious instructions within the input context. Recent defense mechanisms have leveraged an Instruction Hierarchy (IH) Signal, often implemented through special delimiter tokens or additive embeddings to denote the privilege […]
CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents
arXiv:2601.09923v2 Announce Type: replace Abstract: AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss. The only known robust defense is architectural isolation that strictly separates trusted task planning from untrusted environment observations. However, applying this design to Computer Use Agents (CUAs) — systems […]