arXiv:2605.23892v1 Announce Type: cross Abstract: Visual geometry transformers have become powerful architectures for multi-view 3D reconstruction, enabling joint prediction of multiple 3D attributes in a feed-forward manner. However, their computational cost grows quadratically with the input sequence length due to the global attention layers inside these models. This limits both their scalability and efficiency. In […]
Anatomy-Guided Vision-Language Learning with Angular Prototype Separation for Multi-Label Video Capsule Endoscopy Classification Under Class Imbalance
arXiv:2603.17879v2 Announce Type: replace-cross Abstract: This work presents a multi-label temporal event detection framework for video capsule endoscopy (VCE) that addresses the extreme class imbalance inherent in the Galar dataset by combining two principal contributions: an Angular Separation Loss on class prototypes and a Biological State Machine temporal decoder. The backbone remains BiomedCLIP, a biomedical […]
Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German
arXiv:2605.19069v3 Announce Type: replace-cross Abstract: Code-switching — the natural alternation between two languages within a single utterance — remains one of the most challenging and under-studied conditions for automatic speech recognition (ASR). We present a benchmark evaluating five commercial ASR providers across four language pairs: Egyptian Arabic–English, Saudi Arabic (Najdi/Hijazi)–English, Persian (Farsi)–English, and German–English, comprising […]
Multimodal Distribution Matching for Vision-Language Dataset Distillation
arXiv:2605.23482v1 Announce Type: cross Abstract: Dataset distillation compresses large training sets into compact synthetic datasets while preserving downstream performance. As modern systems increasingly operate on paired vision-language inputs, multimodal distillation must preserve representation quality and cross-modal alignment under tight compute and memory budgets, yet prior methods often require heavy computes and overlook their correlations. To […]
Empowering 9-1-1 Calltaking Training with Generative AI: Experiences and Lessons Learned
arXiv:2602.13241v2 Announce Type: replace-cross Abstract: Emergency call-takers form the first operational link in public safety response, handling over 240 million calls annually while facing a sustained training crisis: staffing shortages exceed 25% in many centers, and preparing a single new hire can require up to 720 hours of one-on-one instruction that removes experienced personnel from […]
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions
arXiv:2605.20087v2 Announce Type: replace-cross Abstract: Conversational AI has now reached billions of users, yet existing datasets capture only what people say, not what they think. We introduce ThoughtTrace, the first large-scale dataset that pairs real-world multi-turn human–AI conversations with users’ self-reported thoughts: their reasons for sending prompts and reactions to assistant responses. ThoughtTrace comprises 1,058 […]
Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor
arXiv:2605.20402v2 Announce Type: replace-cross Abstract: MXFP4 arithmetic can dramatically accelerate reinforcement learning (RL) post-training of large language models (LLMs), yet the quantization error introduces severe accuracy degradation. Existing work treats the quantization error as a monolithic noise term, missing the distinct mechanisms upon interpreting how quantization error damages training. We prove an exact three-way decomposition […]
VACE: Learning Geometrically Structured Representations for Time Series Anomaly Detection
arXiv:2605.23504v1 Announce Type: cross Abstract: Anomaly detection in multivariate time series is a critical task across a wide range of real-world applications, where abnormal behaviour is rare, labels are unavailable, and the cost of a miss is high. The central challenge is learning a characterisation of normality precise enough to flag deviations. Representation self-supervised learning, […]
GenAI-Driven Threat Detection with Microsoft Security Copilot
arXiv:2605.20896v2 Announce Type: replace-cross Abstract: Defending against today’s increasingly sophisticated cyberattacks requires security analysts to continuously translate evolving attacker tradecraft into detection logic. This places defenders in a reactive posture, requiring constantly updated expertise across an increasingly fragmented security landscape. We introduce the Dynamic Threat Detection Agent (DTDA), an always-on adaptive agent that continuously investigates […]
GeoMAE: Masking Representation Learning for Spatio-Temporal Graph Forecasting with Missing Values
arXiv:2508.14083v3 Announce Type: replace-cross Abstract: The ubiquity of missing data in urban intelligence systems, attributable to adverse environmental conditions and equipment failures, poses a significant challenge to the efficacy of downstream applications, notably in the realms of traffic forecasting and energy consumption prediction. Therefore, it is imperative to develop a robust spatio-temporal learning methodology capable […]
Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic
arXiv:2605.18993v2 Announce Type: replace-cross Abstract: Task vector composition has emerged as a promising paradigm for editing pre-trained models, enabling model merging through addition and unlearning through subtraction. Fine-tuning in the tangent space of a pre-trained model (linear fine-tuning) has proven effective, as it produces task vectors that are naturally disentangled and resistant to interference. However, […]
On the Koopman-Based Generalization Bounds for Multi-Task Deep Learning
arXiv:2512.19199v2 Announce Type: replace-cross Abstract: The paper establishes generalization bounds for multitask deep neural networks using operator-theoretic techniques. The authors propose a tighter bound than those derived from conventional norm based methods by leveraging small condition numbers in the weight matrices and introducing a tailored Sobolev space as an expanded hypothesis space. This enhanced bound […]