arXiv:2604.14726v2 Announce Type: replace-cross Abstract: Online anomaly detection (OAD) plays a pivotal role in real-time analytics and decision-making for evolving data streams. However, existing methods often rely on costly retraining and rigid decision boundaries, limiting their ability to adapt both effectively and efficiently to concept drift in dynamic environments. To address these challenges, we propose […]
Conjuring Semantic Similarity
arXiv:2410.16431v4 Announce Type: replace Abstract: The semantic similarity between sample expressions measures the distance between their latent ‘meaning’. These meanings are themselves typically represented by textual expressions. We propose a novel approach whereby the semantic similarity among textual expressions is based not on other expressions they can be rephrased as, but rather based on the […]
Current validation practice undermines surgical AI development
arXiv:2511.03769v2 Announce Type: replace Abstract: Surgical data science (SDS) is rapidly advancing, yet clinical adoption of artificial intelligence (AI) in surgery remains limited, with inadequate validation emerging as an important contributing factor. In fact, existing validation practices often neglect the temporal and hierarchical structure of intraoperative videos, producing misleading, unstable, or clinically irrelevant results. In […]
Ontology-Constrained Neural Reasoning in Enterprise Agentic Systems: A Neurosymbolic Architecture for Domain-Grounded AI Agents
arXiv:2604.00555v2 Announce Type: replace Abstract: Enterprise adoption of Large Language Models (LLMs) is constrained by hallucination, domain drift, and the inability to enforce regulatory compliance at the reasoning level. We present a neurosymbolic architecture implemented within the Foundation AgenticOS (FAOS) platform that addresses these limitations through ontology-constrained neural reasoning. Our approach introduces a three-layer ontological […]
DASB – Discrete Audio and Speech Benchmark
arXiv:2406.14294v4 Announce Type: replace-cross Abstract: Discrete audio tokens have recently gained considerable attention for their potential to bridge audio and language processing, enabling multimodal language models that can both generate and understand audio. However, preserving key information such as phonetic content, speaker identity, and paralinguistic cues remains a major challenge. Identifying the optimal tokenizer and […]
LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
arXiv:2506.09373v3 Announce Type: replace-cross Abstract: The advent of autonomous agents is transforming interactions with Graphical User Interfaces (GUIs) by employing natural language as a powerful intermediary. Despite the predominance of Supervised Fine-Tuning (SFT) methods in current GUI agents for achieving spatial localization, these methods face substantial challenges due to their limited capacity to accurately perceive […]
ReefNet: A Large-Scale Dataset and Benchmark for Fine-Grained Coral Reef Recognition
arXiv:2510.16822v3 Announce Type: replace-cross Abstract: Coral reefs are rapidly declining under anthropogenic pressures (e.g., climate change), creating an urgent need for scalable and automated monitoring. Progress in data-driven coral analysis, however, is constrained by the scarcity of large-scale datasets with fine-grained labels that are taxonomically consistent across sites and studies. To address this gap, we […]
Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models
arXiv:2601.14152v2 Announce Type: replace-cross Abstract: Large language models exhibit surprising sensitivity to the structure of the prompt, but the mechanisms underlying this sensitivity remain poorly understood. In this work, we conduct an in-depth investigation on a striking case: in multiple-choice question answering, placing context before the questions and options (CQO) outperforms the reverse order (QOC) […]
PhysMem: Scaling Test-time Physical Memory for Robot Manipulation
arXiv:2602.20323v5 Announce Type: replace-cross Abstract: Reliable object manipulation requires understanding physical properties that vary across objects and environments. Vision-language model (VLM) planners can reason about friction and stability in general terms; however, they often cannot predict how a specific ball will roll on a particular surface or which stone will provide a stable foundation without […]
MESA: A Training-Free Multi-Exemplar Deep Framework for Restoring Ancient Inscription Textures
arXiv:2604.17390v2 Announce Type: replace-cross Abstract: Ancient inscriptions frequently suffer missing or corrupted regions from fragmentation, erosion, or other damage, hindering reading, and analysis. We review prior image restoration methods and their applicability to inscription image recovery, then introduce MESA (Multi-Exemplar, Style-Aware) -an image-level restoration method that uses well-preserved exemplar inscriptions (from the same epigraphic monument, […]
CoCo-SAM3: Harnessing Concept Conflict in Open-Vocabulary Semantic Segmentation
arXiv:2604.19648v1 Announce Type: cross Abstract: SAM3 advances open-vocabulary semantic segmentation by introducing a prompt-driven mask generation paradigm. However, in multi-class open-vocabulary scenarios, masks generated independently from different category prompts lack a unified and inter-class comparable evidence scale, often resulting in overlapping coverage and unstable competition. Moreover, synonymous expressions of the same concept tend to activate […]
VLA Foundry: A Unified Framework for Training Vision-Language-Action Models
arXiv:2604.19728v1 Announce Type: cross Abstract: We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the action training stage, often stitching together incompatible pretraining pipelines. VLA Foundry instead provides a shared training stack with end-to-end control, from language pretraining to action-expert […]