arXiv:2605.20088v1 Announce Type: cross Abstract: Discovering shapelets — i.e., discriminative temporal patterns within time series — has been widely studied to address the inherent complexity of time-series classification (TSC) and to make model decision-making processes more transparent. However, existing methods primarily focus on population-level shapelets optimized across the entire dataset, which leads to two fundamental […]
ViroGym: Realistic Large-Scale Benchmarks for Evaluating Viral Proteins
arXiv:2603.06740v2 Announce Type: replace Abstract: Protein language models (pLMs) have shown strong potential for zero-shot prediction of missense variant effects, yet systematic benchmarking on viral proteins remains limited, a critical gap given the need for proactive tools that can anticipate emerging mutations ahead of experimental validation. Here we introduce ViroGym, a comprehensive benchmark evaluating pLMs […]
Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters
arXiv:2605.19523v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) have demonstrated remarkable proficiency in general multi-modal understanding; yet they struggle to efficiently acquire continually evolving domain-specific skills. Conventional approaches to enhancing VLM capabilities, such as Supervised Fine-Tuning (SFT), require extensive dataset curation and substantial computational resources. Model merging has emerged as an efficient alternative that enables […]
Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation
arXiv:2605.19833v1 Announce Type: cross Abstract: Despite rapid advances in automatic speech recognition (ASR) and large audio-language models, robust recognition in real-world environments remains limited by an “acoustic robustness bottleneck”: models often lose acoustic grounding and produce omissions or hallucinations under severe, compositional distortions. We propose Mega-ASR, a unified ASR-in-the-wild framework that combines scalable compound-data construction […]
PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts
arXiv:2605.05974v2 Announce Type: replace-cross Abstract: LLM agents rely on prompts to implement task-specific capabilities based on foundation LLMs, making agent prompts valuable intellectual property. However, in untrusted deployments, adversaries can copy and reuse these prompts with other proprietary LLMs, causing economic losses. To protect these prompts, we identify four key challenges: proactivity, runtime protection, usability, […]
Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation
arXiv:2605.18474v2 Announce Type: replace-cross Abstract: The widespread deployment and redistribution of large language models (LLMs) have made model provenance tracking a critical challenge. While existing LLM fingerprinting methods, particularly active approaches that embed identity signals via fine-tuning, achieve high accuracy and robustness, they suffer from significant scalability bottlenecks. These methods typically treat fingerprint injection as […]
Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges
arXiv:2605.19723v1 Announce Type: cross Abstract: Mathematical reasoning is essential for problem-solving in education, science, and industry, serving as a crucial benchmark for evaluating artificial intelligence systems. As Large Language Models (LLMs) improve their reasoning capabilities, understanding how well they perform mathematical reasoning has become increasingly important. This survey synthesizes recent advancements in mathematical reasoning with […]
Detecting Fluent Optimization-Based Adversarial Prompts via Sequential Entropy Changes
arXiv:2605.19966v1 Announce Type: cross Abstract: Optimization-based adversarial suffixes can jailbreak aligned large language models (LLMs) while remaining fluent, weakening static and windowed perplexity-based detectors. We cast adversarial suffix detection as an online change-point detection problem over the token-level next-token entropy stream. Using the LLM system prompt to estimate a robust baseline, we standardize user-token entropies […]
Learning Efficient Guardrails for Compliance
arXiv:2510.03485v2 Announce Type: replace Abstract: Autonomous web agents are increasingly deployed for long-horizon tasks, yet their ability to adhere to real-world policies remains critically underexplored compared to standard safety objectives. To address this gap, we introduce PolicyGuardBench, a benchmark of 60k policy-trajectory pairs designed to evaluate compliance through both full-trajectory and novel prefix-based violation detection […]
Skim: Speculative Execution for Fast and Efficient Web Agents
arXiv:2605.16565v2 Announce Type: replace Abstract: Skim is a speculative execution framework for web agents that exploits the predictable structure of purpose-built websites. Today’s web-agent expense is not intrinsic to the tasks but a property of how agents are composed: frontier-model inference, browser rendering, and ReAct-style planning are applied to every step of every task regardless […]
Needles in the Landscape: Semi-Supervised Pseudolabeling for Archaeological Site Discovery under Label Scarcity
arXiv:2510.16814v3 Announce Type: replace-cross Abstract: Archaeological predictive modelling estimates where undiscovered sites are likely to occur by combining known locations with environmental and geospatial variables, presenting a positive-unlabeled (PU) learning challenge where confirmed sites are rare and most locations are unlabeled rather than truly negative. To overcome this, we propose asymmetric dual pseudolabeling (DPL), an […]
Synthetic Data Generation for Brain-Computer Interfaces: Overview, Benchmarking, and Future Directions
arXiv:2603.12296v2 Announce Type: replace-cross Abstract: Deep learning has achieved transformative performance across diverse domains, largely driven by large-scale and high-quality training data. In contrast, the development of brain-computer interfaces (BCIs) is fundamentally constrained by limited, heterogeneous, and privacy-sensitive neural recordings. Generating synthetic yet physiologically plausible brain signals has therefore emerged as a promising strategy to […]