arXiv:2604.06435v1 Announce Type: cross Abstract: Visual Anomaly Detection (VAD) is a critical task for many applications including industrial inspection and healthcare. While VAD has been extensively studied, two key challenges remain largely unaddressed in conjunction: edge deployment, where computational resources are severely constrained, and continual learning, where models must adapt to evolving data distributions without […]
Improving Robustness In Sparse Autoencoders via Masked Regularization
arXiv:2604.06495v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) are widely used in mechanistic interpretability to project LLM activations onto sparse latent spaces. However, sparsity alone is an imperfect proxy for interpretability, and current training objectives often result in brittle latent representations. SAEs are known to be prone to feature absorption, where general features are subsumed […]
AI-Driven Research for Databases
arXiv:2604.06566v1 Announce Type: cross Abstract: As the complexity of modern workloads and hardware increasingly outpaces human research and engineering capacity, existing methods for database performance optimization struggle to keep pace. To address this gap, a new class of techniques, termed AI-Driven Research for Systems (ADRS), uses large language models to automate solution discovery. This approach […]
SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning
arXiv:2604.06636v1 Announce Type: cross Abstract: Process supervision has emerged as a promising approach for enhancing LLM reasoning, yet existing methods fail to distinguish meaningful progress from mere verbosity, leading to limited reasoning capabilities and unresolved token inefficiency. To address this, we propose Stage-aware Hierarchical Advantage via Potential Estimation (SHAPE), a framework that formalizes reasoning as […]
Fine-grained Approaches for Confidence Calibration of LLMs in Automated Code Revision
arXiv:2604.06723v1 Announce Type: cross Abstract: In today’s AI-assisted software engineering landscape, developers increasingly depend on LLMs that are highly capable, yet inherently imperfect. The tendency of these models to produce incorrect outputs can reduce developer productivity. To this end, a canonical mitigation method is to provide calibrated confidence scores that faithfully reflect their likelihood of […]
BDI-Kit Demo: A Toolkit for Programmable and Conversational Data Harmonization
arXiv:2604.06405v1 Announce Type: new Abstract: Data harmonization remains a major bottleneck for integrative analysis due to heterogeneity in schemas, value representations, and domain-specific conventions. BDI-Kit provides an extensible toolkit for schema and value matching. It exposes two complementary interfaces tailored to different user needs: a Python API enabling developers to construct harmonization pipelines programmatically, and […]
AEROS: A Single-Agent Operating Architecture with Embodied Capability Modules
arXiv:2604.07039v1 Announce Type: cross Abstract: Robotic systems lack a principled abstraction for organizing intelligence, capabilities, and execution in a unified manner. Existing approaches either couple skills within monolithic architectures or decompose functionality into loosely coordinated modules or multiple agents, often without a coherent model of identity and control authority. We argue that a robot should […]
Attribution-Driven Explainable Intrusion Detection with Encoder-Based Large Language Models
arXiv:2604.06266v1 Announce Type: cross Abstract: Software-Defined Networking (SDN) improves network flexibility but also increases the need for reliable and interpretable intrusion detection. Large Language Models (LLMs) have recently been explored for cybersecurity tasks due to their strong representation learning capabilities; however, their lack of transparency limits their practical adoption in security-critical environments. Understanding how LLMs […]
The Mechanistic Invariance Test: Genomic Language Models Fail to Learn Positional Regulatory Logic
arXiv:2604.06549v1 Announce Type: new Abstract: Genomic language models (gLMs) have transformed computational biology, achieving state-of-the-art performance across genomic tasks. Yet a fundamental question threatens the foundation of this success: do these models learn the mechanistic principles governing gene regulation, or do they merely exploit statistical shortcuts? We introduce the Mechanistic Invariance Test (MIT), a rigorous […]
DosimeTron: Automating Personalized Monte Carlo Radiation Dosimetry in PET/CT with Agentic AI
arXiv:2604.06280v1 Announce Type: cross Abstract: Purpose: To develop and evaluate DosimeTron, an agentic AI system for automated patient-specific MC internal radiation dosimetry in PET/CT examinations. Materials and Methods: In this retrospective study, DosimeTron was evaluated on a publicly available PSMA-PET/CT dataset comprising 597 studies from 378 male patients acquired on three scanner models (18-F, n […]
The ATOM Report: Measuring the Open Language Model Ecosystem
arXiv:2604.07190v1 Announce Type: cross Abstract: We present a comprehensive adoption snapshot of the leading open language models and who is building them, focusing on the ~1.5K mainline open models from the likes of Alibaba’s Qwen, DeepSeek, Meta’s Llama, that are the foundation of an ecosystem crucial to researchers, entrepreneurs, and policy advisors. We document a […]
On Emotion-Sensitive Decision Making of Small Language Model Agents
arXiv:2604.06562v1 Announce Type: new Abstract: Small language models (SLM) are increasingly used as interactive decision-making agents, yet most decision-oriented evaluations ignore emotion as a causal factor influencing behavior. We study emotion-sensitive decision making by combining representation-level emotion induction with a structured game-theoretic evaluation. Emotional states are induced using activation steering derived from crowd-validated, real-world emotion-eliciting […]