arXiv:2511.16683v1 Announce Type: cross Abstract: Large Language Models (LLMs) are the engines driving today’s AI agents. The better these models understand human languages, the more natural and user-friendly the interaction with AI becomes, from everyday devices like computers and smartwatches to any tool that can act intelligently. Yet, the ability of industrial-scale LLMs to comprehend […]
DDTime: Dataset Distillation with Spectral Alignment and Information Bottleneck for Time-Series Forecasting
arXiv:2511.16715v1 Announce Type: cross Abstract: Time-series forecasting is fundamental across many domains, yet training accurate models often requires large-scale datasets and substantial computational resources. Dataset distillation offers a promising alternative by synthesizing compact datasets that preserve the learning behavior of full data. However, extending dataset distillation to time-series forecasting is non-trivial due to two fundamental […]
SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge
arXiv:2511.16743v1 Announce Type: cross Abstract: Improving the safety of vision-language models like CLIP via fine-tuning often comes at a steep price, causing significant drops in their generalization performance. We find this trade-off stems from rigid alignment strategies that force unsafe concepts toward single, predefined safe targets, disrupting the model’s learned semantic structure. To address this, […]
Large language models for automated PRISMA 2020 adherence checking
arXiv:2511.16707v1 Announce Type: cross Abstract: Evaluating adherence to PRISMA 2020 guideline remains a burden in the peer review process. To address the lack of shareable benchmarks, we constructed a copyright-aware benchmark of 108 Creative Commons-licensed systematic reviews and evaluated ten large language models (LLMs) across five input formats. In a development cohort, supplying structured PRISMA […]
AutoBackdoor: Automating Backdoor Attacks via LLM Agents
arXiv:2511.16709v1 Announce Type: cross Abstract: Backdoor attacks pose a serious threat to the secure deployment of large language models (LLMs), enabling adversaries to implant hidden behaviors triggered by specific inputs. However, existing methods often rely on manually crafted triggers and static data pipelines, which are rigid, labor-intensive, and inadequate for systematically evaluating modern defense robustness. […]
Concept-Based Interpretability for Toxicity Detection
arXiv:2511.16689v1 Announce Type: cross Abstract: The rise of social networks has not only facilitated communication but also allowed the spread of harmful content. Although significant advances have been made in detecting toxic language in textual data, the exploration of concept-based explanations in toxicity detection remains limited. In this study, we leverage various subtype attributes present […]
RAG-Driven Data Quality Governance for Enterprise ERP Systems
arXiv:2511.16700v1 Announce Type: cross Abstract: Enterprise ERP systems managing hundreds of thousands of employee records face critical data quality challenges when human resources departments perform decentralized manual entry across multiple languages. We present an end-to-end pipeline combining automated data cleaning with LLM-driven SQL query generation, deployed on a production system managing 240,000 employee records over […]
Password Strength Analysis Through Social Network Data Exposure: A Combined Approach Relying on Data Reconstruction and Generative Models
arXiv:2511.16716v1 Announce Type: cross Abstract: Although passwords remain the primary defense against unauthorized access, users often tend to use passwords that are easy to remember. This behavior significantly increases security risks, also due to the fact that traditional password strength evaluation methods are often inadequate. In this discussion paper, we present SODA ADVANCE, a data […]
Patient-level Information Extraction by Consistent Integration of Textual and Tabular Evidence with Bayesian Networks
arXiv:2511.17056v1 Announce Type: new Abstract: Electronic health records (EHRs) form an invaluable resource for training clinical decision support systems. To leverage the potential of such systems in high-risk applications, we need large, structured tabular datasets on which we can build transparent feature-based models. While part of the EHR already contains structured information (e.g. diagnosis codes, […]
SAM 3: Segment Anything with Concepts
arXiv:2511.16719v1 Announce Type: cross Abstract: We present Segment Anything Model (SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phrases (e.g., “yellow school bus”), image exemplars, or a combination of both. Promptable Concept Segmentation (PCS) takes such prompts […]
Stochastic neutral fractions and the effective population size
arXiv:2502.05062v2 Announce Type: replace-cross Abstract: The dynamics of a general structured population is modelled using a general stochastic differential equation (SDE) with an infinite decomposability property. This property allows the population to be divided into an arbitrary number of allelic components, also known as stochastic neutral fractions. When demographic noise is small, a fast-slow principle […]
The Belief-Desire-Intention Ontology for modelling mental reality and agency
arXiv:2511.17162v1 Announce Type: new Abstract: The Belief-Desire-Intention (BDI) model is a cornerstone for representing rational agency in artificial intelligence and cognitive sciences. Yet, its integration into structured, semantically interoperable knowledge representations remains limited. This paper presents a formal BDI Ontology, conceived as a modular Ontology Design Pattern (ODP) that captures the cognitive architecture of agents […]