arXiv:2604.20306v1 Announce Type: cross Abstract: Medical Visual Question Answering (MedVQA) aims to generate clinically reliable answers conditioned on complex medical images and questions. However, existing methods often overfit to superficial cross-modal correlations, neglecting the intrinsic biases embedded in multimodal medical data. Consequently, models become vulnerable to cross-modal confounding effects, severely hindering their ability to provide […]
Early-Stage Product Line Validation Using LLMs: A Study on Semi-Formal Blueprint Analysis
arXiv:2604.20523v1 Announce Type: cross Abstract: We study whether Large Language Models (LLMs) can perform feature model analysis operations (AOs) directly on semi-formal textual blueprints, i.e., concise constrained-language descriptions of feature hierarchies and constraints, enabling early validation in Software Product Line scoping. Using 12 state-of-the-art LLMs and 16 standard AOs, we compare their outputs against the […]
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
arXiv:2604.20720v1 Announce Type: cross Abstract: Large language models (LLMs) often exhibit performance disparities across languages, with naive multilingual fine-tuning frequently degrading performance due to negative cross-lingual interference. To address this, we introduce COMPASS (COntinual Multilingual PEFT with Adaptive Semantic Sampling), a novel data-centric framework for adapting LLMs to target languages. COMPASS leverages parameter-efficient fine-tuning (PEFT) […]
MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management
arXiv:2601.11505v2 Announce Type: replace-cross Abstract: Progress in Type 1 Diabetes (T1D) algorithm development is limited by the fragmentation and lack of standardization across existing T1D management datasets. Current datasets differ substantially in structure and are time-consuming to access and process, which impedes data integration and reduces the comparability and generalizability of algorithmic developments. This work […]
Alignment midtraining for animals
arXiv:2604.13076v2 Announce Type: replace-cross Abstract: We investigate the robustness of value alignment via finetuning with synthetic documents, using animal compassion as a value that is both important in its own right and orthogonal to existing alignment efforts. To evaluate compassionate reasoning, we develop and publicly release the Animal Harm Benchmark (AHB), a 26-question evaluation spanning […]
IVY-FAKE: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection
arXiv:2506.00979v5 Announce Type: replace-cross Abstract: The rapid development of Artificial Intelligence Generated Content (AIGC) techniques has enabled the creation of high-quality synthetic content, but it also raises significant security concerns. Current detection methods face two major limitations: (1) the lack of multidimensional explainable datasets for generated images and videos. Existing open-source datasets (e.g., WildFake, GenVideo) […]
LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent
arXiv:2604.17931v2 Announce Type: replace Abstract: Reinforcement Learning (RL) has emerged as a powerful training paradigm for LLM-based agents. However, scaling agentic RL for deep research remains constrained by two coupled challenges: hand-crafted synthetic data fails to elicit genuine real-world search capabilities, and real-world search dependency during RL training introduces instability and prohibitive cost, which limits […]
A hybrid discrete-continuum modelling approach for the interactions of the immune system with oncolytic viral infections
arXiv:2404.06459v3 Announce Type: replace Abstract: Oncolytic virotherapy, utilizing genetically modified viruses to combat cancer and trigger anti-cancer immune responses, has garnered significant attention in recent years. In our previous work arXiv:2305.12386, we developed a stochastic agent-based model elucidating the spatial dynamics of infected and uninfected cells within solid tumours. Building upon this foundation, we present […]
Computing the Reachability Value of Posterior-Deterministic POMDPs
arXiv:2602.07473v2 Announce Type: replace Abstract: Partially observable Markov decision processes (POMDPs) are a fundamental model for sequential decision-making under uncertainty. However, many verification and synthesis problems for POMDPs are undecidable or intractable. Most prominently, the seminal result of Madani et al. (2003) states that there is no algorithm that, given a POMDP and a set […]
White-Basilisk: A Hybrid Model for Code Vulnerability Detection
arXiv:2507.08540v5 Announce Type: replace-cross Abstract: The proliferation of software vulnerabilities presents a significant challenge to cybersecurity, necessitating more effective detection methodologies. We introduce White-Basilisk, a novel approach to vulnerability detection that demonstrates superior performance while challenging prevailing assumptions in AI model scaling. Utilizing an innovative architecture that integrates Mamba layers, linear self-attention, and a Mixture […]
MirrorBench: Evaluating Self-centric Intelligence in MLLMs by Introducing a Mirror
arXiv:2604.14785v2 Announce Type: replace Abstract: Recent progress in Multimodal Large Language Models (MLLMs) has demonstrated remarkable advances in perception and reasoning, suggesting their potential for embodied intelligence. While recent studies have evaluated embodied MLLMs in interactive settings, current benchmarks mainly target capabilities to perceive, understand, and interact with external objects, lacking a systematic evaluation of […]
Locate-Then-Examine: Grounded Region Reasoning Improves Detection of AI-Generated Images
arXiv:2510.04225v2 Announce Type: replace-cross Abstract: The rapid growth of AI-generated imagery has blurred the boundary between real and synthetic content, raising practical concerns for digital integrity. Vision-language models (VLMs) can provide natural language explanations, but standard one-pass classifiers often miss subtle artifacts in high-quality synthetic images and offer limited grounding in the pixels. We propose […]