arXiv:2603.14380v1 Announce Type: cross Abstract: Spiking neural networks (SNNs) offer inherent energy efficiency due to their event-driven computation model, making them promising for edge AI deployment. However, their practical adoption is limited by the computational overhead of deep architectures and the absence of input-adaptive control. This work presents SPARQ, a unified framework that integrates spiking […]
Compute Allocation for Reasoning-Intensive Retrieval Agents
arXiv:2603.14635v1 Announce Type: cross Abstract: As agents operate over long horizons, their memory stores grow continuously, making retrieval critical to accessing relevant information. Many agent queries require reasoning-intensive retrieval, where the connection between query and relevant documents is implicit and requires inference to bridge. LLM-augmented pipelines address this through query expansion and candidate re-ranking, but […]
Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for AH Detection
arXiv:2603.13406v1 Announce Type: cross Abstract: Emotion recognition in videos is a pivotal task in affective computing, where identifying subtle psychological states such as Ambivalence and Hesitancy holds significant value for behavioral intervention and digital health. Ambivalence and Hesitancy states often manifest through cross-modal inconsistencies such as discrepancies between facial expressions, vocal tones, and textual semantics, […]
CtrlAttack: A Unified Attack on World-Model Control in Diffusion Models
arXiv:2603.13435v1 Announce Type: cross Abstract: Diffusion-based image-to-video (I2V) models increasingly exhibit world-model-like properties by implicitly capturing temporal dynamics. However, existing studies have mainly focused on visual quality and controllability, and the robustness of the state transition learned by the model remains understudied. To fill this gap, we are the first to analyze the vulnerability of […]
Opportunistic Cardiac Health Assessment: Estimating Phenotypes from Localizer MRI through Multi-Modal Representations
arXiv:2603.13590v1 Announce Type: cross Abstract: Cardiovascular diseases are the leading cause of death. Cardiac phenotypes (CPs), e.g., ejection fraction, are the gold standard for assessing cardiac health, but they are derived from cine cardiac magnetic resonance imaging (CMR), which is costly and requires high spatio-temporal resolution. Every magnetic resonance (MR) examination begins with rapid and […]
Evaluating Semantic Fragility in Text-to-Audio Generation Systems Under Controlled Prompt Perturbations
arXiv:2603.13824v1 Announce Type: cross Abstract: Recent advances in text-to-audio generation enable models to translate natural-language descriptions into diverse musical output. However, the robustness of these systems under semantically equivalent prompt variations remains largely unexplored. Small linguistic changes may lead to substantial variation in generated audio, raising concerns about reliability in practical use. In this study, […]
Discriminative Flow Matching Via Local Generative Predictors
arXiv:2603.13928v1 Announce Type: cross Abstract: Traditional discriminative computer vision relies predominantly on static projections, mapping input features to outputs in a single computational step. Although efficient, this paradigm lacks the iterative refinement and robustness inherent in biological vision and modern generative modelling. In this paper, we propose Discriminative Flow Matching, a framework that reformulates classification […]
Concisely Explaining the Doubt: Minimum-Size Abductive Explanations for Linear Models with a Reject Option
arXiv:2603.14096v1 Announce Type: cross Abstract: Trustworthiness in artificial intelligence depends not only on what a model decides, but also on how it handles and explains cases in which a reliable decision cannot be made. In critical domains such as healthcare and finance, a reject option allows the model to abstain when evidence is insufficient, making […]
UniFusion: A Unified Image Fusion Framework with Robust Representation and Source-Aware Preservation
arXiv:2603.14214v1 Announce Type: cross Abstract: Image fusion aims to integrate complementary information from multiple source images to produce a more informative and visually consistent representation, benefiting both human perception and downstream vision tasks. Despite recent progress, most existing fusion methods are designed for specific tasks (i.e., multi-modal, multi-exposure, or multi-focus fusion) and struggle to effectively […]
How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images
arXiv:2603.14323v1 Announce Type: cross Abstract: Generalist multimodal large language models (MLLMs) have achieved impressive performance across a wide range of vision-language tasks. However, their performance on medical tasks, particularly in zero-shot settings where generalization is critical, remains suboptimal. A key research gap is the limited understanding of why medical MLLMs underperform in medical image interpretation. […]
Agent-Based User-Adaptive Filtering for Categorized Harassing Communication
arXiv:2603.13288v1 Announce Type: new Abstract: We propose an agent-based framework for personalized filtering of categorized harassing communication in online social networks. Unlike global moderation systems that apply uniform filtering rules, our approach models user-specific tolerance levels and preferences through adaptive filtering agents. These agents learn from user feedback and dynamically adjust filtering thresholds across multiple […]
Distilling Reasoning Without Knowledge: A Framework for Reliable LLMs
arXiv:2603.14458v1 Announce Type: cross Abstract: Fact-seeking question answering with large language models (LLMs) remains unreliable when answers depend on up-to-date or conflicting information. Although retrieval-augmented and tool-using LLMs reduce hallucinations, they often rely on implicit planning, leading to inefficient tool usage. We propose a modular framework that explicitly separates planning from factual retrieval and answer […]