Multi-View Majority Vote Learning Algorithms: Direct Minimization of PAC-Bayesian Bounds

arXiv:2411.06276v3 Announce Type: replace-cross Abstract: The PAC-Bayesian framework has significantly advanced the understanding of statistical learning, particularly for majority voting methods. Despite its successes, its application to multi-view learning — a setting with multiple complementary data representations — remains underexplored. In this work, we extend PAC-Bayesian theory to multi-view learning, introducing novel generalization bounds based […]

Understanding the Modality Gap: An Empirical Study on the Speech-Text Alignment Mechanism of Large Speech Language Models

arXiv:2510.12116v1 Announce Type: cross Abstract: End-to-end Large Speech Language Models (LSLMs) have demonstrated impressive conversational generation abilities, yet consistently fall short of traditional pipeline systems on semantic understanding benchmarks. In this work, we reveal through systematic experimentation that although LSLMs lose some text input performance after speech-text alignment training, the performance gap between speech and […]

An Efficient Algorithm for Exploring RNA Branching Conformations under the Nearest-Neighbor Thermodynamic Model

arXiv:2510.12059v1 Announce Type: new Abstract: Background: In the Nearest-Neighbor Thermodynamic Model, a standard approach for RNA secondary structure prediction, the energy of the multiloops is modeled using a linear entropic penalty governed by three branching parameters. Although these parameters are typically fixed, recent work has shown that reparametrizing the multiloop score and considering alternative branching […]

From Knowledge to Treatment: Large Language Model Assisted Biomedical Concept Representation for Drug Repurposing

arXiv:2510.12181v1 Announce Type: cross Abstract: Drug repurposing plays a critical role in accelerating treatment discovery, especially for complex and rare diseases. Biomedical knowledge graphs (KGs), which encode rich clinical associations, have been widely adopted to support this task. However, existing methods largely overlook common-sense biomedical concept knowledge in real-world labs, such as mechanistic priors indicating […]

AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars

arXiv:2505.15058v2 Announce Type: replace-cross Abstract: Whole-body audio-driven avatar pose and expression generation is a critical task for creating lifelike digital humans and enhancing the capabilities of interactive virtual agents, with wide-ranging applications in virtual reality, digital entertainment, and remote communication. Existing approaches often generate audio-driven facial expressions and gestures independently, which introduces a significant limitation: […]

HALF: Harm-Aware LLM Fairness Evaluation Aligned with Deployment

arXiv:2510.12217v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed across high-impact domains, from clinical decision support and legal analysis to hiring and education, making fairness and bias evaluation before deployment critical. However, existing evaluations lack grounding in real-world scenarios and do not account for differences in harm severity, e.g., a biased decision […]

Empowering LLM Agents with Geospatial Awareness: Toward Grounded Reasoning for Wildfire Response

arXiv:2510.12061v1 Announce Type: new Abstract: Effective disaster response is essential for safeguarding lives and property. Existing statistical approaches often lack semantic context, generalize poorly across events, and offer limited interpretability. While Large language models (LLMs) provide few-shot generalization, they remain text-bound and blind to geography. To bridge this gap, we introduce a Geospatial Awareness Layer […]

Diffusion Models for Reinforcement Learning: Foundations, Taxonomy, and Development

arXiv:2510.12253v1 Announce Type: cross Abstract: Diffusion Models (DMs), as a leading class of generative models, offer key advantages for reinforcement learning (RL), including multi-modal expressiveness, stable training, and trajectory-level planning. This survey delivers a comprehensive and up-to-date synthesis of diffusion-based RL. We first provide an overview of RL, highlighting its challenges, and then introduce the […]

Dual Perspectives on Non-Contrastive Self-Supervised Learning

arXiv:2507.01028v2 Announce Type: replace-cross Abstract: The em stop gradient and em exponential moving average iterative procedures are commonly used in non-contrastive approaches to self-supervised learning to avoid representation collapse, with excellent performance in downstream applications in practice. This presentation investigates these procedures from the dual viewpoints of optimization and dynamical systems. We show that, in […]

TFGA-Net: Temporal-Frequency Graph Attention Network for Brain-Controlled Speaker Extraction

arXiv:2510.12275v1 Announce Type: cross Abstract: The rapid development of auditory attention decoding (AAD) based on electroencephalography (EEG) signals offers the possibility EEG-driven target speaker extraction. However, how to effectively utilize the target-speaker common information between EEG and speech remains an unresolved problem. In this paper, we propose a model for brain-controlled speaker extraction, which utilizes […]

ThinkPilot: Steering Reasoning Models via Automated Think-prefixes Optimization

arXiv:2510.12063v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) are powerful, but they still suffer from inefficient and off-target reasoning. Currently, training-free methods are limited to either rigid heuristics or descriptive, non-actionable analyses. In this paper, we introduce ThinkPilot, a training-free framework that automatically optimizes LRMs reasoning. It uses an evolutionary process to generate think-prefixes, […]

Causal Inspired Multi Modal Recommendation

arXiv:2510.12325v1 Announce Type: cross Abstract: Multimodal recommender systems enhance personalized recommendations in e-commerce and online advertising by integrating visual, textual, and user-item interaction data. However, existing methods often overlook two critical biases: (i) modal confounding, where latent factors (e.g., brand style or product category) simultaneously drive multiple modalities and influence user preference, leading to spurious […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844