It’s About Time: The Temporal and Modal Dynamics of Copilot Usage

arXiv:2512.11879v1 Announce Type: cross Abstract: We analyze 37.5 million deidentified conversations with Microsoft’s Copilot between January and September 2025. Unlike prior analyses of AI usage,

Uncovering the Role of Initial Saliency in U-Shaped Attention Bias: Scaling Initial Token Weight for Enhanced Long-Text Processing

arXiv:2512.13109v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated strong performance on a variety of natural language processing (NLP) tasks. However, they often

Counting Clues: A Lightweight Probabilistic Baseline Can Match an LLM

arXiv:2512.12868v1 Announce Type: cross Abstract: Large language models (LLMs) excel on multiple-choice clinical diagnosis benchmarks, yet it is unclear how much of this performance reflects

Social welfare optimisation in well-mixed and structured populations

arXiv:2512.07453v2 Announce Type: replace-cross Abstract: Research on promoting cooperation among autonomous, self-regarding agents has often focused on the bi-objective optimisation problem: minimising the total incentive

Optimal Labeler Assignment and Sampling for Active Learning in the Presence of Imperfect Labels

arXiv:2512.12870v1 Announce Type: cross Abstract: Active Learning (AL) has garnered significant interest across various application domains where labeling training data is costly. AL provides a

Protecting Bystander Privacy via Selective Hearing in Audio LLMs

December 16, 2025

arXiv:2512.06380v2 Announce Type: replace-cross
Abstract: Audio Large language models (LLMs) are increasingly deployed in the real world, where they inevitably capture speech from unintended nearby bystanders, raising privacy risks that existing benchmarks and defences did not consider. We introduce SH-Bench, the first benchmark designed to evaluate selective hearing: a model’s ability to attend to an intended main speaker while refusing to process or reveal information about incidental bystander speech. SH-Bench contains 3,968 multi-speaker audio mixtures, including both real-world and synthetic scenarios, paired with 77k multiple-choice questions that probe models under general and selective operating modes. In addition, we propose Selective Efficacy (SE), a novel metric capturing both multi-speaker comprehension and bystander-privacy protection. Our evaluation of state-of-the-art open-source and proprietary LLMs reveals substantial bystander privacy leakage, with strong audio understanding failing to translate into selective protection of bystander privacy. To mitigate this gap, we also present Bystander Privacy Fine-Tuning (BPFT), a novel training pipeline that teaches models to refuse bystander-related queries without degrading main-speaker comprehension. We show that BPFT yields substantial gains, achieving an absolute 47% higher bystander accuracy under selective mode and an absolute 16% higher SE compared to Gemini 2.5 Pro, which is the best audio LLM without BPFT. Together, SH-Bench and BPFT provide the first systematic framework for measuring and improving bystander privacy in audio LLMs.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844