April 22, 2026 – Page 11 – dijee Pharma Intelligence

DASB – Discrete Audio and Speech Benchmark

arXiv:2406.14294v4 Announce Type: replace-cross Abstract: Discrete audio tokens have recently gained considerable attention for their potential to bridge audio and language processing, enabling multimodal language models that can both generate and understand audio. However, preserving key information such as phonetic content, speaker identity, and paralinguistic cues remains a major challenge. Identifying the optimal tokenizer and […]

April 22, 2026

Protecting Bystander Privacy via Selective Hearing in Audio LLMs

arXiv:2512.06380v3 Announce Type: replace-cross Abstract: Audio Large language models (LLMs) are increasingly deployed in the real world, where they inevitably capture speech from unintended nearby bystanders, raising privacy risks that existing benchmarks and defences did not consider. We introduce SH-Bench, the first benchmark designed to evaluate selective hearing: a model’s ability to attend to an […]

April 22, 2026

GAIR: Location-Aware Self-Supervised Contrastive Pre-Training with Geo-Aligned Implicit Representations

arXiv:2503.16683v2 Announce Type: replace-cross Abstract: Vision Transformer (ViT) has been widely used in computer vision tasks with excellent results by providing representations for a whole image or image patches. However, ViT lacks detailed localized image representations at arbitrary positions when applied to geospatial tasks that involve multiple geospatial data modalities, such as overhead remote sensing […]

April 22, 2026

Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control

arXiv:2604.19018v1 Announce Type: cross Abstract: Inference-time LLM alignment methods, particularly activation steering, offer an alternative to fine-tuning by directly modifying activations during generation. Existing methods, however, often rely on non-anticipative interventions that ignore how perturbations propagate through transformer layers and lack online error feedback, resulting in suboptimal, open-loop control. To address this, we show empirically […]

April 22, 2026

Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph

arXiv:2604.18883v1 Announce Type: cross Abstract: Current AI-assisted programming tools are predominantly linear and chat-based, which deviates from the iterative and branching nature of programming itself. Our preliminary study with developers using AI assistants suggested that they often struggle to explore alternatives, manage prompting sequences, and trace changes. Informed by these insights, we created EvoGraph, an […]

April 22, 2026

Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

arXiv:2604.18955v1 Announce Type: cross Abstract: In this study, we present the first comprehensive evaluation of modern LLMs – including GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT – across three core social media analytics tasks on a Twitter (X) dataset: (I) Social Media Authorship Verification, (II) Social Media Post Generation, and (III) […]

April 22, 2026

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

arXiv:2604.19533v1 Announce Type: cross Abstract: We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of threat hunting: given a database of raw Windows event logs with no guided questions or hints, identify the exact timestamps of malicious events. The benchmark wraps […]

April 22, 2026

Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring

arXiv:2604.18835v1 Announce Type: cross Abstract: We propose a scalable, multifactorial experimental framework that systematically probes LLM sensitivity to subtle semantic changes in pairwise document comparison. We analogize this as a needle-in-a-haystack problem: a single semantically altered sentence (the needle) is embedded within surrounding context (the hay), and we vary the perturbation type (negation, conjunction swap, […]

April 22, 2026

Inductive Subgraphs as Shortcuts: Causal Disentanglement for Heterophilic Graph Learning

arXiv:2604.19186v1 Announce Type: cross Abstract: Heterophily is a prevalent property of real-world graphs and is well known to impair the performance of homophilic Graph Neural Networks (GNNs). Prior work has attempted to adapt GNNs to heterophilic graphs through non-local neighbor extension or architecture refinement. However, the fundamental reasons behind misclassifications remain poorly understood. In this […]

April 22, 2026

Evaluation-driven Scaling for Scientific Discovery

arXiv:2604.19341v1 Announce Type: cross Abstract: Language models are increasingly used in scientific discovery to generate hypotheses, propose candidate solutions, implement systems, and iteratively refine them. At the core of these trial-and-error loops lies evaluation: the process of obtaining feedback on candidate solutions via verifiers, simulators, or task-specific scoring functions. While prior work has highlighted the […]

April 22, 2026

Protecting Bystander Privacy via Selective Hearing in Audio LLMs

April 22, 2026

MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge

arXiv:2604.18164v2 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs) have been increasingly used as automatic evaluators-a paradigm known as MLLM-as-a-Judge. However, their reliability and vulnerabilities to biases remain underexplored. We find that many MLLM judges fail to reliably integrate key visual or textual cues, yielding unreliable evaluations when evidence is missing or mismatched, and […]

April 22, 2026

Subscribe for Updates