January 29, 2026 – Page 27 – DIJEE Pharma Intelligence

HEART: A Unified Benchmark for Assessing Humans and LLMs in Emotional Support Dialogue

arXiv:2601.19922v1 Announce Type: cross Abstract: Supportive conversation depends on skills that go beyond language fluency, including reading emotions, adjusting tone, and navigating moments of resistance, frustration, or distress. Despite rapid progress in language models, we still lack a clear way to understand how their abilities in these interpersonal domains compare to those of humans. We […]

January 29, 2026

OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling

arXiv:2601.19924v1 Announce Type: cross Abstract: Large Language Models (LLMs) have demonstrated impressive progress in optimization modeling, fostering a rapid expansion of new methodologies and evaluation benchmarks. However, the boundaries of their capabilities in automated formulation and problem solving remain poorly understood, particularly when extending to complex, real-world tasks. To bridge this gap, we propose OPT-ENGINE, […]

January 29, 2026

The Grammar of Transformers: A Systematic Review of Interpretability Research on Syntactic Knowledge in Language Models

arXiv:2601.19926v1 Announce Type: cross Abstract: We present a systematic review of 337 articles evaluating the syntactic abilities of Transformer-based language models, reporting on 1,015 model results from a range of syntactic phenomena and interpretability methods. Our analysis shows that the state of the art presents a healthy variety of methods and data, but an over-focus […]

January 29, 2026

SDUs DAISY: A Benchmark for Danish Culture

arXiv:2601.19930v1 Announce Type: cross Abstract: We introduce a new benchmark for Danish culture via cultural heritage, Daisy, based on the curated topics from the Danish Culture Canon 2006. For each artifact in the culture canon, we query the corresponding Wikipedia page and have a language model generate random questions. This yields a sampling strategy within […]

January 29, 2026

Quantifying non deterministic drift in large language models

arXiv:2601.19934v1 Announce Type: cross Abstract: Large language models (LLMs) are widely used for tasks ranging from summarisation to decision support. In practice, identical prompts do not always produce identical outputs, even when temperature and other decoding parameters are fixed. In this work, we conduct repeated-run experiments to empirically quantify baseline behavioural drift, defined as output […]

January 29, 2026

Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

arXiv:2601.19936v1 Announce Type: cross Abstract: The opacity of massive pretraining corpora in Large Language Models (LLMs) raises significant privacy and copyright concerns, making pretraining data detection a critical challenge. Existing state-of-the-art methods typically rely on token likelihoods, yet they often overlook the divergence from the model’s top-1 prediction and local correlation between adjacent tokens. In […]

January 29, 2026

Continuous-Flow Data-Rate-Aware CNN Inference on FPGA

arXiv:2601.19940v1 Announce Type: cross Abstract: Among hardware accelerators for deep-learning inference, data flow implementations offer low latency and high throughput capabilities. In these architectures, each neuron is mapped to a dedicated hardware unit, making them well-suited for field-programmable gate array (FPGA) implementation. Previous unrolled implementations mostly focus on fully connected networks because of their simplicity, […]

January 29, 2026

Audio Deepfake Detection in the Age of Advanced Text-to-Speech models

arXiv:2601.20510v1 Announce Type: cross Abstract: Recent advances in Text-to-Speech (TTS) systems have substantially increased the realism of synthetic speech, raising new challenges for audio deepfake detection. This work presents a comparative evaluation of three state-of-the-art TTS models–Dia2, Maya1, and MeloTTS–representing streaming, LLM-based, and non-autoregressive architectures. A corpus of 12,000 synthetic audio samples was generated using […]

January 29, 2026

Self Voice Conversion as an Attack against Neural Audio Watermarking

arXiv:2601.20432v1 Announce Type: cross Abstract: Audio watermarking embeds auxiliary information into speech while maintaining speaker identity, linguistic content, and perceptual quality. Although recent advances in neural and digital signal processing-based watermarking methods have improved imperceptibility and embedding capacity, robustness is still primarily assessed against conventional distortions such as compression, additive noise, and resampling. However, the […]

January 29, 2026

Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures

arXiv:2510.14616v2 Announce Type: replace-cross Abstract: Current preference learning methods achieve high accuracy on standard benchmarks but exhibit significant performance degradation when objective quality signals are removed. We introduce WritingPreferenceBench, a dataset of 1,800 human-annotated preference pairs (1,200 English, 600 Chinese) across 8 creative writing genres, where responses are matched for objective correctness, factual accuracy, and […]

January 29, 2026

Open-Vocabulary Functional 3D Human-Scene Interaction Generation

arXiv:2601.20835v1 Announce Type: cross Abstract: Generating 3D humans that functionally interact with 3D scenes remains an open problem with applications in embodied AI, robotics, and interactive content creation. The key challenge involves reasoning about both the semantics of functional elements in 3D scenes and the 3D human poses required to achieve functionality-aware interaction. Unfortunately, existing […]

January 29, 2026

Reinforcement Learning via Self-Distillation

arXiv:2601.20802v1 Announce Type: cross Abstract: Large language models are increasingly post-trained with reinforcement learning in verifiable domains such as code and math. Yet, current methods for reinforcement learning with verifiable rewards (RLVR) learn only from a scalar outcome reward per attempt, creating a severe credit-assignment bottleneck. Many verifiable environments actually provide rich textual feedback, such […]

January 29, 2026

Subscribe for Updates