Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks

arXiv:2603.04459v2 Announce Type: replace-cross Abstract: The rapid growth of research in LLM safety makes it hard to track all advances. Benchmarks are therefore crucial for capturing key trends and enabling systematic comparisons. Yet, it remains unclear why certain benchmarks gain prominence, and no systematic assessment has been conducted on their academic influence or code quality. […]

A practical identifiability criterion leveraging weak-form parameter estimation

arXiv:2506.17373v5 Announce Type: replace-cross Abstract: In this work, we define a practical identifiability criterion, (e, q)-identifiability, based on a parameter e, reflecting the noise in observed variables, and a parameter q, reflecting the mean-square error of the parameter estimator. This criterion is better able to encompass changes in the quality of the parameter estimate due […]

SemBench: A Universal Semantic Framework for LLM Evaluation

arXiv:2603.11687v1 Announce Type: cross Abstract: Recent progress in Natural Language Processing (NLP) has been driven by the emergence of Large Language Models (LLMs), which exhibit remarkable generative and reasoning capabilities. However, despite their success, evaluating the true semantic understanding of these models remains a persistent challenge. Traditional benchmarks such as Word-in-Context (WiC) effectively probe this […]

Quality Assurance of LLM-generated Code: Addressing Non-Functional Quality Characteristics

arXiv:2511.10271v2 Announce Type: replace-cross Abstract: In recent years, large language models have been widely integrated into software engineering workflows, supporting tasks like code generation. While prior evaluations focus on functional correctness, there is still a limited understanding of the non-functional quality characteristics of generated code. Guided by the ISO/IEC 25010 quality model, this study adopts […]

Understanding Parents’ Desires in Moderating Children’s Interactions with GenAI Chatbots through LLM-Generated Probes

arXiv:2603.03727v2 Announce Type: replace-cross Abstract: This paper studies how parents want to moderate children’s interactions with Generative AI chatbots, with the goal of informing the design of future GenAI parental control tools. We first used an LLM to generate synthetic child-GenAI chatbot interaction scenarios and worked with four parents to validate their realism. From this […]

Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning

arXiv:2602.20197v2 Announce Type: replace-cross Abstract: Reinforcement Learning with verifiable rewards (RLVR) has emerged as a primary learning paradigm for enhancing the reasoning capabilities of multi-modal large language models (MLLMs). However, during RL training, the enormous state space of MLLM and sparse rewards often leads to entropy collapse, policy degradation, or over-exploitation of suboptimal behaviors. This […]

Causal Prosody Mediation for Text-to-Speech:Counterfactual Training of Duration, Pitch, and Energy in FastSpeech2

arXiv:2603.11683v1 Announce Type: cross Abstract: We propose a novel causal prosody mediation framework for expressive text-to-speech (TTS) synthesis. Our approach augments the FastSpeech2 architecture with explicit emotion conditioning and introduces counterfactual training objectives to disentangle emotional prosody from linguistic content. By formulating a structural causal model of how text (content), emotion, and speaker jointly influence […]

ProtoDCS: Towards Robust and Efficient Open-Set Test-Time Adaptation for Vision-Language Models

arXiv:2602.23653v2 Announce Type: replace-cross Abstract: Large-scale Vision-Language Models (VLMs) exhibit strong zero-shot recognition, yet their real-world deployment is challenged by distribution shifts. While Test-Time Adaptation (TTA) can mitigate this, existing VLM-based TTA methods operate under a closed-set assumption, failing in open-set scenarios where test streams contain both covariate-shifted in-distribution (csID) and out-of-distribution (csOOD) data. This […]

The Landscape of Generative AI in Information Systems: A Synthesis of Secondary Reviews and Research Agendas

arXiv:2603.11842v1 Announce Type: cross Abstract: As organizations grapple with the rapid adoption of Generative AI (GenAI), this study synthesizes the state of knowledge through a systematic literature review of secondary studies and research agendas. Analyzing 28 papers published since 2023, we find that while GenAI offers transformative potential for productivity and innovation, its adoption is […]

Entropy-Preserving Reinforcement Learning

arXiv:2603.11682v1 Announce Type: cross Abstract: Policy gradient algorithms have driven many recent advancements in language model reasoning. An appealing property is their ability to learn from exploration on their own trajectories, a process crucial for fostering diverse and creative solutions. As we show in this paper, many policy gradient algorithms naturally reduce the entropy — […]

Effective Resistance Rewiring: A Simple Topological Correction for Over-Squashing

arXiv:2603.11944v1 Announce Type: cross Abstract: Graph Neural Networks struggle to capture long-range dependencies due to over-squashing, where information from exponentially growing neighborhoods must pass through a small number of structural bottlenecks. While recent rewiring methods attempt to alleviate this limitation, many rely on local criteria such as curvature, which can overlook global connectivity constraints that […]

Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction

arXiv:2602.23312v2 Announce Type: replace-cross Abstract: Leader-follower interaction is an important paradigm in human-robot interaction (HRI). Yet, assigning roles in real time remains challenging for resource-constrained mobile and assistive robots. While large language models (LLMs) have shown promise for natural communication, their size and latency limit on-device deployment. Small language models (SLMs) offer a potential alternative, […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844