arXiv:2603.04459v2 Announce Type: replace-cross Abstract: The rapid growth of research in LLM safety makes it hard to track all advances. Benchmarks are therefore crucial for capturing key trends and enabling systematic comparisons. Yet, it remains unclear why certain benchmarks gain prominence, and no systematic assessment has been conducted on their academic influence or code quality. […]
A practical identifiability criterion leveraging weak-form parameter estimation
arXiv:2506.17373v5 Announce Type: replace-cross Abstract: In this work, we define a practical identifiability criterion, (e, q)-identifiability, based on a parameter e, reflecting the noise in observed variables, and a parameter q, reflecting the mean-square error of the parameter estimator. This criterion is better able to encompass changes in the quality of the parameter estimate due […]
SemBench: A Universal Semantic Framework for LLM Evaluation
arXiv:2603.11687v1 Announce Type: cross Abstract: Recent progress in Natural Language Processing (NLP) has been driven by the emergence of Large Language Models (LLMs), which exhibit remarkable generative and reasoning capabilities. However, despite their success, evaluating the true semantic understanding of these models remains a persistent challenge. Traditional benchmarks such as Word-in-Context (WiC) effectively probe this […]
Quality Assurance of LLM-generated Code: Addressing Non-Functional Quality Characteristics
arXiv:2511.10271v2 Announce Type: replace-cross Abstract: In recent years, large language models have been widely integrated into software engineering workflows, supporting tasks like code generation. While prior evaluations focus on functional correctness, there is still a limited understanding of the non-functional quality characteristics of generated code. Guided by the ISO/IEC 25010 quality model, this study adopts […]
Understanding Parents’ Desires in Moderating Children’s Interactions with GenAI Chatbots through LLM-Generated Probes
arXiv:2603.03727v2 Announce Type: replace-cross Abstract: This paper studies how parents want to moderate children’s interactions with Generative AI chatbots, with the goal of informing the design of future GenAI parental control tools. We first used an LLM to generate synthetic child-GenAI chatbot interaction scenarios and worked with four parents to validate their realism. From this […]
Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
arXiv:2602.20197v2 Announce Type: replace-cross Abstract: Reinforcement Learning with verifiable rewards (RLVR) has emerged as a primary learning paradigm for enhancing the reasoning capabilities of multi-modal large language models (MLLMs). However, during RL training, the enormous state space of MLLM and sparse rewards often leads to entropy collapse, policy degradation, or over-exploitation of suboptimal behaviors. This […]
Causal Prosody Mediation for Text-to-Speech:Counterfactual Training of Duration, Pitch, and Energy in FastSpeech2
arXiv:2603.11683v1 Announce Type: cross Abstract: We propose a novel causal prosody mediation framework for expressive text-to-speech (TTS) synthesis. Our approach augments the FastSpeech2 architecture with explicit emotion conditioning and introduces counterfactual training objectives to disentangle emotional prosody from linguistic content. By formulating a structural causal model of how text (content), emotion, and speaker jointly influence […]
ProtoDCS: Towards Robust and Efficient Open-Set Test-Time Adaptation for Vision-Language Models
arXiv:2602.23653v2 Announce Type: replace-cross Abstract: Large-scale Vision-Language Models (VLMs) exhibit strong zero-shot recognition, yet their real-world deployment is challenged by distribution shifts. While Test-Time Adaptation (TTA) can mitigate this, existing VLM-based TTA methods operate under a closed-set assumption, failing in open-set scenarios where test streams contain both covariate-shifted in-distribution (csID) and out-of-distribution (csOOD) data. This […]
The Landscape of Generative AI in Information Systems: A Synthesis of Secondary Reviews and Research Agendas
arXiv:2603.11842v1 Announce Type: cross Abstract: As organizations grapple with the rapid adoption of Generative AI (GenAI), this study synthesizes the state of knowledge through a systematic literature review of secondary studies and research agendas. Analyzing 28 papers published since 2023, we find that while GenAI offers transformative potential for productivity and innovation, its adoption is […]
Entropy-Preserving Reinforcement Learning
arXiv:2603.11682v1 Announce Type: cross Abstract: Policy gradient algorithms have driven many recent advancements in language model reasoning. An appealing property is their ability to learn from exploration on their own trajectories, a process crucial for fostering diverse and creative solutions. As we show in this paper, many policy gradient algorithms naturally reduce the entropy — […]
Effective Resistance Rewiring: A Simple Topological Correction for Over-Squashing
arXiv:2603.11944v1 Announce Type: cross Abstract: Graph Neural Networks struggle to capture long-range dependencies due to over-squashing, where information from exponentially growing neighborhoods must pass through a small number of structural bottlenecks. While recent rewiring methods attempt to alleviate this limitation, many rely on local criteria such as curvature, which can overlook global connectivity constraints that […]
Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction
arXiv:2602.23312v2 Announce Type: replace-cross Abstract: Leader-follower interaction is an important paradigm in human-robot interaction (HRI). Yet, assigning roles in real time remains challenging for resource-constrained mobile and assistive robots. While large language models (LLMs) have shown promise for natural communication, their size and latency limit on-device deployment. Small language models (SLMs) offer a potential alternative, […]