JuICE: A Benchmark for Evaluating LLM-Judge in Identifying Cultural Errors

Semantic Robustness Probing via Inpainting: An Interactive Tool for Safety-Critical Object Detection

arXiv:2605.27155v1 Announce Type: cross Abstract: Testing object detectors in safety-critical domains requires semantically meaningful probes beyond pixel-level corruptions. We present SemProbe, a tool for semantic

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

arXiv:2605.26895v1 Announce Type: cross Abstract: Normalization layers in modern large language models (LLMs) consist of a deterministic normalization operation and a learnable scale vector. While

Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination

arXiv:2605.27016v1 Announce Type: cross Abstract: Large language models (LLMs) are prone to hallucinations, i.e., statements unsupported by the input or training data, hindering reliable deployment.

The Labyrinth and the Thread: Rethinking Regularizations in Sequential Knowledge Editing for Large Language Models

arXiv:2605.26670v1 Announce Type: cross Abstract: Sequential editing of structured knowledge in large language models allows targeted factual updates without retraining, yet existing methods often rely

Generative artificial intelligence and the marginalization of minoritized knowledges in higher education: the case of disability

arXiv:2605.26769v1 Announce Type: cross Abstract: Generative artificial intelligence redefines higher education by restructuring the processes through which scientific knowledge is produced and validated. These systems