arXiv:2603.23626v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed as optimization modules in agentic systems, yet the fundamental limits of such LLM-mediated improvement remain poorly understood. Here we propose a theory of LLM information susceptibility, centred on the hypothesis that when computational resources are sufficiently large, the intervention of a fixed LLM […]
Probing Ethical Framework Representations in Large Language Models: Structure, Entanglement, and Methodological Challenges
arXiv:2603.23659v1 Announce Type: cross Abstract: When large language models make ethical judgments, do their internal representations distinguish between normative frameworks, or collapse ethics into a single acceptability dimension? We probe hidden representations across five ethical frameworks (deontology, utilitarianism, virtue, justice, commonsense) in six LLMs spanning 4B–72B parameters. Our analysis reveals differentiated ethical subspaces with asymmetric […]
PLACID: Privacy-preserving Large language models for Acronym Clinical Inference and Disambiguation
arXiv:2603.23678v1 Announce Type: cross Abstract: Large Language Models (LLMs) offer transformative solutions across many domains, but healthcare integration is hindered by strict data privacy constraints. Clinical narratives are dense with ambiguous acronyms, misinterpretation these abbreviations can precipitate severe outcomes like life-threatening medication errors. While cloud-dependent LLMs excel at Acronym Disambiguation, transmitting Protected Health Information to […]
An In-Depth Study of Filter-Agnostic Vector Search on a PostgreSQL Database System: [Experiments and Analysis]
arXiv:2603.23710v1 Announce Type: cross Abstract: Filtered Vector Search (FVS) is critical for supporting semantic search and GenAI applications in modern database systems. However, existing research most often evaluates algorithms in specialized libraries, making optimistic assumptions that do not align with enterprise-grade database systems. Our work challenges this premise by demonstrating that in a production-grade database […]
Human-in-the-Loop Pareto Optimization: Trade-off Characterization for Assist-as-Needed Training and Performance Evaluation
arXiv:2603.23777v1 Announce Type: cross Abstract: During human motor skill training and physical rehabilitation, there is an inherent trade-off between task difficulty and user performance. Characterizing this trade-off is crucial for evaluating user performance, designing assist-as-needed (AAN) protocols, and assessing the efficacy of training protocols. In this study, we propose a novel human-in-the-loop (HiL) Pareto optimization […]
Deep Neural Regression Collapse
arXiv:2603.23805v1 Announce Type: cross Abstract: Neural Collapse is a phenomenon that helps identify sparse and low rank structures in deep classifiers. Recent work has extended the definition of neural collapse to regression problems, albeit only measuring the phenomenon at the last layer. In this paper, we establish that Neural Regression Collapse (NRC) also occurs below […]
PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay
arXiv:2603.23841v1 Announce Type: cross Abstract: While Large Language Models (LLMs) are increasingly used as primary sources of information, their potential for political bias may impact their objectivity. Existing benchmarks of LLM social bias primarily evaluate gender and racial stereotypes. When political bias is included, it is typically measured at a coarse level, neglecting the specific […]
Can VLMs Reason Robustly? A Neuro-Symbolic Investigation
arXiv:2603.23867v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) have been applied to a wide range of reasoning tasks, yet it remains unclear whether they can reason robustly under distribution shifts. In this paper, we study covariate shifts in which the perceptual input distribution changes while the underlying prediction rules do not. To investigate this question, […]
SM-Net: Learning a Continuous Spectral Manifold from Multiple Stellar Libraries
arXiv:2603.23899v1 Announce Type: cross Abstract: We present SM-Net, a machine-learning model that learns a continuous spectral manifold from multiple high-resolution stellar libraries. SM-Net generates stellar spectra directly from the fundamental stellar parameters effective temperature (Teff), surface gravity (log g), and metallicity (log Z). It is trained on a combined grid derived from the PHOENIX-Husser, C3K-Conroy, […]
DecepGPT: Schema-Driven Deception Detection with Multicultural Datasets and Robust Multimodal Learning
arXiv:2603.23916v1 Announce Type: cross Abstract: Multimodal deception detection aims to identify deceptive behavior by analyzing audiovisual cues for forensics and security. In these high-stakes settings, investigators need verifiable evidence connecting audiovisual cues to final decisions, along with reliable generalization across domains and cultural contexts. However, existing benchmarks provide only binary labels without intermediate reasoning cues. […]
HalluJudge: A Reference-Free Hallucination Detection for Context Misalignment in Code Review Automation
arXiv:2601.19072v2 Announce Type: replace-cross Abstract: Large Language models (LLMs) have shown strong capabilities in code review automation, such as review comment generation, yet they suffer from hallucinations — where the generated review comments are ungrounded in the actual code — poses a significant challenge to the adoption of LLMs in code review workflows. To address […]
Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support
arXiv:2512.07801v5 Announce Type: replace-cross Abstract: LLM-based agents are increasingly deployed for expert decision support, yet human-AI teams in high-stakes settings do not yet reliably outperform the best individual. We argue this complementarity gap reflects a fundamental mismatch: current agents are trained as answer engines, not as partners in the collaborative sensemaking through which experts actually […]