The Arrival of AGI? When Expert Personas Exceed Expert Benchmarks

arXiv:2603.20225v1 Announce Type: cross Abstract: Do expert personas improve language model performance? The Wharton Generative AI Lab reports that they do not, broadcasting to millions via social media the recommendation that practitioners abandon a technique recommended by Anthropic, Google, and OpenAI. We demonstrate that this null finding was structurally predictable. Five core mechanisms precluded detection […]

Stability of AI Governance Systems: A Coupled Dynamics Model of Public Trust and Social Disruptions

arXiv:2603.20248v1 Announce Type: cross Abstract: As artificial intelligence (AI) is increasingly deployed in high-stakes public decision-making (from resource allocation to welfare distribution), public trust in these systems has become a critical determinant of their legitimacy and sustainability. Yet existing AI governance research remains largely qualitative, lacking formal mathematical frameworks to characterize the precise conditions under […]

RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models

arXiv:2603.21341v1 Announce Type: new Abstract: Improving embodied reasoning in multimodal-large-language models (MLLMs) is essential for building vision-language-action models (VLAs) on top of them to readily translate multimodal understanding into low-level actions. Accordingly, recent work has explored enhancing embodied reasoning in MLLMs through supervision of vision-question-answering type. However, these approaches have been reported to result in […]

Silent Commitment Failure in Instruction-Tuned Language Models: Evidence of Governability Divergence Across Architectures

arXiv:2603.21415v1 Announce Type: new Abstract: As large language models are deployed as autonomous agents with tool execution privileges, a critical assumption underpins their security architecture: that model errors are detectable at runtime. We present empirical evidence that this assumption fails for two of three instruction-following models evaluable for conflict detection. We introduce governability — the […]

Safety as Computation: Certified Answer Reuse via Capability Closure in Task-Oriented Dialogue

arXiv:2603.21448v1 Announce Type: new Abstract: We introduce a new paradigm for task-oriented dialogue systems: safety certification as a computational primitive for answer reuse. Current systems treat each turn independently, recomputing answers via retrieval or generation even when they are already derivable from prior state. We show that in capability-based systems, the safety certification step computes […]

Brain Learning Principles Utilizing Non-Ideal Factors in Neural Circuits

arXiv:2603.21542v1 Announce Type: new Abstract: The human brain achieves its remarkable computational prowess not despite its inherent non-ideal factors noise, heterogeneity, structural irregularities, decentralized plasticity, systematic errors, and chaotic dynamics but precisely because of them. This paper systematically demonstrates that these traits, long dismissed as imperfections in classical neuroscience and eliminated in digital engineering, are […]

Mind over Space: Can Multimodal Large Language Models Mentally Navigate?

arXiv:2603.21577v1 Announce Type: new Abstract: Despite the widespread adoption of MLLMs in embodied agents, their capabilities remain largely confined to reactive planning from immediate observations, consistently failing in spatial reasoning across extensive spatiotemporal scales. Cognitive science reveals that Biological Intelligence (BI) thrives on “mental navigation”: the strategic construction of spatial representations from experience and the […]

Silicon Bureaucracy and AI Test-Oriented Education: Contamination Sensitivity and Score Confidence in LLM Benchmarks

arXiv:2603.21636v1 Announce Type: new Abstract: Public benchmarks increasingly govern how large language models (LLMs) are ranked, selected, and deployed. We frame this benchmark-centered regime as Silicon Bureaucracy and AI Test-Oriented Education, and argue that it rests on a fragile assumption: that benchmark scores directly reflect genuine generalization. In practice, however, such scores may conflate exam-oriented […]

Deterministic Hallucination Detection in Medical VQA via Confidence-Evidence Bayesian Gain

arXiv:2603.21693v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have shown strong potential for medical Visual Question Answering (VQA), yet they remain prone to hallucinations, defined as generating responses that contradict the input image, posing serious risks in clinical settings. Current hallucination detection methods, such as Semantic Entropy (SE) and Vision-Amplified Semantic Entropy (VASE), […]

CurvZO: Adaptive Curvature-Guided Sparse Zeroth-Order Optimization for Efficient LLM Fine-Tuning

arXiv:2603.21725v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) with backpropagation achieves high performance but incurs substantial memory overhead, limiting scalability on resource-constrained hardware. Zeroth-order (ZO) optimization provides a memory-efficient alternative by relying solely on forward passes, yet it typically suffers from slow or unstable convergence due to high-variance gradient estimates. Sparse ZO updates […]

Agentic Personas for Adaptive Scientific Explanations with Knowledge Graphs

arXiv:2603.21846v1 Announce Type: new Abstract: AI explanation methods often assume a static user model, producing non-adaptive explanations regardless of expert goals, reasoning strategies, or decision contexts. Knowledge graph-based explanations, despite their capacity for grounded, path-based reasoning, inherit this limitation. In complex domains such as scientific discovery, this assumption fails to capture the diversity of cognitive […]

Future-Interactions-Aware Trajectory Prediction via Braid Theory

arXiv:2603.22035v1 Announce Type: new Abstract: To safely operate, an autonomous vehicle must know the future behavior of a potentially high number of interacting agents around it, a task often posed as multi-agent trajectory prediction. Many previous attempts to model social interactions and solve the joint prediction task either add extensive computational requirements or rely on […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844