Neuro-Symbolic Generation and Validation of Memory-Aware Formal Function Specifications

arXiv:2603.13414v1 Announce Type: cross Abstract: Formal verification of memory-manipulating programs critically depends on precise function specifications that capture memory states written by experts. This requirement has become a major bottleneck as large language models (LLMs) increasingly generate low-level systems code whose correctness cannot be assumed. To enable scalable formal verification, we focus exclusively on function […]

Evaluation format, not model capability, drives triage failure in the assessment of consumer health AI

arXiv:2603.11413v2 Announce Type: replace-cross Abstract: Ramaswamy et al. reported in textitNature Medicine that ChatGPT Health under-triages 51.6% of emergencies, concluding that consumer-facing AI triage poses safety risks. However, their evaluation used an exam-style protocol — forced A/B/C/D output, knowledge suppression, and suppression of clarifying questions — that differs fundamentally from how consumers use health chatbots. […]

NormCode Canvas: Making LLM Agentic Workflows Development Sustainable via Case-Based Reasoning

arXiv:2603.13443v1 Announce Type: cross Abstract: We present NormCode Canvas (v1.1.3), a deployed system realizing Case-Based Reasoning at two levels for multi-step LLM workflows. The foundation is NormCode, a semi-formal planning language whose compiler-verified scope rule ensures every execution checkpoint is a genuinely self-contained case — eliminating the implicit shared state that makes retrieval unreliable and […]

LLM-Augmented Release Intelligence: Automated Change Summarization and Impact Analysis in Cloud-Native CI/CD Pipelines

arXiv:2603.14619v1 Announce Type: cross Abstract: Cloud-native software delivery platforms orchestrate releases through complex, multi-stage pipelines composed of dozens of independently versioned tasks. When code is promoted between environments — development to staging, staging to production — engineering teams need timely, accurate communication about what changed and what downstream components are affected. Manual preparation of such […]

The Equivalence Theorem: First-Class Relationships for Structurally Complete Database Systems

arXiv:2603.13603v1 Announce Type: cross Abstract: We prove The Equivalence Theorem: structurally complete knowledge representation requires exactly four mutually entailing capabilities — n-ary relationships with attributes, temporal validity, uncertainty quantification, and causal relationships between relationships — collectively equivalent to treating relationships as first-class objects. Any system implementing one capability necessarily requires all four; any system missing […]

Evaluating Adjective-Noun Compositionality in LLMs: Functional vs Representational Perspectives

arXiv:2603.09994v2 Announce Type: replace-cross Abstract: Compositionality is considered central to language abilities. As performant language systems, how do large language models (LLMs) do on compositional tasks? We evaluate adjective-noun compositionality in LLMs using two complementary setups: prompt-based functional assessment and a representational analysis of internal model states. Our results reveal a striking divergence between task […]

Implicit Maximum Likelihood Estimation for Real-time Generative Model Predictive Control

arXiv:2603.13733v1 Announce Type: cross Abstract: Diffusion-based models have recently shown strong performance in trajectory planning, as they are capable of capturing diverse, multimodal distributions of complex behaviors. A key limitation of these models is their slow inference speed, which results from the iterative denoising process. This makes them less suitable for real-time applications such as […]

Delightful Policy Gradient

arXiv:2603.14608v1 Announce Type: cross Abstract: Standard policy gradients weight each sampled action by advantage alone, regardless of how likely that action was under the current policy. This creates two pathologies: within a single decision context (e.g. one image or prompt), a rare negative-advantage action can disproportionately distort the update direction; across many such contexts in […]

Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving

arXiv:2603.13842v1 Announce Type: cross Abstract: End-to-end autonomous driving is typically built upon imitation learning (IL), yet its performance is constrained by the quality of human demonstrations. To overcome this limitation, recent methods incorporate reinforcement learning (RL) through sequential fine-tuning. However, such a paradigm remains suboptimal: sequential RL fine-tuning can introduce policy drift and often leads […]

Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers

arXiv:2603.09453v2 Announce Type: replace-cross Abstract: Foundation models are increasingly being deployed in contexts where understanding the uncertainty of their outputs is critical to ensuring responsible deployment. While Bayesian methods offer a principled approach to uncertainty quantification, their computational overhead renders their use impractical for training or inference at foundation model scale. State-of-the-art models achieve parameter […]

Countershading coloration in blue shark skin emerges from hierarchically organized and spatially tuned photonic architectures inside skin denticles

arXiv:2603.13937v1 Announce Type: cross Abstract: The blue shark (Prionace glauca) exhibits a striking dorsoventral color gradient, transitioning from vibrant blue dorsally to silver and white ventrally, a pattern widely interpreted as pelagic countershading. Despite its ecological significance, the physical basis of this coloration remains unresolved. Here we show that this color system does not arise […]

$PA^3$: $textbfP$olicy-$textbfA$ware $textbfA$gent $textbfA$lignment through Chain-of-Thought

arXiv:2603.14602v1 Announce Type: cross Abstract: Conversational assistants powered by large language models (LLMs) excel at tool-use tasks but struggle with adhering to complex, business-specific rules. While models can reason over business rules provided in context, including all policies for every query introduces high latency and wastes compute. Furthermore, these lengthy prompts lead to long contexts, […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844