arXiv:2604.25359v1 Announce Type: cross Abstract: Large Language Models are increasingly being deployed to extract structured data from unstructured and semi-structured sources: parsing invoices, medical records, and converting PDF documents to database entries. Yet existing benchmarks for structured output generation either focus on schema compliance alone, or evaluate value correctness within a single source domain. We […]
Sensory-Aware Sequential Recommendation via Review-Distilled Representations
arXiv:2603.02709v3 Announce Type: replace-cross Abstract: We propose a novel framework for sensory-aware sequential recommendation that enriches item representations with linguistically extracted sensory attributes from product reviews. Our approach, ASER (Attribute-based Sensory-Enhanced Representation), introduces an offline extraction-and-distillation pipeline in which a large language model is first fine-tuned as a teacher to extract structured sensory attribute-value pairs, […]
Safe-Support Q-Learning: Learning without Unsafe Exploration
arXiv:2604.25379v1 Announce Type: cross Abstract: Ensuring safety during reinforcement learning (RL) training is critical in real-world applications where unsafe exploration can lead to devastating outcomes. While most safe RL methods mitigate risk through constraints or penalization, they still allow exploration of unsafe states during training. In this work, we adopt a stricter safety requirement that […]
From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models
arXiv:2604.25167v1 Announce Type: new Abstract: While mechanistic interpretability tools like Sparse Autoencoders (SAEs) can uncover meaningful features within Large Language Models (LLMs), a critical gap remains in transforming these insights into practical actions for model optimization. We bridge this gap with the hypothesis that data selection guided by a model’s internal task features is a […]
Do LLMs Capture Embodied Cognition and Cultural Variation? Cross-Linguistic Evidence from Demonstratives
arXiv:2604.25423v1 Announce Type: cross Abstract: Do large language models (LLMs) truly acquire embodied cognition and cultural conventions from text? We introduce demonstratives, fundamental spatial expressions like “this/that” in English and “zh`e/n`a” in Chinese, as a novel probe for grounded knowledge. Using 6,400 responses from 320 native speakers, we establish a human baseline: English speakers reliably […]
Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space
arXiv:2604.05030v2 Announce Type: replace-cross Abstract: Experiments probing natural language processing by both humans and LLMs suggest that the meaning of a semantic expression is indeterminate prior to the act of interpretation rather than being specifiable simply as the sum of its parts (i.e. compositionality). This observer-dependent act dynamically actualizes meaning under genuine contextuality more consistent […]
From World-Gen to Quest-Line: A Dependency-Driven Prompt Pipeline for Coherent RPG Generation
arXiv:2604.25482v1 Announce Type: cross Abstract: Large Language Models (LLMs) have shown strong potential for narrative generation, but their use in complex, multi-layered role-playing game (RPG) worlds is still limited by issues of coherence, controllability, and structural consistency. This paper explores a dependency-aware, multi-stage prompt pipeline for procedural RPG content generation that models narrative dependencies through […]
DATAREEL: Automated Data-Driven Video Story Generation with Animations
arXiv:2604.25220v1 Announce Type: new Abstract: Data videos are a powerful medium for visual data based storytelling, combining animated, chart-centric visualizations with synchronized narration. Widely used in journalism, education, and public communication, they help audiences understand complex data through clear and engaging visual explanations. Despite their growing impact, generating data-driven video stories remains challenging, as it […]
You Don’t Need Public Tests to Generate Correct Code
arXiv:2604.21598v2 Announce Type: replace-cross Abstract: Multi-agent systems are frequently employed for autonomous code generation, demonstrating strong utility in complex algorithmic problem-solving. Recent studies tackle the difficulty of producing functionally correct programs by leveraging simulation-guided planning and debugging, wherein language models step through execution traces to validate logic. Nevertheless, these methods rely heavily on human-authored public […]
From CRUD to Autonomous Agents: Formal Validation and Zero-Trust Security for Semantic Gateways in AI-Native Enterprise Systems
arXiv:2604.25555v1 Announce Type: cross Abstract: Enterprise software engineering is shifting away from deterministic CRUD/REST architectures toward AI-native systems where large language models act as cognitive orchestrators. This transition introduces a critical security tension: probabilistic LLMs weaken classical mechanisms for validation, access control, and formal testing. This paper proposes the design, formal validation, and empirical evaluation […]
ValueAlpha: Agreement-Gated Stress Testing of LLM-Judged Investment Rationales Before Returns Are Observable
arXiv:2604.25224v1 Announce Type: new Abstract: Long-horizon investment decisions create a pre-realization evaluation problem: realized returns are the eventual arbiter of investment quality, but they arrive too late and are too noisy to guide many model-development and governance decisions. LLM judges offer a tempting substitute for pre-deployment evaluation of AI-finance systems, but unvalidated judges may reward […]
Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models
arXiv:2604.25591v1 Announce Type: cross Abstract: Recent audio-aware large language models (ALLMs) have demonstrated strong capabilities across diverse audio understanding and reasoning tasks, but they still frequently produce hallucinated or overly confident outputs. While uncertainty estimation has been extensively studied in text-only LLMs, it remains largely unexplored for ALLMs, where audio-conditioned generation introduces additional challenges such […]