The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection

arXiv:2603.11875v2 Announce Type: replace-cross Abstract: Prompt injection defenses are often framed as semantic understanding problems and delegated to increasingly large neural detectors. For the first screening layer, however, the requirements are different: the detector runs on every request and therefore must be fast, deterministic, non-promptable, and auditable. We introduce Mirror, a data-curation design pattern that […]

Rhetorical Questions in LLM Representations: A Linear Probing Study

arXiv:2604.14128v1 Announce Type: cross Abstract: Rhetorical questions are asked not to seek information but to persuade or signal stance. How large language models internally represent them remains unclear. We analyze rhetorical questions in LLM representations using linear probes on two social-media datasets with different discourse contexts, and find that rhetorical signals emerge early and are […]

AgentForge: Execution-Grounded Multi-Agent LLM Framework for Autonomous Software Engineering

arXiv:2604.13120v1 Announce Type: cross Abstract: Large language models generate plausible code but cannot verify correctness. Existing multi-agent systems simulate execution or leave verification optional. We introduce execution-grounded verification as a first-class principle: every code change must survive sandboxed execution before propagation. We instantiate this principle in AGENTFORGE, a multi-agent framework where Planner, Coder, Tester, Debugger, […]

UNBOX: Unveiling Black-box visual models with Natural-language

arXiv:2603.08639v2 Announce Type: replace-cross Abstract: Ensuring trustworthiness in open-world visual recognition requires models that are interpretable, fair, and robust to distribution shifts. Yet modern vision systems are increasingly deployed as proprietary black-box APIs, exposing only output probabilities and hiding architecture, parameters, gradients, and training data. This opacity prevents meaningful auditing, bias detection, and failure analysis. […]

LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks

arXiv:2604.13072v1 Announce Type: cross Abstract: LLM-based agents are increasingly expected to handle real-world assistant tasks, yet existing benchmarks typically evaluate them under isolated sources of difficulty, such as a single environment or fully specified instructions. This leaves a substantial gap between current evaluation settings and the compositional challenges that arise in practical deployment. To address […]

Alignment as Institutional Design: From Behavioral Correction to Transaction Structure in Intelligent Systems

arXiv:2604.13079v1 Announce Type: cross Abstract: Current AI alignment paradigms rely on behavioral correction: external supervisors (e.g., RLHF) observe outputs, judge against preferences, and adjust parameters. This paper argues that behavioral correction is structurally analogous to an economy without property rights, where order requires perpetual policing and does not scale. Drawing on institutional economics (Coase, Alchian, […]

WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain

arXiv:2604.13055v1 Announce Type: cross Abstract: Today’s evolving labor markets rely increasingly on recommender systems for hiring, talent management, and workforce analytics, with natural language processing (NLP) capabilities at the core. Yet, research in this area remains highly fragmented. Studies employ divergent ontologies (ESCO, O*NET, national taxonomies), heterogeneous task formulations, and diverse model families, making cross-study […]

Correct Chains, Wrong Answers: Dissociating Reasoning from Output in LLM Logic

arXiv:2604.13065v1 Announce Type: cross Abstract: LLMs can execute every step of chain-of-thought reasoning correctly and still produce wrong final answers. We introduce the Novel Operator Test, a benchmark that separates operator logic from operator name, enabling rigorous distinction between genuine reasoning and pattern retrieval. By evaluating Boolean operators under unfamiliar names across depths 1-10 on […]

OVT-MLCS: An Online Visual Tool for MLCS Mining from Long or Big Sequences

arXiv:2604.13037v1 Announce Type: cross Abstract: Mining multiple longest common subsequences (textitMLCS) from a set of sequences of three or more over a finite alphabet $Sigma$ (a classical NP-hard problem) is an important task in a wide variety of application fields. Unfortunately, there is still no exact textitMLCS algorithm/tool that can handle long (length $ge$ 1,000) […]

From Natural Language to PromQL: A Catalog-Driven Framework with Dynamic Temporal Resolution for Cloud-Native Observability

arXiv:2604.13048v1 Announce Type: cross Abstract: Modern cloud-native platforms expose thousands of time series metrics through systems like Prometheus, yet formulating correct queries in domain-specific languages such as PromQL remains a significant barrier for platform engineers and site reliability teams. We present a catalog-driven framework that translates natural language questions into executable PromQL queries, bridging the […]

THEIA: Learning Complete Kleene Three-Valued Logic in a Pure-Neural Modular Architecture

arXiv:2604.11284v2 Announce Type: replace-cross Abstract: We present THEIA, a modular neural architecture that learns complete Kleene three-valued logic (K3) end-to-end without any external symbolic solver, and investigate what architectural prior enables compositional generalization under uncertainty. THEIA processes four mathematical domains (arithmetic, order, set membership, propositional logic) through dedicated engines that converge in a final logic […]

The Code Whisperer: LLM and Graph-Based AI for Smell and Vulnerability Resolution

arXiv:2604.13114v1 Announce Type: cross Abstract: Code smells and software vulnerabilities both increase maintenance cost, yet they are often handled by separate tools that miss structural context and produce noisy warnings. This paper presents The Code Whisperer, a hybrid framework that combines graph-based program analysis with large language models to detect, explain, and repair maintainability and […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844