What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct

arXiv:2605.21778v1 Announce Type: new Abstract: AI sycophancy has become a prominent concern in large language model (LLM) research. Yet the term lacks a consistent definition and has been applied to behaviors ranging from agreeing with a user’s false claim to excessively praising the user to withholding corrective feedback. When researchers, companies, and policymakers use the […]

LLM Readiness Harness: Evaluation, Observability, and CI Gates for LLM/RAG Applications

arXiv:2603.27355v2 Announce Type: replace Abstract: We present a readiness harness for LLM and RAG applications that turns evaluation into a deployment decision workflow. The system combines automated benchmarks, OpenTelemetry observability, and CI quality gates under a minimal API contract, then aggregates workflow success, policy compliance, groundedness, retrieval hit rate, cost, and p95 latency into scenario-weighted […]

Drivers of Transient Dynamics and Persistence in Dengue: Insights from Sensitivity and Stochastic Modeling

arXiv:2605.21787v1 Announce Type: new Abstract: We investigate how key epidemiological parameters shape both seasonal epidemics and the persistence of dengue transmission. Our findings confirm known mechanistic drivers of epidemic variability and introduce a ranking of parameter importance in our dengue model, which in turn informs the prioritization of public health policies. We propose a stochastic […]

Trace2Skill: Verifier-Guided Skill Evolution for Long-Context EDA Agents

arXiv:2605.21810v1 Announce Type: new Abstract: Complex Verilog Design Problems (CVDP) challenge hardware LLM agents because solving them requires localizing verifier-relevant RTL, testbenches, include paths, and build dependencies inside large repository snapshots, making precise edits, and recovering from sparse hidden-verifier failures. We present Trace2Skill, a test-time scaling framework that improves a hardware agent without RTL-specialized model […]

DecepChain: Inducing Deceptive Reasoning in Large Language Models

arXiv:2510.00319v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have been demonstrating strong reasoning capability with their chain-of-thoughts (CoT), which are routinely used by humans to judge answer quality. This reliance creates a powerful yet fragile basis for trust. In this work, we study an underexplored phenomenon: whether LLMs could generate incorrect yet coherent CoTs […]

Implicit Safety Alignment from Crowd Preferences

arXiv:2605.21822v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) can reveal implicit objectives such as safety considerations that go beyond task completion. In this work, we focus on the common safety criteria embedded in crowd preference datasets, where different users may express distinct preferences or objectives, yet follow similar safety principles. Our aim […]

AutoBaxBuilder: Bootstrapping Code Security Benchmarking

arXiv:2512.21132v2 Announce Type: replace-cross Abstract: As large language models (LLMs) see wide adoption in software engineering, the reliable assessment of the correctness and security of LLM-generated code is crucial. Notably, prior work showed that LLMs are prone to generating code with security vulnerabilities, highlighting that security is often overlooked. These insights were enabled by specialized […]

Toward AI VIS Co-Scientists: A General and End-to-End Agent Harness for Solving Complex Data Visualization Tasks

arXiv:2605.21825v1 Announce Type: new Abstract: The ability to inspect, interpret, and communicate complex data is crucial for virtually any scientific endeavor, but often requires significant expertise outside the core domain ranging from data management and analysis to visualization design and implementation. We present an end-to-end agentic harness that, based on only the data and a […]

VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction

arXiv:2602.13294v3 Announce Type: replace-cross Abstract: Evaluating whether Multimodal Large Language Models (MLLMs) genuinely reason about physical dynamics remains challenging. Most existing benchmarks rely on recognition-style protocols such as Visual Question Answering (VQA) and Violation of Expectation (VoE), which can often be answered without committing to an explicit, testable physical hypothesis. We propose VisPhyWorld, an execution-based […]

FLUID: From Ephemeral IDs to Multimodal Semantic Codes for Industrial-Scale Livestreaming Recommendation

arXiv:2605.21832v1 Announce Type: new Abstract: Modern recommender systems rely heavily on ID-based collaborative filtering: each item is represented by a unique ID embedding that accumulates collaborative signals from user interactions. Livestreaming recommendation, however, faces a unique challenge in this paradigm: a live room typically broadcasts for only tens of minutes, so its item ID remains […]

Black-Box Optimization From Small Offline Datasets via Meta Learning with Synthetic Tasks

arXiv:2604.12325v3 Announce Type: replace-cross Abstract: We consider the problem of offline black-box optimization, where the goal is to discover optimal designs (e.g., molecules or materials) from past experimental data. A key challenge in this setting is data scarcity: in many scientific applications, only small or poor-quality datasets are available, which severely limits the effectiveness of […]

PhylaFlow: Hybrid Flow Matching in Billera-Holmes-Vogtmann Tree Space for Phylogenetic Inference

arXiv:2605.21859v1 Announce Type: new Abstract: Phylogenetic trees are hybrid objects: branch lengths vary continuously, while topologies change discretely through edge contractions and expansions. Billera-Holmes-Vogtmann (BHV) tree space provides a canonical geometry for this structure, representing each resolved topology as a Euclidean orthant and topological changes as motion across shared lower-dimensional boundaries. We introduce PhylaFlow, a […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844