AgentEval: Generative Agents as Reliable Proxies for Human Evaluation of AI-Generated Content

arXiv:2512.08273v1 Announce Type: new Abstract: Modern businesses are increasingly challenged by the time and expense required to generate and assess high-quality content. Human writers face time constraints, and extrinsic evaluations can be costly. While Large Language Models (LLMs) offer potential in content creation, concerns about the quality of AI-generated content persist. Traditional evaluation methods, like […]

The Theory of Strategic Evolution: Games with Endogenous Players and Strategic Replicators

arXiv:2512.07901v1 Announce Type: cross Abstract: This paper develops the Theory of Strategic Evolution, a general model for systems in which the population of players, strategies, and institutional rules evolve together. The theory extends replicator dynamics to settings with endogenous players, multi level selection, innovation, constitutional change, and meta governance. The central mathematical object is a […]

Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders

arXiv:2512.08892v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) improves the factuality of large language models (LLMs) by grounding outputs in retrieved evidence, but faithfulness failures, where generations contradict or extend beyond the provided sources, remain a critical challenge. Existing hallucination detection methods for RAG often rely either on large-scale detector training, which requires substantial annotated […]

DeepCode: Open Agentic Coding

arXiv:2512.07921v1 Announce Type: cross Abstract: Recent advances in large language models (LLMs) have given rise to powerful coding agents, making it possible for code assistants to evolve into code engineers. However, existing methods still face significant challenges in achieving high-fidelity document-to-codebase synthesis–such as scientific papers to code–primarily due to a fundamental conflict between information overload […]

Towards a Science of Scaling Agent Systems

arXiv:2512.08296v1 Announce Type: new Abstract: Agents, language model (LM)-based systems that are capable of reasoning, planning, and acting are becoming the dominant paradigm for real-world AI applications. Despite this widespread adoption, the principles that determine their performance remain underexplored, leaving practitioners to rely on heuristics rather than principled design choices. We address this gap by […]

An Empirical Framework for Evaluating Semantic Preservation Using Hugging Face

arXiv:2512.07983v1 Announce Type: cross Abstract: As machine learning (ML) becomes an integral part of high-autonomy systems, it is critical to ensure the trustworthiness of learning-enabled software systems (LESS). Yet, the nondeterministic and run-time-defined semantics of ML complicate traditional software refactoring. We define semantic preservation in LESS as the property that optimizations of intelligent components do […]

A Gray Literature Study on Fairness Requirements in AI-enabled Software Engineering

arXiv:2512.07990v1 Announce Type: cross Abstract: Today, with the growing obsession with applying Artificial Intelligence (AI), particularly Machine Learning (ML), to software across various contexts, much of the focus has been on the effectiveness of AI models, often measured through common metrics such as F1- score, while fairness receives relatively little attention. This paper presents a […]

Joint Activity Design Heuristics for Enhancing Human-Machine Collaboration

arXiv:2512.08036v1 Announce Type: cross Abstract: Joint activity describes when more than one agent (human or machine) contributes to the completion of a task or activity. Designing for joint activity focuses on explicitly supporting the interdependencies between agents necessary for effective coordination among agents engaged in the joint activity. This builds and expands upon designing for […]

Training LLMs for Honesty via Confessions

arXiv:2512.08093v1 Announce Type: cross Abstract: Large language models (LLMs) can be dishonest when reporting on their actions and beliefs — for example, they may overstate their confidence in factual claims or cover up evidence of covert actions. Such dishonesty may arise due to the effects of reinforcement learning (RL), where challenges with reward shaping can […]

Predicting California Bearing Ratio with Ensemble and Neural Network Models: A Case Study from T”urkiye

arXiv:2512.08340v1 Announce Type: new Abstract: The California Bearing Ratio (CBR) is a key geotechnical indicator used to assess the load-bearing capacity of subgrade soils, especially in transportation infrastructure and foundation design. Traditional CBR determination relies on laboratory penetration tests. Despite their accuracy, these tests are often time-consuming, costly, and can be impractical, particularly for large-scale […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registeration number 16808844