Dynamic Stability of LLM-Generated Code

arXiv:2511.07463v1 Announce Type: cross Abstract: Current evaluations of LLMs for code generation emphasize functional correctness, overlooking the fact that functionally correct solutions can differ significantly in algorithmic complexity. For instance, an $(O(n^2))$ versus $(O(n log n))$ sorting algorithm may yield similar output but incur vastly different performance costs in production. This discrepancy reveals a critical […]

Laplacian Score Sharpening for Mitigating Hallucination in Diffusion Models

arXiv:2511.07496v1 Announce Type: cross Abstract: Diffusion models, though successful, are known to suffer from hallucinations that create incoherent or unrealistic samples. Recent works have attributed this to the phenomenon of mode interpolation and score smoothening, but they lack a method to prevent their generation during sampling. In this paper, we propose a post-hoc adjustment to […]

Making LLMs Reliable When It Matters Most: A Five-Layer Architecture for High-Stakes Decisions

arXiv:2511.07669v1 Announce Type: new Abstract: Current large language models (LLMs) excel in verifiable domains where outputs can be checked before action but prove less reliable for high-stakes strategic decisions with uncertain outcomes. This gap, driven by mutually reinforcing cognitive biases in both humans and artificial intelligence (AI) systems, threatens the defensibility of valuations and sustainability […]

Leveraging the Power of AI and Social Interactions to Restore Trust in Public Polls

arXiv:2511.07593v1 Announce Type: cross Abstract: The emergence of crowdsourced data has significantly reshaped social science, enabling extensive exploration of collective human actions, viewpoints, and societal dynamics. However, ensuring safe, fair, and reliable participation remains a persistent challenge. Traditional polling methods have seen a notable decline in engagement over recent decades, raising concerns about the credibility […]

SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins

arXiv:2411.18212v3 Announce Type: replace-cross Abstract: Path planning under wireless performance constraints is a complex challenge in robot navigation. However, naively incorporating such constraints into classical planning algorithms often incurs prohibitive search costs. In this paper, we propose SCoTT, a wireless-aware path planning framework that leverages vision-language models (VLMs) to co-optimize average path gains and trajectory […]

AIA Forecaster: Technical Report

arXiv:2511.07678v1 Announce Type: new Abstract: This technical report describes the AIA Forecaster, a Large Language Model (LLM)-based system for judgmental forecasting using unstructured data. The AIA Forecaster approach combines three core elements: agentic search over high-quality news sources, a supervisor agent that reconciles disparate forecasts for the same event, and a set of statistical calibration […]

ViPRA: Video Prediction for Robot Actions

arXiv:2511.07732v1 Announce Type: cross Abstract: Can we turn a video prediction model into a robot policy? Videos, including those of humans or teleoperated robots, capture rich physical interactions. However, most of them lack labeled actions, which limits their use in robot learning. We present Video Prediction for Robot Actions (ViPRA), a simple pretraining-finetuning framework that […]

Beyond Algorethics: Addressing the Ethical and Anthropological Challenges of AI Recommender Systems

arXiv:2507.16430v2 Announce Type: replace-cross Abstract: This paper examines the ethical and anthropological challenges posed by AI-driven recommender systems (RSs), which increasingly shape digital environments and social interactions. By curating personalized content, RSs do not merely reflect user preferences but actively construct experiences across social media, entertainment platforms, and e-commerce. Their influence raises concerns over privacy, […]

Physical Consistency of Aurora’s Encoder: A Quantitative Study

arXiv:2511.07787v1 Announce Type: cross Abstract: The high accuracy of large-scale weather forecasting models like Aurora is often accompanied by a lack of transparency, as their internal representations remain largely opaque. This “black box” nature hinders their adoption in high-stakes operational settings. In this work, we probe the physical consistency of Aurora’s encoder by investigating whether […]

ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents

arXiv:2511.07685v1 Announce Type: new Abstract: Deep Research (DR) is an emerging agent application that leverages large language models (LLMs) to address open-ended queries. It requires the integration of several capabilities, including multi-step reasoning, cross-document synthesis, and the generation of evidence-backed, long-form answers. Evaluating DR remains challenging because responses are lengthy and diverse, admit many valid […]

LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost

arXiv:2511.07865v1 Announce Type: cross Abstract: Chaos Engineering (CE) is an engineering technique aimed at improving the resilience of distributed systems. It involves intentionally injecting faults into a system to test its resilience, uncover weaknesses, and address them before they cause failures in production. Recent CE tools automate the execution of predefined CE experiments. However, planning […]

DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning

arXiv:2511.05784v2 Announce Type: replace-cross Abstract: Unlearning in Large Language Models (LLMs) is crucial for protecting private data and removing harmful knowledge. Most existing approaches rely on fine-tuning to balance unlearning efficiency with general language capabilities. However, these methods typically require training or access to retain data, which is often unavailable in real world scenarios. Although […]

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd.   dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844