Disclosure in the era of generative artificial intelligence

Generative artificial intelligence (AI) has rapidly become embedded in academic writing, assisting with tasks ranging from language editing to drafting text and producing evidence. Despite

The Influencing Factors of Medical Postgraduates’ Usage Intention Toward Artificial Intelligence–Generated Content Tools in Academic Research: Qualitative Analysis

Background: The integration of artificial intelligence–generated content (AIGC) tools into academic research offers transformative potential for enhancing productivity and innovation. However, within the highly regulated

Debiasing Reward Models via Causally Motivated Inference-Time Intervention

arXiv:2604.27495v1 Announce Type: cross Abstract: Reward models (RMs) play a central role in aligning large language models (LLMs) with human preferences. However, RMs are often

Why Self-Supervised Encoders Want to Be Normal

arXiv:2604.27743v1 Announce Type: cross Abstract: We develop a geometric and information-theoretic framework for encoder-decoder learning built on the Information Bottleneck (IB) principle. Recasting IB as

PiCSAR: Probabilistic Confidence Selection And Ranking for Reasoning Chains

arXiv:2508.21787v2 Announce Type: replace-cross Abstract: Best-of-n sampling improves the accuracy of large language models (LLMs) and large reasoning models (LRMs) by generating multiple candidate solutions

Frugal Knowledge Graph Construction with Local LLMs: A Zero-Shot Pipeline, Self-Consistency and Wisdom of Artificial Crowds

April 14, 2026

arXiv:2604.11104v1 Announce Type: new
Abstract: This paper presents an empirical study of a multi-model zero-shot pipeline for knowledge graph construction and exploitation, executed entirely through local inference on consumer-grade hardware. We propose a reproducible evaluation framework integrating two external benchmarks (DocRED, HotpotQA), WebQuestionsSP-style synthetic data, and the RAGAS evaluation framework in an automated pipeline. On 500 document-level relations, our system achieves an F1 of 0.70 $pm$ 0.041 in zero-shot, compared to 0.80 for supervised DREEAM. Text-to-query achieves an accuracy of 0.80 $pm$ 0.06 on 200 samples. Multi-hop reasoning achieves an Exact Match (EM) of 0.46$pm$0.04 on 500 HotpotQA questions, with a RAGAS faithfulness of 0.96 $pm$ 0.04 on 50 samples. Beyond the pipeline, we study diversity mechanisms for difficult multi-hop reasoning. On 181 questions unsolvable at zero temperature, self-consistency (k=5, T =0.7) recovers up to 23% EM with a single Mixture-of-Experts (MoE) model, but the cross-model oracle (3 architectures x 5 samples) reaches 46.4%. We highlight an agreement paradox: strong consensus among samples signals collective hallucination rather than a reliable answer, echoing the work of Moussa”id et al. on the wisdom of crowds. Extending to the full pipeline (500 questions), self-consistency (k=3) raises EM from 0.46 to 0.48 $pm$ 0.04. A confidence-routing cascade mechanism (Phi-4 $rightarrow$ GPT-OSS, k=5) achieves an EM of 0.55 $pm$ 0.04, the best result obtained, with 45.4% of questions rerouted. Finally, we show that V3 prompt engineering applied to other models does not reproduce the gains observed with Gemma-4, confirming the specific prompt/model interaction. The entire system runs in $sim$5 h on a single RTX 3090, without any training, for an estimated carbon footprint of 0.09 kg CO2 eq.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844