Disclosure in the era of generative artificial intelligence

Generative artificial intelligence (AI) has rapidly become embedded in academic writing, assisting with tasks ranging from language editing to drafting text and producing evidence. Despite

A framework for culturally adapting mental mHealth apps

Mobile health (mHealth) apps are increasingly deployed for evidence-based mental health interventions, broadening access to care. While effective, Internet-based Cognitive Behavioural Therapy, delivered via web

Stratified and combined analysis of the quality of lumbar spinal stenosis–related videos on major Chinese short video platforms

BackgroundLumbar spinal stenosis (LSS) is a degenerative disorder in which narrowing of the spinal canal compresses neural elements, causing pain, numbness, and limited mobility. With

Differential acceptance of a national digital health platform among community and frontline health workers in Cote d’Ivoire: a cross-sectional study

IntroductionMobile-based digital health solutions are critical technologies that play a significant role in improving the quality of healthcare services. Cote d’Ivoire is digitizing its community-based

Based on dual perspectives of management and ethics: exploring challenges and governance approaches for new media applications in psychiatric specialty hospitals

The further promotion and application of new media technologies present new opportunities for psychiatric specialty hospitals in areas such as health education, doctor-patient communication, service

Below-Chance Blindness: Prompted Underperformance in Small LLMs Produces Positional Bias Rather than Answer Avoidance

April 29, 2026

arXiv:2604.25249v1 Announce Type: cross
Abstract: Detecting sandbagging–the deliberate underperformance on capability evaluations–is an open problem in AI safety. We tested whether symptom validity testing (SVT) logic from clinical malingering detection could identify sandbagging through below-chance performance (BCB) on forced-choice items. In a pre-registered pilot at the 7-9 billion parameter instruction-tuned scale (3 models, 4 MMLU-Pro domains, 4 conditions, 500 items per cell, 24,000 total trials), the plausibility gate failed. Zero of 12 model-domain cells showed significant below-chance performance under sandbagging instruction. Exploratory analyses revealed three qualitatively distinct failure modes. Qwen-2.5-7B and Phi-3.5-mini largely ignored the sandbagging instruction, with 62-88% response identity with the honest baseline. Llama-3-8B complied substantially but implemented underperformance as a positional heuristic, collapsing its response distribution onto middle-alphabet options (E at 31.8%, F at 26.1%) regardless of where the correct answer fell. This produced accuracy boosts of up to 33 percentage points when the correct answer coincidentally occupied the model’s preferred position. An explicit anti-task instruction (“pick the least likely answer”) drove two of three models below chance, with accuracy as low as 0.024. The capability for answer-aware avoidance therefore exists but is not activated by “deliberately underperform.” BCB did not fail as a logical marker of answer-aware avoidance. It was not observed in this regime because the model showing the largest behavioural shift exhibited behaviour consistent with a position-dominant response policy rather than content-aware answer avoidance. We propose that positional-distribution shift may be a more effective behavioural signature than below-chance accuracy for detecting prompted underperformance at this model scale.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844