When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don’t

arXiv:2604.06422v1 Announce Type: cross Abstract: Understanding when Vision-Language Models (VLMs) will behave unexpectedly, whether models can reliably predict their own behavior, and if models adhere

Bi-Level Optimization for Single Domain Generalization

arXiv:2604.06349v1 Announce Type: cross Abstract: Generalizing from a single labeled source domain to unseen target domains, without access to any target data during training, remains

Uncertainty Estimation for Deep Reconstruction in Actuatic Disaster Scenarios with Autonomous Vehicles

arXiv:2604.06387v1 Announce Type: cross Abstract: Accurate reconstruction of environmental scalar fields from sparse onboard observations is essential for autonomous vehicles engaged in aquatic monitoring. Beyond

Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization

arXiv:2604.06285v1 Announce Type: cross Abstract: Vision-Language Models (VLMs) have become essential for tasks such as image synthesis, captioning, and retrieval by aligning textual and visual

The Detection–Extraction Gap: Models Know the Answer Before They Can Say It

arXiv:2604.06613v1 Announce Type: cross Abstract: Modern reasoning models continue generating long after the answer is already determined. Across five model configurations, two families, and three

Do Large Language Models Mentalize When They Teach?

April 3, 2026

arXiv:2604.01594v1 Announce Type: new
Abstract: How do LLMs decide what to teach next: by reasoning about a learner’s knowledge, or by using simpler rules of thumb? We test this in a controlled task previously used to study human teaching strategies. On each trial, a teacher LLM sees a hypothetical learner’s trajectory through a reward-annotated directed graph and must reveal a single edge so the learner would choose a better path if they replanned. We run a range of LLMs as simulated teachers and fit their trial-by-trial choices with the same cognitive models used for humans: a Bayes-Optimal teacher that infers which transitions the learner is missing (inverse planning), weaker Bayesian variants, heuristic baselines (e.g., reward based), and non-mentalizing utility models. In a baseline experiment matched to the stimuli presented to human subjects, most LLMs perform well, show little change in strategy over trials, and their graph-by-graph performance is similar to that of humans. Model comparison (BIC) shows that Bayes-Optimal teaching best explains most models’ choices. When given a scaffolding intervention, models follow auxiliary inference- or reward-focused prompts, but these scaffolds do not reliably improve later teaching on heuristic-incongruent test graphs and can sometimes reduce performance. Overall, cognitive model fits provide insight into LLM tutoring policies and show that prompt compliance does not guarantee better teaching decisions.

Subscribe for Updates

Copyright 2025 dijee Intelligence Ltd. dijee Intelligence Ltd. is a private limited company registered in England and Wales at Media House, Sopers Road, Cuffley, Hertfordshire, EN6 4RY, UK registration number 16808844