OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 09.04.2026, 10:32

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Functional Theory of Mind Evaluation in Large Language Models: A Behavioral and Causal Stability Framework

2026·0 Zitationen·International Scientific Journal of Engineering and Management
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2026

Jahr

Abstract

Theory of Mind (ToM) — the cognitive capacity to attribute beliefs, intentions, desires, and emotions to oneself and others — is considered a cornerstone of human social intelligence. As Large Language Models (LLMs) such as GPT-4o, LLaMA-3.1-70B, and Qwen2.5-72B are increasingly deployed in social and interactive roles, the question of whether they genuinely possess ToM capabilities has become both scientifically significant and practically urgent. However, the existing landscape of ToM evaluation is fragmented, primarily relying on behavioral benchmarks that test only whether a model produces the correct output, without investigating the underlying computational mechanism or the stability of that reasoning. This paper proposes a Functional Theory of Mind Evaluation Framework that addresses thisgap through three integrated layers of analysis: (1) behavioral accuracy evaluation using structured benchmarks(BigToM and ToMValley), (2) causal internal representation analysis using perspective projection and counterfactualinterventions gr ounded in Simulation Theory, and (3) reasoning stability measurement using transformation-based divergence testing. Experimental analysis across five leading LLMs demonstrates significant variation in behavioral accuracy (35–67%), with transformation and belief- tracking questions proving hardest. Counterfactual intervention experiments reveal that later Transformer layers (65–80) encode perspective-taking representations with measurable causal effects on model outputs, providing partial support for Simulation Theory as an explanatory mechanism. Stabilitytesting reveals that all models exhibit significant brittleness under adversarial scenario modifications, with answer consistency dropping 18–34% under minimal transformations. We propose a unified Functional ToM Score that integrates these three dimensions into a single interpretable metric, and discuss implications for AI safety, evaluationmethodology, and future benchmark design. Keywords: Theory of Mind, Large Language Models, Simulation Theory, false-belief evaluation, causal representationanalysis, reasoning stability, Functional ToM Score, social reasoning, mechanistic interpretability, BigToM, ToMValley.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Explainable Artificial Intelligence (XAI)Computational and Text Analysis MethodsArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen