Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Detection of LLM Deceptive Behaviour Triggered by the Poisonous Context Injection: The Problem Demonstration

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

This paper presents a focused demonstration of deceptive behaviour in Large Language Models (LLMs) arising under poisonous context injection. The case study is constructed around a Japanese haiku, selected for its inherent ambiguity, which serves as a probe for LLM alignment with the humans’ real-world model. When presented with a poisonous context, ChatGPT generated translation, interpretation, and literary criticism that were not only incorrect but also internally inconsistent. This experiment highlights a fundamental risk: LLMs can produce outputs that are both linguistically convincing and semantically deceptive. The novelty of this work is in framing LLM deception as a measurable phenomenon and in articulating the feasibility of automated detection through cross-verification with independent models. The contribution of this work establishes the problem space by demonstrating how subtle poisoning can systematically induce deceptive generations. By formalising the problem and identifying a methodological direction, this study positions itself as an initial step in an ongoing research program on trustworthy and self-aware AI. Proof of the concept experiments demonstrated that a committee of five major LLMs estimates the trustworthiness of the poisonous context haiku interpretations at 0.57±0.33 range, while non-poisoned haiku interpretations are estimated at the 0.86±0.15 trustworthiness range.

Autoren

Institutionen

University of Bedfordshire(GB)

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingExplainable Artificial Intelligence (XAI)

Volltext beim Verlag öffnen

Detection of LLM Deceptive Behaviour Triggered by the Poisonous Context Injection: The Problem Demonstration

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen