Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of Prompt Design and Internal Reasoning in Chatbot-Based Medical History Taking (Preprint)

2026·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

<sec> <title>BACKGROUND</title> A persistent discrepancy exists between patient-reported information and physician documentation. While conversational agents have been developed to collect medical histories prior to consultation, existing evaluations have largely focused on diagnostic accuracy or user satisfaction rather than the completeness and clinical usefulness of the information collected. There remains a need to assess the extent of clinically relevant information captured through chatbot-based interviews and to understand how model configurations and instructional strategies influence this coverage. </sec> <sec> <title>OBJECTIVE</title> This study aimed to evaluate the extent to which a chatbot can obtain clinically useful patient history information and to examine how prompt detail and internal reasoning influence information coverage during chatbot-based medical interviews. </sec> <sec> <title>METHODS</title> We developed a medical history-taking chatbot using the Qwen3-14B-Instruct model and evaluated four configurations in a 2×2 factorial design: Detailed/Thinking (DT), Detailed/Non-thinking (DN), Minimal/Thinking (MT), and Minimal/Non-thinking (MN). These configurations were compared against a rule-based system baseline (choice-based mode) using 66 standardized primary care clinical cases, with simulated patients interacting with the chatbot according to predefined case scripts. Information coverage (%) was assessed using a checklist inspired by Objective Structured Clinical Examination (OSCE) frameworks. Three physicians independently evaluated transcript coverage, with inter-rater agreement assessed using full agreement rates and Fleiss’ κ. Coverage percentages were compared across configurations using repeated-measures analysis of variance with post hoc testing. </sec> <sec> <title>RESULTS</title> Inter-rater agreement was substantial (Fleiss’ κ = 0.75). Across all 66 simulated cases, information coverage differed significantly among configurations (p < .001), with the detailed prompt with thinking (DT) mode achieving the highest mean coverage (72.3%), compared with moderate coverage in configurations using either thinking or detailed prompts alone (approximately 60%) and lower coverage in minimal non-thinking and rule-based configurations (approximately 51-54%). Differences were most pronounced for past medical and family history domains. Symptom-level analyses revealed substantial variability, with higher coverage for symptoms associated with well-defined diagnostic frameworks and lower coverage for multi-system presentations. </sec> <sec> <title>CONCLUSIONS</title> The combination of clinically detailed prompt instructions and internal reasoning significantly enhances the clinical usefulness of AI-driven history-taking by ensuring more comprehensive data collection. This approach allows for a more systematic and robust foundation for automated clinical documentation, facilitating better integration into healthcare workflows. </sec>

Autoren

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsDigital Mental Health Interventions

Volltext beim Verlag öffnen

Evaluation of Prompt Design and Internal Reasoning in Chatbot-Based Medical History Taking (Preprint)

Abstract

Ähnliche Arbeiten

Autoren

Themen