OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 27.05.2026, 10:13

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Supporting postgraduate exam preparation with large language models: implications for traditional Chinese medicine education

2026·0 Zitationen·Frontiers in MedicineOpen Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2026

Jahr

Abstract

Introduction In China, the medical education system features multiple co-existing levels, with higher education often leading to better job prospects. In career advancement—especially for entry into competitive urban hospitals—the postgraduate examination often plays a more decisive role than the licensing examination. The application of Large Language Models (LLMs) in Traditional Chinese Medicine (TCM) has rapidly expanded. TCM theories possess distinct scientific features, requiring LLMs to demonstrate advanced information processing and comprehension abilities in a Chinese context. While LLMs have shown strong performance in many countries' licensing examinations, their performance in selective TCM examinations remains underexplored. This study aimed to evaluate and compare the performance of Ernie Bot, ChatGLM, SparkDesk, and GPT-4 on the 2023 Chinese Postgraduate Examination for TCM (CPE-TCM), and explore their potential in supporting TCM education and academic development. Methods We assessed the performance of four LLMs using the 2023 CPE-TCM as a test set. Exam scores were calculated to evaluate subject-specific performance. Additionally, responses were qualitatively analyzed based on logical reasoning and the use of internal and external information. Results Ernie Bot and ChatGLM achieved accuracy rates of 50.30 and 46.67%, respectively, both above the passing score. Statistically significant differences in subject-specific performance were observed, with the highest scores in the medical humanistic spirit module. ChatGLM and GPT-4 provided logical explanations for all responses, while Ernie Bot and SparkDesk showed logical reasoning in 98.2 and 43.6% of responses, respectively. ChatGLM and GPT-4 incorporated internal information in all explanations, whereas SparkDesk rarely did. Over 60% of responses from Ernie Bot, ChatGLM, and GPT-4 included external information, which did not significantly differ between correct and incorrect answers. In SparkDesk, the presence of internal or external information was significantly associated with answer correctness ( P < 0.001). Discussion Ernie Bot and ChatGLM surpassed the passing threshold for postgraduate selection, reflecting solid TCM expertise. LLMs demonstrated strong capabilities in logical reasoning and integration of background knowledge, highlighting their promising role in enhancing TCM education.

Ähnliche Arbeiten