OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 25.05.2026, 04:07

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of <scp>ChatGPT</scp> 's and Gemini's Ability to Find Studies About Neurosurgical Training and Trainees' Anatomy Understanding

2026·0 Zitationen·ANZ Journal of Surgery
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2026

Jahr

Abstract

BACKGROUND: There is a lack of research about artificial intelligence's potential to explore neurosurgical literature. We aimed to investigate ChatGPT's and Gemini's ability to identify and outline studies about neurosurgical training and trainees' anatomy understanding. METHODS: We asked ChatGPT 4.0 Turbo and Gemini 2.5 flash (in July 2025), and ChatGPT 5.3 and Gemini 3 flash (in March 2026) to list and summarize five papers: (1) about the use of virtual reality in neurosurgical training, (2) about the role of virtual reality in neurosurgery trainees' anatomy understanding, (3) that compared virtual reality with other neurosurgical training methods. We evaluated how many studies were successfully identified and accurately outlined. RESULTS: For ChatGPT 4.0 Turbo, the successful identification and summarization percentages were 100%/60%, 40%/40%, and 60%/40% respectively. For ChatGPT 5.3, the respective percentages were 100%/0%, 20%/0%, and 40%/0%. For Gemini 2.5 flash, they were 80%/60%, 40%/20%, and 0%/0%. For Gemini 3 flash, they were 60%/0%, 20%/0%, and 0%/0%. There was a tendency towards reporting outcomes in favor of virtual reality. Gemini 2.5 flash and 3 flash hallucinated 5/15 and 6/15 papers respectively. Gemini 2.5 flash misattributed first authorship in 8/15 papers. CONCLUSIONS: Apart from the excellent ChatGPT's ability to simply detect studies about the use of virtual reality in neurosurgical training, ChatGPT and Gemini did not provide reliable responses. Gemini performed worse and frequently hallucinated, while both platforms exhibited bias in favor of virtual reality. Newer versions performed generally worse than older ones. Ongoing development may improve these platforms' role in neurosurgical research.

Ähnliche Arbeiten