Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of <scp>ChatGPT</scp> 's and Gemini's Ability to Find Studies About Neurosurgical Training and Trainees' Anatomy Understanding
0
Zitationen
6
Autoren
2026
Jahr
Abstract
BACKGROUND: There is a lack of research about artificial intelligence's potential to explore neurosurgical literature. We aimed to investigate ChatGPT's and Gemini's ability to identify and outline studies about neurosurgical training and trainees' anatomy understanding. METHODS: We asked ChatGPT 4.0 Turbo and Gemini 2.5 flash (in July 2025), and ChatGPT 5.3 and Gemini 3 flash (in March 2026) to list and summarize five papers: (1) about the use of virtual reality in neurosurgical training, (2) about the role of virtual reality in neurosurgery trainees' anatomy understanding, (3) that compared virtual reality with other neurosurgical training methods. We evaluated how many studies were successfully identified and accurately outlined. RESULTS: For ChatGPT 4.0 Turbo, the successful identification and summarization percentages were 100%/60%, 40%/40%, and 60%/40% respectively. For ChatGPT 5.3, the respective percentages were 100%/0%, 20%/0%, and 40%/0%. For Gemini 2.5 flash, they were 80%/60%, 40%/20%, and 0%/0%. For Gemini 3 flash, they were 60%/0%, 20%/0%, and 0%/0%. There was a tendency towards reporting outcomes in favor of virtual reality. Gemini 2.5 flash and 3 flash hallucinated 5/15 and 6/15 papers respectively. Gemini 2.5 flash misattributed first authorship in 8/15 papers. CONCLUSIONS: Apart from the excellent ChatGPT's ability to simply detect studies about the use of virtual reality in neurosurgical training, ChatGPT and Gemini did not provide reliable responses. Gemini performed worse and frequently hallucinated, while both platforms exhibited bias in favor of virtual reality. Newer versions performed generally worse than older ones. Ongoing development may improve these platforms' role in neurosurgical research.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.773 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.682 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.242 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.