Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
From ChatGPT to UroGPT: A guideline-trained artificial intelligence model for male infertility
0
Zitationen
7
Autoren
2026
Jahr
Abstract
Background: ChatGPT is not yet sufficiently reliable for answering clinical questions relevant to direct patient care. We hypothesized that a GPT model trained exclusively on expert guidelines would provide more accurate, guideline-concordant responses. Materials and methods: With permission from the European Association of Urology, we developed UroGPT, a custom GPT model trained solely on the European Association of Urology guidelines. We posed 25 clinical questions derived from the Male Infertility Guidelines and expert opinions to both the standard ChatGPT (GPT-4o) and UroGPT. Responses were anonymized and graded by 2 blinded reviewers as “complete and accurate,” “incomplete but accurate,” and “incorrect or misleading.” Guideline concordance was compared using the chi-square test. Results: UroGPT demonstrated significantly greater concordance with guideline-based responses than ChatGPT ( p < 0.001). UroGPT provided 94% (47/50) complete and accurate responses, whereas ChatGPT provided only 38% (19/50). ChatGPT also produced a significantly higher rate of incorrect or misleading responses (52% vs. 4%). Inter-reviewer agreement was higher for UroGPT (88% vs. 48%), suggesting that its answers were clearer and more consistent with the guidelines. ChatGPT frequently overgeneralized, recommended unsupported interventions, or offered non-guideline-based lifestyle advice. However, both models failed to answer correctly 2 high-stakes questions regarding orchiectomy in patients with undescended testes. Conclusions: UroGPT markedly outperformed ChatGPT in guideline concordance. Training artificial intelligence models on expert-authored content represents a meaningful step toward developing clinically useful large language models. However, UroGPT is not yet appropriate for direct patient care and should currently be used only for research and academic purposes.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.635 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.543 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.051 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.844 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.