Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
WCN26-4700 Robustness Gap of Large Language Models in Nephrology
0
Zitationen
6
Autoren
2026
Jahr
Abstract
Large language models (LLMs) achieve high accuracy on medical benchmarks, raising interest in their clinical application. However, whether this performance reflects genuine reasoning or pattern recognition remains unclear. To evaluate reasoning robustness, we replaced the correct answer in nephrology multiple-choice questions with “None of the other answers” (NOTA) and assessed changes in accuracy. We hypothesized that causal and pathophysiological reasoning would preserve accuracy, whereas reliance on memorized patterns would cause a marked decline.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.460 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.341 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.791 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.536 Zit.