Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

From Guidelines to Real‐Time Conversation: Expert‐Validated Retrieval‐Augmented and Fine‐Tuned <scp>GPT</scp> ‐4 for Hepatitis C Management

2025·3 Zitationen·Liver InternationalOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

BACKGROUND AND AIMS: Advances in artificial intelligence, particularly large language models (LLMs), hold promise for transforming chronic disease management such as Hepatitis C Virus (HCV) infection. This study evaluates the impact of retrieval-augmented generation (RAG) and supervised fine-tuning (SFT) on both open-ended question answering (accuracy and clarity) and on LLM-recommended treatment regimens for clinical scenarios. METHODS: We employed OpenAI's GPT-4 Turbo in four configurations-baseline, RAG-Top1, RAG-Top 10 and SFT-using the 2020 EASL HCV guidelines as external knowledge or fine-tuning data. For the question set, guidelines were segmented at the paragraph level and encoded into 3072-dimensional embeddings. Fifteen questions covering general, patient and physician perspectives were scored on a 10-point accuracy scale and binary accuracy/clarity by four experts. Separately, we created 25 simulated clinical scenarios; a consensus of four hepatologists defined the gold-standard DAA regimens. Model performance on these cases was measured by two metrics: 'partial accuracy' (≥ one correct DAA without errors) and 'complete accuracy' (all correct DAAs without errors). RESULTS: On open-ended questions, RAG-Top10 outperformed baseline in accuracy (91.7% vs. 36.6%; p < 0.001) and clarity (91.7% vs. 46.6%; p < 0.001). RAG-Top1 achieved 81.7% accuracy and 86.6% clarity (both p < 0.001), while SFT reached 71.7% accuracy and 88.3% clarity (p < 0.001). Similarly, RAG-Top10 achieved the highest performance in prescribing the correct DAA regimen according to expert consensus in 76% of cases (vs. 24% for baseline model, p < 0.001). CONCLUSIONS: Both RAG-Top10 and SFT markedly enhance LLM performance in guideline-driven HCV management-improving not only response accuracy and clarity but also DAA selection in clinical scenarios. RAG-Top10's broader context retrieval confers the greatest gains, while SFT underscores the value of domain-specific alignment. Rigorous, expert-informed evaluation frameworks are essential for the safe integration of LLMs into clinical practice.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingMachine Learning in Healthcare

Volltext beim Verlag öffnen

From Guidelines to Real‐Time Conversation: Expert‐Validated Retrieval‐Augmented and Fine‐Tuned <scp>GPT</scp> ‐4 for Hepatitis C Management

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen