Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Development of AI Chatbots for Cancer Information: Reducing Hallucinations and Trade-Offs in Responses with Reliable Data (Preprint)
0
Zitationen
3
Autoren
2024
Jahr
Abstract
<sec> <title>BACKGROUND</title> Generative artificial intelligence (AI) is increasingly used to find information. Providing accurate information is essential to support cancer patients and their families; however, information returned by generative AIs is sometimes wrong. Returning wrong information is called hallucination. </sec> <sec> <title>OBJECTIVE</title> We aimed to examine cancer information returned by generative AIs with retrieval-augmented generation (RAG) using cancer-specific information sources and general internet search. </sec> <sec> <title>METHODS</title> We compiled 62 cancer-related questions in Japanese and compared the responses of conventional chatbots with GPT-4 and GPT-3.5 (-turbo-16K) without RAG. We developed generative AI chatbots with different reference information sources—RAG-equipped Cancer Information Service (CIS) chatbot and Google chatbot—and compared the characteristics of their responses with those generated by a conventional chatbot without RAG. The CIS chatbot system included CIS as the reference information source. The characteristics of the responses were analyzed. </sec> <sec> <title>RESULTS</title> For questions on information issued by CIS, the rates of hallucinations for the CIS chatbot were 0% for GPT-4 and 6% for GPT-3.5, whereas those for the Google chatbot were 6% and 10%. For questions on information that is not issued by CIS, the Google chatbot generated hallucinations in 19% of cases using GPT-4 and 35% using GPT-3.5. The conventional chatbot returned hallucinations in approximately 40% of the responses. The reference data from Google searches was higher compared to CIS for producing hallucinations, with an odds ratio of 9.4, (95% confidence interval 1.2-17.5, P < .01), and the odd ratio for the conventional chatbot was 16.1 (95% CI, 3.7-50.0, P < .001). The conventional chatbot responded to all questions, but the response rate decreased (36% to 81%) for chatbots with RAG. For questions on information not covered by CIS, the CIS chatbot did not respond, while the Google chatbot generated responses in 52% of the cases using GPT-4 and 71% using GPT-3.5. </sec> <sec> <title>CONCLUSIONS</title> Using RAG with reliable information sources significantly reduced the hallucination rate of generative AI chatbots, and increased the ability to admit lack of information, making them more suitable for general use, where users need to be provided with accurate information. </sec>
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.789 Zit.
Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data
2005 · 10.555 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.989 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.598 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.124 Zit.