Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Investigating the diagnostic accuracy of <scp>GPT</scp>‐4's novel image analytics feature in dermatology

2024·8 Zitationen·Journal of the European Academy of Dermatology and VenereologyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

In late September 2023, the developers behind artificial intelligence tool ChatGPT announced the launch of their most complex model yet: a version of GPT capable of analysing and creating visual content.1 While its predecessors were confined to text-based processing, updated models like GPT-4 can utilize multimodal processing to integrate and interpret textual and visual information.2 This significant advancement, currently offered exclusively to paid ChatGPT subscribers, suggests particular potential in the field of dermatology which relies on visual observation and cues for diagnosis. Despite disclaimers against its use for medical advice, the novel offerings of GPT-4 may appeal to individuals with pressing health concerns or those experiencing barriers to healthcare. This is consistent with the well-documented rise in patient use of internet and social media tools to address unmet health needs.3, 4 However, patient safety may be implicated if healthcare decisions are made based on output from unregulated sources or biased datasets.5 While previous studies investigating GPT's diagnostic ability were limited to text-based cases,6 the advent of GPT-driven image analysis highlights a novel and time-sensitive application of the tool. In anticipation of real-world patient use, this pilot study explores the clinical validity of GPT-4 in generating diagnoses for images of various dermatological conditions. A total of 100 clinical images were collected from DermNet™, a free online dermatology database.7 Images depicted a range of dermatological conditions with representation across Fitzpatrick skin types (Table 1). Patient geographic location, sex, gender, ethnicity and race data were not available. Images were saved in JPG format with watermark preservation in accordance with licensing requirements.8 Each image was submitted to GPT-4 with a standardized prompt requesting image description, Fitzpatrick skin type classification, a differential diagnosis and the most likely diagnosis. The prompt also acknowledged that outputs would not be used as medical advice (Figure 1). Of the 100 images, GPT-4 correctly diagnosed 23 of 100 clinical images (23%) and included the correct diagnosis on its list of differential diagnoses in 52 of 100 clinical images (53%). Forty-six of 100 of images (46%) received the same Fitzpatrick classification by GPT-4 and a blinded human rater. Diagnostic accuracy results were further categorized as malignant, pre-malignant or non-malignant (Table 1). GPT-4 displayed a sensitivity rate of 22.2% for malignant images and 0% for precancerous images. While GPT-4's responses suggest a foundational ability to communicate dermatological concepts, diagnostic accuracy disparities indicate its extremely limited clinical applications. Relatively low sensitivity for malignant images (22.2%) and precancerous images (0%) were particularly concerning, as malignant lesions necessitate early and accurate diagnosis. Additionally, GPT-4 often included at least one malignant differential for benign conditions, potentially causing undue patient distress. The study faced limitations in image extraction methodology, as images lacked clinical context, were not standardized in lighting, quality or orientation and had proportionally less images of skin of colour. This prevented reliable Fitzpatrick skin type classifications by GPT-4 and human raters. Importantly, real-world chatbot-submitted images will also vary in quality. GPT-4 also required repeated prompt adjustments for brevity, emphasizing the importance of AI training. Future research may investigate GPT-4's ability to score Fitzpatrick skin types using standardized images, clinical background and diverse datasets. This pilot study revealed the current limitations of GPT-4, as of December 2023, in image-based dermatological diagnostics. While ChatGPT and other AI platforms offer promising opportunities for clinical collaboration and accessibility, patients are at risk of making critical health decisions based on misdiagnoses obtained through chatbots alone. We aim to raise awareness of the experimental nature of GPT-4 and encourage patient and clinician caution as AI platforms evolve. Malignant (18) Squamous cell carcinoma (6) Basal cell carcinoma (6) Melanoma in situ (6) Pre-malignant (3) Actinic keratosis (3) Non-malignant (79) Acrochordon (5) Atopic dermatitis (6) Blue nevus (1) Bullous pemphigoid (6) Contact dermatitis (6) Dysplastic nevus (1) Herpes simplex virus (6) IgA vasculitis (3) Impetigo (6) Keloid (6) Neurofibromatosis (6) Psoriasis (6) Seborrheic dermatitis (6) Spitz nevus (3) Systemic lupus erythematosus (6) Vitiligo (6) None declared. The authors declare no conflicts of interest. Consent for the publication of recognizable patient photographs or other identifiable material was obtained by DermNet™ and with terms and conditions detailed under its Image License at the time of article submission to the journal. The clinical images that supplied the findings of this study are openly available on DermNet™ at www.dermnetnz.org. Derived output supporting the findings of this study is available as supplementary data from the corresponding author EM on request.

Autoren

Institutionen

Themen

Cutaneous Melanoma Detection and ManagementArtificial Intelligence in Healthcare and EducationAI in cancer detection

Volltext beim Verlag öffnen

Investigating the diagnostic accuracy of <scp>GPT</scp>‐4's novel image analytics feature in dermatology

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen