Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
SMILES challenge 2025: Multitask learning with contrastive and natural language generation for enhanced medical image classification
0
Zitationen
2
Autoren
2026
Jahr
Abstract
Abstract This article proposes a novel multitask learning framework that integrates contrastive learning and natural language generation (NLG) to enhance medical image classification and report generation. The goal is to improve disease classification accuracy and interpretability in medical diagnostics. The model architecture consists of a Vision Transformer (ViT) as a visual encoder, a transformer-based text encoder, and a multimodal decoder. The visual encoder processes medical images, while the text encoder handles disease-related text prompts. These components are trained jointly using image-text contrastive loss and language generation loss. Evaluations on the MIMICCXR and Chexpert datasets show that the model with NLG (Plain + NLG) outperforms the baseline contrastive learning model (Plain) in disease classification. For example, in the MIMICCXR dataset, the accuracy for Atelectasis increased from 17.44%(Plain) to 41.5% (Plain + NLG), and for Cardiomegaly, it improved from 19.25% to 47.4%. In Chexpert, the accuracy for Atelectasis increased from 12.5% to 58.5%, and for Pleural Effusion, from 61.10% to 64.0%. The model also demonstrated improvements in F1 scores, particularly for complex diseases like Cardiomegaly and Consolidation. The proposed multitask framework effectively combines contrastive learning with NLG, leading to improved disease classification and medical report generation. This approach has potential clinical applications by enhancing AI’s interpretability and accuracy in medical decision-making.
Ähnliche Arbeiten
Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study
2020 · 22.632 Zit.
La certeza de lo impredecible: Cultura Educación y Sociedad en tiempos de COVID19
2020 · 19.284 Zit.
A Multi-Modal Distributed Real-Time IoT System for Urban Traffic Control (Invited Paper)
2024 · 14.276 Zit.
UNet++: A Nested U-Net Architecture for Medical Image Segmentation
2018 · 8.640 Zit.
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
2021 · 7.262 Zit.