Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Data adequacy bias impact in a data-blinded semi-supervised GAN for privacy-aware COVID-19 chest X-ray classification
1
Zitationen
2
Autoren
2022
Jahr
Abstract
Supervised machine learning models are, by definition, data-sighted, requiring to view all or most parts of the training dataset which are labeled. This paradigm presents two bottlenecks which are intertwined: risk of exposing sensitive data samples to the third-party site with machine learning engineers, and time-consuming, laborious, bias-prone nature of data annotations by the personnel at the data source site. In this paper we studied learning impact of data adequacy as bias source in a data-blinded semi-supervised learning model for covid chest X-ray classification. Data-blindedness was put in action on a semi-supervised generative adversarial network to generate synthetic data based only on a few labeled data samples and concurrently learn to classify targets. We designed and developed a data-blind COVID-19 patient classifier that classifies whether an individual is suffering from COVID-19 or other type of illness with the ultimate goal of producing a system to assist in labeling large datasets. However, the availability of the labels in the training data had an impact in the model performance, and when a new disease spreads, as it was COVID9-19 in 2019, access to labeled data may be limited. Here, we studied how bias in the labeled sample distribution per class impacted in classification performance for three models: a Convolution Neural Network based classifier (CNN), a semi-supervised GAN using the source data (SGAN), and finally our proposed data-blinded semi-supervised GAN (BSGAN). Data-blind prevents machine learning engineers from directly accessing the source data during training, thereby ensuring data confidentiality. This was achieved by using synthetic data samples, generated by a separate generative model which were then used to train the proposed model. Our model achieved comparable performance, with the trade-off between a privacy-aware model and a traditionally-learnt model of 0.05 AUC-score, and it maintained stable, following the same learning performance as the data distribution was changed.
Ähnliche Arbeiten
La certeza de lo impredecible: Cultura Educación y Sociedad en tiempos de COVID19
2020 · 19.308 Zit.
A Multi-Modal Distributed Real-Time IoT System for Urban Traffic Control (Invited Paper)
2024 · 14.307 Zit.
UNet++: A Nested U-Net Architecture for Medical Image Segmentation
2018 · 8.836 Zit.
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
2021 · 7.435 Zit.
scikit-image: image processing in Python
2014 · 6.859 Zit.