Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Mitigating Algorithmic Bias in Cancer Site Classification Models
1
Zitationen
6
Autoren
2026
Jahr
Abstract
PURPOSE: Integrating artificial intelligence in cancer diagnostics has improved tumor classification beyond rule-based systems. Despite these advancements, these models may still encode demographic biases. We conducted a large-scale, applied bias-probing study of a deep learning-based cancer site classifier to quantify race information encoded in document embeddings. We then evaluated how performance changes when race-correlated embedding dimensions are removed in a post-training sensitivity analysis. METHODS: The cancer site classifier was trained using 3.5 million electronic cancer pathology reports from six of the National Cancer Institute's SEER registries. We trained a hierarchical self-attention network to generate 400-dimensional document embeddings. These embeddings were used to train two downstream, gradient-boosted decision tree classifiers: one to classify the cancer sites and another to predict racial categories. We identified overlapping features by intersecting the top 50 feature-importance rankings from the site and race models and computed their cumulative feature importance in each model. As a post hoc sensitivity analysis, we progressively pruned these overlapping dimensions, retrained the site model, and compared overall macro-F1 and accuracy, race-stratified macro-F1, and group fairness metrics on the basis of demographic parity and equalized odds before and after pruning. RESULTS: The analysis revealed minimal feature overlap between the cancer site and race prediction models, and the cumulative importance scores indicated a negligible influence of racial information on clinical predictions. Post-training pruning of overlapping features did not compromise the models' diagnostic accuracy, with a 0.07% loss in accuracy. CONCLUSION: Our findings demonstrate that HiSAN-generated embeddings from SEER data can be used effectively in cancer site classification without significant demographic bias influencing the outcomes. Post-training pruning therefore functions as a practical audit and sensitivity check.
Ähnliche Arbeiten
A survey on deep learning in medical image analysis
2017 · 14.110 Zit.
pROC: an open-source package for R and S+ to analyze and compare ROC curves
2011 · 13.861 Zit.
Dermatologist-level classification of skin cancer with deep neural networks
2017 · 13.583 Zit.
A survey on Image Data Augmentation for Deep Learning
2019 · 12.220 Zit.
QuPath: Open source software for digital pathology image analysis
2017 · 8.487 Zit.