OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 26.05.2026, 04:53

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Mitigating Algorithmic Bias in Cancer Site Classification Models

2026·1 Zitationen·JCO Clinical Cancer InformaticsOpen Access
Volltext beim Verlag öffnen

1

Zitationen

6

Autoren

2026

Jahr

Abstract

PURPOSE: Integrating artificial intelligence in cancer diagnostics has improved tumor classification beyond rule-based systems. Despite these advancements, these models may still encode demographic biases. We conducted a large-scale, applied bias-probing study of a deep learning-based cancer site classifier to quantify race information encoded in document embeddings. We then evaluated how performance changes when race-correlated embedding dimensions are removed in a post-training sensitivity analysis. METHODS: The cancer site classifier was trained using 3.5 million electronic cancer pathology reports from six of the National Cancer Institute's SEER registries. We trained a hierarchical self-attention network to generate 400-dimensional document embeddings. These embeddings were used to train two downstream, gradient-boosted decision tree classifiers: one to classify the cancer sites and another to predict racial categories. We identified overlapping features by intersecting the top 50 feature-importance rankings from the site and race models and computed their cumulative feature importance in each model. As a post hoc sensitivity analysis, we progressively pruned these overlapping dimensions, retrained the site model, and compared overall macro-F1 and accuracy, race-stratified macro-F1, and group fairness metrics on the basis of demographic parity and equalized odds before and after pruning. RESULTS: The analysis revealed minimal feature overlap between the cancer site and race prediction models, and the cumulative importance scores indicated a negligible influence of racial information on clinical predictions. Post-training pruning of overlapping features did not compromise the models' diagnostic accuracy, with a 0.07% loss in accuracy. CONCLUSION: Our findings demonstrate that HiSAN-generated embeddings from SEER data can be used effectively in cancer site classification without significant demographic bias influencing the outcomes. Post-training pruning therefore functions as a practical audit and sensitivity check.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

AI in cancer detectionArtificial Intelligence in Healthcare and EducationMachine Learning in Healthcare
Volltext beim Verlag öffnen