OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 13.05.2026, 15:12

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Automated Extraction of Imaging and Pathology Data From Diverse Prostate Cancer Electronic Records

2025·4 Zitationen·JCO Clinical Cancer InformaticsOpen Access
Volltext beim Verlag öffnen

4

Zitationen

12

Autoren

2025

Jahr

Abstract

PURPOSE: To develop and validate an algorithm to extract clinically relevant data elements for prostate cancer (PCa) from prostate biopsy reports and magnetic resonance imaging (MRI) reports. PATIENTS AND METHODS: MRI reports and biopsy pathology reports were extracted from a cohort of 1,360,866 patients with PCa in the VA Cancer Registry System or the VA Corporate Data Warehouse, with 155,570 patients having the relevant reports for inclusion. We hand-annotated a sample of these reports, which were used to develop a rule-based natural language processing (NLP) algorithm for extracting Gleason score, positive cores, and total cores taken during biopsy from biopsy pathology reports and Prostate Imaging Reporting and Data System (PI-RADS) score, prostate-specific antigen (PSA) density, prostate volume, and prostate dimensions from MRI reports. Our algorithm was validated on a set of 250 biopsy reports and 250 MRI reports representing 378 patients at 78 VA centers with procedures between 2004 and 2024. RESULTS: Our algorithm performed well across all data elements, demonstrating high F1 scores: Gleason (96.9), PI-RADS (93.7), PSA density (99.5), prostate volume (95.7), and prostate dimensions (93.2), with the percentage of positive cores being greater than or less than 34% (88.4). Error analysis demonstrated that items missed by our algorithm were often explained by unusual or vague wording within the notes or especially complex language. CONCLUSION: We developed an NLP algorithm and validated that it successfully captures salient information about data elements of interest in PCa research. Reliable extraction of these key data elements will have numerous uses for downstream research in this field.

Ähnliche Arbeiten