OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 23.05.2026, 21:52

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

OncoPT: long-context transformer models for in hospital tumor phenotype extraction from pathology reports

2026·0 Zitationen·npj Digital MedicineOpen Access
Volltext beim Verlag öffnen

0

Zitationen

9

Autoren

2026

Jahr

Abstract

Despite recent advances in medical informatics, extracting tumor information from pathology reports remains a challenge in modern cancer registry and surveillance workflows. These documents often have an unstructured format, complex medical content, and a considerably lengthy context, creating significant challenges for automated phenotypic information extraction. Although some recent language models such as BERT, GatorTron, and GPT-4 have demonstrated efficacy in medical applications, they are either constrained by sequence length limitations or cloud-based computing that violates the handling of protected health information. We introduce two oncology pathology-optimized transformer models OncoPT, based on Longformer and BigBird architectures and trained on real-world pathology reports. OncoPT efficiently processes reports up to 4,096 tokens, making it suitable for hospitals' onsite deployment with limited resources. We apply OncoPT to a common malignancy (exemplified by breast cancer) and a rare malignancy (exemplified by gastric cancer), across five key tumor phenotypes: Subsite, Histology, Grade, Stage, and Laterality. The results demonstrate that OncoPT achieves state-of-the-art weighted F-1 on a private pathology dataset and surpasses commercial chatbots (ChatGPT 4o and o1) on the public CORAL dataset (up to 30% improvement). These findings highlight the robustness of OncoPT models with the added benefit of preserving the privacy of patient health information.

Ähnliche Arbeiten