Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Towards More Generalizable and Accurate Sentence Classification in Medical Abstracts with Less Data
1
Zitationen
3
Autoren
2022
Jahr
Abstract
Abstract Purpose: With the unprecedented growth of biomedical publications, it is important to have structured abstracts in bibliographic databases (i.e., PubMed), thus, to facilitate the information retrieval and knowledge synthesis in needs of researchers. Methods: Here we propose a few-shot prompt learning-based approach to classify sentences in medical abstracts of randomized clinical trials (RCT) and observational studies (OS) to subsections of Introduction, Background, Methods, Results, and Conclusion, using an existing corpus of RCT (PubMed 200k/20k RCT) and a newly built corpus of OS (PubMed 20k OS). 5 manually designed templates in a combination of 4 BERT model variants were tested and compared to a previous Hierarchical Sequential Labeling Network architecture and traditional BERT-based sentence classification method. Results: On the PubMed 200k and 20k RCT datasets, we achieved overall F1 scores of 0.9508 and 0.9401 respectively. Under few-shot settings, we demonstrated that only 20\% of training data is sufficient to achieve a comparable F1 score by the HSLN model (0.9266 by us and 0.9263 by HSLN). When trained on the RCT dataset, our method achieved a 0.9065 F1 score on the OS dataset. When trained on the OS dataset, our method achieved a 0.9203 F1 score on the RCT dataset. Conclusion: We show that the prompt learning-based method outperformed the existing method, even when fewer training samples were used. Moreover, the proposed method shows better generalizability across two types of medical publications when compared with the existing approach.