Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing the Difficulty of Inference Types in Natural Language Inference for Clinical Trials
0
Zitationen
3
Autoren
2026
Jahr
Abstract
Large Language Models (LLMs) achieve competitive results on Natural Language Inference when applied to clinical trials; however, it is not yet clear on which type of inference LLMs perform well or poorly. We address this by proposing new supplementary annotations to the existing NLI4CT dataset on the types of inferences observed in clinical trials. Our dataset supplements NLI4CT with a total of 1,851 new annotations using our carefully crafted guidelines for 17 types of inferences. To investigate how the inference types impact the performance of LLMs, we prompt Flan-T5, Llama, Mistral, and Qwen and investigate their performance using our newly annotated dataset. We found that logical inferences have a negative impact on Mixtral, Qwen-7B, and Qwen-14B's overall performance, while numerical inferences have a negative impact on Flan-T5-XL and Mixtral. Further analysis shows that understanding the CTR's structure by itself remains challenging for MMed-Llama-3. Other parameters, such as the number of inference types involved or the type of section used in the premise, also impact models' performance. Our code and dataset are publicly available on GitHub.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.697 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.602 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.127 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.872 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.