Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessing the Difficulty of Inference Types in Natural Language Inference for Clinical Trials

2026·0 Zitationen·HAL (Le Centre pour la Communication Scientifique Directe)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Large Language Models (LLMs) achieve competitive results on Natural Language Inference when applied to clinical trials; however, it is not yet clear on which type of inference LLMs perform well or poorly. We address this by proposing new supplementary annotations to the existing NLI4CT dataset on the types of inferences observed in clinical trials. Our dataset supplements NLI4CT with a total of 1,851 new annotations using our carefully crafted guidelines for 17 types of inferences. To investigate how the inference types impact the performance of LLMs, we prompt Flan-T5, Llama, Mistral, and Qwen and investigate their performance using our newly annotated dataset. We found that logical inferences have a negative impact on Mixtral, Qwen-7B, and Qwen-14B's overall performance, while numerical inferences have a negative impact on Flan-T5-XL and Mixtral. Further analysis shows that understanding the CTR's structure by itself remains challenging for MMed-Llama-3. Other parameters, such as the number of inference types involved or the type of section used in the premise, also impact models' performance. Our code and dataset are publicly available on GitHub.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingMachine Learning in Healthcare

Volltext beim Verlag öffnen

Assessing the Difficulty of Inference Types in Natural Language Inference for Clinical Trials

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen