Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Benchmarking LLMs for Pairwise Causal Discovery in Biomedical and Multi-Domain Contexts
0
Zitationen
4
Autoren
2026
Jahr
Abstract
The safe deployment of large language models (LLMs) in high-stakes fields like biomedicine, requires them to be able to reason about cause and effect. We investigate this ability by testing 13 open-source LLMs on a fundamental task: pairwise causal discovery (PCD) from text. Our benchmark, using 12 diverse datasets, evaluates two core skills: 1) \textbf{Causal Detection} (identifying if a text contains a causal link) and 2) \textbf{Causal Extraction} (pulling out the exact cause and effect phrases). We tested various prompting methods, from simple instructions (zero-shot) to more complex strategies like Chain-of-Thought (CoT) and Few-shot In-Context Learning (FICL). The results show major deficiencies in current models. The best model for detection, DeepSeek-R1-Distill-Llama-70B, only achieved a mean score of 49.57\% ($C_{detect}$), while the best for extraction, Qwen2.5-Coder-32B-Instruct, reached just 47.12\% ($C_{extract}$). Models performed best on simple, explicit, single-sentence relations. However, performance plummeted for more difficult (and realistic) cases, such as implicit relationships, links spanning multiple sentences, and texts containing multiple causal pairs. We provide a unified evaluation framework, built on a dataset validated with high inter-annotator agreement ($κ\ge 0.758$), and make all our data, code, and prompts publicly available to spur further research. \href{https://github.com/sydneyanuyah/CausalDiscovery}{Code available here: https://github.com/sydneyanuyah/CausalDiscovery}
Ähnliche Arbeiten
Research electronic data capture (REDCap)—A metadata-driven methodology and workflow process for providing translational research informatics support
2008 · 50.585 Zit.
Gene Ontology: tool for the unification of biology
2000 · 44.220 Zit.
STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets
2018 · 18.959 Zit.
A translation approach to portable ontology specifications
1993 · 12.486 Zit.
Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research
2005 · 12.011 Zit.