Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

CARMINA: optimizing low-parameter language models for high-quality cardiovascular research assistance

2026·0 Zitationen·European Heart Journal - Digital HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract Introduction Large language models (LLMs) and their use in chatbots have demonstrated impressive capabilities in biomedical contexts [1]; however, the hallucinations, privacy issues and substantial computational requirements limit their widespread implementation in resource-constrained environments. Current approaches either sacrifice performance for efficiency, require prohibitive computational resources, or require payment for each word. Purpose We have developed and validated CARMINA (Cardiovascular And Research-driven Molecular Insight with Novel Assistant), a specialized biomedical assistant powered by smaller, resource-efficient, and open-source language models. We hypothesized that carefully optimized Retrieval-Augmented Generation (RAG) systems using models with fewer parameters (≤7B) could achieve performance comparable to larger models while maintaining or even improving factual accuracy and scientific rigor in cardiovascular research applications. Methods We constructed a comprehensive biomedical RAG system using four different language models: llama3.1:7b, gemma2:2b, qwen2:7b, and phi3:3.8b [2–5]. Models are coupled with a MongoDB vector database containing 650,000 indexed PubMed cardiology-related abstracts and GTE-large embedding model [6]. We optimized the system through prompt engineering to reduce hallucinations and provide source citations. For benchmarking, we developed a questionnaire with ~250 questions extracted from scientific abstracts using llama3.1. The questions were taylored to assess the groundedness, relevance, and context-independence [7,8] of the answers provided by CARMINA Model responses were systematically evaluated using an independent language model (llama3.1:7b) for accuracy, completeness, reference quality, and clarity, varying in the number of retrieved context documents 1-5 papers). Results Our benchmarking demonstrated that qwen2:7b is the most consistent model across all evaluation metrics [Figure 1]. All models acknowledged their lack of information answering "I don´t know" whenever needed and provided relevant references for their responses. The optimized RAG architecture significantly reduced hallucination rates compared to standard implementations. Furthermore, the use of larger open-source models does not substantially improve performance. Conclusion CARMINA shows that small language models, when equipped with specialized RAG workflows and optimizations techniques, can provide reliable research assistance even better than non-specialized larger models. This approach offers a solution for resource-limited environments while maintaining scientific accuracy and guaranteeing privacy. In future work, we plan to address the limitations of automated benchmarking methodologies, and the inherent risks associated with using LLMs as evaluators [9,10].

Autoren

Institutionen

Spanish National Centre for Cardiovascular Research(ES)

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingMachine Learning in Healthcare

Volltext beim Verlag öffnen

CARMINA: optimizing low-parameter language models for high-quality cardiovascular research assistance

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen