Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Dual Fine-Tuning and Dual Alignment Training for Medical Language Models
0
Zitationen
2
Autoren
2025
Jahr
Abstract
The vast volume of medical information in the real world demands intelligent assistant systems for effective integration and summarization. In response to this, we designed a fourstage training pipeline comprising dual fine-tuning (continued pre-training and supervised fine-tuning) and dual alignment training (Direct Preference Optimization and Proximal Policy Optimization), to build a medical language model based on the open-source LLaMA-3.1-8B. The entire training process adopts the DeepSpeed framework, combined with ZeRO-1 optimization, Quantized Low-Rank Adaptation (QLoRA), and Flash-Attention 2, alleviating the substantial computational costs of large language model (LLM) training. Results show that after the four-stage training process, the medical language model achieved scores of 50.49 on CMMLU and 35.00 on CMB, both outperforming the baseline LLaMA-3.1-8B-Instruct, indicating significant improvements in both general Chinese capabilities and Chinese medical domain capabilities. Furthermore, after dual alignment training, the model's reward score increased by 30%, while toxicity decreased by 25%, demonstrating considerable enhancements in safety and harmlessness.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.446 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.769 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.300 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.734 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.451 Zit.