Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Dual Fine-Tuning and Dual Alignment Training for Medical Language Models

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

The vast volume of medical information in the real world demands intelligent assistant systems for effective integration and summarization. In response to this, we designed a fourstage training pipeline comprising dual fine-tuning (continued pre-training and supervised fine-tuning) and dual alignment training (Direct Preference Optimization and Proximal Policy Optimization), to build a medical language model based on the open-source LLaMA-3.1-8B. The entire training process adopts the DeepSpeed framework, combined with ZeRO-1 optimization, Quantized Low-Rank Adaptation (QLoRA), and Flash-Attention 2, alleviating the substantial computational costs of large language model (LLM) training. Results show that after the four-stage training process, the medical language model achieved scores of 50.49 on CMMLU and 35.00 on CMB, both outperforming the baseline LLaMA-3.1-8B-Instruct, indicating significant improvements in both general Chinese capabilities and Chinese medical domain capabilities. Furthermore, after dual alignment training, the model's reward score increased by 30%, while toxicity decreased by 25%, demonstrating considerable enhancements in safety and harmlessness.

Autoren

Institutionen

Themen

Machine Learning in HealthcareArtificial Intelligence in Healthcare and EducationTopic Modeling

Volltext beim Verlag öffnen

Dual Fine-Tuning and Dual Alignment Training for Medical Language Models

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen