Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
DeepSeek-R1 for automated scoring in radiology residency examinations: an agreement and test–retest reliability study
0
Zitationen
5
Autoren
2025
Jahr
Abstract
OBJECTIVE: This study evaluates the feasibility of employing DeepSeek-R1 for automated scoring in examinations for radiology residents, comparing its performance with that of radiologists. METHODS: A cross-sectional study was undertaken to assess 504 diagnostic radiology reports produced by eighteen third-year radiology residents. The evaluations were independently conducted by Radiologist A, Radiologist B, and DeepSeek-R1 (as of June 15, 2025), utilizing standardized scoring rubrics and predefined evaluation criteria. One month after the initial evaluation, a re-assessment was performed by DeepSeek-R1 and Radiologist A. The inter-rater reliability among Radiologist A, Radiologist B, and DeepSeek-R1, in addition to the test-retest reliability, was analyzed using intraclass correlation coefficients (ICC). RESULTS: The ICC values between DeepSeek-R1 and Radiologist A, DeepSeek-R1 and Radiologist B, and Radiologist A and Radiologist B were found to be 0.879, 0.820, and 0.862, respectively. The test-retest ICC for DeepSeek-R1 was determined to be 0.922, whereas for Radiologist A, it was 0.952. The ICC between DeepSeek-R1 (re-test) and Radiologist A (re-test) was 0.885. CONCLUSION: The performance of DeepSeek-R1 was comparable to that of radiologists in the evaluation of radiology residents' reports. The integration of DeepSeek-R1 into medical education could effectively assist in assessment tasks, potentially alleviating faculty workload while preserving the quality of evaluations.
Ähnliche Arbeiten
Refinement and reassessment of the SERVQUAL scale.
1991 · 3.967 Zit.
Radiobiology for the Radiologist.
1974 · 3.502 Zit.
ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee
2017 · 2.432 Zit.
Accuracy of Physician Self-assessment Compared With Observed Measures of Competence
2006 · 2.326 Zit.
Technology as an Occasion for Structuring: Evidence from Observations of CT Scanners and the Social Order of Radiology Departments
1986 · 2.251 Zit.