Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Will large language model AI (ChatGPT) be a benefit or a risk to quality for submission of medical physics manuscripts?

2025·1 Zitationen·Medical Physics

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large language models (LLMs), such as ChatGPT, are quickly becoming a part of everyday life. I have personally used it to refine reference letters, create developer documentation for python code, and even generate letters of medical necessity. It is clearly a powerful tool, but should it be used in publishing? It can edit a manuscript for grammar and syntax. It can summarize a manuscript to create the abstract. It can generate an entire manuscript from simple prompts. It can even generate the research ideas themselves. But should it? Dr. Low obtained his Ph.D. in Physics from Indiana University, Bloomington and, after a postdoctoral fellowship at M. D. Anderson Cancer Center, Houston, TX, moved to Washington University Mallinckrodt Institute of Radiology, St. Louis, MO, where he eventually became Professor in Radiation Oncology. In 2010, he moved to his current position at UCLA, where he is a Professor in Radiation Oncology and Vice Chair of Medical Physics Research and Innovation. Dr. Low is certified by the American Board of Medical Physics in Radiation Oncology Physics and the American Board of Radiology in Therapeutic Medical Physics. He has been very active in both the AAPM and ASTRO, acting as the Science Council Chair of both committees. Dr. Low's major research interests include image-guided radiation therapy, magnetic-resonance guided radiation therapy, quality assurance, and modeling respiratory motion. He is a Fellow of both AAPM and ASTRO, has published over 310 papers in refereed journals, and is the author of the most highly cited manuscript in the history of the journal Medical Physics. Prior to joining Varian as Senior Director for Clinical Technology Adoption, Mr. Halvorsen was the Director of Physics in Radiation Oncology for the Lahey division of Beth Israel Lahey Health in suburban Boston for the past decade. He received his MS in Radiological Medical Physics from the University of Kentucky in 1990 and was certified by the American Board of Radiology in 1995. Prior to joining Beth Israel Lahey Health, he was Vice President of Medical Physics for Alliance Oncology, a division of Alliance Healthcare Services, where he oversaw the physics program's growth from 8 to 24 clinics across the country and implemented consistent practice standards and regular peer reviews. In the mid to late 1990s, he brought IMRT services to the community practice setting through a collaboration with the University of North Carolina—Chapel Hill. He has been an active volunteer in professional societies, chairing the AAPM Professional Council and serving on the Board of Directors. During his tenure on the Professional Council, he initiated the Medical Physics Practice Guideline program. He has authored numerous peer-reviewed manuscripts, most recently as the chair of the Medical Physics Practice Guideline for Peer Review as well as the Guideline for SRS and SBRT, and as a member of the ASTRO-ASCO-AUA Evidence-Based Guideline for Hypofractionated Prostate treatment. He is a volunteer surveyor for the American College of Radiology and served many years on its accreditation program oversight committee. He served as the Chair of the AAPM Working Group on Implementation of TG-100 and was the Deputy Editor-in-Chief of the open-access journal JACMP and is an honorary fellow of the ACR and AAPM. Of all the manuscripts I have contributed to, the subject of this one is by far the most rapidly changing. The development of commercial and widely-available LLMs has the potential for significantly impacting all aspects of modern life, including the process of writing scientific manuscripts.1 However, the hundreds of millions of dollars invested in LLMs to date and the potential billions to be earned puts substantial pressure on companies to get us excited about LLMs through generating hype.2 Drafting and structuring content: Improving clarity and readability: Generating a literature list: Additional benefits have been touted, including by ChatGPT itself, including synthesizing studies for insertion into the background section, generating abstracts and summaries, assisting with exploring implications of the research, and checking whether arguments in a paper are logically sound. Issues, such as hallucinations and inaccurate interpretation and analysis limit the utility of these potential benefits.9 Text generated by ChatGPT also often lacks specificity and can have significant redundancy. In summary, ChatGPT can be very useful in the many steps of manuscript preparation, especially in creating draft outlines and bullet lists of important concepts to consider and include in the manuscript. ChatGPT can also be useful in editing text for clarity and readability and generating an initial literature list. The utility of ChatGPT should be considered like an assistant that has an excellent grasp of sentence and paragraph structure and grammar, but that may have a limited grasp of the science behind the manuscript. The author should consider the information provided to be an aid in manuscript preparation. This may be most useful for junior faculty and faculty that are not native English speakers, but thorough review by a senior investigator and native English speaker, respectively, should be conducted, as would be done without ChatGPT assistance. With the introduction of easy-to-use generative artificial intelligence (AI) tools based on LLMs, we have seen intense interest in its power and potential applications. Most of us are understandably awed by this new technology and many have used “chatbots” to summarize information or to draft a document. It is tempting to consider how such tools may improve productivity or ease the process of scientific discovery and publishing, but I posit that the heedless use of such tools may undermine scientific integrity. “Principle III: Members must act with integrity in all aspects of their work.” “Principle X: Members are professionally responsible and accountable for their practice, attitudes, and actions, including inactions and omissions.” These ethical principles can and should guide our professional practice, including how we employ technology. Most of us use tools to automate the processing and analysis of data, and we commonly use the embedded grammatical corrections in word processing software. Generative AI represents a new level of assistive power—the trick is “where to draw the line” and how to properly acknowledge its use. Research is all about novelty—discovering new solutions or new theories to better understand how things work. While a tool based on generative AI may be an excellent assistant, helping with the writing process and searching for prior work on a given subject, we must be diligent as scientists to ensure that discovery is based on our own critical thinking and relevant experience. Consider some of the limitations of the current generative AI tools: “The predictive algorithms of AI are trained to discover patterns based on their training data. This means they look for common themes or ideas which will bias the program against new ideas. […] The software does not limit its training to the newest concepts in the field, so it can pull outdated […] information into its response […].10” AI may introduce bias. Consider that AI algorithms are trained on datasets mostly created by humans. Data from countries with the most used languages may be disproportionately represented. Data from the wealthiest nations may be more likely to be available in digital form. And the teams behind each algorithm (who decide how to train their algorithm) likely have undisclosed biases. Thus, a heavy reliance on AI tools for the ‘discovery’ phase of a research project risks introducing significant bias.10 AI may plagiarize the work of others. Consider that the role of AI is to use a dataset of information to answer any questions that are posed to it. Depending on AI to do the work of answering a question without independently assessing the literature leaves researchers open to the possibility of claiming the ideas of others as their own or using the ideas of others without appropriate attribution.10, 11 The use of chatbots to assist with peer review of manuscript submissions is increasing in prevalence, and fraught with risks. Hosseini and Horbach12 identified potential efficiency benefits to the editorial process but concluded that “the fundamental opacity of LLMs’ training data […] and development process raise concerns about biases […]” So, what is the appropriate use of generative AI in science? In my opinion, generative AI should be thought of as an assistant. The researchers remain fully responsible for the integrity and originality of their work and for conducting thorough due diligence to confirm all source material and to assess its relevance to their work. I very much agree with my colleague's statement, he correctly addresses some of the risks of using LLMs for assisting in the preparation of scientific manuscripts, including the risks of bias, plagiarism, and error. I myself once asked ChatGPT how much time had passed between two dates exactly 40 years apart (I was double checking my mental “math”) and it returned a number of days corresponding to just over 30 years. It was simply dead wrong on what should have been a trivial calculation. Many of the espoused concerns about the use of LLMs involve situations where the researcher is using them to generate ideas or research directions (potentially inviting both bias and plagiarism). It is my opinion that this concern is born by individuals that themselves do not understand how deep domain knowledge has to be to conduct relevant research. To put it more simply, if you must ask an LLM for a research concept, you are not likely in the position to (a) understand the relevance and timeliness of the suggested research paths, or (b) conduct the research itself. Research, specifically cutting edge and relevant research, requires domain knowledge that current LLMs lack. That said, asking an LLM to provide structural or grammatical assistance is highly useful, even to those of us that have been drafting scientific manuscripts and grants for decades. We all have our stylistic weaknesses and blind spots, and while many of us may have colleagues willing to take the time and effort to clean up our writing, LLMs can go a long way to improving a manuscripts clarity and flow with little investment in time or effort. Dr. Low stated: “One of the biggest challenges of writing is at the beginning, staring at a blank screen and deciding on where to start. […] One use of ChatGPT is to draft bullet points of relevant background and the justification for the work and then to draft those into text, from which the author can edit and expand.” This is precisely what troubles me—relying on generative AI to “decide where to start” and to draft the “justification for the work”. Perhaps I am old-fashioned, but I believe scientific discovery should be based on relevant experience coupled with critical thinking. The current generation of generative AI tools demonstrably fails in this regard. In Dr. Low's own words, “ChatGPT can synthesize a list of manuscripts that cover a user-defined topic, but that list should be considered neither accurate nor complete.” In fact, the faculty at the MIT Sloan School of Management found that “Generative AI models function like advanced autocomplete tools: They are designed to predict the next word or sequence based on observed patterns. Their goal is to generate plausible content, not to verify its truth. That means any accuracy in their outputs is often coincidental. As a result, they might produce content that sounds reasonable but is inaccurate.13” Alkaissi and McFarlane14 explored the topic of ChatGPT hallucinations related to medical research. As they stated, “We asked ChatGPT to explain these findings further and provide references to fact-check the [hypothesis]. Hence, it provided five reference dating to the early 2000s. None of the provided paper titles existed, and all provided PubMed IDs (PMIDs) were of different unrelated papers. We then requested ChatGPT to provide more recent references from the last 10 years. The list provided was the same as the first list but with different years and similarly with PMID numbers that belong to different papers.” Similarly, Emsley and Bhattacharyya reported that “Another study investigating the authenticity and accuracy of references in medical articles generated by ChatGPT found that of 115 references that were generated, 47% were fabricated, 46% were authentic but inaccurate, and only 7% were authentic and accurate.4, 15” Dr. Low and I do agree on one key premise: “The utility of ChatGPT should be considered like an assistant that has an excellent grasp of sentence and paragraph structure and grammar, but that may have a limited grasp of the science behind the manuscript.” The authors have nothing to report. The authors declare no conflicts of interest.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Will large language model AI (ChatGPT) be a benefit or a risk to quality for submission of medical physics manuscripts?

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen