Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Equity in Scientific Publishing: Can Artificial Intelligence Transform the Peer Review Process?

2023·6 Zitationen·Mayo Clinic Proceedings Digital HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2023

Jahr

Abstract

Chat Generative Pre-Trained Transformer (ChatGPT), a large language model developed by OpenAI, is gaining global recognition for its ability to read and perform writing tasks with human-like precision. Recently, this artificial intelligence (AI) tool has made important inroads into health care, helping streamline administrative tasks like writing referrals and previous authorizations, classifying skin conditions, and devising patient-specific care plans. Although ChatGPT has also been used as a research aid, largely underexplored are its potential applications in evaluating manuscripts for publication. Although there are inherent quality control concerns, using generative AI tools for peer review could rectify inequities in the research process and help create more inclusive scholarly discourse. Peer review, a cornerstone of academic research, is a labor-intensive endeavor. In 2020, reviewers spent 100 million hours, or 15,000 years, working on these reviews, summing up to 1.5 billion dollars of time for US-based reviewers alone.1Aczel B. Szaszi B. Holcombe A.O. A billion-dollar donation: estimating the cost of researchers’ time spent on peer review.Res Integr Peer Rev. 2021; 6: 14https://doi.org/10.1186/s41073-021-00118-2Crossref PubMed Google Scholar With no financial compensation, scholars often decline review requests, leading to a shortage of peer reviewers and the inflating time to publication. The predominant nonblinded peer review model also favors Western authors with established reputations, introducing bias against those from low-income and middle-income countries (LMICs).2Fox C.W. Meyer J. Aimé E. Double-blind peer review affects reviewer ratings and editor decisions at an ecology journal.Funct Ecol. 2023; 37: 1144-1157https://doi.org/10.1111/1365-2435.14259Crossref Scopus (14) Google Scholar These scholars’ work may be further dismissed because of surface-level differences, given that English may not be the primary language for some LMIC authors. The prominence of Western authors in premier journals often overshadows voices from less-resourced nations. This phenomenon is commonly referred to as academic ventriloquism.3Silverio S, Wilkinson C, Wilkinson S. Academic Ventriloquism: Tensions Between Inclusion, Representation, and Anonymity in Qualitative Research. Paper presented at: The British Psychological Society Qualitative Methods in Psychology conference; July 2022; Leicester, United Kingdom.Google Scholar This was particularly true during the COVID-19 pandemic, where Western perspectives dominated the global discourse despite disparate experiences among LMICs.4Benjamens S. de Meijer V.E. Pol R.A. Haring M.P.D. Are all voices heard in the COVID-19 debate?.Scientometrics. 2021; 126: 859-862https://doi.org/10.1007/s11192-020-03730-zCrossref PubMed Scopus (3) Google Scholar Limited contributions from resource-poor settings bearing a disproportionate disease burden highlight the need for broader academic discourse. Artificial intelligence models like ChatGPT could help mitigate some of these biases in academic publishing. For instance, AI can detect language errors by suggesting revisions to grammar, readability, and formatting discrepancies, pausing the submission, and allowing authors to make suggested edits. These services could be coupled with an AI-assisted author blinding process to ensure scientists properly leave their names off manuscripts and obscure identifying references. Allowing reviewers to focus more on the quality of the research question rather than superficial issues or who the authors are could help reduce review time and, thus, potentially, gender disparities in academic career advancement. Indeed, women accept review invitations more frequently than men,5Schmaling K.B. Blume A.W. Gender differences in providing peer review to two behavioural science journals, 2006-2015.Learn Publ. 2017; 30: 221-225https://doi.org/10.1002/leap.1104Crossref Scopus (5) Google Scholar diverting time toward this uncompensated public service. Generative AI may thus help address structural inequities on various levels. Although these applications are mainly uncontroversial, the use of AI among academic referees has proven more controversial. In July 2023, mainly because of data confidentiality, originality, and accuracy concerns, the National Institutes of Health (NIH) banned AI in peer reviews on NIH research proposals and grant applications.6Science funding agencies say no to using AI for peer review. Science AAAS.https://www.science.org/content/article/science-funding-agencies-say-no-using-ai-peer-reviewDate accessed: September 8, 2023Google Scholar To enforce this policy, the NIH’s peer reviewers must now sign a security and confidentiality agreement, emphasizing review process confidentiality that confirms the content under evaluation has not been shared with any other parties, including AI platforms.7The Use of Generative Artificial Intelligence Technologies is Prohibited for the NIH Peer Review ProcessNational Institute of Health.https://grants.nih.gov/grants/guide/notice-files/NOT-OD-23-149.htmlDate accessed: September 12, 2023Google Scholar However, this decision is juxtaposed against the relatively liberal stance of prominent journals, including the Proceedings of the National Academy of Sciences and those from publishers Elsevier and Springer-Nature, that mandate the declaration of AI contributions during manuscript submission, although AI is barred from direct authorship.8The use of AI and AI-assisted writing technologies in scientific writing.https://www.elsevier.com/about/policies/publishing-ethics/the-use-of-ai-and-ai-assisted-writing-technologies-in-scientific-writingDate accessed: September 12, 2023Google Scholar,9Nature PortfolioArtificial Intelligence (AI).https://www.nature.com/nature-portfolio/editorial-policies/aiDate accessed: September 12, 2023Google Scholar These more nuanced approaches are essential because generative AI is not without its risks. For instance, peer reviews with humans in the loop may create even more noise in academia, correctly assessing the validity or importance of research and leading to the promotion of flawed or inconsequential studies. A more insidious concern may be AI bias, where models generate feedback with prejudices learned from their training data and development or hallucinate incorrect facts altogether.10Doshi R.H. Bajaj S.S. Krumholz H.M. ChatGPT: Temptations of Progress.Am J Bioeth. 2023; 23: 6-8https://doi.org/10.1080/15265161.2023.2180110Crossref PubMed Scopus (11) Google Scholar These issues can be challenging to detect because of AI decision-making’s black box nature, where the inputs and outputs are visible. Still, its thought processes are opaque, potentially obscuring prejudice and incorrect associations in manuscript evaluations. However, outright bans because of such limitations are unrealistic and counterproductive. At the same time, generative AI is already prevalent; bans may not diminish its use but instead push it into unregulated shadows, void of any safeguards. Indeed, US education systems reacted similarly to ChatGPT’s introduction. However, after initially banning its use, school districts in Los Angeles, Seattle, and New York later integrated ChatGPT into their curriculum, recognizing its inevitable role in future collaboration and our society.11More schools want your kids to use ChatGPT. Really. POLITICO.https://www.politico.com/news/2023/08/23/chatgpt-ai-chatbots-in-classrooms-00111662Date accessed: September 8, 2023Google Scholar Given the promise and pitfalls of AI, how can publishers implement these tools equitably? Clear rules for joint review by humans and AI can encourage mutual supervision. In the initial internal evaluation phase, before the manuscript is sent to external reviewers, AI can help fix basic errors, allowing humans to focus on the actual content of the papers.12Salvagno M. Taccone F.S. Gerli A.G. Can artificial intelligence help for scientific writing?.Crit Care. 2023; 27: 75https://doi.org/10.1186/s13054-023-04380-2Crossref PubMed Scopus (101) Google Scholar These human editors, in turn, can check AI-generated feedback for inaccuracies or bias, addressing the tendency of these technologies to fabricate information or overlook critical nuance. Indeed, AI needs help handling statistics, applied math, and a deep understanding of meaning in conversations while also needing to catch up with humans in problem-solving, decision-making, critical thinking, and originality.13Bogost I. ChatGPT Is dumber than you think. The Atlantic.https://www.theatlantic.com/technology/archive/2022/12/chatgpt-openai-artificial-intelligence-writing-ethics/672386/Date accessed: March 28, 2023Google Scholar Acknowledging these limitations, journals can collaborate with research organizations like the Algorithmic Fairness and Opacity Group to train in-house AI on diverse datasets of a journal’s prior peer reviews.14Algorithmic Fairness and Opacity Group—Berkeley.https://afog.berkeley.edu/Date accessed: March 28, 2023Google Scholar These AI tools can then be integrated into the internal review process, supporting human editors by identifying basic errors and ensuring the quality of content before it reaches external peer reviewers. To improve validity and account for progress in the field, in-house AI tools should undergo retesting, retraining, and recertification at standard intervals. Invited reviewers downstream of the initial screening process could also be permitted to use AI if they supervise its evaluation, correct its errors, and disclose its use. To streamline reviewer tasks, disclosures can be succinct, employing a concise checklist that pops up when a reviewer checks that AI was used in the submission. This checklist could allow the reviewer to confirm the absence of full AI automation in the peer review process, gauge the extent of human-AI collaboration, and prompt reflection on potential inherent biases within the reviewer's feedback. To facilitate these AI usage disclosures by invited reviewers, journals can send short courses or training videos in invitation emails to reviewers, describing how to spot biases and inaccuracies. Journals could also display a roster of approved AI algorithms on their websites for invited reviewers to use, accompanied by guidelines on the basis of each algorithm’s reported capabilities during internal testing. Similar to how publishers prohibit AI as full authors, they should also ban fully autonomous peer reviews due to AI’s inability to be liable for its output. However, supervised use is acceptable if declared. Indeed, allowing supervised AI use, rather than banning it outright, leverages the technology's potential while maintaining human oversight in the review process. These policies can also encourage transparency between reviewers, authors, editors, readers, and society. We used an open peer review from a Nature Communications paper to test how this AI-assisted evaluation could be implemented in the real world. We compared the reviewer’s comments from the preprint manuscript with comments from GPT-4.15Brannock M.D. Chew R.F. Preiss A.J. et al.Long COVID risk and pre-COVID vaccination in an EHR-based cohort study from the RECOVER program.Nat Commun. 2023; 14: 2914https://doi.org/10.1038/s41467-023-38388-7Crossref PubMed Scopus (7) Google Scholar First, the AI prefaced its response with an acknowledgment of its limitations, which may help mitigate overreliance on its judgment—or prove meaningless due to disclaimer fatigue (Figure 1). ChatGPT’s response to the preprint then found promise in flagging general issues for reviewers to critique with greater detail and specificity (Figure 2). Finally, AI seems well-poised to confirm whether an author adequately addressed a reviewer’s original comments, with its robust explanation helping reduce algorithmic opacity and time spent rereviewing (Figure 3).Figure 2GPT-4 does not provide very detailed responses to research manuscripts but can flag areas where the reviewer could scrutinize more carefully (Top: ChatGPT user input, Middle: ChatGPT output, Bottom: human reviewer comment).View Large Image Figure ViewerDownload Hi-res image Download (PPT)Figure 3GPT-4 can determine whether the rebuttal letter has adequately addressed reviewer critiques, reducing time for rereview (Top: ChatGPT user input, Middle: ChatGPT output middle, Bottom: rereview from human reviewer).View Large Image Figure ViewerDownload Hi-res image Download (PPT) Establishing guidelines that maximize collaboration between human experts and AI tools—while confining AI’s role to automated desk reviews, identification of nontechnical manuscript shortcomings, and supervised feedback on complex ideas—can enhance reviews’ efficiency, quality, and equity. Unregulated AI is already challenging the integrity of the publication process. Still, if integrated thoughtfully, AI can serve as an exoskeleton in peer review, amplifying human capabilities without supplanting critical judgment. Authors declare that they have no competing interests.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationCOVID-19 and healthcare impacts

Volltext beim Verlag öffnen

Equity in Scientific Publishing: Can Artificial Intelligence Transform the Peer Review Process?

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen