Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
ChatGPT Without a Safety Net
3
Zitationen
1
Autoren
2023
Jahr
Abstract
Generative artificial intelligence (AI) applications continue to have huge potential for their ability to generate just about any type of content from large data sets. Educators continue to grapple with their risks and benefits. Understanding what generative AI applications are and how to approach them in the educational environment has been challenging given their continuous evolution and novelty. Several options for detecting the likelihood of AI within a student writing sample appear to offer a safety net for ensuring academic integrity. Unfortunately, however, reports of issues with these detectors have introduced a new set of ethical and practical considerations beyond the still murky challenges of generative AI itself. Generative AI appears to be at, or even just, beyond the peak of the Gartner Hype Cycle (Raza, 2020), a theory that identifies the predicable curve of high expectations, followed by a crash of disillusionment and an eventual state of normalization. No matter where this technology is in terms of perceptions and utility, it cannot be ignored in terms of its positive and negative values across academia, health care delivery, and beyond. The simplicity of use and seemingly endless options for efficiencies and streamlining of content generation make generative AI an enticing option for administrative cost-saving initiatives and, in some cases, a shortcut for time-strapped students. GENERATIVE AI DETECTORS The widespread availability of Chat Generative Pretrained Transformer (ChatGPT) was quickly followed by applications intended to distinguish human from AI-generated text. One of those applications was even released by OpenAI, the creators of ChatGPT. Other applications quickly followed like ZeroGPT, Winston AI, and a built-in AI detector within the popular plagiarism software family from Turnitin. Turnitin applications, used internationally in more than 16,000 institutions, were even directly embedded into course management systems. Reports from the first three months of turning on the detection tool indicated that 3.3 percent of 65 million student papers were composed of 80 percent or more AI-generated text (Kuykendall, 2023). Like any hyped phenomenon, there has been no shortage of entrepreneurs selling AI services for academic use and misuse, including options for rephrasing AI-generated text so that it can beat detection software (Lu et al., 2023). As an example, the web-based detector, Content at Scale, claimed a 98 percent detection success rate. The site offers free detection services and, for a fee, will also regenerate text so that it will appear to be written by a human. Moral ambiguity aside, this example validates the growing use and commercialization of generative AI for both positive and negative use cases. The true extent of use is unfortunately obfuscated by these kinds of services, as well as the widely published prompting strategies that can hide the appearance of text being generated by AI. AI detection tools appeared to offer an opportunity for faculty to hold students accountable and even serve as a potential deterrent to academic misuse. However, analysis of these detection tools and their implications reveals holes in that safety net beyond the known means of tricking detection applications. Additional holes have included reports of bias and conflicting rates of detection success, lack of clear guidance on acceptable and responsible use thresholds, and the potential negative impact to the educator/student dyad. ACCURACY AND BIAS Among the holes in the generative AI safety net are disparities in rates of success-reporting between vendors and independent investigators. Detection vendors and application owners claim detection rates of often 80 percent to more than 90 percent accuracy. Several independent studies of often the very same applications find rates to be 20 percent to 50 percent or even lower (Elkhatat et al., 2023; Hines, 2023; Weber-Wulff et al., 2023). The authors often concluded that detector sensitivity/specificity may limit their effectiveness in identifying AI-generated text. Given the high-stakes academic and legal implications of cheating allegations, even the Federal Trade Commission (Atleson, 2023) weighed in on detection bias, vendor overstatements of accuracy, and deceptive marketing. Researchers at Stanford University (Liang et al., 2023) evaluated seven generative AI text detectors for potential bias. They found that text written by nonnative English writers was more likely to be misidentified as having been generated by AI. The authors also found that with simple self-editing prompts, essays could be improved as a kind of writing tutorial but also intentionally edited to circumvent detection. The Liang et al. (2023) study is one of many that validate the two sides of the generative AI coin in terms of value and risk. That balance relies currently on the subjective perspectives of educators, a student’s own level of information literacy and moral integrity, and often lagging academic policies. LACK OF CLEAR GUIDANCE In an international UNESCO survey from June 2023 (UNESCO, 2023), it was reported that fewer than 10 percent of K-12 institutions and universities surveyed had formal guidance in place related to generative AI. The authoritative output of generative AI has coasted directly into most institutions’ plagiarism policy blind spots, causing philosophical and intellectual whiplash. The line is blurred as to when using generative AI for brainstorming, summarization, or revision crosses over into plagiarism and academic misconduct. To add to the complexity, detectors do not test findings in a consistent way (Elkhatat et al., 2023), with some using measures like likelihood of AI generation versus percentage of content that was AI generated. Students and educators are left confused and lacking in direction as to when and how much generative AI is allowed, a compounding stressor for those still nursing their COVID-19 academic hangover. Plagiarism policies and the consequences of academic misconduct are widely published at every higher learning institution but often cannot keep up with emerging forms of academic misconduct. Educators and academic administrators are left to struggle with what thresholds and types of use may be allowed, further slowing progress toward guidance for students. Amidst this lack and lag from academic institutions, some detection application owners have offered their own guidance as to how to respond to detection results. GPTZero was one of the earliest AI detection tools created following the public release of ChatGPT v3.5. Its website displays a warning below detection results stating, “The nature of AI content is changing constantly. As such, these results should not be used to punish students.” The creators of ChatGPT, OpenAI, decided to shut down their detector in July 2023 because of low rates of accuracy (OpenAI, 2023). HARM TO THE STUDENT/EDUCATOR DYAD AND CREATING A CULTURE OF TRUST The issues of AI-generated text detection and the lack of clear guidance about the use of generative AI have set students and educators on a potentially adversarial and antagonistic path. The chase to detect and punish cheating and plagiarism can lead to false accusations that often have difficult official and unofficial consequences. Archibald and Clark (2023) editorialized about three main approaches health science faculty could take with regard to generative AI, namely, avoidance, prohibition, or integration. Use of and reliance on detectors arguably takes a clear prohibitive stance in which identification of AI authoring creates a cloud of suspicion. The stance of guilty until proven innocent has a negative impact on student morale and erodes trust. Students may all be cast as potential cheaters rather than as accountable and trusted future colleagues. Students who choose to act ethically and avoid the temptations of paper mills and inappropriate use of generative AI may be just as alienated as students who did actually commit academic misconduct. Reimagining a safety net for generative AI and learning requires action on the part of industry, students, educators, and academic administrators. The creators and critics of generative AI are already talking about self-regulation given the early and pervasive discussion of their ethical use and growing misuse. The veracity of claims by detection sites parallels the long history of new and poorly understood innovations, which often operate early on in the hype cycle with limited oversight and accountability. Code-based “watermarking” of output is one example of a proposed strategy intended to encourage easier detection and to provide evidence of digital content provenance. It may unfortunately take forced accountability in the form of regulation to address predatory and unethical practices around generative AI and AI detection. Faculty and students will need to approach generative AI like any emerging technology and not vilify the tool outright. Generative AI can be used to teach powerful lessons about information literacy, along the same lines of not believing everything seen on television or found on the Internet. Information literacy is a crucial skill for nurses in an era where it is impossible to know everything and where information-seeking skills are far more valuable and perhaps safer than rote memorization. Generative AI applications often have direct warnings for users that outputs may be incorrect if not entirely fabricated. To treat generative AI output as fact based solely on its deceivingly confident results can be just as dangerous as relying on social media or sources with low authority for patient care information. Unfortunately, generative AI may be exacerbating the incivility wars in academia and in nursing education (Darbyshire & Thompson, 2021). The shaky evidence of how well digital detection tools discern human from AI writing may result in the kinds of false accusations and clouds of suspicion that are shown to exacerbate student/educator antagonism. Even if AI detectors improve, solely using their results as evidence for cases of academic misconduct may not be advisable. Eventually, there may be detectors or other kinds of applications that can assist, but like clinical technology, educators may want to lean more toward augmenting and supplementing their judgment and experience with students and student writing. No detector will ever be a replacement for clear student expectations, progressive and equally clear academic integrity policies, and reliance on writing as a process rather than solely as a product. Prudent and judicious approaches are especially important in the high-stakes and high-anxiety milieu of nursing education programs. Returning to what may be perceived as basics of the learning social contract, discussion of the writing process itself may be valuable in this early era of generative AI. For example, having open dialogue about and clear rationales for the “why” of assignments and learning can clarify the learning social contract and inherent value of the course. Clear expectations about the use of generative AI can help to establish boundaries of use and a context of civility and mutual respect. In terms of the basics of writing, the breaking down of assignments into smaller chunks, “show your work” approaches, in-class writing, and required integration of concepts directly from class, are some strategies that may reduce the incentive to use generative AI and its value in assisting with the writing process (Coley et al., 2023). It may be time for educators to let go of non-value-added learning work that could be handed off to generative AI tools and even incorporated into the writing process. The hype of generative AI may also be a useful means of drawing attention to the need for greater writing support services and building more promotion of writing skills into curricula. Ultimately, students are accountable for demonstrating their learning with the highest degree of academic integrity, just as faculty are accountable for setting students up to be successful in that endeavor. CONCLUSION In August 2023, Vanderbilt University announced its decision to disable the AI detection option within Turnitin for many of the reasons noted above (Vanderbilt University, 2023). Vanderbilt’s action and others like it are perhaps a sign that generative AI is headed toward the hype cycle phase of disappointment and disillusionment. There is compelling evidence that educators cannot rely solely on detection applications to determine if generative AI was used to write papers. The initial safety net detectors may have provided appears now to be nothing more than a false sense of security. The specter of actual or perceived questioning of an assignment’s origins runs the risk of upsetting the fragile trust inherent in the student/educator dyad. Ideally, industry self-regulation, progressive academic policies, and a return to the basics of information literacy and writing as a process can turn generative AI into less of an existential threat and more of a tool for teaching and learning.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.493 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.377 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.835 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.555 Zit.