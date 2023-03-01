



The author is a scientific critic

A large language model like ChatGPT is a validity provider. Many so-called generative AI-based chatbots are trained to answer user questions, gather relevant information from the internet, assemble coherent answers, compose compelling student essays, credible Produce legal documents, credible news articles in bulk.

However, some machine-generated texts may not be accurate or true, as the published data contains misinformation and disinformation. Therefore, there is an urgent need to develop tools to identify whether text is human-generated or machine-generated. The scientific community is also struggling to adapt to this new era, with lively debates over whether chatbots should be allowed to write scientific papers or generate new hypotheses. .

The importance of distinguishing between artificial intelligence and human intelligence grows every day. This month, UBS analysts revealed that ChatGPT was the fastest growing web app in history, reaching his 100 million monthly active users in January. The International Baccalaureate said Monday it will allow students to write essays using ChatGPT.

To be fair, tech creators are upfront about their limitations. OpenAI chief executive Sam Altman warned in December that ChatGPT is better at some things that it could give the misleading impression of being better. The company develops a secret machine-readable sequence of cryptographic watermarks, punctuation, spelling and word order for output. We then refine a classifier that distinguishes between synthetic and human-generated text, and train it using examples of both.

Stanford University graduate student Eric Mitchell realized that a classifier would require a large amount of training data. Together with his colleague he came up with his DetectGPT. This is a zero-shot approach to finding differences. In other words, this method does not require prior learning. Instead, this method turns on the chatbot and sniffs its own output.

It works like this: DetectGPT asks the chatbot how much you like the sample text. Preference is a simple expression of how similar a sample is to your creation. DetectGPT then goes a step further by cluttering the text and slightly changing the wording. The assumption is that chatbots are more mutable with modified human-generated text than with modified machine text. In early tests, the researcher claims the method correctly distinguished human and machine authors 95% of the time.

There is a caveat. The results have not yet been peer-reviewed. While this method is better than random guessing, it did not work equally well with all generative AI models. DetectGPT can be fooled by a human tweaking the synthetic text.

What does this mean for science? Scientific publishing is the lifeblood of research, injecting ideas, hypotheses, arguments and evidence into the global scientific norms. Some quickly joined ChatGPT as research assistants, and several papers list the controversial AI as a co-author.

Meta also launched a science-focused text generator called Galactica. It was withdrawn after 3 days. Among the howls it produced was the fictional history of spacefaring bears.

Professor Michael Black of the Max Planck Institute for Intelligent Systems in Tbingen tweeted at the time that he was baffled by Galacticas’ responses to multiple inquiries about his own research area. In all cases [Galactica] It sounded right and authoritative, even though it was wrong and biased. I think it’s dangerous.

The dangers arise from plausible texts blending into actual scientific submissions, adding false citations to the literature, and forever distorting the canon. The journal Science now completely bans generated text. Nature permits its use where declared, but prohibits co-author credit.

Again, most people don’t refer to high-end journals to guide their scientific thinking. You can spew on demand a stream of highly cited pseudoscience about why global warming is a hoax. That misleading material posted online could be swallowed up by future generative AI, creating new iterations of falsehoods that further pollute the public discourse.

Suspicious merchants must be rubbing their hands.

