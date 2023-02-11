



AI-generated texts from tools like ChatGPT are beginning to impact our daily lives. Teachers test it separately from classroom teaching. Marketers are a bit reluctant to replace interns. Memmers are a fuss. myself? I’d be lying if I said I wasn’t one bit worried about robots coming to my writing gig. (ChatGPT is fortunately not yet able to hop on to Zoom calls for interviews.)

With publicly accessible generative AI tools, you are likely to encounter more synthetic content while surfing the web. Sometimes it’s harmless, like which fried dessert matches your political beliefs in an auto-generated BuzzFeed quiz (Are you a Democrat or a Republican?). , like sophisticated propaganda campaigns by foreign governments, may be more malicious. .

Academic researchers are looking at ways to detect if a sequence of words was generated by a program like ChatGPT. What’s the definitive indicator that what you’re reading right now was spun up with AI assistance?

Lack of surprises.

entropy, evaluated

Algorithms with the ability to mimic natural writing patterns have been around for a few years longer than you might think. In 2019, Harvard University and the MIT-IBM Watson AI Lab released an tool that scans text and highlights words based on their level of randomness.

Why is this useful? AI text generators are basically mystical pattern machines. It excels at camouflage and is not good at throwing curveballs. Sure, he might have a predictable tone and tone when typing an email to his boss or sending a group text to a friend, but there’s an underlying fickle nature to his style of human communication.

Princeton University student Edward Tian went viral earlier this year with a similar tool for educators called GPTZero. Measures the likelihood that some of the content was generated by his ChatGPT based on its perplexity (randomness) and burstiness (variance). His OpenAI behind ChatGPT has dropped another tool that scans text over 1,000 characters in length to make judgments. The company is outspoken about the tool’s limitations, including false positives and limited effectiveness in non-English languages. Most tools for AI text detection currently benefit English speakers, just as English data is often a top priority for the people behind his AI text his generators. Perfect for bringing.

Can you sense if at least part of a news article was created by AI? These AI-generated texts can’t do the job of journalists like Reese, says Tian. It’s a heartfelt impression. His technology-focused website, he CNET, has published multiple articles written by algorithms and dragged to the finish line by humans. At the moment, ChatGPT lacks certain chatpers and causes occasional hallucinations. This can be a problem for reliable reports. We all know that qualified journalists save psychedelics after hours.

entropy, imitation

While these detection tools are useful for now, Tom Goldstein, a professor of computer science at the University of Maryland, sees a future where they become less effective as natural language processing becomes more sophisticated. These kinds of detectors rely on the fact that there are systematic differences between human and machine text, says Goldstein. But the goal of these companies is to make machine text as close as possible to human text. Does this mean that all hope of synthetic media detection is lost? Absolutely not.

Goldstein worked on a recent paper investigating possible watermarking techniques that can be incorporated into large-scale language models that power AI text generators. It’s not foolproof, but it’s an attractive idea. ChatGPT tries to predict the next most likely word in the sentence, comparing multiple alternatives in the process. A watermark may be able to designate certain word patterns to be off-limits to the AI ​​text generator. So if the text is scanned and the watermarking rules are broken too many times, it’s a good indication that the masterpiece has been counteracted by humans.

