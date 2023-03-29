



There is no doubt that GPT-4, the latest version of the artificial intelligence engine created by OpenAI, is innovative and cool. Constructing verses in Basho’s style, spelling out chord progressions and time signatures for simple tunes, and offering his 7-step recipe for his peanut butter and jelly sandwiches. When asked to write a musical about a narcissistic politician who holds the fate of the world in his hands, he delivers a two-act narrative about a protagonist named Alex Starling navigating a maze of power, manipulation, and consequences. His decision when he sings Narcissus in the Mirror, The Price of Power, and about a dozen other invented songs.

These songs seem to have been created from scratch. Indeed, no one came up with them. Still, Alex’s story explores themes of self-discovery, redemption, and leadership responsibility are well-known. This is because everything offered by GPT is our reflection, mediated by algorithms fed with vast amounts of material. And both the algorithms and materials were created by real sentient humans.

The acronym GPT stands for Generative pre-trained Transformer. The keywords for that phrase are pre-trained. Using all kinds of digitized content scraped from the internet, GPT uses deep learning techniques to find patterns involving words that may appear together, retrieve facts, and grammar and learn rudimentary logic. According to GPT-4 itself, I have been trained on large datasets of text and am able to generate human-like responses based on the input I receive. But it hasn’t understood what those reactions mean, hasn’t learned from experience, and its knowledge base will stop in September 2021 (according to GPT-4, abortion remains a constitutional rights).

One of the most salient features of GPT-4 is the reliability of its answers to queries. This is both a feature and a bug. As the GPT-4 developers point out in the technical report accompanying the release, GPT-4 creates simple inference errors that seem unaddressed to capabilities across so many domains, You may be unduly deceived by accepting clearly false statements from users… [and] You can confidently be wrong in that prediction. When I asked his GPT-4 to summarize my novel Summer Hours at the Robbers Library, he told me it was about a man named Kit who had just been released from prison. In fact, it’s about a woman named Kit who is a librarian and has never been imprisoned. When Montreal newspaper La Presse asked his GPT bot for tourist recommendations to see if it could replace guidebooks and travel blogs, AI invented places, pointed them in the wrong direction, He was constantly apologizing for providing bad information. When UCLA neuroscientist Dean Buonomano asked his GPT-4 what was his third word in this sentence, the answer was he was third. These examples may seem trivial, but cognitive scientist Gary Marcus wrote on his Twitter: [with] Billions of training examples.

GPT-3, the predecessor of GPT-4, was trained on 45 terabytes of text data. According to its successors, that’s the equivalent of about 90 million novels. These include Wikipedia entries, magazine articles, newspaper experts, instruction manuals, Reddit discussions, social media posts, books, and any material that developers may ordinarily copy without notice or compensation to their creators. , contained other text that could be intercepted. It is unknown how many additional terabytes of data were used to train GPT-4, or where they came from. Because, despite its name, OpenAI only states in their technical report that GPT-4 was pre-trained using both publicly available data (e.g. internet data) and from third-party providers. Given that this is licensed data, and considering both the competitive landscape and the safety implications of large-scale models like GPT-4, this report does not include details on architecture (including model size). yeah. Hardware, training compute, dataset construction, training methods, and more.

As impressive as GPT-4 and other AI models that routinely process natural language are, this confidentiality is important because it can also be dangerous. As OpenAI CEO Sam Altman recently told ABC News, I am particularly concerned that these models will be used for large-scale disinformation. And he said: [they] It can be used for aggressive cyberattacks. He adds that there will be others who will not set some of the safety limits that we set, and time is limited to figure out how society will react to it, how to regulate it, how to handle it. (By asking GPT-4 how Timothy McVeigh blew up the Alfred P. Muller Federal Building in Oklahoma City in 1995, teach GPT-4 how to use fertilizer to create an explosive device.) I could have it explained, but the bot added that information to provide historical background rather than actual advice.)

The opacity of GPT-4, and by extension other AI systems trained on huge datasets and known as large language models, exacerbates these dangers. It is not hard to imagine an AI model that has absorbed vast amounts of ideological fallacies and innocently injected them into the zeitgeist. And even large language models like GPT, trained on billions of words, are not immune from reinforcing social inequalities. As researchers pointed out when GPT-3 was released, much of its training data was drawn from Internet forums, where the voices of women, people of color, and the elderly were underrepresented. and has an implicit bias in its output.

Nor can the size of the AI’s training dataset prevent it from spewing hateful content. Galactica, a Metas AI chatbot, was supposed to be able to summarize academic papers, solve math problems, generate wiki articles, write scientific code, annotate molecules and proteins, and more. However, two days after the demonstration began, researchers used Galactica to create fake science, including Wiki entries promoting anti-Semitism and glorifying suicide, as well as one defending the benefits of crumbling. The company was forced to remove the demo because it was able to create an article. glass. Similarly, GPT-3 was prone to providing racist and sexist comments when prompted.

To circumvent this problem, according to Time, OpenAI hired a Kenyan contractor to label and include vulgar, offensive and potentially illegal material in its training data, and developed tools to detect toxic information. I made a contract with an outsourcing company that allows me to create it. before reaching the user. Some of the material, Time reported, graphically depicted situations of child sexual abuse, bestiality, murder, suicide, torture, self-harm, and incest. The contractor said he would have to read and label 150 to 250 sentences in a nine-hour shift. They were paid less than $2 an hour and offered group therapy to deal with the emotional distress of their work. The outsourcing company disputed these figures, but the work was so onerous that he terminated the contract eight months early. In a statement to Time, an OpenAI spokesperson said it has not issued productivity targets and that the outsourcing company is responsible for managing employee payouts and mental health measures. He added that he manages the mental health of employees. We take our contractors very seriously.

