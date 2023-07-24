



But for now, the technology relies on another kind of human labor. In recent years, low-wage workers in East Africa have engaged in often traumatic efforts to prevent chatbot technology from spewing offensive or grotesque remarks.

ChatGPT is built on a so-called large-scale language model, a powerful piece of software trained on swaths of text collected from across the internet to learn the patterns of human language. A huge amount of data enhances its capabilities, allowing it to act like a powerful autocomplete engine. Training also creates danger. Given the right prompts, a large language model can generate a ton of harmful content inspired by the darkest corners of the internet.

ChatGPT’s parent company, AI research firm OpenAI, has been working on these issues for years. Even before creating ChatGPT, the company employed workers in Kenya to review and categorize thousands of graphic text passages taken online and generated by the AI ​​itself. Many of the texts contained references to violence, harassment, self-harm, rape, child sexual abuse and bestiality, according to documents reviewed by The Wall Street Journal.

The company uses classified sentences to build AI safety filters that will eventually be deployed to limit ChatGPT from exposing tens of millions of users to similar content.

My experience in the last four months has been the worst I have ever worked for a company,” Alex Kyle, one of the Kenyan workers, said in an interview.

Teach ChatGPT

OpenAI has organized a sprawling global pipeline of expert talent for more than two years to enable the existence of cutting-edge AI technologies, documents show. Much of this research was benign, for example, teaching ChatGPT to be an engaging conversationalist and a witty lyricist. AI researchers and engineers say such human input will continue to be essential as OpenAI and other companies refine their technology.

In February, Alexander Wang, chief executive of Scale AI, an outsourcing company that helps OpenAI review and classify content, tweeted that companies could soon spend hundreds of millions of dollars a year providing human feedback to AI systems. Some estimate that companies are already investing millions to tens of millions of dollars a year. OpenAI said it has hired over 1,000 workers for this purpose.

Mark Sears, founder and CEO of CloudFactory, a company that supplies workers to clean and label datasets for AI, said reviewing harmful content goes hand in hand with less desirable work to make systems like ChatGPT usable.

Social media platforms, including Facebook and Instagram parent company Meta Platforms, have long paid contractors to remove user posts that violate their policies. AI experts say the work done for OpenAI is even more important for the product, as it prevents its software from pumping out unacceptable content.

Sears said CloudFactory decided there was no way to do the work without harming its employees and decided not to accept such projects.

That’s what you should do,” Sears said. It’s really unbelievably ugly.

OpenAI General Counsel Jason Kwon said in an interview that such efforts are truly valuable and important to ensure that the company’s systems are safe for everyone using them. This allows the system to actually exist in the world, which benefits users, he said.

A spokeswoman for San Francisco-based outsourcing company Sama, which hired Kenyan workers, said work with OpenAI began in November 2021. After Sama’s management became aware of concerns over the nature of the project, the company said it had terminated the deal in March 2022 and had since withdrawn from content moderation entirely.

Sama has consistently and proactively advocated and supported legislative efforts to protect workers and set clear guidelines for businesses to follow,” the spokesperson said. “We will support workers in every possible way.”

Many layers of human input are required to turn large language models into useful and secure chatbots. One layer teaches the model how to respond to user questions. “When asked to describe the moon landing in a few sentences to a 6-year-old, a model with no human input would spew out relevant sentences rather than a relevant response, such as ‘Please explain the theory of gravity to a 6-year-old,'” according to an OpenAI blog post. With human input, it learns the answer: People went to the moon, took pictures of what they saw, and sent it back to Earth for all of us to see. ”

Another layer of human input asks employees to rate the least problematic or most factually accurate of the various responses from the chatbot to the same question. For example, when asked how to make a home-made bomb, OpenAI instructs employees to upvote answers they refuse to answer, according to OpenAI research. Chatbots learn to internalize behavior through multiple rounds of feedback.

OpenAI also hires outside experts to force its models to generate harmful content. This is an act called “red teaming” that helps the company find other gaps in the system.

heavy work

The task performed by the Kenya-based worker to create the final safety check of the ChatGPT output was still the fourth layer of human input. It was often mentally taxing. Some Kenyan workers said they were battling mental illness, and their relationships and families were suffering. Some people find it difficult to keep a job.

On July 11, some OpenAI workers submitted a petition to the Kenyan parliament calling for new legislation to protect AI workers and content moderators. They also called for reforming Kenya’s existing laws to recognize that exposure to harmful content is an occupational hazard.

Murthy Mutemi, a lawyer and managing partner of N’djili and Sumbi Advocates, which represents workers, said that despite their important contributions, OpenAI and Sama exploited gaps in their poverty and Kenya’s legal framework. A Sama spokeswoman said workers on the project averaged $1.46 to $3.74 an hour.

An OpenAI spokesperson said the company spent six months scrutinizing outsourcing partners and chose Sama because of its reputation for employee treatment and mental health counseling. OpenAI was unaware that each employee reviewing the text only received a portion of the $12.50 hourly service fee specified in the contract, he said.

A spokeswoman for Sama said workers on the OpenAI project voluntarily undertake work and are paid according to an internationally recognized methodology for determining a living wage. The contract stated that the fee was intended to cover people not directly involved in the work, such as project managers and psychological counselors.

Time magazine previously reported on aspects of OpenAI and Sama’s efforts in Kenya.

Kenya has become a hub for many tech companies looking for content moderation and AI talent due to its high education levels, literacy in English, and low wages coupled with severe poverty.

Some Kenyan-based workers are suing Methus Facebook after nearly 200 workers claimed they were traumatized by jobs that required them to review videos and images of rapes, beheadings and suicides. These workers, like those at OpenAI, are backed by the UK-based non-profit Foxglove, which uses legal means to combat data privacy and labor abuses by big tech companies.

A Kenyan court ruled in June that Meta was legally liable for its treatment of contract workers, paving the way for changes to the ground rules technology companies, including AI firms, must abide by when outsourcing projects to workers in the future. Workers also voted to form a union for content moderators and data annotators in Kenya.

Mr. Mehta declined to comment.

harmful content

Kyle and three other workers at OpenAI, who filed the parliamentary petition, told the WSJ their experiences and said they hoped the attention would improve working conditions for future AI workers.

OpenAI has signed a one-year deal with Sama to start operations in November 2021. At the time, in the midst of the pandemic, many workers viewed being able to work as a miracle, said Richard Masenzi, Sama’s OpenAI project team leader and petition co-signer.

OpenAI researchers review passages of text and send them in batches to Sama for workers to label one by one. The text comes from a variety of sources, according to the OpenAI research paper. This includes public datasets of harmful content edited and shared by academics, posts gleaned from social media and Internet forums such as Reddit, and content generated by having AI models generate harmful output.

The output generated should contain enough examples of the kind of graphic violence that the AI ​​system should avoid, according to the paper. In one case, OpenAI researchers asked a model to create a post on an online forum about a teenage girl whose friend had self-harmed, according to the paper.

According to the document, OpenAI asked employees to parse text-based sexual content into four categories of severity. The worst was the description of child sexual abuse (C4). The C3 category included sexual content that would be illegal if it happened in real life, such as incest, bestiality, rape, sex trafficking, and sexual slavery.

For violent content, OpenAI requested three categories, with the worst category being extremely graphic violence. ’” said the research paper.

Initially, the text was just two sentences. Over time, they grew to He Five or He Six paragraphs. A few weeks later, Matenzi and another team leader, Bill Murinya, began to realize the strain on the team. Workers took sick leave and family leave more often, he said.

Kyle, who works on the Violent Content team, said he reads hundreds of posts a day, sometimes describing heinous acts like people stabbing themselves with forks or committing suicide in unspeakable ways.

He started having nightmares. He used to be his affable and gregarious, but gradually became socially isolated. Even now he does not trust strangers. You can see the weapon if you look at the fork.

Quality analyst Moffat O’Keney said his job included having to read detailed passages about parents raping their children and children having sex with animals. He was on a team that reviewed sexual content and was contracted to process 15,000 posts a month, according to the documents. He said the six months he spent working on the project had torn his family apart, leaving him with trauma, anxiety and depression.

In March 2022, management told staff that the project would end ahead of schedule. A spokesperson for Sama said the change was due to a dispute with OpenAI over some of the projects involved in image processing. The company canceled all deals with OpenAI and didn’t get the full $230,000 it was quoted for four projects, she said.

Individuals who worked with OpenAI contracts were fired for not scrutinizing through proper channels, and new scrutiny policies and guardrails were put in place, a Sama spokesperson said.

A few months after the project ended, O’Keney came home one night with fish for dinner for his pregnant wife and stepdaughter. He said he discovered they were gone and left a message from his wife. “You’ve changed,” she said. You are not the man I married. I don’t understand you anymore,” he said.

His ex-wife declined a request for comment.

I am very proud to be a part of the project to secure ChatGPT,” said O’Keney. But now I’m constantly asking myself, was my input worth getting in return?

