Google sues for ‘secretly stealing’ data to train Bard

A California law firm has filed a class action lawsuit (opens in new tab) against Google, alleging it “covertly stole” vast amounts of data from the web to train its AI technology.

Clarkson Law Firm is suing the tech giant for negligence, invasion of privacy, theft, piracy and profiting from illegally obtained personal data. “Google takes all of our personal and professional information, our creative and copyright work, our photos, and even our emails, effectively taking our entire digital footprint and transforming it into: We use it to build commercial artificial intelligence (“AI”) products. Byrd,” the complaint filed July 11 in the Northern District of California said:

See also: FTC investigating OpenAI for potential consumer harm

The lawsuit comes after Google quietly updated its privacy policy last week, claiming that any public information can be used to train its AI products like Bard. Google basically says that anything published on the web is fair game, but law firms scrape data without compensation or consent for the obvious reason of training AI models. We believe that doing so is a massive invasion of privacy. The complaint alleges that Google, a multi-billion dollar company with more than a billion users worldwide, puts its users in an “unbearable” position. “Either you use the internet and surrender all your personal and copyrighted information to Google’s insatiable AI model, or you avoid it.” ”

In a statement to Reuters (opens in a new tab), Google general counsel Halima Delane Prado said the allegations were “unfounded,” adding: We use data from public sources such as public datasets for training.” The AI ​​models behind services like Google Translate have been responsibly built according to our AI principles. “

Clarkson recently filed a similar class action lawsuit against OpenAI, the company that developed ChatGPT, for “theft and diversion of personal data” through similar data scraping operations. Large language models require large amounts of data to train AI chatbots and make them conversational and intelligent. Both Bard and ChatGPT rely on large language models to work, raising concerns about personal data use and piracy.

The latest lawsuit alleges that Google used data sets from nonprofits such as Common Crawl, which makes data free for research and educational purposes, and from sites like Medium and Kickstarter. Google also uses its own data from Gmail and Google Search to feed the model. Other data collected includes copyrighted material such as e-books from digital libraries and even copyrighted material from pirated websites, which the company uses without compensation to artists and authors. .

Key to Clarkson’s lawsuit is the public domain issue. But the complaint states that “‘public availability’ in no way means free to use for any purpose.” Yes, some data is available for purchase, but it depends on usage and user consent. Yes, a user agrees to her privacy policy when publishing content on her web, but has the right to know if it is used elsewhere. In other words, Clarkson said, “Google needs to understand once and for all that it does not own the Internet.”




