



Google has confirmed that it uses scraped web data to train its artificial intelligence (AI) systems, according to an updated privacy policy. The search giant said its AI services, including Bard AI and Cloud AI, could be trained using public data pulled from the web. Google spokeswoman Christa Muldoon emphasized that the company builds privacy principles and safeguards into the development of its AI technology in line with the AI ​​Principles.

Effective July 1, 2023, our revised Privacy Policy clarifies that Google will use information to enhance our services and develop new products and technologies for the benefit of our users and the public. . It stipulates that public information can be used to train Google’s AI models to build various products and features such as Google Translate, Bard, and Cloud AI features.

However, this policy does not detail how Google will prevent the inclusion of copyrighted material in data pools used for training. Many publicly accessible websites have implemented policies prohibiting data collection and web scraping for training large-scale language models and other AI tools. The impact of Google’s approach is significant given global regulations like the General Data Protection Regulation (GDPR) that protect individuals from misuse of their data without their explicit consent. The origin4 of the training data used in generative AI systems such as OpenAI’s GPT is a controversial issue. Manufacturers are becoming increasingly secretive about the sources of their training data, raising concerns about the inclusion of copyrighted works and social posts on his media. The legal status of the fair use doctrine in this context remains unclear, leading to litigation and calls for increased regulation of how AI companies collect and use training data.

Moreover, processing such huge amounts of training data presents challenges. The personnel responsible for sorting this data often endure long hours and harsh working conditions. Poor processing within AI systems has led to potentially dangerous failures and concerns about the impact on stakeholders.

Google's dominance in the digital advertising market prompted Gannett, the largest U.S. newspaper, to sue Google and Alphabet for promoting a monopoly through advances in AI technology. Google's AI search beta has also been criticized for being a "plagiarism engine" that negatively impacts website traffic. Social platforms such as Twitter and Reddit, which contain extensive public information, have taken steps to limit data scraping by others. However, these changes caused backlash within the respective communities of Twitter and Reddit, as they negatively impacted the core user experience of Twitter and Reddit.

Q2: Why does Google use scraped web data to train Bard? A2: Google uses scraped web data to train Bard because it provides a large and diverse dataset of information. To do. This will allow Bard to study a wider range of topics and generate more creative and informative texts.

