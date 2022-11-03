



Google has announced an ambitious new project to develop a single AI language model that supports the 1,000 most spoken languages ​​in the world. As a first step towards this goal, the company is announcing AI models trained in his 400+ languages. It explains that this is the largest language range seen in speech models today.

Language and AI have arguably always been at the heart of Google’s offerings, but recent advances in machine learning, especially the development of powerful and versatile Large Language Models (LLMs), have put new emphasis on these areas. Now

Google is beginning to integrate these language models into products such as Google Search, while dodging criticism about system capabilities. Language models have many flaws, including their tendency to regurgitate harmful social biases such as racism and xenophobia, and their inability to parse language with human sensibilities. Google itself notoriously fired its own researchers after publishing a paper outlining these issues.

However, these models can perform many tasks, from language generation (such as OpenAI’s GPT-3) to translation (see the work on Metas No Language Left Behind). Google’s 1,000 Languages ​​Initiative doesn’t focus on a specific feature, but on creating a single system with vast knowledge across the world’s languages.

Zoubin Ghahramani, vice president of research at Google AI, told The Verge that creating a model of this size would allow a wide variety of AI to be expressed in languages ​​that are not well represented in the online space or AI training datasets. The company said it believes features can be introduced easily (also known as low-resource languages).

Languages ​​are like organisms, evolving from each other and having certain similarities.

Having a single model that is exposed and trained to many different languages ​​can significantly improve performance in low-resource languages, Ghahramani said. The way to get to 1,000 languages ​​is not to build 1,000 different models. Languages ​​are like organisms, evolving from each other and having certain similarities. Also, when he gets the ability to incorporate and translate data from new languages ​​into his 1,000 language models, he can make pretty impressive progress in what we call zero-shot learning. [what its learned] From a high resource language to a low resource language.

Past research has shown the effectiveness of this approach. The scale of Google’s planned model could be significantly better than past work. Such large-scale projects are typical of tech companies looking to dominate AI research and leverage their unique advantages in terms of access to vast amounts of computing power and training data. It is An equivalent project is Facebook’s parent company Metas’ ongoing attempt to build a universal speech translator.

However, access to data becomes an issue when training in so many languages. Google says it plans to fund data collection for low-resource languages, such as voice recordings and text, to support the work of 1,000 language models. .

The company says it has no direct plans as to where this model’s functionality will be applied, but only expects it to have a variety of uses across Google’s products, from Google Translate to YouTube captions and more.

The same language model can translate robot commands into code. You can solve math problems. It can do translations.

One of the really interesting things about large language models and language research in general is that they can perform so many different tasks, says Ghahramani. The same language model can translate robot commands into code. You can solve math problems. It can do translations. A very interesting thing about language models is that they are becoming a repository of a lot of knowledge, and by exploring them in different ways we can reach different useful features.

Google announced 1,000 language models at its new AI product showcase. The company also has new research on text-to-video models, a prototype of an AI writing assistant named Wordcraft, and an AI Test Kitchen app that gives users limited access to in-development AI models like text-to-image. I also shared an update for model image.

