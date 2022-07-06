



Meta, a social media conglomerate, has created a single AI model that can be translated into 200 different languages, including many languages ​​not supported by current commercial tools. The company is open sourcing projects in the hope that others will build on their work.

The AI ​​model is part of Meta’s ambitious R & D project to create so-called universal speech translators and is important for growing on many platforms, from Facebook and Instagram to the development of domains such as VR and AR. The company believes that. Machine translation not only helps Meta better understand users (it can improve the advertising system that generates 97% of revenue), but it also forms the basis for killer apps for future projects such as augmented reality glasses.

Model translation is definitely not perfect

Machine translation experts told The Verge that Metas’ latest research is ambitious and thorough, but the quality of translations for some models is in more supported languages, such as Italian and German. Said that it could be far below the quality of.

The main contribution here is data, Professor Alexander Fraser, an expert in computational linguistics at LMU Munich, Germany, told The Verge.100 new languages ​​are important [that can be translated by Metas model]..

The results of Metas, somewhat paradoxically, arise from both the scope and focus of the study. Most machine translation models process only a handful of languages, but all Metas models are encapsulated. That is, a single system that can translate over 40,000 different directions across 200 different languages. However, Meta is also interested in including resource-poor languages ​​in model languages ​​with less than one million published translated sentence pairs. These usually include many African and Indian languages ​​that are not supported by commercial machine translation tools.

What does it take to create a translation technique that is useful to everyone?

Angelafan, a meta-AI research scientist involved in the project, told The Verge that the team was inspired by the lack of attention to such low-resource languages ​​in this area. That’s why we started this project because translation doesn’t work in the languages ​​we speak, fans said. Do we have this inclusion motive, such as what is needed to create a translation technique that works for everyone?

According to fans, the model described in the research paper here has already been tested to support projects that help Wikipedia editors translate articles into other languages. The techniques developed in Model Creation will soon be integrated into the Metas translation tool.

How do you judge the translation?

Translation is a difficult task at best, and machine translation is notorious for its instability. When applied to the Metas platform on a large scale, even the slightest error can have disastrous consequences, for example, if Facebook mistranslated and hurt a post by a Palestinian man from good morning and was arrested by Israeli police.

To evaluate the output quality of the new model, Meta created a test dataset consisting of 3001 sentence pairs for each language of interest in the model. Each pair is a professional translator and has been translated from English to the target language by a native speaker.

Researchers ran these sentences in a model and compared machine translations to human references using a common benchmark in machine translation known as BLEU (short for BiLingual Evaluation Understudy).

The Metas model provides an improved benchmark, but it can’t give the big picture

BLEU allows researchers to assign numerical scores that measure duplication between sentence pairs. According to Meta, the model has a 44% increase in BLEU score across all supported languages ​​(compared to previous cutting-edge works). However, as is often the case with AI research, context is needed to determine progress based on benchmarks.

BLEU scores allow researchers to compare the relative progress of different machine translation models, but cannot absolutely measure the software’s ability to create human-quality translations.

Note: The Metas dataset consists of 3001 sentences, each translated by only one individual. This provides a baseline for determining translation quality, but the overall expressiveness of the language as a whole cannot be captured in such a small piece of real language. This issue is not limited to Meta, which affects all machine translation tasks, but is especially serious when assessing resource-poor languages, but it points to a range of challenges facing this area. ..

Christian Federman, Principal Research Manager working on machine translation at Microsoft, said the entire project, which wanted to extend the scope of machine translation software to less-covered languages, was commendable, but the BLEU score alone provided it. A limited measure of output quality that stated that it was not possible.

Translation is a creative and generative process, and many different translations can all have the same good (or bad) results, Fedderman told The Verge. It is not possible to provide a general level of good BLEU score because it depends not only on the test set used, its reference quality, but also on the unique characteristics of the language pair under investigation.

Fans said the BLEU score was also complemented by human ratings, and this feedback was very positive and produced some surprising reactions.

One of the most interesting phenomena is that people who speak low-resource languages ​​often have low standards of translation quality because they don’t have other tools. They are so generous, so we actually go back, hey, no, you need to be very accurate and you need to call it if you get an error.

Imbalance in the power of corporate AI

Working on AI translation is often presented as a clear outcome, but writing this software is especially difficult for people who speak low-resource languages. For some communities, Big Tech’s attention is simply unwelcome. They don’t want the tools they need to keep their language in the hands of others. For others, the problem is not existential, but it is related to quality and influence issues.

Some communities don’t want Big Tech to control the language

Metas engineers investigated some of these questions by interviewing 44 speakers in resource-poor languages. These interviewees have raised many positive and negative implications of opening the language to machine translation.

For example, one advantage is that such tools give speakers access to more media and information. These can be used to translate a wealth of resources such as English Wikipedia and educational textbooks. However, at the same time, if speakers in a resource-poor language consume more media produced by speakers in a more supported language, the incentive to create such material in their own language may be reduced. there is.

Balancing these issues is difficult, and the issues encountered even within this recent project show why. Metas researchers, for example, out of the 44 low-resource language speakers interviewed to investigate these questions, the majority of these interviewees are immigrants living in the United States and Europe, about three minutes. One of them is identified as a technical worker, which means their point of view. Probably different from those in their home community, it’s biased from the beginning.

Professor Fraser of LMU Munich nevertheless said that the research was certainly done in a way that involved native speakers, and such efforts were commendable.

Overall, I’m happy that Meta is doing this.

Overall, I’m happy that Meta is doing this. Many of these from companies like Google, Meta, and Microsoft do a lot of work on low-resource machine translation, but it’s great for the world, Fraser said. And, of course, some of the ideas behind why and how to do this come from academia, as well as most training for the listed researchers.

Fans said they sought to anticipate many of these social challenges by expanding the expertise Meta had consulted on the project. Where is my PhD in Computer Science when AI is often very much developing engineering? Let’s get together and build just because we can. But in reality, for this, we worked with linguists, sociologists, and ethicists, she said. And I think this interdisciplinary approach focuses on human issues. For example, who wants to build this technology? How do they want to build it? How are they going to use it?

Equally important is the decision to open source as many elements of the project as possible, from models to evaluation datasets and training code. This helps correct the power imbalance inherent in companies working on such initiatives. Meta also provides grants to researchers who want to contribute to such translation projects but cannot fund their projects.

According to Huang, this is really, really important. This is because one company cannot solve the machine translation problem comprehensively. Everyone in the world was really interested in supporting this kind of community effort.

