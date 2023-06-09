



Google Translate makes it easy to translate any sentence into over 100 languages, but regular users of Google Translate know there is room for improvement.

In theory, a Large Language Model (LLM) like ChatGPT should usher in the next era of language translation. They consume vast amounts of text-based training data and real-time feedback from millions of users around the world to rapidly learn how to “speak” a wide range of languages ​​in consistent, human-like sentences. To do.

However, I’ve heard it repeated before that “ChatGPT replaces everything”, but in practice it is often inaccurate and this is the worst case scenario for translation. Nazneen Rajani, head of research at Hugging Face (opens in new window), which developed the AI-based Hugging Chat, said, “There are currently no empirical results to support the claim that his LLM is effective for translation. No,” he said.

So I decided to test ChatGPT. What talent could replace Google Translate as your go-to translation service for travel, work, cross-border romance, and other language needs?And how does it differ from its sister chatbots Microsoft Bing and Google Bard? Is not it?

Methodologies and Languages ​​Tested

(Credit: Wara1982 / Getty Images)

Blind tests were conducted by bilingual speakers of seven languages. They all grew up speaking a language other than English and now live in the US or work for US companies.

Given a paragraph in English, it ranked versions translated into that language by Google Translate, ChatGPT, and Microsoft Bing. As they completed the exercise, they revealed which service generated each exercise.

Languages ​​tested: Polish, French, Korean, Spanish, Arabic, Tagalog, Amharic

Translation services: Google Translate, Google Bard, ChatGPT, Microsoft Bing

This is by no means a comprehensive study. Federico Pascual, an AI industry veteran, said, “Consider that small-blind testing is not enough. We need rigorous testing.” Still, the results are surprisingly consistent and provide an interesting glimpse into how AI models work.

Create paragraphs for translation

(Credit: Vadim Sazhniev/Getty Images)

I chose a language and an AI model to write a few paragraphs in English highlighting the limitations of each service’s translation capabilities. The first contained two tricky colloquialisms, “Blow the stress away” and “Cheers!”, meaning to relax after a stressful day. The meaning is “thank you!” It also included two units that need to be converted in a real-world scenario: USD ($) and miles (instead of kilometers).

Paragraph 1 – “Hi! Do you speak English? I would like some help with directions. My sister doesn’t eat meat and I am looking for a vegetarian restaurant. Where would you recommend? We are also a few miles from here. I want to stay within ‘, but I don’t want to spend more than $50. With cocktails, that’s a bonus. After a long day of travel, you need to de-stress. Please join us. cheers! “

The second paragraph was simpler, with no phrases or units of measure, but more slang (“hooligans” and “pop champagne”). We only sent this to the late participants to broaden our data collection while refining our approach.

Paragraph 2 – ‘How do I buy boat party tickets? Do I have to pay in advance? ”

Result: AI Chatbot Beats Google Translate

Of the 12 examples sent to participants, participants preferred AI chatbots (ChatGPT, Google Bard, or Microsoft Bing) to Google Translate. ChatGPT topped them all.

The table below shows the ranking by participants for each service. Those who received both example paragraphs are marked (1) and (2). Others received only the first.

“In my opinion, [ChatGPT] “is the closest thing to normal conversation,” said Ana Romero, who ranks the Spanish translation. She said, “The level of formality between the two key questions is consistent (informal) and the correct translation is used: ‘Blow the stress away.'”

Romero also appreciated that ChatGPT’s translation gives you the option to end certain words with a masculine or feminine name instead of choosing for you. For example, it says “eres bienvenido/aa unirte a nosotros” (“You are welcome to join us”), but this depends on the gender of the invited speaker.

Google Bard rarely worked and even said “I can’t translate languages”. Instead, it recommends using Google Translate, presumably in an effort to keep Google from cannibalizing its own products. However, when we tested it nonetheless, when it worked three times (Korean, French, and Spanish), participants rated it higher than Google Translate.

All chatbots fell short of our high expectations for currency and distance measurements in the first paragraph. Given the nature of their conversations and their ability to ask follow-up questions, we wanted to ask what currency they converted to and whether miles or kilometers were preferred.

Instead, they treated them like Google Translate. Make small adjustments, sometimes adding “USD” after $50 or converting miles to kilometers. It was inconsistent across languages ​​and services, and generally incomplete.

It’s all about mastering the nuances

In America they are called “cookies”, but in England they are called “biscuits”. (Credit: olligha / Getty Images)

A consistent pitfall of Google Translate has been literal interpretation. “This was the most ‘word-for-word’ translation of all three,” says Emile Saad, who ranks the Arabic translation. “This caused some of the context to be lost. For example, ‘pop’ [as in champagne] was translated as “to make fireworks”.

In French, Google Translate used the English word “hooligan” verbatim, but the chatbot knew to use the culturally appropriate slang voyous.

After all, chatbots are designed to be nuanced and contextual. Languages ​​where the model has a large amount of source data and many users interacting in that language can better identify cultural phrases and select the best matches in the target language.

“The secret sauce for chatbots like ChatGPT is RLHF, Reinforcement Learning with Human Feedback,” says Rajani of Hugging Face. ”[They] Collect human preferences about model responses for aspects such as truthfulness, harmlessness, and usefulness. Human preferences help us choose what is more culturally appropriate, especially for non-native speakers. ”

A Google spokesperson told PCMag that Bard and Google Translate “due to different underlying technologies, so it’s not surprising that they produce different output.” Bard is a large language model designed to perform a wide variety of tasks, while Google Translate is optimized specifically for the translation task.

“Size matters. These models are the biggest and best models out there,” says Pascual. “They are at the forefront of the AI ​​arms race. So it’s not surprising that Google Translate is better at translating text because Google Translate probably uses older technology, smaller models. It is from [and are] It’s probably optimized to run as quickly and cheaply as possible. ”

However, none of the four options could replace fluent speakers one-to-one. All chatbots still suffered from awkward and imprecise word choices at times, but only in fewer instances. For example, Microsoft Bing translated “Welcome to join us” in Polish. [at the restaurant]Barbara Pavone, senior manager of content delivery at PCMag, says the change from “” to “Zapraszamy Cię do nas” is actually an invitation to “come to my house.”

Editors’ Choice Use Google Translate if you speak these two languages

A traditional Ethiopian bowl (Credit: Evgenii Zotov/Getty Images)

In our tests, two languages, Tagalog (Philippines) and Amharic (Ethiopia), topped Google Translate. According to WorldData.info (opens in new window), the estimated global speaking population is Tagalog at 33 million and Amharic at 25 million. (Spanish is 450 million, Korean is 80 million.)

”[AI models] “It doesn’t generalize well to languages ​​that have fewer resources or don’t have a good collection of human preferences,” says Rajani. In the case of Amharic and Tagalog, we suspect that the chatbot lacked enough data to give a more nuanced response to the context of the paragraph. , contrary to what I’ve seen in other languages, seemed more literal than Google Translate.

Colin Salao, who ranked the Tagalog translation, noted that ChatGPT is “hyperformal” and uses words reserved for public announcement. He considered Bing to be the “most literal translation” and ranked it lower than ChatGPT and Google Translate.

Microsoft Bing struggled even more with Amharic. I left part of each paragraph in English. This is the only time the service has failed to translate, including other character-based languages ​​such as Korean and Arabic.

Paragraph 1 – Hello! Did they ask you how to speak? Do you want to know if this carport is the right size? Spend only $50 and have a cocktail? 2-3 meters. After a long day of travel, we need to blow away our fatigue. Please join us. cheers!

Paragraph 2 – How to use the utility? How can I use it by following the route of the dock? With the expanse of the upper deck, young hooligans drink more champagne in a day? It’s dangerous and not my fun.

AI takes web translation to the next level

For summer travel or other language needs, ChatGPT may be a better choice than Google Translate. Plus, a new iOS app makes it even more accessible. However, as we have seen with Amharic and Tagalog, chatbots are not yet a full replacement for the old standby.

However, with more training data for each language, the AI ​​model could fully surpass the capabilities of Google Translate. “We are excited about the potential of LLM and how we can incorporate it into our products,” Google told PCMag.

Google is also testing a new search results page called Search Generated Experiences (SGE). It will be live on Google.com on an undisclosed date and will provide paragraph-based his ChatGPT-style answers to your queries. However, Google stressed that Bard and SGE are and would not comment on whether they could replace Google Translate in the future.

Before that happens, Google needs to have a more definitive way to measure a chatbot’s translation ability and prove that a chatbot is better than Google Translate. More broadly, to keep the web of the future accessible and as “global” as possible, all chatbots should be able to converse in a wide range of languages, such as Amharic.

“All these [AI] “The system is a black box, with no specific information shared about how it was built, what data was used for training, etc.,” Pascual says. ! ”

To learn more about ChatGPT and the technology behind LLM, read our explanation.

