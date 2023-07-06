



Dead languages ​​are notoriously difficult to decipher. It took 23 years to decipher the Egyptian hieroglyphs inscribed on the Rosetta Stone. It took him nearly two centuries to understand the Maya script. And it took him over 3,000 years before Linear B, the earliest form of Greek, was revealed. Technooptimists raise these difficult questions when they talk about the transformative potential of AI, but challenges remain, even in languages ​​that have already been translated. Consider the cuneiform script of Akkadian, one of the oldest written languages ​​in the world. To date, he has nearly a million Akkadian documents untranslated because so few people can read the extinct language, but now AI tools can decipher them within seconds.

A multidisciplinary group of computer science and history researchers published a paper in May describing how they created an AI model that instantly translates ancient glyphs. Led by software engineers at Google and Assyrian scholars from Ariel University, the team used the same technology that powers Google Translate to train a model based on existing cuneiform translations.

A lighthouse for tired translation travelers

When translating a dead language, especially one without a descendant language, piecing together meaning without a rich cultural background may be like traveling without a North Star. Akkadian is just such a language. Akkadian, the language of the Akkadian Empire in what is now Iraq from the 24th century BC to his 22nd century, existed as both a spoken and written language. The cuneiform script used an alphabet with sharply intersecting triangular shapes. The Akkadians usually wrote by marking clay tablets with the wedge-shaped edges of reeds (cuneiform literally means wedge-shaped in Latin). Hundreds of thousands of these tablets, due to the durability of their materials, have endured for centuries and are now on display in the halls of various universities and museums.

Translation is often misunderstood as a one-to-one decoding of foreign words and phrases. But often statements in one language do not have exact or easily equivalent statements in another language, taking into account cultural nuances and differences in linguistic structure. High-quality translation requires a deep knowledge of the structure of both languages, their surrounding cultures, and the history that underpins those cultures. Translating a text while preserving the original tone, rhythm, and even humor is a delicate task, and a very difficult task when the language culture is largely unknown.

Compared to the few linguists who can translate Akkadian, the number of extant cuneiform texts is staggering. This means that the treasury of knowledge about important early civilizations, sometimes considered the first empires in history, is completely untapped. At present, the number of existing slabs unearthed by archaeologists and the proportion of new slabs exceeds the translation work of linguists. But that could change when AI is integrated into the cuneiform interpretation process.

Hundreds of thousands of cuneiform tablets document the political, social, economic and scientific history of ancient Mesopotamia, the researchers wrote. However, most of these documents remain untranslated and inaccessible due to the large and limited number of experts who can read them.

The AI ​​can perform two types of translation: one that translates cuneiform into English, and one that transliterates (phonetically rewrites) the cuneiform. The AI’s skill in translating the two translation types scored 36.52 and 37.47 respectively on the Best Bilingual Evaluation Understudy 4 (BLEU4), a measure of translation quality. These scores exceed the team’s goals and both are high enough to be considered high quality translations. BLEU4 scores are expressed on a scale of 0-100 (or 0-1), with 70 being the highest realistically achievable by a highly skilled human translator.

For decades, computer-generated translations have been fragile and unreliable, said Tom McCoy, a computational linguist at Princeton University. Translators with embedded grammar rules have always missed idioms and the richness of non-literal language that slips through the cracks of formal grammar. Recently, however, AI programs like cuneiform translators have been able to understand obscure areas of language. It ushers in an exciting new era in AI-powered computational linguistics.

The big new thing in AI these days is statistical processing. It’s also a form of mathematics, but it’s not the hard and fast rules that people used to deal with, McCoy says. Statistics allowed us to overcome the difficulties of previous methods. Currently working on machine learning and deep learning. Machines can learn all these idiosyncrasies, idioms, and exceptions to the rules, something that previous generations of AI lacked.

You can’t really trust the output

The AI’s cuneiform translation still had errors and common AI hallucinations. In one example, it was translated as “Why should we sue before the Livi Ali man?” They are in the inner city of the inner city.

Despite the occasional error, this tool saved a lot of time and human effort in the initial processing of the text.

Current AI is remarkable, but unreliable. So while AI can do really amazing things, you can’t really trust the output it produces, McCoy said of using AI for translation. This is because the best case for using AI is very labor intensive and difficult for humans to do, but once the AI ​​has given some output it is easy for humans to validate it. means

This model was most accurate when translating boilerplate texts such as short sentences and administrative records. In addition, the ability to reproduce genre-specific nuances in translation was astonishing to researchers. In the future, the AI ​​will be trained on increasingly large samples of translations to further improve accuracy, the researchers wrote.

At this time, researchers can be assisted by creating pseudo-translations that humans can check for accuracy and adjust for nuance.

A promising future scenario is [model] Show the user a list of sources that the translation was based on. It’s also particularly useful for academic purposes, the researchers write.

