AI steps in to save 5000-year old Cuneiform writing
The world’s oldest form of writing may have just been yanked back from the verge of going extinct.
Cuneiform has long remained a mystery to all but a few hundred experts worldwide, with its intricate wedge-shaped symbols etched on clay tablets presenting a formidable challenge to anyone daring to decipher.
However, a breakthrough has recently emerged from a collaborative effort between Israeli archaeologists and computer scientists. Their findings, published in the prestigious scientific journal PNAS Nexus, reveal the development of an AI-powered translation program capable of decoding ancient Akkadian cuneiform, enabling instantaneous translation of tens of thousands of digitized tablets into English.
The Power of AI Unleashed
The project, which initially began as a master's thesis project at Tel Aviv University, has garnered attention for its innovative application of neural machine translation.
This approach, with similarities to Google Translate, converts words into numerical representations and employs a neural network to generate accurate and natural translations in the target language, breathing life into Akkadian, a language that has not been spoken or written for over two millennia.
The significance of this advancement lies in the vast number of untranslated clay tablets worldwide, with over half a million housed in libraries, museums, and universities. Gai Gutherz, a computer scientist and part of the research team, emphasized the program's ability to unlock the past without requiring expertise in Akkadian.
"What’s so amazing about it is that I don’t need to understand Akkadian at all to translate [a tablet] and get what’s behind the cuneiform. I can just use the algorithm to understand and discover what the past has to say," he told The Times of Israel.
Unveiling the Future of Ancient Language Research
The team's research highlights the program's success in achieving accurate translations from Akkadian cuneiform to English.
Using Best Bilingual Evaluation Understudy 4 (BLEU4) — a scoring system that quantifies the accuracy of machine-generated translations from 0 to 100— the program achieved a score of 36.52 for cuneiform to English and 37.47 for transliterated cuneiform to English.
Although these scores indicate a fairly good early-stage translation model, Gutherz acknowledged that commercial translation tools like Google Translate typically score around 60.
Translating Akkadian to English is complicated mainly due to the syntactical differences between the language and English, making AI's achievements all the more remarkable.
While translations of formulaic texts, such as royal decrees, had a higher rate of accuracy, the system struggled with more literary and poetic pieces, with the machine sometimes even producing “hallucinations”, results completely unrelated to the text provided.
“Some translations were very good, some were near the point where you could start from it, and some were total hallucinations,” Gutherz remarked.
“The amount of data you train on is correlative to how well you can perform, and the more data you have, the better your models will be,” he added while underlining the challenges faced by the team in sourcing data to train the AI model on. The largest online databases of Akkadian tablets have just tens of thousands of entries.
A New Era in Ancient Linguistics
An early version of this translation program is available for public use on The Babylon Engine. However, not all experts eye the system favorably.
“I’m an old school philologist who’s sitting at a table, looking at the tablets and reading them as humans used to do for thousands of years,” said Nathan Wasserman, professor of Assyriology at the Institute of Archaeology at the Hebrew University of Jerusalem.
“Of course, it will work. But for deeper and less formulaic texts, this is still very far from being useful,” he added. “When you have a text, even when you have the words correct, it doesn’t mean you understand what’s there. For that, you still need the human mind.”
The AI-powered translation program represents a significant leap forward in the study of ancient languages and offers a glimpse into a future where all languages and their cultural tapestries continue to live on while shedding more light on our rich past.
Study Abstract
Cuneiform is one of the earliest writing systems in recorded human history (ca. 3,400 BCE–75 CE). Hundreds of thousands of such texts were found over the last two centuries, most of which are written in Sumerian and Akkadian. We show the high potential in assisting scholars and interested laypeople alike, by using natural language processing (NLP) methods such as convolutional neural networks (CNN), to automatically translate Akkadian from cuneiform Unicode glyphs directly to English (C2E) and from transliteration to English (T2E). We show that high-quality translations can be obtained when translating directly from cuneiform to English, as we get 36.52 and 37.47 Best Bilingual Evaluation Understudy 4 (BLEU4) scores for C2E and T2E, respectively. For C2E, our model is better than the translation memory baseline in 9.43, and for T2E, the difference is even higher and stands at 13.96. The model achieves best results in short- and medium-length sentences (c. 118 or less characters). As the number of digitized texts grows, the model can be improved by further training as part of a human-in-the-loop system which corrects the results.