A group of researchers from MIT CSAIL has developed a new AI system that can automatically decipher and translate a lost language without knowledge of its relation to other languages.
The algorithm might help the team uncover entire bodies of hidden knowledge that are currently indecipherable.
While testing their system, the team from MIT were able to corroborate a recent academic study that said the language of Iberian is not related to the Basque language.
Deciphering the indecipherable
Led by MIT Professor Regina Barzilay the team of researchers wants their algorithm to, ultimately, have the capacity to decipher lost languages that have been indecipherable until now, using only a few thousand words.
As MIT explains in a press release about the new AI system, most of the languages that have existed throughout the history of humanity are no longer spoken, and at least half of the languages spoken today are predicted to vanish in the next 100 years.
The new system, detailed in a new MIT paper, could help to recover lost languages and help us retrieve valuable knowledge and wisdom born of the cultures represented by those languages.
An AI rooted in historical linguistics
The algorithm is based on key principles from historical linguistics, such as the fact that languages generally only evolve in certain predictable ways. For example, languages generally replace certain sounds when evolving from their predecessors, though they rarely create or remove an entire sound.
This predictability allows the algorithm to model to segment words from an ancient language and map them to a related language. In doing this, the system can also identify language families.
For example, when trained on Basque, the regional language used in Spain's Basque Country, the algorithm said that the language is not related to Basque, supporting recent scholarship on the topic suggesting that the two languages are not closely related despite the Basque Country being located on the Iberian peninsula.