Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Education Programming Science

Automated Language Deciphering By Computer AI 109

eldavojohn writes "Ugaritic has been deciphered by an unaided computer program that relied only on four basic assumptions present in many languages. The paper (PDF) may aid researchers in deciphering eight undecipherable languages (Ugaritic has already been deciphered and proved their system worked) as well as increase the number of languages automated translation sites offer. The researchers claim 'orders of magnitude' speedups in deciphering languages with their new system."
This discussion has been archived. No new comments can be posted.

Automated Language Deciphering By Computer AI

Comments Filter:
  • by cappp ( 1822388 ) on Wednesday June 30, 2010 @11:53PM (#32753068)
    Just so we can keep the “didn’t read TFA” comments to a minimum: The four assumptions as laid out in the article are:

    - The language being deciphered is closely related to some other language: In the case of Ugaritic, the researchers chose Hebrew.

    - There’s a systematic way to map the alphabet of one language on to the alphabet of the other, and that correlated symbols will occur with similar frequencies in the two languages. The system makes a similar assumption at the level of the word: The languages should have at least some cognates, or words with shared roots, like main and mano in French and Spanish, or homme and hombre.

    - The system assumes a similar mapping for parts of words. A word like “overloading,” for instance, has both a prefix — “over” — and a suffix — “ing.” The system would anticipate that other words in the language will feature the prefix “over” or the suffix “ing” or both, and that a cognate of “overloading” in another language — say, “surchargeant” in French — would have a similar three-part structure.

    . The article also notes the success rates where it states that

    Ugaritic has already been deciphered: Otherwise, the researchers would have had no way to gauge their system’s performance. The Ugaritic alphabet has 30 letters, and the system correctly mapped 29 of them to their Hebrew counterparts. Roughly one-third of the words in Ugaritic have Hebrew cognates, and of those, the system correctly identified 60 percent. “Of those that are incorrect, often they’re incorrect only by a single letter, so they’re often very good guesses,” Snyder says.

    Critics noted that

    The researchers’ approach, he says, presupposes that the language to be deciphered has an alphabet that can be mapped onto the alphabet of a known language — “which is almost certainly not the case with any of the important remaining undeciphered scripts.” It also assumes, he argues, that it’s clear where one character or word ends and another begins, which is not the case with many deciphered and undeciphered scripts. The decipherment of Ugaritic took years and relied on some happy coincidences — such as the discovery of an axe that had the word “axe” written on it in Ugaritic.

  • Re:Sweet (Score:3, Informative)

    by doishmere ( 1587181 ) on Wednesday June 30, 2010 @11:53PM (#32753074)
    Their method relies heavily on the unknown language being related to a known language by some degree. At their heart of their technique is Bayesian statistics applied to lexical and frequency analysis; for this approach to work, there must be some basis for comparison.
  • by KritonK ( 949258 ) on Thursday July 01, 2010 @01:59AM (#32753686)

    Actually, the program might be able to help: From what I understand, the Linear A alphabet is related to the linear B alphabet, which has been deciphered, even though the languages may be different. We know a bit about context (what we have are mostly inventories), and we even know the meaning of one word: the one next to the total of the amounts in the inventory probably means "total". Furthermore, that word, ku-ro, is similar to a form of a Greek word for "total" ("houlon"), so it is very likely that the language is at least indoeuropean in origin. One could try using various indoeuropean languages as candidates for the related language, until the program comes up with something meanngful.

    Now, if only we had a larger sample of the language of the disk of Phaestos...

  • by djupedal ( 584558 ) on Thursday July 01, 2010 @02:02AM (#32753698)
    IBM, as one example, has been on this hard since 2002 ( http://news.cnet.com/2100-1008-998264.html [cnet.com] ) when the prize was first announced....stop going all lady gaga over stuf that is so old it can't even be recycled properly.

"I don't believe in sweeping social change being manifested by one person, unless he has an atomic weapon." -- Howard Chaykin

Working...