Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Education Programming Science

Automated Language Deciphering By Computer AI 109

eldavojohn writes "Ugaritic has been deciphered by an unaided computer program that relied only on four basic assumptions present in many languages. The paper (PDF) may aid researchers in deciphering eight undecipherable languages (Ugaritic has already been deciphered and proved their system worked) as well as increase the number of languages automated translation sites offer. The researchers claim 'orders of magnitude' speedups in deciphering languages with their new system."
This discussion has been archived. No new comments can be posted.

Automated Language Deciphering By Computer AI

Comments Filter:
  • by DurendalMac ( 736637 ) on Thursday July 01, 2010 @12:01AM (#32753104)
    Darn. So the Voynich Manuscript is probably not a prime candidate.
  • by DowdyGoat ( 1830958 ) on Thursday July 01, 2010 @12:03AM (#32753124)

    This is very cool for us undeciphered language fans.

    In the article, the language author Andrew Robinson correctly points out that this computer program won't work for languages that don't have a known language that is close to them, say like for Linear A found on Crete, which is definitely not Greek like Linear B turned out to be. There is a lot of speculation that Linear A is a native Minoan (Cretan) script, largely unrelated to any other known script.

    However, parallel with Linear A on Crete was a Cretan pictographic script, which may, or may not be related to Egyptian hieroglyphics. The Minoans had known trading ties to Egypt, which had written language long before them. If a relationship could be found (via this computer program) between the Minoan pictographic script and Egyptian hieroglyphs, then that might give insights into how the Linear A script was set up (which is a syllabary script).

    The only difficulty is that there may not be enough of the pictographic script to work--I'd imagine you'd need a fair number of examples to really allow the computer to compare and contrast.

  • Re:Sweet (Score:3, Interesting)

    by jd ( 1658 ) <imipak@yahoGINSBERGo.com minus poet> on Thursday July 01, 2010 @02:25AM (#32753748) Homepage Journal

    Well, Old Norse is technically based on Old Germanic rather than the other way round, and Old English not only had Old Germanic input but Old Norse input as well. Along with an uncertain amount of Anglic (amazingly little is known about the Angles), possibly some Jute. English uses Norman French, plus modern French (which itself is derived from Norman French). Norman French survives in the modern world in Guernsey, Jersey and maybe some other Channel Islands but became extinct on Alderney.

    To bring this Back On Topic, if English were lost, it would be almost impossible to use this program to recover it. English has input from too many sources, resulting in way too many loan-words of incompatible structure and too much incompatible grammar. However, one very interesting test of the program would be to map each of the derived phonemes in Pre-Indo-European to a character, then compare this derived PIE script with each Indo-European language in turn. If the derivation is correct, the number of correct guesses for translations of PIE words into each known IE language aught to be above what would be expected by chance alone AND the translations should remain compatible with the derivations the PIE engineers used in the first place. By comparing across the translations for all languages, the program may discover other word-parts that had not been noticed before.

    It may be possible to determine if a language is truly isolate or not, by analyzing against a language multiple times using slightly different data sets and seeing if the results remain about the same. If this test works, then languages of uncertain/unknown ancestry (such as Basque and Etruscan*) can be tested against all 7,200 known languages to see if any of them produce a moderately stable match. No match means no connection with any other existent linguistic family tree.

    *Etruscan is a bugbear. There is one book that is completely intact and undamaged. It's made of gold leaf. The academic who currently owns it has not published so much as a single line of the text, merely two of the illustrations. All other Etruscan texts are fragmentary (so you've very little context to work with and not many words that are definitely complete) or too short to be useful. We don't know what Etruscan is related to, but if the above hypothesis is correct, we could find out and then translate the book. But the damaged texts, such as a linen book used to wrap a mummy, are way too fragmentary. You'd never be sure if such a translation was correct. A complete book, on the other hand, would offer no possibility for mistake. It would work or it wouldn't.

  • by vlueboy ( 1799360 ) on Thursday July 01, 2010 @02:29AM (#32753760)

    The decipherment of Ugaritic took years and relied on some happy coincidences — such as the discovery of an axe that had the word “axe” written on it in Ugaritic.

    Maybe I should go around and write "computer" in English on all my computers, as a service to future language researchers.

    Extinct language researchers examining english would fail at this same task 3000 years from now. English has no nouns --it has brand names: today's "computers" have big "Dell" logos but not "Computer."

    Also, how would researchers realize that [Apple Mac Glyph] isn't an integral part of our "ancient moon runes" if seen from their era? :)

  • by Anonymous Coward on Thursday July 01, 2010 @03:25AM (#32753952)

    Iberian language was spoken in Spain before the Roman Empire. It has some similarities with Basque Language. The texts in iberian are few, anyway I wonder if this language could be decoded using this tool.

Old programmers never die, they just hit account block limit.

Working...