Automated Language Deciphering By Computer AI 109
eldavojohn writes "Ugaritic has been deciphered by an unaided computer program that relied only on four basic assumptions present in many languages. The paper (PDF) may aid researchers in deciphering eight undecipherable languages (Ugaritic has already been deciphered and proved their system worked) as well as increase the number of languages automated translation sites offer. The researchers claim 'orders of magnitude' speedups in deciphering languages with their new system."
Sweet (Score:2)
Re:Sweet (Score:4, Funny)
Re:Sweet (Score:5, Funny)
Good news, it's a suppository.
Re: (Score:2)
You forgot the third option... but we need a TARDIS for that.
Re: (Score:3, Interesting)
Well, Old Norse is technically based on Old Germanic rather than the other way round, and Old English not only had Old Germanic input but Old Norse input as well. Along with an uncertain amount of Anglic (amazingly little is known about the Angles), possibly some Jute. English uses Norman French, plus modern French (which itself is derived from Norman French). Norman French survives in the modern world in Guernsey, Jersey and maybe some other Channel Islands but became extinct on Alderney.
To bring this Back
Re: (Score:2)
Do you have a citation for that book? It's not mentioned in Wikipedia.
Thanks,
-l
Re: (Score:2)
Links follow:
Finding useful information on this book is... hard. You're right, Wikipedia doesn't even mention it. Anywhere.
Re: (Score:2)
Cool! I wasn't aware of this. So you get thanks (in lieu of mod points, which is what you'd have gotten from me yesterday when I had some.)
Too bad Claudius didn't leave a dictionary.
Re: (Score:2)
Much appreciated. The thanks is fine (and more informative :). It's a damn shame there are so many written languages that have been lost through deliberate acts of destruction. (My understanding is that Etruscan books were destroyed first by the Roman Empire and then again by the Holy Roman Empire, which is why so little is known.) Phrygian is another language that we know only fragmentarily, for absolutely no good reason.
Re: (Score:2)
Re: (Score:1)
Universal translator, here we come!
pbbbbbbttt! I prefer to stick to translator microbes, thank you very much!
Re: (Score:3, Informative)
Re: (Score:1, Insightful)
I think that this is more a tool for the human deciphers than a magic tool for decipher the languages. This a great tool wen you have already obtained the key points of the language, with this you can evade the most tedious part that is going word for word to obtain the language and the reduce the necessary time for decipher it. Also with this tool is possible the case were you decipher the language but this language is wrong, but this don't mean that all of the deciphered is wrong as the most possible with
Re:Sweet (Score:5, Funny)
Universal translator, here we come!
Cool! Can I bring it into my next marketing meeting?
Re:Sweet (Score:5, Funny)
Re: (Score:2)
Only if the gross gains in closing juncture exceed the long-term sustainability goals of the viability imperative for all mass interoperability.
Only if we can update the UI for version 2 and sell it a second time to the same saps.
We at Mega Industries believe this will move us forward to our cloud-based monetization of the human-media dynamic which is strategically important in an ever-evolving mobile continuum.
When everyone has it, we can turn it into a subscription-based cash cow.
We have directed our customer experience champions to ensure consumers realize this when they call in with emphatic expressions of dissatisfaction.
Tell the whining losers that premium support is only available with the Platinum Care package, and transfer them to "Gord-on" in the Mumbai sales office.
Re: (Score:2)
Pffft, please, your plan is to have emphatically expressed dissatisfied consumers realize that your gross gains within the closing juncture exceed your long-term sustainability goals for all viability imperatives, which will allow the move to cloud-based monetization of the human-media dynamic? It is but a futile attempt, you may as well give up right now, no matter how much time your customer experience champions waste on a single call.
Here, at GOD Industries, we know better than to rely on such clearly m
Re: (Score:2)
I...I think it was screaming...
Re: (Score:2)
This.
Seriously.
Re: (Score:2)
Re: (Score:2)
He said universal translator, as in it only works on languages of this universe. Marketing speak is from the anti-matter dominant universe, as evidenced by the fact that the more it is spoken, the less is actually communicated.
Re: (Score:1)
Answers to all TFA questions (Score:5, Informative)
. The article also notes the success rates where it states that
Critics noted that
Re:Answers to all TFA questions (Score:5, Insightful)
The decipherment of Ugaritic took years and relied on some happy coincidences — such as the discovery of an axe that had the word “axe” written on it in Ugaritic.
Maybe I should go around and write "computer" in English on all my computers, as a service to future language researchers.
Pfft, why? (Score:5, Funny)
Label at least one computer "ham sandwich" to confuse future language researchers.
Alternatively, label each computer with a character's name from (insert show of your choice here).
Re: (Score:2)
In all of the computer labs I've been to, the name of the computer is visibly displayed in front somewhere. The names of all teh computers in the lab usually revolve around a common theme, e.g. periodic table of elements, Simpsons characters, HHGTTG characters, etc.
You better hope English never becomes extinct, because an important period in human history would be forever lost.
Re: (Score:3, Insightful)
Re: (Score:2)
The word you are looking for is "systematically", not "logically". And unless you're talking about a whole building's worth of computers, it's simply not worth it to indicate a location in the name, "Huey" is a lot easier to remember than "B2R22S15".
Re: (Score:2)
Re: (Score:2)
This guy was talking about a computer lab. I get the impression that Huey, Duey, Louie, Barney, Smarmey, Charley, Blarney, Indigo Montarney etc will get particularly bothersome to keep tabs on as a convention. Why not B2R22Cad4
You know, if you are the kind of person that prefers numbers over pronouncable names, computers also have IP addresses. Just use those, and leave the hostnames for the rest of us...
Re: (Score:1)
Re: (Score:2)
Boy, I could sure save some money replacing equipment which needed to be moved by changing the hostname and printing a new sticker.
Re: (Score:2)
Funny you should mention that - I worked in a place once where that appeared to be true.
By trial and error I discovered what the IDs of some of the printers were and relabled them. Next day, somebody had stuck new labels over the top with the old, wrong, IDs on them.
Re: (Score:2)
Re: (Score:2)
Encroaching on territory, I guess.
It was full of people who wouldn't jump in a stream if their feet were on fire unless someone specifically told them to. And no, it wasn't military/defense related at all.
Re: (Score:2)
Boy, I could sure save some money replacing equipment which needed to be moved by changing the hostname and printing a new sticker.
I think you're missing the point there. Yes, it's pretty easy to change a computer name, but then you also have to update all people and/or software that connect to the server as well, and that's far from trivial. Sending out a mass email to the entire company saying "we've moved the following five servers from room A to room B, so please remember to change the corresponding digits in their names whenever you use them" doesn't go over very well.
I've worked at a place where the server names were completel
Re: (Score:1)
Re: (Score:1)
Label at least one computer "ham sandwich" to confuse future language researchers.
You might be interested in this approximation of what it would cause: http://www.mcsweeneys.net/2010/6/10packman.html [mcsweeneys.net]
Re: (Score:1)
You laugh, but names from a TV show used to be the server naming convention at one place I worked. Makes for interesting conversations:
"Uhura is down again and Kirk is acting up, Spock is still blocking incoming attacks."
"Alright, I'll bring up Scotty and RedShirt to take some of the overload. Promote Sulu to be in charge until we figure out what's wrong with Kirk."
Re: (Score:1)
Maybe I should go around and write "computer" in English on all my computers, as a service to future language researchers.
Just don't, like many would do, put your label on the monitor.
Re: (Score:2)
Re: (Score:3, Interesting)
The decipherment of Ugaritic took years and relied on some happy coincidences — such as the discovery of an axe that had the word “axe” written on it in Ugaritic.
Maybe I should go around and write "computer" in English on all my computers, as a service to future language researchers.
Extinct language researchers examining english would fail at this same task 3000 years from now. English has no nouns --it has brand names: today's "computers" have big "Dell" logos but not "Computer."
Also, how would researchers realize that [Apple Mac Glyph] isn't an integral part of our "ancient moon runes" if seen from their era? :)
Re: (Score:2)
Going further OT: In Harry Harrison's Stainless Steel Rat books people from the distant future wondered why their ancestors had named their planet "dirt".
Re:Answers to all TFA questions (Score:5, Funny)
Also, how would researchers realize that [Apple Mac Glyph] isn't an integral part of our "ancient moon runes" if seen from their era? :)
They'd probably see it as having some sort of religious significance. And they'd be correct.
Re: (Score:3, Interesting)
Re: (Score:2)
Re: (Score:3, Funny)
For those who don't know what it was like, clicky [youtube.com]
Re: (Score:3, Insightful)
Neither is my great great grandmother's cookbook. Which really is a shame, as I strongly suspect the recipes make something more edible than what's served at the local coffee shop.
Re: (Score:1)
Re: (Score:2)
Can you elaborate on that?
Re: (Score:2)
Re: (Score:2)
In the case of Rongorongo, if it is a written language, then it is probably a written form of the Rapa Nui, the language of Easter Island. In any case since Rapa Nui is a polynesian language we'd be able to compare it to other Polynesian languages. However, this has already been done with no success.
Part of the problem with Rongorongo and with other undeciphered scripts is that we don't know what counts as a distinct character, the character vs. glyph problem. It is not clear from the article if this system
Re: (Score:1)
Re: (Score:1)
The decipherment of Ugaritic took years and relied on some happy coincidences -- such as the discovery of an axe that had the word "axe" written on it in Ugaritic
Sorry but I had to lol at this. What was actually written on the axe was "Bill" - because it was his axe. And now the deciphered writings are all containing phrases like "That duck has an axe!", "The members voted and passed the new Axe" and "Monday - Remember to pay the Axes".
But I guess it does make sense to write "Axe" on an "Axe", just to be sure. Oh bugger - where'd I put my Axe, all I can find is my Bill.
Re: (Score:1)
Yeah if you work on a building site you engrave your name on your tools. But fire axes in my building are labelled, as are toilets and emergency exits, even though the labels are pretty obvious.
The yarra river in Melbourne has that name because the local aboriginal people pointed to the river and said that word but it turned out later they were commenting on the rate of flow.
Re: (Score:2)
It's also why an inordinate number of mountains are called "your finger, you fool" and "who is this fool who doesn't know what a mountain is?"
Re: (Score:2)
Thanks for the summary. They are really limiting preconditions, worth pointing out. Still, it's a decent achievement (that's coming from someone with a Phd in NLP). The combinatorial problem is quite huge and you need some indication that your translation makes sense. This study shows that the use of cognates and/or word structure may help. I would think it's possible to get rid of the alphabet restriction.
But these assumptions have other possible uses too. I'd think that they could be used to find relation
Linear A Implications (Score:5, Interesting)
This is very cool for us undeciphered language fans.
In the article, the language author Andrew Robinson correctly points out that this computer program won't work for languages that don't have a known language that is close to them, say like for Linear A found on Crete, which is definitely not Greek like Linear B turned out to be. There is a lot of speculation that Linear A is a native Minoan (Cretan) script, largely unrelated to any other known script.
However, parallel with Linear A on Crete was a Cretan pictographic script, which may, or may not be related to Egyptian hieroglyphics. The Minoans had known trading ties to Egypt, which had written language long before them. If a relationship could be found (via this computer program) between the Minoan pictographic script and Egyptian hieroglyphs, then that might give insights into how the Linear A script was set up (which is a syllabary script).
The only difficulty is that there may not be enough of the pictographic script to work--I'd imagine you'd need a fair number of examples to really allow the computer to compare and contrast.
Re: (Score:3, Informative)
Actually, the program might be able to help: From what I understand, the Linear A alphabet is related to the linear B alphabet, which has been deciphered, even though the languages may be different. We know a bit about context (what we have are mostly inventories), and we even know the meaning of one word: the one next to the total of the amounts in the inventory probably means "total". Furthermore, that word, ku-ro, is similar to a form of a Greek word for "total" ("houlon"), so it is very likely that the
Re: (Score:2)
Nah, it's not gonna be much help with Linear A. Although without a solid decipherment it's hard to be sure, a majority of the characters in Linear B also appear in Linear A. There are also names that appear in both scripts. This of course no guarantee that all the symbols had the same values in both scripts, but it's a reasonable starting point.
Furthermore, Linear A is a syllabary, not an alphabet, and they used logograms extensively. Ugaritic, being an alphabet, is much simpler. They haven't demonstrated
Re: (Score:2)
Well, a more obvious implication is that if you fed in some percentage of Linear A texts and Cretan pictographic texts, you'd get virtually the same results as feeding in a different set of texts (ie: symbols should always equate to the same opposite number) if they are truly related.
This would at least let you identify if the texts are indeed of the same language, even if you can't read it, which is further along than we are now.
Re: (Score:2)
What's the difference between linear A and perl? ... drrrtish!
One day we might be able to read linear A
Re: (Score:2)
On the other hand it could be useful since a program could do such a test against every known language quickly as long as you rented enough CPU time.
I imagine such a task would take a long time to do by hand.
Re: (Score:2)
It's a good thought, and definitely worth a try once they've worked the algorithm more. (This is very preliminary stuff.)
But Linear A is going to be hard. There are a lot of fragments, but they're still fragments. The longest texts are only a few sentences long, and most are much shorter than that.
Nonetheless, it's a very promising start. When you combine what the algorithm can put out with the rest of what researchers know (semantic information that the algorithm doesn't have and probably won't any tim
Next step: (Score:2, Insightful)
If only we could find a language that is similar enough...
Re: (Score:2)
Thats amazing. I will have to set aside some time to go through it. My guess is that the document is an attempt to create a written script for an Asian language which is only spoken. Cantonese comes to mind because speakers of that language currently borrow mandarin and chinese writing when they want to write stuff down.
Re: (Score:1)
It seems plausible, all the statistical and historical evidence back it up, but it's quite strange that even with this critical hint nobody has solved the mystery yet.
Re: (Score:2)
Cantonese is a dialect of Chinese (as is Mandarin). In fact it is more akin to Middle Chinese than modern Mandarin. It is commonly accepted that Tang dynasty poetry sounds better in Cantonese due to the more similar tonal structure. Basically, it is believed that Cantonese has gone through less changes over the (1500) years from Middle Chinese than Mandarin.
It is similar to how it is now believed that Elizabethan English sounds more like American English than British English / Received Pronunciation. When c
Re: (Score:2)
That's interesting, I have not come across this before.
I last worked in computational linguistics over five years ago and but when I left there were a good supply of techniques for automatically extracting meaning from an unknown text.
My own research was able to build up both a dendrogram and word vectors from any sufficiently large corpus, and a quick google search turned up http://www.springerlink.com/content/fp17278783422256/ [springerlink.com] which shows that the field is continuing to develop. I would expect that by no
Re: (Score:2)
The problem is that one of their four assumptions is that the script for the undeciphered language maps characters 1-to-1 onto an existing language's script in a way such that letter frequencies are similar, which is something people have already looked for and which appears not to be the case with the Voynich manuscript.
Re: (Score:1)
Wow, thanks for that one. I ended up sidetracked for hours reading about that, and trying to fathom its meaning for myself. Coolest "something new for today" I've learned in a very long time.
Interesting (Score:1)
I guess there might be some way to handle some possible differences in script type (comparing a language written with alphabetic system to one written using a syllabary or abjad) by producing a fake alternate writing system for the known language that would be
Sigh. (Score:2)
£337 $p33|{ |)00d$ (Score:1)
Re: (Score:2)
It probably would have problems with your leet hacker speak, but it isn't that hard to decypher. Then again, since some of the output I've had from OCR resembles your text, maybe not...
Voynich ? (Score:2)
So, when are they going to apply this to the Voynich manuscript [wikipedia.org] ?
"Axe" on axe ?!? (Score:1)
the discovery of an axe that had the word “axe” written on it in Ugaritic
A conversation in Semitic times:
"What's that?"
"Dunno..." examines the object "...it says on here that it's an axe."
Re: (Score:2)
the discovery of an axe that had the word “axe” written on it in Ugaritic
A conversation in Semitic times:
"What's that?"
"Dunno..." examines the object "...it says on here that it's an axe."
Honestly, i would think that it was the name of the person who owned it myself.
Re: (Score:2)
Honestly, i would think that it was the name of the person who owned it myself.
IIRC there are other examples of axes with the word "axe" written on them in languages known at the time in the area, so it wasn't that big a leap.
Google is missing out (Score:2)
Re: (Score:1)
they stopped participating. I don't think that they need too much external help.
Re: (Score:2)
Screw the article.... (Score:3, Informative)
You want to impress me... (Score:3, Funny)
Iberian from Basque Language? (Score:1, Interesting)
Iberian language was spoken in Spain before the Roman Empire. It has some similarities with Basque Language. The texts in iberian are few, anyway I wonder if this language could be decoded using this tool.
Yes but... (Score:1)
undecipherable languages? (Score:2)
If they are undecipherable languages, how do they verify the results are accurate?
They don't know (Score:1)
But also note that, at present, this tool best serves as an aid to those trying to decipher languages. The article states that the output has limitations that make it rather inutile for the general publ
Re: (Score:2)
If they are undecipherable languages, how do they verify the results are accurate?
In general, there are two ways to test a decipherment. The first is to compare it to a bilingual text (e.g., the Rosetta Stone). Ancient Sumerian is apparently unrelated to everything else, but there were a lot of bilinguals so the decipherment is pretty firm.
The second method is to use the decipherment to decipher a new text. For instance, the first big test for Michael Ventris's decipherment of Linear B was using it on some newly discovered tablets. Obviously there's more uncertainty with this method,
TFA is unintentionally funny (Score:1)
An incidental challenge in developing a computer system that could decipher Ugaritic (inscribed on tablet) was developing a way to digitally render Ugaritic symbols.
Riiiiight. What did they feed their software? Photographs of stone tablets?
Shaka! (Score:1)
SHAKA! When the walls fell. :(
Now try it on something else. (Score:2)
Like this [wikipedia.org].
ObVoynich (Score:2)
(Insert obligatory wishful thinking about the Voynich Manuscript here.)
That's nice, but (Score:1)