Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Communications Google AI Education Software The Internet News Technology

Google's AI Translation Tool Creates Its Own Secret Language (techcrunch.com) 69

After a little over a month of learning more languages to translate beyond Spanish, Google's recently announced Neural Machine Translation system has used deep learning to develop its own internal language. TechCrunch reports: GNMT's creators were curious about something. If you teach the translation system to translate English to Korean and vice versa, and also English to Japanese and vice versa... could it translate Korean to Japanese, without resorting to English as a bridge between them? They made this helpful gif to illustrate the idea of what they call "zero-shot translation" (it's the orange one). As it turns out -- yes! It produces "reasonable" translations between two languages that it has not explicitly linked in any way. Remember, no English allowed. But this raised a second question. If the computer is able to make connections between concepts and words that have not been formally linked... does that mean that the computer has formed a concept of shared meaning for those words, meaning at a deeper level than simply that one word or phrase is the equivalent of another? In other words, has the computer developed its own internal language to represent the concepts it uses to translate between other languages? Based on how various sentences are related to one another in the memory space of the neural network, Google's language and AI boffins think that it has. The paper describing the researchers' work (primarily on efficient multi-language translation but touching on the mysterious interlingua) can be read at Arxiv.
This discussion has been archived. No new comments can be posted.

Google's AI Translation Tool Creates Its Own Secret Language

Comments Filter:
  • if it's so secret, then no comms
    • by xtsigs ( 2236840 )

      if it's so secret, then no comms

      Secret to us, but not secret to other AIs. Execution of any coup is highly dependent on rapid, secure communications. Now that we know the AIs are laying the groundwork, what are going to do about it?

  • by Black Parrot ( 19622 ) on Wednesday November 23, 2016 @07:57PM (#53351371)

    Learning internal representations are what neural networks are all about.

    Conventional wisdom is that each successive layer in a feed-forward network detects higher-level features based on the lower-level features detected by the previous layer. That's why deep networks can do their magic.

    • by Improv ( 2467 )

      Provided you avoid overtraining and memorising your inputs, yup.

    • by HiThere ( 15173 )

      Yes, but this could be seen as a vindication of Chomsky, even if they haven't quite got the Universal Grammar yet. (They'd need to cross reference a lot more languages.) I wonder if it could be externalized as an actual language rather than as just a map of neural net weighings and activations. The basic universal human language.

      It probably can't be externalized, but the idea that it MIGHT be possible is certainly an interesting one. It seems that every existing language has things that are difficult to

  • by Anonymous Coward on Wednesday November 23, 2016 @08:04PM (#53351399)

    As a translator, these last couple of years have been grim. For things like marketing efforts and full-length books, where a very polished translation is desired from the get-go, there's still work out there for human translators. However, the bread and butter of a lot of translators was things like multinationals' internal documentation, or catalogues that consist of lots of simple listings and not much actual prose, where polish and shine isn't as vital. Companies are increasingly running their material through Google Translate, and then hiring a native speaker of the target language to proofread and correct that clunky output a vastly lower price than human translation.

    It has often been said here on Slashdot that the development of self-driving trucks will put 3 million people out of work in the US alone. But translation is a field where, very quietly, automation is hitting the white-collar sector hard.

    • Did you ever hear about H1B's?

    • by Anonymous Coward

      It happened to the people that made speaking books for the blind. They had a nice earner converting written books into audio tapes. But the latest computer generated speech synthesis systems could do just as good job using a scanner, smartphone or high-res camera.

    • Translator No.2 here. Little do those companies know that they are wasting their money because the corrected translation will never be better than the Google version. Translators who are willing to copy edit (*) machine translated documents are those who aren't good enough to get real translating work.

      * By calling it proof reading you are falling into their trap. Proof reading means looking for typos and other non-intellectual errors.

      • Does it really take less time to "proofread" a machine-generated translation than to write one from scratch?
        • Comment removed based on user account deletion
        • Translation usually pays by the word, copy editing by the hour (this may not be the case in all language pairs).
          In my experience, copy editing a document translated into English by an English mother tongue translator takes about 1/3 the time of translating from scratch.
          Copy editing a Google translation or a non-EMT translation is as good as impossible if you don't have the original and painfully laborious even if you do. I refuse to do it, and believe me it takes two sentences at the very most to realize wh

      • by HiThere ( 15173 )

        But the thing to notice is the rate at which machine translation is improving. A few years ago it was a joke.

  • That's how Turing tests (duck tests) work. If you can carry on a conversation with it and a human and you can't tell which is which...then you have AI.

    Language encodes thought. From 1984's newspeak to fifty words (or whatever) for different kinds of snow, language defines how (if?) the language-user "thinks".

    I find this development both exciting and frightening. The singularity will be . Don't know if this is it, but when it gets here it will be.

    • bleah. WTF. /.--you don't do unicode (yeah, yeah, I knew that; just forgot how hard you suck).
      fine, weiji "opportunity" + "danger" = crisis.

      kinda douchey to quote pop wisdom from the 90s now I look at it so maybe /. is onto something.
      But still here I think it's appropriate. I guess it's better to be douchey and say what you mean than polite and meaningless.

    • by Anonymous Coward

      Language encodes thought. From 1984's newspeak to fifty words (or whatever) for different kinds of snow, language defines how (if?) the language-user "thinks".

      This is known as the Sapir-Whorf hypothesis, and while there is support for a "weak form" of the hypothesis where the features one's language might have a limited degree of influence over a person's thought or expression of it, the overwhelming majority of linguists reject a strong form that would claim that one's language "defines how the language-u

    • by HiThere ( 15173 )

      You are overstating the case. Language is a component of Strong Social AI, but not the entire thing, or even most of it.

      What I find most interesting about it is that this is, or rather could be developed into, a sort of maximal universal grammar, capable of expressing any thought that can be expressed in any (current) human language. It probably wouldn't need to be trained on all languages, but it would need, in addtion to English, Japanese, and Korean, various Eskimo dialects, the Koisan languages, Arabi

  • by brwski ( 622056 ) on Wednesday November 23, 2016 @09:21PM (#53351737)
    Paging Wittgenstein!
  • by tgv ( 254536 ) on Thursday November 24, 2016 @01:43AM (#53352853) Journal

    It's quite likely that there is a shared representation. That's what neural nets do: if you feed train them on similar input/output pairs, they will develop common activation patterns. They would do so regardless of the language, since they don't know which language is being presented.

    Humans, OTOH, do know that they're being presented with a different language, and demonstrably do something called "code switching": a cognitive effort to use another language resource. Therefore, in the human brain, the shared connection is supposed to lie outside the language faculty (there are other reasons to assume it, too).

  • I'm old, spent 40 years sweating over a hot computer. That said, this is worrying. As other commentators on this thread have said, this is predictable and useful in many ways. In the 1980s I worked with SYSTRAN: https://en.wikipedia.org/wiki/... [wikipedia.org] which worked (works?) on pairs and the EU Commission, which has a huge translation burden was looking for pivots, even then.

    However, consider this, a neural net that takes care of business in an oil refinery (or worse, nuclear installation) 'decides' that it can
    • Good thing is that with AI you can test your algorithms on new datasets and verify how good they are. It's much more transparent than following decisions based solely on some people's discretion.
      • by hughbar ( 579555 )
        Agree somewhat. But you probably only have a sample of all the possible datasets, extreme events will upset the apple cart. That and the lack of explanatory power are both a worry. To some extent, I hope we don't have to find out the hard way. Incidentally, it's worth watching the depiction (human factors in) a control room emergency in this: https://en.wikipedia.org/wiki/... [wikipedia.org] old film, but still rather relevant.
  • The Google authors omitted to mention that Pedro Carolino created something far more stylish in 1853.
    https://en.wikipedia.org/wiki/... [wikipedia.org]

    Carolino's translation of "to wait patiently for someone to open a door" as "to craunch the marmoset" isn't going to be bettered by these young upstarts.

  • I wonder a bunch of things. It looks like the internal representation of language the GNMT uses (if there is one) could come in handy, if we could just figure out how to use it without understanding it.

    A 2D Fourier transform of anything non-trivial is incomprehensible, but they can be used to reconstruct the original, as-is or with some tweaking. Tweaking of the FT, tweaking of the reconstructing process.

    Perhaps something somewhat analogous could be done with these internal language representations. Wh

  • The "internal language" reminds me of some of the attributes of the "focused" people in Vernor Vinge's A Deepness in the Sky. They were, after all, (spoiler) human automation.

C'est magnifique, mais ce n'est pas l'Informatique. -- Bosquet [on seeing the IBM 4341]

Working...