Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI News Science

'Hutter Prize' for Lossless Compression of Human Knowledge Increased to 500,000€ (hutter1.net) 65

Baldrson (Slashdot reader #78,598) writes: First announced on Slashdot in 2006, AI professor Marcus Hutter has gone big with his challenge to the artificial intelligence [and data compression] community. A 500,000€ purse now backs The Hutter Prize for Lossless Compression of Human Knowledge... Hutter's prize incrementally awards distillation of Wikipedia's storehouse of human knowledge to its essence.
That essence is a 1-billion-character excerpt of Wikipedia called "enwik9" -- approximately the amount that a human can read in a lifetime. And 14 years ago, Baldrson wrote a Slashdot article explaining how this long-running contest has its roots in a theory which could dramatically advance the capabilities of AI: The basic theory, for which Hutter provides a proof, is that after any set of observations the optimal move by an AI is find the smallest program that predicts those observations and then assume its environment is controlled by that program. Think of it as Ockham's Razor on steroids.
Writing today, Baldrson argues this could become a much more sophisticated Turing Test. Formally it is called Algorithmic Information Theory or AIT. AIT is, according to Hutter's "AIXI" theory, essential to Universal Intelligence.

Hutter's judging criterion is superior to Turing tests in 3 ways:

1) It is objective
2) It rewards incremental improvements
3) It is founded on a mathematical theory of natural science.

Detailed rules for the contest and answers to frequently asked questions are available.

This discussion has been archived. No new comments can be posted.

'Hutter Prize' for Lossless Compression of Human Knowledge Increased to 500,000€

Comments Filter:
  • Scam artists (Score:4, Insightful)

    by 110010001000 ( 697113 ) on Saturday February 22, 2020 @03:36PM (#59754928) Homepage Journal

    Why is the "AI community" completely full of scam artists?

    • Because any scientist who actually tries to solve the Hard AI problem ends up getting too drunk depressed to speak out against nonsense?
    • Because we're at least a hundred years from Human-level AI with modern tech, and that's assuming it's even possible using transistors (the brain is way more complex than that, changing size, shape, connection count over time, and having aggregate effects like neurotransmitters floating semi-freely in the fluid around it - meanwhile we have no computer chips, traditional or quantum, that even have that sort of reconfiguration on their far future horizon [and no, FPGAs aren't even close.]) What we have are s
      • by gweihir ( 88907 )

        Well, actually there is no scientifically sound reason to believe building a brain-equivalent would result in something intelligent either. That is unless you are a physicalist, but physicalism is religion and has no scientific basis.

        • That is unless you are a physicalist, but physicalism is religion and has no scientific basis.

          What you call "physicalism" is what everyone else calls "science".
          Your superstitious nonsense has much more in common with religion.

    • by gweihir ( 88907 )

      I don't think it is. I think it is like 95% clueless hopefuls that desperately want the idea to work and 5% scam artists profiteering from that. I am unsure about the underlying motivations, but I suspect the 95% want some artificial slaves that they can order around and look down on.

    • by hey! ( 33014 )

      Because language. We use the word "intelligence" to allude to a whole bunch of different things related to ways human process information. Then we act as if "intelligence" a uniform category of things that we can measure or simulate *as a whole*.

      This naturally leads to overly broad expectations when you talk about "artificial" intelligence. Then when you find out how some specific form of AI works, you inevitably feel shortchanged.

      Take regression trees, one of the most practically successful approaches t

    • Because there are lots of bucks available at least until the next "AI Winter".
  • by retchdog ( 1319261 ) on Saturday February 22, 2020 @03:55PM (#59754956) Journal
    Why the fuck would you revamp a prize in NLP AI while retaining a restriction that rules out the past few very fruitful years of GPU/TPU-based neural net models? "Restrictions: Must run in 100 hours using a single CPU core and <10GB RAM and <100GB HDD on our test machine." [hutter1.net] I understand that practical run-time is a concern, but a restriction of 100 hours on a single CPU core is ridiculous. GPUs aren't even that expensive; a 2080Ti is cheaper (adjusted for inflation) than a C=64 was on release in 1982, and there are plenty of affordable options to rent time on the hardware.
    • Presumably to prevent the "solution" that happened in chess......where scientists were focused on getting computers to think like humans, and then suddenly as a publicity stunt a corporation built a huge tree searching computer and pretended they had taught the computer to think.

      I'm not sure limiting it in this way is the right approach, but it is certainly a natural reaction.
      • right, some limitation is necessary, but when all of the best NLP research of the past few years (by far, like not even a competition...) has been deep neural net models, it's absurd to completely ban GPU usage on your test platform.
      • i mean, any problem can be stretched artificially long by restricting its implementation space.

        if you need the algorithm to run fast on a single CPU core, then there is still no effective AI for Go, since those run on GPU/TPUs.

        now, if one were ostensibly interested in getting computers to play Go at all, would they allow the use of GPUs? yes, they would. if GPUs were some scarce resource, i'd understand maybe, but uh you can buy them at walmart.

    • Look, you can compress stuff into an almost arbitrary small size of N bytes by just computing a cryptographic hash-sum of that size over your text body, then let your "decompressor" iterate over every possible text starting from an empty string, and check if the hash-sum becomes the same. Of course, not only would the decompression be unusably slow, also the compressor would have to make sure the N bytes hash sum is first hit when the actual text body becomes the one to be compressed, so it takes even longe
      • I'm not sure this scheme would work the way you seem to think it would work.
        • by ffkom ( 3519199 )
          The winning compressor for the 100MB corpus produced 15'284'944 bytes of compressed output. Yes, I am very sure that a cryptographic hash-sum of let's say 12'000'000 bytes size would reproduce the uncompressed corpus as its first matching result when iterated. But this is impractical to test. Maybe we could start testing with a 7 byte corpus of real, grammatically correct text, let's see how big the output of "xz" (or the likes) from that is, call that number of bytes N, and then find out whether the N-1 lo
          • Reposting again for truth.

            You have no idea how stupid that is. Turing's corpse just blew dehydrated flecks of cyanide-laced apple skin into the moldy remains of his nasal cavity.

            This only works if you can iterate within the typical set of the target domain.

            The minimum fully distinguished hash length for a truly random input string is the same size as the input string, plus an anti-collision pad of about 16 bytes (ambiguous cases reduced to one part in 2^128).

            What you've actually described is a data inflatio

          • Yes, I am very sure that a cryptographic hash-sum of let's say 12'000'000 bytes size would reproduce the uncompressed corpus as its first matching result when iterated.

            How sure are you? The probability of what you're describing is something like 10^-1000000.

            Maybe we could start testing with a 7 byte corpus of real, grammatically correct text

            Why grammatically correct? Why do you assume that?

            let's see how big the output of "xz" (or the likes) from that is, call that number of bytes N, and then find out whether the N-1 low-order bytes of a sha256 hash sum would be reproduced for a non-corpus text when iterating through all texts up to the corpus text. I am pretty confident that "compression by iteration" will success with N-1 bytes and win this compression contest.

            Algorithms are supposed to be general. If you have a different program for different inputs, it defeats the point of having algorithms in the first place. Having something that succeeds for one input is useless.

      • uh, yeah? no shit. so put a restriction of "100 hours on a CPU core or 20 hours on 8x2080Ti GPUs" (or whatever, obviously the 100 and 20 are tweaked). your "point" is literally a strawman. to avoid the "attack" you describe of literally searching the fucking hashspace on a multi-GB input, all you need is for the algorithm to terminate before the end of the fucking universe... limiting it to 100 hours on an i7 is arbitrary and stupid.
        • by ffkom ( 3519199 )
          You would reduce the group of potential participants to a tiny fraction of what it is today if participants were expected to have access to "8x2080Ti GPUs" at their disposal. The FAQ of the contest explains very comprehensively why the conditions have been set as they are, and the size of the corpus to compress has also been chosen carefully to allow for solutions that can be attempted without investing many thousands of USD.
          • frankly, to improve on Alexander Rhatushnyak's work would be an endeavor of several years of one's life. to be honest, a few thousand USD shouldn't be a problem; even Nigeria has universities with real computers in them.

            google and amazon love throwing free compute credits to get students to use their platforms.

            but this is beside the point; the loss of denying an entire family of models far outweighs the "benefit" of constraining execution to an i7 because of accessibility. this objection is a cartoonish par

            • by ffkom ( 3519199 )
              No "family of models" is denied entry. Feel free to emulate whatever GPU or RAM you miss on the available CPU and swap space.
          • okay so you tried a technical strawman and that didn't work, so now you're using some accessibility argument which is also garbage.

            maybe just give up? idk. we could both be doing better things.

            • by ffkom ( 3519199 )
              Yes, for example you could fund another competition that better fits your preference of what hardware should be thrown at the topic.

              What did not work is that you did not convince the ones running this competition that they should adopt your favorized rules.
      • by epine ( 68316 )

        You have no idea how stupid that is. Turing's corpse just blew dyhydrated flecks of cyanide-laced apple skin into the moldy remains of his nasal cavity.

        This only works if you can iterate within the typical set of the target domain.

        The minimum fully distinguished hash length for a truly random input string is the same size as the input string, plus an anti-collision pad of about 16 bytes (ambiguous cases reduced to one part in 2^128).

        What you've actually described is a data inflation method.

        Crank score 10/10

      • by rtb61 ( 674572 )

        You will always run in storage problems with binary, storing stuff as off and on. Want to up storage density, store stuff as off and set to specific frequency. Depending upon how well you can store and keep the frequency active, polling it should be easy, same as transmitting it and for processing covert back into binary. Store it in ten different frequencies and you hugely increase data storage per transistor substitute.

    • by irchans ( 527097 )

      Actually, "a restriction of 100 hours on a single CPU core is" not that ridiculous. It often takes much more than 100 hours to train a neural net, but it takes much less time to run a neural net that has already been trained.

      • by ffkom ( 3519199 )
        Also, the software winning the past 100MB contest has not seen widespread use for lossless compression outside this contest. So it is already so much more optimized for size than for resource usage, that allowing for even more resource usage would probably just result in winners that are of even less generic applicability.
        • "even more resource usage"? okay, fair enough, let's denominate it in actual dollars as opposed to the arbitrary "you get 100 hours on whatever Hutter had lying around".

    • by Baldrson ( 78598 ) *

      Separating the 2 questions, 1) "Why not just use BERT?" and 2) "Why exclude parallel hardware?"

      1) BERT's algorithmic opacity leaves open an enormous "argument surface" regarding "algorithmic bias". There are >100,000 hits for "algorithmic bias [google.com]" due to inadequately principled model selection criteria used by the ML community, thence google's algorithms. Algorithmic Information Theory cuts down the argument surface to its mathematical definition of natural science:
      Solomonoff Induction. Basically

      • 1) If BERT can magically sneak in some spooky black magic, then you're admitting that enwik9 is not an adequate data set for testing, end of story. I don't understand the relevance of your ramblings about AIT and "Cartesian causality"; they sound like hokum excuses to me. (note that I understand AIT, KC, and SI at least at the level of an intro textbook; I just don't understand your argument using them.)

        Could you give a more explicit example of what you mean by "bias"? I mean, if this so-called "bias" achie

        • by Baldrson ( 78598 ) *

          retchdog writes:

          If BERT can magically sneak in some spooky black magic, then you're admitting that enwik9 is not an adequate data set for testing, end of story.

          No. All BERT must do is beat the Large Text Compression Benchmark -- a benchmark that uses enwik9 as the corpus.

    • Oh, that's easy: Please remember that the end game here is not about compression. It's about gaining insights on how AGIs might actually work. Admittedly, there are some domains where NNs create marvellous results, but in essence, they're just very powerful parameter fitting machines for preconceived models. You don't gain any insights from that. Secondly, there's not a single instance where this approach has been demonstrated to provide anything that might be related to real semantics.
      • Okay, well, you tell me then: what insights have we gained about AGI so far by hyper-optimizing the PAQ series of compressors? That's what has dominated this competition so far.

  • Annoying (Score:5, Insightful)

    by lobiusmoop ( 305328 ) on Saturday February 22, 2020 @04:13PM (#59754996) Homepage

    the Hutter Prize claims to be promoting AI development, but its demand of lossless compression goes against real AI.

      People don't memorize texts that way, they map them to pre-existing lexical knowledge as a form of lossy compression, maintaining the meaning and semantics while potentially losing the fine detail of syntax and grammar.

    Allowing a degree of loss in the compression while maintaining the meaning of the text would allow for much higher compression.

    • not really. you can think of lossless compression as "lossy compression followed by a diff that patches up the rest". the "a diff that patches up the rest" is pretty trivial, and the better the lossy compression model, the better performance you'll get anyway. so i don't see how it isn't "real AI". developing better lossy compression will mean either that the lossy compression is better or that the diff will be smaller (and hopefully both!). it's just to avoid having to dig into the weeds on technicalitie
      • uh, i'm glad i amused someone, but, yes, this is actually how it works. FLAC, for example, uses a lossy predictor based on past signal to predict the following signal, and combines this with a separate Golomb-Rice encoding for the residuals (i.e. for the value true_signal - lossy_prediction). it's not a joke. modern compression theory can handle this, and it's not a problem. if you come up with a better lossy predictor, that's great! you'll still beat someone else using the same error coding. lobiusmoop's
        • You're not wrong. However "absorb all this wiki text and regurgitate it", is a much narrower "AI" problem than "... and explain it" or "... and apply it".

          It boils down to writing an efficient, lossy text predictor. While writing a better predictor might be interesting, and might be useful for gaining insight into some form of AI. I'm not convinced it's a worthwhile exercise.

          • I'm not sure tbh. There is a significant gap between the estimated entropy of English text (based on human English-speakers' predictive ability), and the compression rate of the best compressors. I am willing to believe that "compressing English to ~1.3 bits per character" is an AI-complete [wikipedia.org] problem.

            I am deeply skeptical, however, that the Hutter prize will get us there. So far it has been dominated by very complex hand-tuned compressors implementing a lot of clever heuristics. I believe that a marginal gain

      • by irchans ( 527097 )

        Upvote Parent.

      • by Megol ( 3135005 )

        Let's take some information inside some text that's very simple: "a = pi * sin(n)"
        How many way can this relation be expressed? An AI could understand and map this into something internal and when prompted repeat it (as used in the original text where it's used). If this equation isn't used inside the text instead being an editing artifact the AI could understand the text while ignoring it - just as humans skip things after understanding it isn't vital. But let's assume the information is important, why sho

        • "Even if an AI understands things this is a compression task rather than something AI related. Statistics is more important than understanding."

          yes, that's a good criticism of the Hutter prize in general, imho. it's like it's 1600 and you start a prize to reach the moon: for every meter closer to the moon one gets, they receive a few grams of gold. people might climb mountains, build tall spires and maybe build some elaborate hot air balloons, but it's obvious that these attempts are futile for actually rea

        • by Baldrson ( 78598 ) *

          Megol asks:

          But let's assume the information is important, why should the raw text be repeated instead of the myriad variants that express the same thing in a different way?

          For the same reason scientists don't throw out the raw measurements in their experiments just because it departs from theory. One can assert that the form of the expression is irrelevant to the underlying knowledge, but this is a sub-theory based the agent's comprehension of the entire corpus. This becomes particularly relevant when at

    • by Baldrson ( 78598 ) *

      Lossy Compression Is Confirmation Bias.

      One Man's Noise Is Another Man's Ciphertext.

    • by ffkom ( 3519199 )
      Who says that "artificial intelligence" has to mimic how natural intelligence works? This is a very practical contest to get something very specific done. A compressor may be as little similar to a human brain than a "Go" software is, but that does not make it "unintelligent".
    • by Anonymous Coward

      I suspect you are a bit confused to the terms and what they are applied to.

      Your first sentence says everything else in your post is incorrect. That sentence is itself incorrect.
      The rest of your post is exactly right, and is exactly what Hutter is aiming for.

      People don't memorize texts that way, they map them to pre-existing lexical knowledge as a form of lossy compression, maintaining the meaning and semantics while potentially losing the fine detail of syntax and grammar.

      That is what is meant by lossless compression here.
      You don't memorize the exact words in 10 books on the same subject. The info in all the books gets merged into one mental model, it is NOT kept individually.

      Allowing a degree of loss in the compression while maintaining the meaning of the text would allow for much higher compression.

      The meaning is exactly what lossless is refe

    • There are a few potential problems with your suggestion. First and foremost, remember that this is not a dick-measuring contest about compression. That's not the endgame at all. It's about making progress in the field of AGIs! I personally find Hutter's hypothesis and approach very compelling: Find the best possible compression for human knowledge and then try to defer insights from that. By definition, an AGI wouldn't have any issues with "noise" that's due to grammar, syntax or any other semantically non
  • The basic theory, for which Hutter provides a proof

    I don't think the word "proof" means what Hutter thinks it means.

  • Pfft... 13'th century, and a religious nut. Can't possibly have any validity.

    --Sadly, most Slashdotters
  • That essence is a 1-billion-character excerpt of Wikipedia called "enwik9" -- approximately the amount that a human can read in a lifetime.

    Okay so it takes your whole lifetime to read it, and then your life is over and you can't do anything with whatever little you were able to memorize or learn of the whole thing.

    So what's the point exactly?

  • of reading a billion characters? Youâ(TM)re not going to remember or use nearly all of it. This goes along with reading books all the time, I remember reading some article about how Bill Gates or the like was complaining he is not going to read all of the books he wants simply because he does not have the time, and he is currently reading like 5-10 a week. Why?

    • You're missing the point.

      Saying it's the amount one person can read in a lifetime is just a sense of the size of the information contained. Nobody said you're *supposed* to read it. Consider it to be equivalent of "amount of information a human gains in a lifetime", and we want to make an AI that contains that knowledge, but in a useful, condensed way rather than just stuffing data into a hard-drive.

      The point is that the ultimate goal here is to make an AI that contains the sum of human knowledge as a *star

      • Also, consider the stated goal is to "compress" the 1-billion characters as much as possible. i.e. even if you're going to read the output, the goal here is to reduce the lifetime's worth of raw information down to ... much less information.

        • by Megol ( 3135005 )

          ... of which the majority will be irrelevant for AI purposes instead representing layout, writing style, order of descriptions, ...

  • Why is Baldrson glorious enough to have his name hotlinked three times, and his five digit userid waved in our faces like it was his penis?

    • by ktakki ( 64573 )

      He's a classic net.kook like Willis Carto, who wants to depopulate cities because they're too Jewish, and go back to a pastoral northern European fairy tale past.

      k.

  • The simplest possible explanation is NOT most likely to be right! That is merely wishful thinking, with no basis in observed reality.

    It's just the first we should TRY! Because it is more efficient to first try the simpler solutions, as you are more likely to end up with the right one earlier.
    But it might, can be, and often *is* some crazy convoluted system that is way too complicated to be fun.

    I mean just look at quantum physics.
    Or the citrate cycle.
    They are closer to Rube-Goldberg machines, just plain sill

    • by Megol ( 3135005 )

      You don't understand the razor.

    • by fygment ( 444210 )

      Your confusing the meaning of "simplest" likely because you've only ever heard the 'lay' interpretation of Occam's law of parsimony. The actual statement is: 'Entities should not be multiplied without necessity." Now do you get it?

  • Gravity waves compress and expand all human knowledge losslessly. Can I collect my prize now?
  • I think there's reasonable support for the notion that human intelligence might be a form of GAN (generative adversarial network) running on hardware which is very suitable for some domains. But that just means that it learns to generate "good enough" scenarios, not that it has any precision at all. Sure, you can add precision with a fix-up diff, but the fix-up diff would likely be many orders of magnitude larger than the encoded network's size. Past a certain point, stunning improvements in the AI capab

Genius is ten percent inspiration and fifty percent capital gains.

Working...