'Hutter Prize' for Lossless Compression of Human Knowledge Increased to 500,000€ (hutter1.net) 65
Baldrson (Slashdot reader #78,598) writes: First announced on Slashdot in 2006, AI professor Marcus Hutter has gone big with his challenge to the artificial intelligence [and data compression] community. A 500,000€ purse now backs The Hutter Prize for Lossless Compression of Human Knowledge... Hutter's prize incrementally awards distillation of Wikipedia's storehouse of human knowledge to its essence.
That essence is a 1-billion-character excerpt of Wikipedia called "enwik9" -- approximately the amount that a human can read in a lifetime. And 14 years ago, Baldrson wrote a Slashdot article explaining how this long-running contest has its roots in a theory which could dramatically advance the capabilities of AI: The basic theory, for which Hutter provides a proof, is that after any set of observations the optimal move by an AI is find the smallest program that predicts those observations and then assume its environment is controlled by that program. Think of it as Ockham's Razor on steroids.
Writing today, Baldrson argues this could become a much more sophisticated Turing Test. Formally it is called Algorithmic Information Theory or AIT. AIT is, according to Hutter's "AIXI" theory, essential to Universal Intelligence.
Hutter's judging criterion is superior to Turing tests in 3 ways:
1) It is objective
2) It rewards incremental improvements
3) It is founded on a mathematical theory of natural science.
Detailed rules for the contest and answers to frequently asked questions are available.
That essence is a 1-billion-character excerpt of Wikipedia called "enwik9" -- approximately the amount that a human can read in a lifetime. And 14 years ago, Baldrson wrote a Slashdot article explaining how this long-running contest has its roots in a theory which could dramatically advance the capabilities of AI: The basic theory, for which Hutter provides a proof, is that after any set of observations the optimal move by an AI is find the smallest program that predicts those observations and then assume its environment is controlled by that program. Think of it as Ockham's Razor on steroids.
Writing today, Baldrson argues this could become a much more sophisticated Turing Test. Formally it is called Algorithmic Information Theory or AIT. AIT is, according to Hutter's "AIXI" theory, essential to Universal Intelligence.
Hutter's judging criterion is superior to Turing tests in 3 ways:
1) It is objective
2) It rewards incremental improvements
3) It is founded on a mathematical theory of natural science.
Detailed rules for the contest and answers to frequently asked questions are available.
Scam artists (Score:4, Insightful)
Why is the "AI community" completely full of scam artists?
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Well, actually there is no scientifically sound reason to believe building a brain-equivalent would result in something intelligent either. That is unless you are a physicalist, but physicalism is religion and has no scientific basis.
Re: (Score:2)
That is unless you are a physicalist, but physicalism is religion and has no scientific basis.
What you call "physicalism" is what everyone else calls "science".
Your superstitious nonsense has much more in common with religion.
Re: (Score:2)
I don't think it is. I think it is like 95% clueless hopefuls that desperately want the idea to work and 5% scam artists profiteering from that. I am unsure about the underlying motivations, but I suspect the 95% want some artificial slaves that they can order around and look down on.
Re: (Score:2)
Because language. We use the word "intelligence" to allude to a whole bunch of different things related to ways human process information. Then we act as if "intelligence" a uniform category of things that we can measure or simulate *as a whole*.
This naturally leads to overly broad expectations when you talk about "artificial" intelligence. Then when you find out how some specific form of AI works, you inevitably feel shortchanged.
Take regression trees, one of the most practically successful approaches t
Re: (Score:2)
just use BERT? oh wait... (Score:4, Insightful)
Re: (Score:3)
I'm not sure limiting it in this way is the right approach, but it is certainly a natural reaction.
Re: (Score:2)
Re: (Score:2)
i mean, any problem can be stretched artificially long by restricting its implementation space.
if you need the algorithm to run fast on a single CPU core, then there is still no effective AI for Go, since those run on GPU/TPUs.
now, if one were ostensibly interested in getting computers to play Go at all, would they allow the use of GPUs? yes, they would. if GPUs were some scarce resource, i'd understand maybe, but uh you can buy them at walmart.
because "decompression by iteration" sucks (Score:2)
Re: (Score:2)
Re: (Score:2)
cyanide snortable (Score:3)
Reposting again for truth.
You have no idea how stupid that is. Turing's corpse just blew dehydrated flecks of cyanide-laced apple skin into the moldy remains of his nasal cavity.
This only works if you can iterate within the typical set of the target domain.
The minimum fully distinguished hash length for a truly random input string is the same size as the input string, plus an anti-collision pad of about 16 bytes (ambiguous cases reduced to one part in 2^128).
What you've actually described is a data inflatio
Re: (Score:2)
Yes, I am very sure that a cryptographic hash-sum of let's say 12'000'000 bytes size would reproduce the uncompressed corpus as its first matching result when iterated.
How sure are you? The probability of what you're describing is something like 10^-1000000.
Maybe we could start testing with a 7 byte corpus of real, grammatically correct text
Why grammatically correct? Why do you assume that?
let's see how big the output of "xz" (or the likes) from that is, call that number of bytes N, and then find out whether the N-1 low-order bytes of a sha256 hash sum would be reproduced for a non-corpus text when iterating through all texts up to the corpus text. I am pretty confident that "compression by iteration" will success with N-1 bytes and win this compression contest.
Algorithms are supposed to be general. If you have a different program for different inputs, it defeats the point of having algorithms in the first place. Having something that succeeds for one input is useless.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
frankly, to improve on Alexander Rhatushnyak's work would be an endeavor of several years of one's life. to be honest, a few thousand USD shouldn't be a problem; even Nigeria has universities with real computers in them.
google and amazon love throwing free compute credits to get students to use their platforms.
but this is beside the point; the loss of denying an entire family of models far outweighs the "benefit" of constraining execution to an i7 because of accessibility. this objection is a cartoonish par
Re: (Score:2)
Re: (Score:2)
okay so you tried a technical strawman and that didn't work, so now you're using some accessibility argument which is also garbage.
maybe just give up? idk. we could both be doing better things.
Re: (Score:2)
What did not work is that you did not convince the ones running this competition that they should adopt your favorized rules.
Re: (Score:2)
You have no idea how stupid that is. Turing's corpse just blew dyhydrated flecks of cyanide-laced apple skin into the moldy remains of his nasal cavity.
This only works if you can iterate within the typical set of the target domain.
The minimum fully distinguished hash length for a truly random input string is the same size as the input string, plus an anti-collision pad of about 16 bytes (ambiguous cases reduced to one part in 2^128).
What you've actually described is a data inflation method.
Crank score 10/10
Re: (Score:2)
You will always run in storage problems with binary, storing stuff as off and on. Want to up storage density, store stuff as off and set to specific frequency. Depending upon how well you can store and keep the frequency active, polling it should be easy, same as transmitting it and for processing covert back into binary. Store it in ten different frequencies and you hugely increase data storage per transistor substitute.
Re: (Score:2)
Actually, "a restriction of 100 hours on a single CPU core is" not that ridiculous. It often takes much more than 100 hours to train a neural net, but it takes much less time to run a neural net that has already been trained.
Re: (Score:2)
Re: (Score:2)
"even more resource usage"? okay, fair enough, let's denominate it in actual dollars as opposed to the arbitrary "you get 100 hours on whatever Hutter had lying around".
Re: (Score:2)
Separating the 2 questions, 1) "Why not just use BERT?" and 2) "Why exclude parallel hardware?"
1) BERT's algorithmic opacity leaves open an enormous "argument surface" regarding "algorithmic bias". There are >100,000 hits for "algorithmic bias [google.com]" due to inadequately principled model selection criteria used by the ML community, thence google's algorithms. Algorithmic Information Theory cuts down the argument surface to its mathematical definition of natural science:
Solomonoff Induction. Basically
Re: (Score:2)
1) If BERT can magically sneak in some spooky black magic, then you're admitting that enwik9 is not an adequate data set for testing, end of story. I don't understand the relevance of your ramblings about AIT and "Cartesian causality"; they sound like hokum excuses to me. (note that I understand AIT, KC, and SI at least at the level of an intro textbook; I just don't understand your argument using them.)
Could you give a more explicit example of what you mean by "bias"? I mean, if this so-called "bias" achie
Re: (Score:2)
retchdog writes:
No. All BERT must do is beat the Large Text Compression Benchmark -- a benchmark that uses enwik9 as the corpus.
Re: (Score:1)
Re: (Score:2)
Okay, well, you tell me then: what insights have we gained about AGI so far by hyper-optimizing the PAQ series of compressors? That's what has dominated this competition so far.
Re: I'm an AI (Score:1)
1 STOP
Annoying (Score:5, Insightful)
the Hutter Prize claims to be promoting AI development, but its demand of lossless compression goes against real AI.
People don't memorize texts that way, they map them to pre-existing lexical knowledge as a form of lossy compression, maintaining the meaning and semantics while potentially losing the fine detail of syntax and grammar.
Allowing a degree of loss in the compression while maintaining the meaning of the text would allow for much higher compression.
Re: (Score:3)
Re: (Score:3)
Re: (Score:2)
You're not wrong. However "absorb all this wiki text and regurgitate it", is a much narrower "AI" problem than "... and explain it" or "... and apply it".
It boils down to writing an efficient, lossy text predictor. While writing a better predictor might be interesting, and might be useful for gaining insight into some form of AI. I'm not convinced it's a worthwhile exercise.
Re: (Score:2)
I'm not sure tbh. There is a significant gap between the estimated entropy of English text (based on human English-speakers' predictive ability), and the compression rate of the best compressors. I am willing to believe that "compressing English to ~1.3 bits per character" is an AI-complete [wikipedia.org] problem.
I am deeply skeptical, however, that the Hutter prize will get us there. So far it has been dominated by very complex hand-tuned compressors implementing a lot of clever heuristics. I believe that a marginal gain
Re: (Score:1)
Upvote Parent.
Re: (Score:2)
Let's take some information inside some text that's very simple: "a = pi * sin(n)"
How many way can this relation be expressed? An AI could understand and map this into something internal and when prompted repeat it (as used in the original text where it's used). If this equation isn't used inside the text instead being an editing artifact the AI could understand the text while ignoring it - just as humans skip things after understanding it isn't vital. But let's assume the information is important, why sho
Re: (Score:2)
"Even if an AI understands things this is a compression task rather than something AI related. Statistics is more important than understanding."
yes, that's a good criticism of the Hutter prize in general, imho. it's like it's 1600 and you start a prize to reach the moon: for every meter closer to the moon one gets, they receive a few grams of gold. people might climb mountains, build tall spires and maybe build some elaborate hot air balloons, but it's obvious that these attempts are futile for actually rea
Re: (Score:2)
Megol asks:
For the same reason scientists don't throw out the raw measurements in their experiments just because it departs from theory. One can assert that the form of the expression is irrelevant to the underlying knowledge, but this is a sub-theory based the agent's comprehension of the entire corpus. This becomes particularly relevant when at
Re: (Score:2)
Lossy Compression Is Confirmation Bias.
One Man's Noise Is Another Man's Ciphertext.
Re: (Score:2)
Re: (Score:1)
I suspect you are a bit confused to the terms and what they are applied to.
Your first sentence says everything else in your post is incorrect. That sentence is itself incorrect.
The rest of your post is exactly right, and is exactly what Hutter is aiming for.
People don't memorize texts that way, they map them to pre-existing lexical knowledge as a form of lossy compression, maintaining the meaning and semantics while potentially losing the fine detail of syntax and grammar.
That is what is meant by lossless compression here.
You don't memorize the exact words in 10 books on the same subject. The info in all the books gets merged into one mental model, it is NOT kept individually.
Allowing a degree of loss in the compression while maintaining the meaning of the text would allow for much higher compression.
The meaning is exactly what lossless is refe
Re: (Score:1)
Proof (Score:1)
The basic theory, for which Hutter provides a proof
I don't think the word "proof" means what Hutter thinks it means.
Occam's Razor (Score:1)
--Sadly, most Slashdotters
What's the point? (Score:2)
Okay so it takes your whole lifetime to read it, and then your life is over and you can't do anything with whatever little you were able to memorize or learn of the whole thing.
So what's the point exactly?
Re: (Score:2)
It's a sense of the scale of information. More importantly, it sets a bar for "as good as a human can get"
What is the point (Score:2)
of reading a billion characters? Youâ(TM)re not going to remember or use nearly all of it. This goes along with reading books all the time, I remember reading some article about how Bill Gates or the like was complaining he is not going to read all of the books he wants simply because he does not have the time, and he is currently reading like 5-10 a week. Why?
Re: (Score:2)
You're missing the point.
Saying it's the amount one person can read in a lifetime is just a sense of the size of the information contained. Nobody said you're *supposed* to read it. Consider it to be equivalent of "amount of information a human gains in a lifetime", and we want to make an AI that contains that knowledge, but in a useful, condensed way rather than just stuffing data into a hard-drive.
The point is that the ultimate goal here is to make an AI that contains the sum of human knowledge as a *star
Re: (Score:2)
Also, consider the stated goal is to "compress" the 1-billion characters as much as possible. i.e. even if you're going to read the output, the goal here is to reduce the lifetime's worth of raw information down to ... much less information.
Re: (Score:2)
... of which the majority will be irrelevant for AI purposes instead representing layout, writing style, order of descriptions, ...
Go home, Baldrson. You're drunk. (Score:2)
Why is Baldrson glorious enough to have his name hotlinked three times, and his five digit userid waved in our faces like it was his penis?
Re: (Score:2)
He's a classic net.kook like Willis Carto, who wants to depopulate cities because they're too Jewish, and go back to a pastoral northern European fairy tale past.
k.
But Occam's razor is a fallacy! (Score:2)
The simplest possible explanation is NOT most likely to be right! That is merely wishful thinking, with no basis in observed reality.
It's just the first we should TRY! Because it is more efficient to first try the simpler solutions, as you are more likely to end up with the right one earlier.
But it might, can be, and often *is* some crazy convoluted system that is way too complicated to be fun.
I mean just look at quantum physics.
Or the citrate cycle.
They are closer to Rube-Goldberg machines, just plain sill
Re: (Score:2)
You don't understand the razor.
Re: (Score:2)
Your confusing the meaning of "simplest" likely because you've only ever heard the 'lay' interpretation of Occam's law of parsimony. The actual statement is: 'Entities should not be multiplied without necessity." Now do you get it?
Lossless compression == Gravity wave (Score:2)
Is this Time Cube 2.0? (Score:2)
I think there's reasonable support for the notion that human intelligence might be a form of GAN (generative adversarial network) running on hardware which is very suitable for some domains. But that just means that it learns to generate "good enough" scenarios, not that it has any precision at all. Sure, you can add precision with a fix-up diff, but the fix-up diff would likely be many orders of magnitude larger than the encoded network's size. Past a certain point, stunning improvements in the AI capab