Forgot your password?
Books Media Businesses Google Sci-Fi The Internet Software

Boiling Down Books, Algorithmically 177

Posted by timothy
from the infallible-is-a-very-strong-word dept.
destinyland writes "A year ago, Aaron Stanton harangued Google over his new project, a web site analyzing patterns in books to generate infallible recommendations. In March he finally finished a prototype which he showed to Google, Yahoo, and Amazon, and he's just announced that he's finally received a big contract which 'gives us a great deal of potential data to work with.' The 25-year-old's original prototype examined over 200 books, plotting 729,000 data points across 30,293 scenes — but its universe of analyzed novels is about to become much, much bigger."
This discussion has been archived. No new comments can be posted.

Boiling Down Books, Algorithmically

Comments Filter:
  • by Anonymous Coward on Sunday July 06, 2008 @08:00PM (#24078577)

    "not just those that can afford it."

    Shit Bud, you make it sound like it's the 1200s. Books ARE cheap. Books are just another thing to compete for your money; sometimes they win, sometimes they lose. Like with those bankrupt families that have a 50" plasma screen and a couple Navigators in the driveway. They've made their choices. Personally, I've chosen books. No need to assault anything or anybody; there are no barriers other than our own (assuming you're a white male, of course).

    Say, you aren't one of those I-want-everything-given-to-me-for-free computer type people, are you? Well, if you are, fuck off.

  • by martin-boundary (547041) on Sunday July 06, 2008 @08:33PM (#24078763)
    It always depends on which part of the statistical landscape the algorithm is good at modelling.

    It may be that what makes a book great is hard to identify, but what makes a book really bad is much easier to identify. In that case, such an algorithm won't help with recommending high quality works for you to read, but it could be very useful in saving you from wasting your time with obviously bad books (ie it would help with initial triage).

    Remember, there are a lot more bad books than good books, so if you had to go through all the books to find the good ones, then you'd spend most of your time just looking a bad books and rejecting them.

  • by blahplusplus (757119) on Sunday July 06, 2008 @08:34PM (#24078769)

    "That's very promising. But the fact remains that publishers such as Elsevier own the copyright to many decades-worth of scientific literature. And they're not about to give any of it away."

    Then I submit the scientific community creates a project website to buy the rights to these works, I've come up with many ways for funding such an endeavor. The barrier would primarily be geometric (population size vs amount of money each person could donate/give/invest in such a venture) and the attitudes of the people themselves.

  • by Chineseyes (691744) on Sunday July 06, 2008 @08:56PM (#24078921)
    You do realize that doing something like this publicly would backfire in the worst ways imaginable? You would immediately increase the value of the works and some incredibly wealthy person or corporation may just buy everything out right in the hope that you pay him/her even more money than you had originally planned.
  • by Anonymous Coward on Sunday July 06, 2008 @09:24PM (#24079125)

    All scientists publish their papers both legitimately and illegitmately through an underground site. I imagine such men have the intelligence to do so.

    There are many ways to do this and no, they don't have to be legal when you're dealing with commercial tyrrany.

  • by Anonymous Coward on Sunday July 06, 2008 @09:44PM (#24079243)
    Intelligent people don't just "read books" they get their information from everywhere, even on a bank line they can still get information because they pay attention to everything, they don't have to struggle to learn something.. ask any Mensan if it's not the way they learn stuff. Sure there are some subjects that need deep study, but for the knowledge that comes in handy in everyday tasks... observation it's the way to go/. nd
  • Who is Joe? (Score:5, Interesting)

    by mustafap (452510) on Sunday July 06, 2008 @09:48PM (#24079269)

    There is one persistent son of a bitch on their forum, Joe, who seems to be their nemesis. I wonder what his angle is.

    Other than that, I like their approach - involve the community *really* early on.

    Apart from Joe.

  • by Anonymous Coward on Sunday July 06, 2008 @09:56PM (#24079325)

    His prototype sounds in a way like Netflix's suggestion system for movies, where you vote your favorites and it'll suggest other ones based on your liking. But books are much more complicated, so I can see how his detailed analysis tool can really be the ultimate suggestion tool. I wonder if people will use this to discover copyright infringement on a new level. Hmm... my book and your book are a 99.5% match. Gee where did the .5% discrepancy occur. My character is a 19 yr old hobo, so is yours. My story is about him eventually becoming a successful company executive by pimping himself out to different high-powered women. My character's name is Matt, yours is Mike. Aha.

  • by Sheafification (1205046) on Sunday July 06, 2008 @10:11PM (#24079445)

    As a member-in-training of the scientific community, I think you'll find that most scientists agree with you. Unfortunately the system right now is hard to break out of. You need to publish in a reputable journal for job evaluation and tenure purposes, but many reputable journals are under the thumb of the publishers.

    In mathematics there have been several mass resignations of journal editorial boards in protest over the price. These editors usually then go on to form a brand new, cheaper journal in the same area. So some progress is being made. I can't say what has been happening in the other sciences though.

  • thhhpt! (Score:3, Interesting)

    by themushroom (197365) on Sunday July 06, 2008 @10:26PM (#24079559) Homepage

    Whoever modded this to 'troll' never took the English classes I had. Yo.

  • by Skreems (598317) on Sunday July 06, 2008 @10:34PM (#24079617) Homepage
    I don't get this, though. The idea of "top sellers, best reviewed, genre classics" already exists, and this invention adds nothing to it. On the other hand, the idea of finding books you should read but don't know about seems a problem particularly poorly suited to an automated solution. This is what personal recommendations absolutely excel at, because no algorithm can gauge the cultural impact of a work of art, or the level of craft involved in its making.
  • by Metasquares (555685) <> on Sunday July 06, 2008 @10:56PM (#24079741) Homepage

    Basically. There's no advantage to observation over learning with a focused objective, but I think the key point is that learning is an unconscious process that is primarily carried out intuitively. You can direct your attention towards a subject and think a great deal, but you can't direct your intuition - all you can do is foster an appropriate environment. I've thought of it as a sort of receptiveness for new ideas (which I think are exogenous but are learned only after personal interpretation).

    I would question whether this is how all of the gifted learn, however. I know a lot of gifted people who nevertheless think they can somehow coerce themselves to learn things through sheer conscious effort, without intuition ever taking over - people who make their primary goal thinking about something vs. understanding it, if that makes any sense. If they keep drilling enough, they eventually get whatever it is they were trying to learn, but it tends to take a long time, usually leaves them exhausted, and is swiftly forgotten. Unsurprisingly, these people have all ended up building small, narrowly focused knowledge bases.

    That's not to say that learning difficult material is easy. It's a struggle irrespective of intelligence, and if learning a particular topic comes easily, there's always something harder. Until you intuitively understand something, conscious thought may be your only way of comprehending it - and if you don't intuitively understand a concept, thinking about it is hard work.

    But that's just my own experience.

  • by ruin20 (1242396) on Monday July 07, 2008 @12:15AM (#24080217)
    In most things we evolve, not leap to new horizons. I find that most of the time I choose to read a book because I like it's similarities, I like the book because of it's differences. Like traditional sci-fi to apocalyptic sci-fi to steam punk to biohacking to cyberspace to crypto. I never would have read the Cryptomicon if I hadn't read I, Robot and can say today that I have a better appreciation for one from the other.

    Typically the way we learn and get good at just about everything is that we go a little bit beyond where we're comfortable and we sustain an effort there. After a while our comfort level moves. Just like if I read enough on one subject typically I'll get caught up with a tangent subject and eventually move into that.

  • by Virtual_Raider (52165) on Monday July 07, 2008 @03:17AM (#24081007) Homepage

    the idea of finding books you should read but don't know about seems a problem particularly poorly suited to an automated solution.

    Er... -1,Wrong* : You don't seem to be considering the impact of statistical analysis and Very Large Sets of Data (C)(TM). It's becoming increasingly possible not only to know that 125K other people all over the world bought books B, C and D along with book A that you purchased, but now you can also index and analyse their content so it will be even easier to fine tune.

    Imagine this: On the first iteration (first purchase) it can only out-of-the-blue recommend to you those books more consistently purchased along with the one you chose. But on subsequent transactions it can remember what you bought and compare the contents of the books. Now if you bought The Silmarillion, Kontakto and The Unfolding of Language over time, it would be possible to suggest that you read Shakespeare's works in their original Klingon once it realizes that you are equally interested in languages as in fictional civilizations.

    I agree with you that the day an algorithm can make value judgements on the artistic merits of any work is still far ahead, but there was just recently a story about this FireFox plug in that sumarizes user reviews. Combine the two and...

    * Didn't we have this conversation before, or is it just a popular .sig? If there was a "-1,Wrong" moderation, you would be told that the info is wrong but you would lose any insight provided by a direct reply of somebody that bothers to correct you AND post the right facts. With Slashdot being a discussion forum, it's on its best interest to actually promote discussion so you most likely will never see that mod option implemented.

  • by smitty_one_each (243267) * on Monday July 07, 2008 @09:33AM (#24082907) Homepage Journal
    What I find more fascinating than your observation is that there appears to be no filtration of noise from the signal.
    Given a relatively free petri dish for information to slosh around in, there seems a shocking lack of condensation of real knowledge out of all the crap.
    Wikipedia seems like a step of sorts in the preferred direction.
  • by Austerity Empowers (669817) on Monday July 07, 2008 @01:41PM (#24086363)

    Sure, but I know their secret, because usually I'm often one of them. The secret, is I've solved the problem already, some day before to research some curiosity I had. It may not have been that exact problem, it may not have even been in that field, but chances are, it was because I had invested work in solving something similar, just not when the audience was watching.

    I've known a lot of really smart people, the only real differentiation was the level of curiosity. The more curious would spend time understanding (by reading, hypothesizing and verifying), and would in turn be better prepared for new/unexpected challenges later. By adulthood curious people who spent the time, resources and mentors to work out their questions can be quite impressive.

    I wholly reject this notion that you can develop some incredible insight and problem solving skills by "drawing in mana" at the bank line. If you're solving a good problem in line at the bank, it's because you've already read and researched the subject before and you just need some time to mull the problem over and use the other side of your brain.

  • by acheron12 (1268924) on Monday July 07, 2008 @02:42PM (#24087175)

    Which open mediums offer peer review?

Little known fact about Middle Earth: The Hobbits had a very sophisticated computer network! It was a Tolkien Ring...