Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Book Reviews Books Media

Programming Collective Intelligence 74

Joe Kauzlarich writes "In 2006, the on-line movie rental store Netflix proposed a $1 million prize to whomever could write a movie recommendation algorithm that offered a ten percent improvement over their own. As of this writing, the intriguingly-named Gravity and Dinosaurs team holds first place by a slim margin of .07 percent over BellKor, their algorithm an 8.82 percent improvement on the Netflix benchmark. So, the question remains, how do they write these so-called recommendation algorithms? A new O'Reilly book gives us a thorough introduction to the basics of this and similar lucrative sciences." Keep reading for the rest of Joe's review.
Programming Collective Intelligence
author Toby Segaran
pages 334
publisher O'Reilly Media Inc.
rating 9/10
reviewer Joe Kauzlarich
ISBN 9780596529321
summary Introduction to data mining algorithms and techniques
Among the chief ideological mandates of the Church of Web 2.0 is that users need not click around to locate information when that information can be brought to the users. This is achieved by leveraging 'collective intelligence,' that is, in terms of recommendations systems, by computationally analyzing statistical patterns of past users to make as-accurate-as-possible guesses about the desires of present users. Amazon, Google and certainly many other organizations, in addition to Netflix, have successfully edged out more traditional competitors on this basis, the latter failing to pay attention to the shopping patterns of users and forcing customers to locate products in a trial and error manner as they would in, say, a Costco. As a further illustration, if I go to the movie shelf at Best Buy, and look under 'R' for Rambo, no one's going to come up to me and say that the Die Hard Trilogy now has a special-edition release on DVD and is on sale. I'd have to accidentally pass the 'D' section and be looking in that direction in order to notice it. Amazon would immediately tell me, without bothering to mention that Gone With The Wind has a new special edition.

Programming Collective Intelligence is far more than a guide to building recommendation systems. Author Toby Segaran is not a commercial product vendor, but a director of software development for a computational biology firm, doing data-mining and algorithm design (so apparently there is more to these 'algorithms' than just their usefulness in recommending movies?). Segaran takes us on a friendly and detailed tour through the field's toolchest, covering the following topics in some depth:
Recommendation Systems
Discovering Groups
Searching and Ranking
Document Filtering
Decision Trees
Price Models
Genetic Programming
... and a lot more

As you can see, the subject matter stretches into the higher levels of mathematics and academia, but Segaran successfully keeps the book intelligible to most software developers and examples are written in the easy-to-follow Python language. Further chapters cover more advanced topics, like optimization techniques and many of the more complex algorithms are deferred to the appendix.

The third chapter of the book, 'Discovering Groups,' deserves some explanation and may enlighten you as to how the book may be of some use in day-to-day software designs. Suppose you have a collection of data that is interrelated by a 'JOIN' in two sets of data. For example, certain customers may spend more time browsing certain subsets of movies. 'Discovering Groups' refers to the computational process of recognizing these patterns and sectioning data into groups. In terms of music or movies, these groups would represent genres. The marketing team may thus become aware that jazz enthusiasts buy more music at sale prices than do listeners of contemporary rock, or that listeners of late-60's jazz also listen to 70's prog, or similar such trends.

Certainly the applications of such tools as Programming Collective Intelligence provides us are broader than my imagination can handle. Insurance companies, airlines and banks are all part of massive industries that rely on precise knowledge of consumer trends and can certainly make use of the data-mining knowledge introduced in this book.

I have no major complaints about the book, particularly because it fills a gap in popular knowledge with no precursor of which I'm aware. Presentation-wise, even though Python is easy to read, pseudo-code is more timeless and even easier to read. You can't cut & paste from a paper book into a Python interpreter anyway. It may 've been more appropriate to use pseudo-code in print and keep the example code on the website (I'm sure it's there anyway).

If you ever find yourself browsing or referencing your algorithms text from college or even seriously studying algorithms for fun or profit, then I would highly recommend this book depending on your background in mathematics and computer science. That is, if you have a strong background in the academic study of related research, then you might look elsewhere, but this book, certainly suitable as an undergraduate text, is probably the best one for relative beginners that is going to be available for a long time.

You can purchase Programming Collective Intelligence from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
This discussion has been archived. No new comments can be posted.

Programming Collective Intelligence

Comments Filter:
  • by Jynx77 ( 974092 ) on Wednesday April 16, 2008 @12:50PM (#23092534)
    I was initially intrigued by reccomendation algorithms. Sadly, it's easy to get them up to a certain point and then almost impossible to make them any better. At least for movies. Netflix rates almost everything between 2.5 to 4 stars. Movies it rates 1 or 2 stars, I wouldn't have considered watching anyways. It never rates anything 5 stars. And for things between 3 and 4 stars, I seem equally as likely to really like a 3 star rated item as I am to not really like a 4 star rated item. So why is Netflix paying a million bucks to change that 3 to a 3.1 or 2.9?
  • by Animats ( 122034 ) on Wednesday April 16, 2008 @12:57PM (#23092618) Homepage

    There are now 35535 entries in the Netflix competition. If they all used roughly the same algorithm, with some randomness in the tuning variables, we'd expect to see results about like what we've seen. I think we're looking at noise here.

    The same phenomenon shows up with mutual funds. Some outperform the market, some don't, but prior year results are not good predictors of future results.

  • "As of this writing" (Score:3, Interesting)

    by Anonymous Coward on Wednesday April 16, 2008 @01:37PM (#23093194)
    When was this written? According to the leaderboard, http://www.netflixprize.com//leaderboard BellKor is leading by 0.26 and has been leading for several months.
  • by WindowlessView ( 703773 ) on Wednesday April 16, 2008 @01:48PM (#23093308)

    I was initially intrigued by reccomendation algorithms.

    Me too. Last time this topic rolled around I took a brief look at the Netflix competition and was disappointed. The star rating system was limited but more importantly there was a remarkable lack of data. Many of the teams that edged out some improvement did so by importing lots of data from other sources - with lots of holes in that process - and trying to discern patterns from that.

    On the whole the exercise seems to be a variation of a couple of decades ago when so many people bought a pc because they planned to be the next stock market wiz by throwing a neural net at basic NYSE daily data. With fancy algorithms and math constructs being all the rage these days (dare I say a bit of a fad?) it behooves us to remember that they are far from the whole story. It helps to have some useful data with which to make connections. No matter how fancy the algorithm you aren't going to harvest rice in a desert.

  • by wintermute42 ( 710554 ) on Wednesday April 16, 2008 @03:46PM (#23094758) Homepage

    The Netflix competition, in principle, is an example of an interesting class of prediction algorithms. There is a lot of good work in academia in this area and on the face of it one might be surprised that no one has beat Netflix yet.

    Unfortunately Netflix restricts the data that can be applied to prediction. You have to use their data which includes only movie title and genre. A much better job could be done if something like the Internet Movie Database were fused with the title selection information. This would allow the algorithm to predict based on actors, directors and detailed genre. For example, I see all movies directed by John Woo. Given that I've seen all of his movies, it's not hard to predict that I'm going to see his next movie.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...