Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Build a Better Netflix, Win a Million Dollars?

Posted by CmdrTaco on Mon Oct 02, 2006 09:56 AM
from the trumps-our-redesign-contest dept.
An anonymous reader writes "In a quest to better movie recommendations, Netflix is opening their database (nytimes, registration and first child required) to users to try to craft a better recommendation technology. The problem is not easy. Says one researcher: 'You're competing with 15 years of really smart people banging away at the problem.'" Recommender systems are really an interesting problem, and that is likely very interesting data to play with.
+ -
story

Related Stories

[+] Netflix Prize Competitor Already Beats Netflix 174 comments
Baldrson writes "Within the first week of the announcement of The Netflix Prize a team has already beaten Netflix's own movie recommendation algorithm. This is pretty impressive given the previously quoted researcher who said: 'You're competing with 15 years of really smart people banging away at the problem.' The team is WXYZConsulting.com apparently registered by a data mining professor named Yi Zhang. Congratulations are in order for Netflix and Prof. Zhang's team who are demonstrating, yet again, the power of prizes to accelerate progress."
[+] Netflix Now Offers Instant Online Movie Streaming 247 comments
An anonymous reader writes "If you're the owner of a video rental store, it may be time to start thinking about getting into a different business, according to ZDNet. Netflix, the online movie rental service, is offering a new feature that allows its subscribers to instantly view movies and TV shows on their PC. From the article: 'Following a one-time, under-60-second installation of a simple browser applet, most subscribers' movie selections will begin playing in their Web browser in as little as 10 to 15 seconds. Movies can be paused and a position bar gives viewers the ability to immediately jump to any point in the movie. In all, the instant watching feature requires only Internet connectivity with a minimum of one megabit per second of bandwidth.' These movies are in addition to the standard DVDs you can have at home, it should be pointed out. You can see a demonstration of the service at the Hacking Netflix blog." Only a small percentage of customers have it available at the moment, but they hope to roll it out to everyone within six months.
[+] Developers: Psychologist Beating Math Nerds in Race to Netflix Prize 205 comments
s1d writes "An almost-anonymous British psychologist named Gavin Potter has suddenly risen to the top of the Netflix prize charts. With his very first attempt, he got a score which took the BellKor team seven months to reach. Currently at a score of 8.07, he has only five teams ahead of him now in the race for the ultimate Netflix algorithm. 'Potter says his anonymity is mostly accidental. He started that way and didn't come out into the open until after Wired found him. "I guess I didn't think it was worth putting up a link until I had got somewhere," he says, adding that he'd been seriously posting under the name of his venture capital and consulting firm, Mathematical Capital, for two months before launching "Just a guy." When he started competing, he posted to his blog: "Decided to take the Netflix Prize seriously. Looks kind of fun. Not sure where I will get to as I am not an academic or a mathematician. However, being an unemployed psychologist I do have a bit of time."'"
[+] Science: Interest Still High In the Netflix Algorithm Competition 77 comments
circletimessquare brings us an update to the status of the million-dollar Netflix competition to develop a better algorithm for movie recommendations. We've discussed aspects of the competition since it started two years ago, but the New York Times has a lengthy overview of where it stands now. "The Netflix competition is still going strong, with a vibrant, competitive roster of some 30,000 programmers around the globe hard at work trying to win the prize. The Times provides a look at some of the more obsessive searchers, such as Len Bertoni, a semi-retired computer scientist near Pittsburgh who logs 20 hours a week on the problem, oftentimes with the help of his children. There's also Martin Chabbert in Montreal: 'After the kids are asleep and I've packed the lunches for school, I come down at 9 in the evening and work until 11 or 12.' The article gets into the history of the search algorithm Netflix currently uses, and explores the hot commodity called 'singular value decomposition' that serves as the basis for most of the algorithms in competition."
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by garcia (6573) on Monday October 02 2006, @10:00AM (#16276999) Homepage
    If no one wins within a year, Netflix will award $50,000 to whoever makes the most progress above a 1 percent improvement, and will award the same amount each year until someone wins the grand prize.

    But if someone does win within a year they will still have the ability to use others' code, free of charge, as part of their product.

    The article doesn't say but how will you know if your code is making choices better than their existing system? I wouldn't be submitting my code unless I was sure I was going to win. Then again I'm not a gambler or a coder ;)
    • Re: (Score:3, Informative)

      by curunir (98273) *
      From the rules, it looks like your submission isn't code, it's a processed dataset. It's only in the terms for winning are that you have to explain your method to them (so that they don't get bitten by a horribly obfuscated entry) and have to non-exclusively license your submission to Netflix (it looks like you retain copyright and can license it to others if you so choose).

      But that seems pretty reasonable...you only have to hand over your code if you win, otherwise you're only submitting the results of yo
    • Re: (Score:3, Informative)

      If you read NetFlix' prize site [netflixprize.com], you'll find that they give clear cut statistical requirements for winning that are well defined. It's actually quite impressive the detail into which they go; it's clear that they want real engineers on this, and that they're willing to get seriously specific in order to make sure people know what's what.
        • Re: (Score:3, Insightful)

          by Sparr0 (451780)
          Because each option you add cuts the number of responses in half. A vast majority of users use the one-rating system. Almost no one would fill out a 20-question survey about every movie they watch.
  • I officially announce I will be entering BigAtticHouse's Vectorspace Database into the melee. At least to see what might come of it.
  • by jimstapleton (999106) on Monday October 02 2006, @10:02AM (#16277015) Journal
    Says one researcher: "You're competing with 15 years of really smart people banging away at the problem."


    So, the professionals have been working at it for a long time. Is it safe to assume some teenage to early college hacker will find a success within two weeks.
  • Simple (Score:5, Funny)

    by Anonymous Coward on Monday October 02 2006, @10:02AM (#16277023)
    if(user.getGender()==Person.MALE)
    recomendation=MovieGenre.PORN;
    else
    recomendation=MovieGenre.CHICKFLICK;

    And of course, slashdot must have sensed my post as my image word is "pervert"
    • Re:Simple (Score:5, Funny)

      by kelzer (83087) on Monday October 02 2006, @11:07AM (#16278055) Homepage

      Old Version:

      if(user.getGender()==Person.MALE)
      recomendation=MovieGenre.PORN;
      else
      recomendation=MovieGenre.CHICKFLICK;

      New Version, sure to win the million bucks:

      if(user.getGender()==Person.MALE && user.getOrientation()==Person.STRAIGHT)
      recomendation=MovieGenre.PORN;
      else
      recomendation=MovieGenre.CHICKFLICK;

  • ..except, instead of making it open to the community (which is not a bad idea, I must say) I thought of having Google do it. This is, perhaps, IMHO, a much better idea. Now, what we really need is a Movie Genome Project, much like the Music Genome project that lead to Pandora.
    • by Boone^ (151057)
      Pandora's recommendations are really spot on. I rarely have to give one the thumbs down.
  • go see porn sites (Score:3, Interesting)

    by LiquidCoooled (634315) on Monday October 02 2006, @10:03AM (#16277035) Homepage Journal
    They have decent tech for building similar/recommended alternative pages.
    Especially the newer blogish type pages where theres a gallery and a small selection underneath.

    Not that I would know of course.
    • What kind of horrible person are you, to make a statement like this and not link to an example of the tech in action... you know, for illustrative purposes.
  • Suggestion (Score:5, Insightful)

    by 99BottlesOfBeerInMyF (813746) on Monday October 02 2006, @10:03AM (#16277045)
    As a NetFlix user I have one suggestion for their recommendation system that can make it much better. Make it aware of the connection between series. That is to say, If you rent season 1 of something, suggest season 2, not season 4 (even if season 4 has better review ratings). If I mark season 1 of something as "not interested" instead of giving it a user rating, don't suggest every other season of that same show at the top of my recommendations. I mean how many times do I have to tell you I don't want to see any season of "Friends" ever, even if you pay me?
      • Just a nitpick... If I mark, say, season 1 of series X as Not Interested, maybe it means I already own it and have no need to rent it, but still might want to see season 2. Of course, if I marked it as 1-star (Which I assume means "Utter crap"), then as you said, it should shut the hell up about the rest of the series.

        I disagree. If you have it, you presumably have watched it and should give it a rating. You do have interest in it, or you would not have bought it. So things you mark as 1 star should prob

        • Re: (Score:2, Funny)

          by 955301 (209856)
          I didn't like Star Wars:Episode I very much. Episode 4 was great though.
          • I didn't like Star Wars:Episode I very much. Episode 4 was great though.

            Right, so you might mark episode I, (technically number 4 by release order and prequels generally suck so I think this should be the ordering mechanism) as 2 stars or even 1. You wouldn't mark it as not interested, since from your comment you were interested enough to watch it. If, however, you were so disinterested in episode I so as to mark it as not interested (meaning you did not watch it and don't ever want to) then the chances

        • Re: (Score:3, Insightful)

          by Xentor (600436)
          Hmm, I see your point.

          I was about to mention that I mark things as Not Interested when I own them, to avoid being reccommended the rest (Usually because I prefer to buy series I like, and rent actual movies), but then I realized that fits into what you said perfectly.

          Point conceded.
          • by truthsearch (249536) on Monday October 02 2006, @10:56AM (#16277861) Homepage Journal
            Point conceded.

            For the record, this is a turning point in slashdot history. I'll forever remember where I was when I first saw those words in a slashdot comment. (Which of course is at work, sitting through a boring meeting.)
        • by rthille (8526)
          Right, the way I think of it is that I had to be interested enough to watch the movie to see that it was a 1-star. 'Not interested' means that just based on what I heard about it, I knew I didn't want to see it. I could be a fantastically done movie, say 'Remains of the Day' or something, but it's not something I'm interested in. It's more indicitative of the kind of movies I'm interested in than a 1-star rating. After all, it could be that the premise/plot-line of movie that I wanted to see, but the wr
  • Privacy issues? (Score:3, Interesting)

    by Vultan (468899) on Monday October 02 2006, @10:05AM (#16277063)
    How will they handle privacy issues? Don't the same issues appear here that appeared with the AOL data this summer? With enough ratings you can narrow down to a specific person, and then find out about all the pr0n that this person has been getting as well.
    • The AOL search ratings were different because the searches could include things like cities, proper names, phone numbers, and other such pieces of identifiable information. The movie ratings have none of that. You might be able to dig through the list and find the person who rated "Goat donkey pr0n" highly and laugh at them, but there's no information there that'll tell you who it was.
    • Re: (Score:3, Informative)

      by Cruise_WD (410599)
      From http://www.netflixprize.com/ [netflixprize.com] :

      To prevent certain inferences being drawn about the Netflix customer base, some of the rating data for some customers in the training and qualifying sets have been deliberately perturbed in one or more of the following ways: deleting ratings; inserting alternative ratings and dates; and modifying rating dates.

      Plus all the usual replacing of IDs and such you'd expect. Looks like they're trying to avoid a repeat of the AOL debacle at least.
    • Re: (Score:3, Insightful)

      by Shihar (153932)
      The AOL search was an issue because you could look at search requests for places and figure out where someone was very quickly. If I use Google to plot the rout to the nearest IKEA or porn store, it is a pretty simple matter to trace back who someone is. Short of some serious stupidity, I couldn't imagine Netflix giving away any valuable information in identity theft. A list of movies is highly unlikely to lead to anyone's address or identity.
  • RSSTimes (Score:5, Insightful)

    by eldavojohn (898314) * <my/.username@@@gmail.com> on Monday October 02 2006, @10:05AM (#16277065) Homepage Journal
    In a quest to better movie recommendations, Netflix is opening their database (nytimes, registration and first child required)...
    Not quite, you can find it here [nytimes.com] (or the minimalist version [nytimes.com] for anyone sick of ads).

    Why is it that the Slashdot editors are just too damn lazy to look up the RSS feed links to these pages?

    The problem is not easy. Says one researcher: "You're competing with 15 years of really smart people banging away at the problem."
    While this may be true, I wouldn't let it deter you. Collaborative filtering is a field that is far from dead. The interesting thing about collaborative filtering is that on the surface, it seems pretty straight forward but once you dig into the mechanics of it, there is actually a lot of playing you can do. Ironically, the way you display the data to the end user is often what determines how well of a job you did.

    Allow me to take a naïve approach at this topic and say we generate a movie index of each person. I would have A Clockwork Orange and Koyaanisqatsi at 5 while The Ring 2 would be at the very low end. My friend might have similar movies. If he has A Clockwork Orange up there, you might be able to compute a Euclidean distance between us. However, this approach falls apart because no one has seen Koyaanisqatsi and of the 20 movies I've ranked highly, they are hard to find.

    You don't have to stop there, however. You could also database the movies I marked as "uninterested" or the movies that were presented to me but I didn't vote on. Like if I had seen the offer to mark J-Lo's latest flop but didn't, wouldn't that tell you something about me?

    So these caveats present themselves all along the way and, at the end computation, you have many different strategies for this data. For example, while you might not be able to link my friend an I through movies, how far apart are we on a nod network? What I mean is, if you plotted every user in their own dimension depending on the movies they ranked and attempted to compute as good a distance as possible between all users, how far would I be away from my friend by hopping on these nodes? There's a lot of information to be gleaned in this sort of friend-of-a-friend collaborative approach.

    Now you need to present this information to the user. Do you just up and recommend him a movie? Do you take Amazon's approach and say "Other people did this -- so should you."? Or do you give them some sort of three dimensional flash plotting of you versus the people nearest to you? Do you allow the user to contact those closest to them? Those farthest away?

    My point is that while 15 years of research has been done, it doesn't mean there's been 15 years of testing and implementation which, in the end of creating products, is where most of the importance lies.
    • You can trick the NY Times personally but you can't do it from a front page of a widely popular commercial site.

      I think it is the reason.

      Slashdot can't send thousands of users with a fake referrer to NY Times. That link you provided is for people using RSS readers and subscribed to NY Times RSS feed.

      I think they should talk with NY Times web team to allow slashdot readers with referrer=slashdot without needing login. They can arrange it for sure, this isn't a "no name" site.

      It would be nice for NY Times for
      • Re: (Score:3, Informative)

        by cei (107343)
        Correction: No one has stayed awake through Koyannisqatsi.

        (FWIW, Powaqqatsi was a better flick, IMHO)
  • Link everyone's credit report into their movie preferences; I'll bet your complete credit history would give them a 5-10% better chance of picking your movies. But seriously...why isn't this just a regression exercise?
  • by Zaphod-AVA (471116) on Monday October 02 2006, @10:14AM (#16277179)
    The problem with recommendation systems is that they use too little information to catagorize their subject.

    What they need to do is copy the methods of the Music Genome Project (www.pandora.com), and list a larger set of attributes for the films. This way it can recommend films by checking many more characteristics, such as director, tone, writer, or subject.
    • by vontrotsky (667853) on Monday October 02 2006, @10:33AM (#16277447)
      The problem with recommendation systems is that they use too little information to catagorize their subject.

      What they need to do is copy the methods of the Music Genome Project (www.pandora.com), and list a larger set of attributes for the films. This way it can recommend films by checking many more characteristics, such as director, tone, writer, or subject.


      In this contest, you run your own code and submit the results to NetFlix to be scored. This means that you can use any other data (e.g. A Movie Genome projct) you can compile to enhance your rankings. Netflix apparently specifically designed the contest to allow this.
    • Yes, they need more characteristics of movies.

      But they also need ways to identify the characteristics of people's choices. Right now, one NetFlix account can be used by a whole family. So instead of getting 1 person's characteristic choices (teenage emo goth girl), you get those combined with the other family members (Dad's action films, Mom's chick flicks, Jr's teenage sex comedies).

      Eventually, you'd end up with a movie genome cross indexed to a sub-culture.
  • only a million? (Score:3, Interesting)

    by StandardDeviant (122674) on Monday October 02 2006, @10:17AM (#16277219) Homepage Journal
    If you can beat "15 years of really smart people", then your work product probably has more than a million dollars in value if you were to license it out to places like Amazon, eBay, Netflix, etc. Even a 1% improvement in revenues from a 1% improvement in recommendation accuracy is probably worth more than 50K, if sold to the major e-tailers. On the other hand, if you just want an interesting problem to screw around with in your spare time and don't want to go through the bother of forming a company in order to monetize that work, this is a pretty cool opportunity.
    • Re: (Score:2, Insightful)

      To win and take home either prize, your qualifying submissions must have the largest accuracy improvement verified by the Contest judges, you must share your method with (and non-exclusively license it to) Netflix, and you must describe to the world how you did it and why it works.

      So, you could take the money from Netflix, use it to start your business, then license it to the other players, too.
  • by Jimmy King (828214) on Monday October 02 2006, @10:17AM (#16277227) Homepage Journal
    I wish they'd fix the problems in the logic determining what they actually send me from my queue before fixing problems with what they recommend to me. If I've got season 1 of a show in my queue prior to season 2, don't start sending me season 2 because some disc of season 1 is unavailable (which has happened to me multiple with both netflix and blockbuster online), send me something else completely. They've got the tech to keep one season of a tv show in order, it can't possibly be that difficult to extend that to keeping multiple seasons of a show in order.

    On top of that, don't show me that it's available in my queue but send me something else instead. While I haven't asked netflix about this, I have asked blockbuster online, and I imagine they are both doing the same thing. The disc is "available" just not at the warehouse used to ship to me personally. Instead of basing one piece of information off of total stock and one off of local stock, base them both on the stock at the warehouse shipping to me.
    • Re: (Score:3, Funny)

      by nine-times (778537)
      They've got the tech to keep one season of a tv show in order, it can't possibly be that difficult to extend that to keeping multiple seasons of a show in order.

      I thought Netflix users just ripped the movies to their hard drive for later viewing anyway?

  • by dduardo (592868) on Monday October 02 2006, @10:23AM (#16277301)
    If Netflix doesn't have the movie in stock it should burn the movie on demand.
  • by jfengel (409917) on Monday October 02 2006, @10:24AM (#16277319) Homepage Journal
    Any marketer will tell you that what people tell you they want and what people actually want are very different things. Even if people answer honestly, the data you gather is often unreliable: people simply don't have as good a handle on what they want as they think they do.

    Not that marketers have a better handle, but simply that people will swear up and down that they would buy a peanut-butter-filled hot dog, that they loved the one they tried, and then don't actually buy any.

    Don't believe me? Go see Snakes on a Plane. Nobody else did. (Sure, $33 million seems like a lot, but that's chump change for a major studio release these days.)

    The best improvements will come from insights gained between the lines. You may have rated The English Patient eleventeen stars, but if your next seven rentals were all episodes of The Girls Next Door, which you only rated 3 stars, it certainly looks like you want more Hugh Hefner and less Ralph Fiennes.

    The best data is the data that the subject doesn't realize he's giving you. Once you start imposing conscious choice on the ratings, you get only what they say they like, not what they really like.
  • by OakDragon (885217) on Monday October 02 2006, @10:29AM (#16277383) Journal

    I stopped rating movies after I found that I got recommended a lot of crap. Say I rent a slasher movie that, for its genre, is artfully done. I rate it high. Now I have recommendations for a bunch of worthless, straight-to-video stuff that I really don't want to see.

    This is the real nut to crack, IMO. How do come up with an algorithm that rates 'quality,' an elusive concept that means different things to different people?

    Not to mention, I'm fickle.

  • by BMonger (68213) on Monday October 02 2006, @10:31AM (#16277413)
    I personally weigh movies on a number of different factors. I might give 3 stars to a movie because it has 4 of my favorite actors in it even if I didn't care for the plot. I might give 3 stars to a different movie with horrible acting but interesting camera angles (From Dusk Til Dawn 2). I tend to average out my ratings dependent on many things a movie has to offer.

    The problem is is that that is my rating system. It works for me. But it does little good to anybody else because they are rating based purely on something else.

    I think they need to implement the ability to rate more aspects of the movie. I'm sure some people out there rate the movie poorly if their disc is scratched or the transfer quality is poor even. A simple 1 to 5 system doesn't cut it. People rate things that aren't "Was the (romance) plot good?", "Do you like this director?", "Do you like these actors?". People rate things that aren't on the box.
  • And some macaroni pieces.
  • SELECT TOP 10 title
    FROM tblMovies as m, tblAdvertisers as a
    WHERE m.studio = a.studio
    ORDER BY a.adRevenue DESC

    I win.
  • by Yogs (592322) on Monday October 02 2006, @11:08AM (#16278065)
    Disclaimer: I subscribe to the same sort of service, except through blockbuster... maybe Netflix does have this feature. My wife and I share a queue... I imagine many, many of these queues are shared. We have very, very different tastes in movies. Instead of getting recommendations that suit us both (which is next to impossible), the recommendations just get very, very confused. If I could just keep my and her recommendations from tangling, we would both have an easier time.
    • Disclaimer: I subscribe to the same sort of service, except through blockbuster... maybe Netflix does have this feature. My wife and I share a queue... I imagine many, many of these queues are shared. We have very, very different tastes in movies. Instead of getting recommendations that suit us both (which is next to impossible), the recommendations just get very, very confused. If I could just keep my and her recommendations from tangling, we would both have an easier time.

      This problem is already solved.

      Wi
  • Common data (Score:3, Informative)

    by Mike Hicks (244) * <hick0088@tc.umn.edu> on Monday October 02 2006, @11:18AM (#16278217) Homepage Journal
    I see that the NYT article linked to just about everything except MovieLens [umn.edu]. I've used the site, and folks might like to try it out. It looks simple, but it's fairly nice, having some of those fun dynamic pages that are all the rage these days. One neat thing in comparison to Netflix is that it will give a projected star rating for you, rather than simply saying "Recommended".

    Of course, I'm biased since I had John Riedl as a professor in a few easy classes. I think he tried to spin off this research as a new company, but I'm not sure if it ever got off the ground.

    One thing I'd really like to see has little to do with the quality of ratings, though. I'd like to be able to keep a common database of my ratings across multiple sites. At the moment, I've rated a number of movies at Netflix, MovieLens, and IMDb, but they aren't entirely consistent. Unfortunately, two of the sites use a ten-point system (IMDb has a ten-point scale, MovieLens goes up to 5 stars, but in half-star increments), while the other uses a five-point one (maybe six if you say "Not Interested"..).

    Well, I'll have to poke around a bit with this stuff. I wouldn't be able to do much, though, since my level of knowledge in this arena is very limited...