Forgot your password?
typodupeerror
This discussion has been archived. No new comments can be posted.

Bayesian Filters Predict Sundance

Comments Filter:
  • by BTO (604614) on Tuesday January 24, 2006 @11:02AM (#14548254) Homepage
    Gay = +100%
  • The Winner! (Score:2, Funny)

    by Anonymous Coward
    Tortured with health problems? You're one click away from healthy life! An amazing variety of licensed meds at one big store! Click the link and make your first step to constant relief!
  • by Big Nothing (229456) <big.nothing@bigger.com> on Tuesday January 24, 2006 @11:03AM (#14548268)
    So, a company claims that their product (or in this case; algorithm) is good?

    STOP THE PRESS!

    • by digitaldc (879047) on Tuesday January 24, 2006 @11:31AM (#14548462)
      So, a company claims that their product (or in this case; algorithm) is good?

      Well according to their algorithm, certain words such as Africa, America, American, beautiful, black, best, emotional, fascinating, great, inspired, lake, new, riveting, Sundance, sexy, story, subtitles, truth, vision, world should never be used.

      My 'kiss of death' film would be:

      "The Beautiful Lake: An African Vision of the World"

      Description: An emotional story of truth about a man from Africa who comes to America to find himself. Being a skilled carpenter, he builds a new home which is set on a beautiful lake. As we hear anectdotes of his vision of truth, a fascinating story emerges. We also learn about his riveting and inspired adventure to his new home, and we see how it impacts his once black view of the world. A great film for any Sundance enthusiast! (with sexy subtitles)

      It is almost guaranteed to bomb, before anyone even sees it!
      • From the article:

        Golden: academic, accomplished, bedroom, complex, dialogue, dream, death, focus, girl, human, high, journey, love, mother, narrative, romance, relationship, superbly, sex, ultimately.

        Therefore, coming soon to a theater near you:

        The Contortionist

        This academic work involves an accomplished contortionist, her bedroom, and many complex, dialogue-strewn dreams that focus on girl-girl scenes with animals as well as humans. Everyone is high on life in this journey through love, motherho

      • I've gone ahead and compiled a similar list with respect to /. posts: Golden: "insensitive clod," "tinfoil hat," "Soviet Russia," "overlords," and "M$" Kiss of Death: "the honorable Jack Thompson," "the RIAA acted appropriately," etc.
      • It is almost guaranteed to bomb, before anyone even sees it!

        That's what they said about "Springtime for Hitler"...
      • Challenge Problem: Write the most kiss-of-death accurate review you can for a film that actually won a Sundance award. Bonus points for avoiding all the words on the golden list.

    • Re:Shocking news! (Score:4, Insightful)

      by goombah99 (560566) on Tuesday January 24, 2006 @11:41AM (#14548532)
      Yeah, I get so tired of people publishing probabilty success rates without stating what the baseline is.

      For example, I could announce I have an 85% accurate weather prediction system. it's this: predict the sun will shine most of the day. nowhere does it rain all day more than 15% of the days. so my predictor is 85% accurate.

      When you claim an accuracy you need to also give the null model accuracy or it's gibberish.
      • Re:Shocking news! (Score:4, Informative)

        by sunya (101612) on Tuesday January 24, 2006 @11:55AM (#14548640) Homepage
        nowhere does it rain all day more than 15% of the days.

        Time to brush up on geography. It rains pretty much all the time in Cherrapunji [wikipedia.org].

      • For example, I could announce I have an 85% accurate weather prediction system. it's this: predict the sun will shine most of the day. nowhere does it rain all day more than 15% of the days. so my predictor is 85% accurate.

        Actually you can claim that the sun will shine all day long. It may shine on the cloud tops instead of the ground, but it will shine.

        That and don't move to Seattle.
      • Actually, most weather systems take 4 to 5 days to move through a region, so if you simply predict "Tomorrow will be about the same as today", you will be right 80% of the time.

      • Not to mention the 10 years they did the analysis on doesn't seem like a long enough time to draw any reliable conclusions.

        And keep in mind that 67.8% of all statistics are made up on the spot. ;)

        I'll bet those guys would describe a divining rod as a scientific means to find water.
  • Fuck films... (Score:3, Insightful)

    by Caspian (99221) on Tuesday January 24, 2006 @11:04AM (#14548274)
    ...let's see it predict STOCK WINNERS.
    • No kiddng. Predict something that noone else can predict. Predict whether Iran will create weapons with their nuclear power facilities. Predict how long it will be before the rain forests are completely wiped out. Predict when the ozone layer will be totally depleted. Predict when Microsoft will ship a secure version of Windows. :)
    • Why stock? People bet on all sorts of things. I would be shocked if you couldn't bet on the Sundance winners. So the real question is, if they had used their predictions as a betting strategy, what would their return on investment be?

      That would give a good indicator of how much they're simply predicting the favorites (not much return) or accurately predicting surprises.
    • Re:Fuck films... (Score:5, Informative)

      by DeveloperAdvantage (923539) on Tuesday January 24, 2006 @11:28AM (#14548432) Homepage
      There are many examples of using statistics and artificial intelligence in finance (go google), including some applications to predict stock prices. Even a decade ago, books like "Neural Networks in Finance and Investing" and "Artificial Intelligence in the Capital Markets" were already published, along with hordes of books on statistics in finance (think about what Quants do).

      Of course, I don't think we can yet predict stock prices with the same 81% accuracy as in this article. And, if anyone could, they would be wise to keep it to themselves.
      • Of course, I don't think we can yet predict stock prices with the same 81% accuracy as in this article. And, if anyone could, they would be wise to keep it to themselves.


        Surely if someone worked this out they would then make money by flogging the method to other people through infomercials, public speaking and self-help books wouldnt they ?!

        I mean, thats what everyone else does that figures out how to get rich. you dont actually do it, you teach everyone else how to do it and make money out of it that
      • Of course, I don't think we can yet predict stock prices with the same 81% accuracy as in this article. And, if anyone could, they would be wise to keep it to themselves.

        quite the opposite.

        If I advertised a program that predicted stock pices with 81% accuracy, a very large number of people would buy/sell based on it's results, making it self predicting. at least in the short term.

      • If someone were to use AI to predict the stock market, and would invest on it based on those predictions, they would be very successful initially, but would also change the behaviour of the same market up until it would render the model unusable.

        I suspect this has happened several times.
    • Sorry, I don't want to have to drill a hole in my head.
    • ** REPORT RESULTS: Bayesian Query = 'STOCK WINNERS' **

      George W. Bush
      Dick Cheney
      Darl McBride
    • ...let's see it predict STOCK WINNERS.

      Oh that is easy. Just ask Google.

      However, you may not be able to afford more than one stock.
    • Hey, that sounds like a good idea for a film! I bet people would like to watch other people fucking on the screen! I'm a-gonna win the next Sundance!
  • Bring a decibel meter and a stopwatch and find the films with the loudest and longest:

    1) Laughter
    2) Applause
    3) Standing Ovations afterward


    This simple method will give you a good idea of who will be the winners.
  • Unimpressed (Score:2, Insightful)

    by Big Nothing (229456)
    "Our engineers were thinking that determining whether a movie is good or bad could be similar to determining whether e-mail is spam or not," said Unspam Chief Executive Prince, 31, who loves the festival and uses it as a recruiting tool. "We had the last 10 years of the festival's film guides, which are like inputs, and then a bunch of outputs, like how many people saw a film, did it win anything at Sundance, did it have commercial success. If you could figure out the pattern between the inputs and the outp
    • Re:Unimpressed (Score:2, Insightful)

      by Raistlin77 (754120)
      That depends. If it predicts and filters 84% of all spam, then it can't be anything but good. However, if 84% of what it predicts and filters is indeed spam, then 16% was not and was filtered needlessly - that's bad.
    • by garcia (6573)
      I'm not a Spam guru so please excuse me if I'm wrong, but isn't 81% a horrible result? Perhaps not for movie prediction but in Spam filtering?

      Perhaps they should use spam filtering for weather reporting. That way, the "dart throwing monkies" will end up with more accurate results than they do now. "There's a 30% chance of rain." I have always wondered if a passing grade in meteorologist college coursework was 30% or better.
      • Argh, I hate feeding the trolls... Anyway, meteorological data is very difficult to predict. Simple storms and weather patterns can easily be altered within a couple days time. The amount of influences on weather are enormous, and simple ideas such as the butterfly effect (not the movie) can create huge effects within a months time. Yes, the weather channel doesn't have the best success rate, but considering the number of molecules involved in weather fronts, our computers aren't exactly suited to the j
      • I'm unimpressed with your own weather-watching skills, not to mention your math. Have you ever taken a lesson in probability? 30% is about 1/3. So a 30% chance of rain means that about 2/3 of the time, it isn't going to rain! You might even get bright, clear skies.
    • Re:Unimpressed (Score:3, Informative)

      by Vann_v2 (213760)
      The problem is that saying it is "81% successful" is meaningless. Typically one would use a two-fold measure of success for these sorts of application: precision and recall. In the case of spam, the precision of your algorithm would be the number of correctly marked emails over the total number of emails marked, and the recall would be the number of correctly marked emails over the number of emails that are actually spam.

      In terms of search this is perhaps more clear, so consider Google. You issue Google
  • Filter Mods (Score:5, Funny)

    by Anonymous Coward on Tuesday January 24, 2006 @11:12AM (#14548323)
    Angsty +2
    Depressing +2
    Happy or Inspirational -1
    Featuring charaters of a marginalized societal group +10
    Featuring charaters of a majority societal group -10
    Making those majority characters feel guilty +20
    Political Agenda +10
    Social Agenda +10
    Leftist Social & Political Agenda +50
    Non-acting acting +3
    Use of black and white film +1
    Sense of Humor -5
    Comedy film -100
    Intellectual +1
    Pseudo-intellectual +30
    Director dresses in all black +4
    Actors dress in all black +10
    Actors dress in all black and do interpretive dance to Phillip Glass music while speaking German backwards +20
    Audience participates and dances with the actors in above scenario +1000
    Would actually generate box office revenue -100
    Good movie that would appeal to more than a niche audience -20
  • by Anonymous Coward
    Prince and his crew came up with two lists: words that "make you golden" or are "the kiss of death."

    Kiss of death: Africa, America, American, beautiful, black, ...

    Prince went on to comment they were suprised to come up with the first racist bayesian filter in their career.
  • Fit your stereotype? (Score:5, Interesting)

    by 246o1 (914193) on Tuesday January 24, 2006 @11:19AM (#14548374)
    From TFA (words in the description that help or hurt it): "Golden: academic, accomplished, bedroom, complex, dialogue, dream, death, focus, girl, human, high, journey, love, mother, narrative, romance, relationship, superbly, sex, ultimately. Kiss of death: Africa, America, American, beautiful, black, best, emotional, fascinating, great, inspired, lake, new, riveting, Sundance, sexy, story, subtitles, truth, vision, world." So, they want complex, academic films about girl-mother relationships with a strong narrative of romance and sex. Nothing about beautiful black people in Africa or America with any sort of interest in visions, truth, or the world, especially if said black people are sexy and live near a great, nay, the best lake.
    • Oh get off it. You have a list of words completely taken out of context and you're turning it into some "everyone at Sundance is racist" nonsense.

      The real difference between the two lists is that the first list is more concrete and the second is more abstract. I'm not surprised to hear that fascinating, beautiful and emotional are in the list. Those words are the hackneyed descriptions of every art house critics favorite film. People are probably sick of hearing them and ignore them like a David Manning r
    • I did once analyse titles of British TV programmes [membled.com] to find keywords that make me more or less likely to want to watch a show.
    • ...especially if said black people are sexy and live near a great, nay, the best lake.

      People are trying to make a case that some predictor words for winning a Sundance award (America, beautiful, Africa, black, ...etc.) imply that films about black Africans or beautiful America win awards.

      What everyone is missing is that these terms are NOT RELATED in the Baysean filter - they are just words. It is the human brain that is incorrectly associating 'America' with 'beautiful' or 'black' with 'Africa' or 'black
  • They picked "Sombodies" as a successful drama. Drama? Not really; definitely a flat out comedy. Good stuff. Sundance is cool, but their ticketing system is getting progressively worse. This year I paid $5 for the chance to pick movies in a ½ hour slot 3 days after the box office opened. Didn't get even one of my first choices; essentially got what was left.
  • by bhima (46039) <Bhima.Pandava@NOspaM.gmail.com> on Tuesday January 24, 2006 @11:27AM (#14548429) Journal
    I've been thinking about this for a while...

    Someone should develop a client side Bayesian Filter / Moderation system for Slashdot.

    Think about it...

    A sizable portion of people around here are not consistantly assholes so it doesn't really make sense to add them to a "foe" list.
    Frequently things are in strange topics so it doesn't make sense to ignore whole topics.
    Not all new members are trolls so modding all new members down doesn't make sense either.
    And the current moderation system is subjected to other people's current peeves and political leanings.

    And please don't tell me to do it, I'm an embedded developer not a web developer... I have no idea where to even begin with it.
    • Yeah- I've wanted a site like digg/slashdot that worked like this for a while- users can vote on anything, and then anything you haven't voted on is given a score that is calculated according to how the people who most consistently vote in agreement with you score the story/comment. The site is custom-tailored to what you want- People who like stupid crap will mod up stupid crap and get more stupid crap because other people who like stupid crap will have modded up the same stupid crap and more, while people
    • by Billosaur (927319) * <wgrother@optRABB ... minus herbivore> on Tuesday January 24, 2006 @11:49AM (#14548599) Journal
      And the current moderation system is subjected to other people's current peeves and political leanings.

      Which is what makes it so much fun!

      Seriously, its wonderful that Bayesian filters are useful, but why put blinders on? Slashdot would simply cease to be interesting if you could will away anything you didn't like. Intelligent discourse requires an airing of all sides of an issue and theoretically this can lead to consensus building, if the best parts of all ideas are combined. Of course you're going to get people with very little to say, or very little between the ears, muddying the waters -- the challenge is to take the disparate elements and meld them to something coherent. Superfluous elements will be winnowed out and hopefully the end product is something most people can agree on.

      Of course this is Slashdot, the Internet equivalent of a bar brawl. The rough-and-tumble of this kind of fourm is what keeps it interesting and more importantly, as much as we are infuriated by those who don't agree with us, makes us think.

      • I don't think a Bayesian moderation system would necessarily prevent you from seeing any opposing viewpoints. I often moderate up comments the I disagree with if the submitter has an interesting point, or if it is necessary for some intelligent reply to make sense. I often wish /. had a view where I could see all 3+ comments as well as all their parents. That way, if someone makes an insightful reply I can read what they were replying to without having to open the parent link in another tab or browse at -1.
      • by bhima (46039)
        I think you are looking at it the wrong way:

        Using the current mod system on Slashdot you are using someone else's blinders.
        Using the Friend / Foe system you are using a static subset.

        Less than 20% of the comments around here are either meaningful, thought provoking, or relevant... I want to see those that truly are interesting and between the current mod system and the outright volume I can't in the amount of time I'm willing to spend reading Slashdot.

        Slashdot is not like the Internet equivalent of a bar br
        • Using the current mod system on Slashdot you are using someone else's blinders.
          Using the Friend / Foe system you are using a static subset.

          True, though the system is flexible enough that I'm not required to mod categories and/or people up or down. I've determined over time that adding/subtracting points based on relationships here is a double-edged sword. I often actually want to see what people who don't like me are saying, to get some sense of why and to challenge them on a fundamental level, if I can

      • I was under the impression that bar brawls were physical and considering the average /. reader, we would lose. Perhaps this should be the Internet equivalent of an angry debate.
    • And please don't tell me to do it, I'm an embedded developer not a web developer... I have no idea where to even begin with it.

      But CGIs are embedded scripts! ;)
  • BUY Ch 3ap \/iag r a 0n1i ne - n0 prescr1pti0n r3quir3d!!!!
  • A better thing (Score:2, Interesting)

    by tessonec (620168)
    This [slashdot.org] was a far better (and open source) applecation of Bayesian filters
  • Does it portray women as victims? +3

    Does it star a beautiful actress with ugly makeup +1

    Does it deal with weighty issues? +1

    Is it science fiction? -3

    Does it show how minority groups are oppressed? +2

    Does it star people from a minority group who haven't received Oscars for a few years? +2

    Did you cry? +2

    Was it made by an action movie director turned serious? +2

    Does it deal with weighty issues albeit by stringing together a sequence of time-worn cliches? +2

    Is it an action movie made by a serious director? -2

    Is
  • by xxxJonBoyxxx (565205) on Tuesday January 24, 2006 @11:35AM (#14548490)
    I'm not sure what kind of crack-simulator Slashdot put into its related stories selector, but some kind of Bayesian filter to figure out the relationship might be helpful.

    For example...

    Ask Slashdot: State of WLAN Support on Linux?
    Related...
        IT: Microsoft Spending $120M To Look Smaller
        Games: Defying Review Aggregation
        Games: Competitive Gaming Hits the Mainstream

    WTF?
  • Although it's possible they ommitted data when when creating their model in order that it could be used later in testing (i didn't see in the article whether this was the case). It is quite possible that the 81% result was based on predicting results that were used in building the model (the article says they used historiacl data to build the model and then tried to predict historical results to test the model) - this would totally negate the results as meaningless. Lets see what it predicts and compare it
  • As well as POPFile's multi-category email filtering, I sell a commercial component [extravalent.com] that does multi-category Bayesian filtering for companies to embed in their own software. Bayesian and other statistical techniques are going to be cropping up everywhere there's text to analyze.

    John.
  • Wow, document classification with Bayes nets. How fresh is that??! I wonder how many more of these we'll see? I liked this version better: http://www.pitchformula.com/ [pitchformula.com] He took it a step further and actually MADE art based on those kinds of predictions.
  • ...predicting Slashdupes?
  • I was amused by something in the article that said that too many adjectives in the description ("riveting!") is a predictor of a negative outcome for a film. That reminds me of a rule of thumb for restaurants that a friend suggested -- if the name of the dish is full of adjectives, it'll taste bad. Amusingly, I just did a Google search for "restaurant menu adjectives", and most of the hits on the first page were for middle-school lesson plans where kids add adjectives to menus to make the food seem more app
  • Give me a Bayesian filter that will predict horse races. Now there's something I can use.

    What are the odds on Sundance in the 5th?

  • 'I voted for you - did you vote for me' - at least that's what the blog says ;-)

    http://efrenramirez.imeem.com/photo/0MCW7w6O/K184B j6EJ60T_ [imeem.com]
  • By releasing their results early, they have biased this year's results with media attention. Imagine a judge making a decsion between two films that she liked. Does he pick the same one as the computer? If a computer can do her job what does the world need him for? So she picks the movie that was not predicted.

    The correct methodology would have been to entrust the results to a third party to be released after the event was over.

    By the way, I am aware that my judge is a he/she, it is Sundance afte
  • What these folks are doing is cute, but simply boils down to seeking correlations between variables. I'm sure I don't need to remind slashdot readers that correlation and causality are not the same thing. Correlations can give you clues, but are not the real meat of the problem.

    If variables are correlated, the mechanics of that correlation might be due to some underlying common cause. Without understanding the underlying cause (if it exists), you are simply groping in the dark, hoping the interplay betw

  • Uh, they made a real common problem in neural nets. Success rate on test data != actual success rate.

    81% success when you run it back on your test data is meaningless. In fact, any number is meaningless when you apply it to your test data. I could get 100% success just by spitting back out if the name of a movie matches a name in the test set.

    Here's the relevant quote:
    "[t]esting the system with known data from previous years, we have established an approximately 81% typical accuracy rate on a year-by-year b
  • From TFA:

    "and then a bunch of outputs, like how many people saw a film, did it win anything at Sundance, did it have commercial success. If you could figure out the pattern between the inputs and the outputs, then you could actually predict future winners."


    Where Sundance is concerned, you run the filter one way to determine if it wins, and then reverse the good/bad word lists to see if it will have commercial success.
  • Please check it out, mods. MAHALO
  • If it were my project, I'd probably feed the script through instead of the reviews.

Heuristics are bug ridden by definition. If they didn't have bugs, then they'd be algorithms.

Working...