Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Movies Media

Bayesian Filters Predict Sundance 123

JohnGrahamCumming writes "The LA Times reports on a company's use of Bayesian filtering to predict the winners at the Sundance Film Festival. They use a modified POPFile email filter and claim an 81% success rate."
This discussion has been archived. No new comments can be posted.

Bayesian Filters Predict Sundance

Comments Filter:
  • Re:Fuck films... (Score:5, Informative)

    by DeveloperAdvantage ( 923539 ) on Tuesday January 24, 2006 @11:28AM (#14548432) Homepage
    There are many examples of using statistics and artificial intelligence in finance (go google), including some applications to predict stock prices. Even a decade ago, books like "Neural Networks in Finance and Investing" and "Artificial Intelligence in the Capital Markets" were already published, along with hordes of books on statistics in finance (think about what Quants do).

    Of course, I don't think we can yet predict stock prices with the same 81% accuracy as in this article. And, if anyone could, they would be wise to keep it to themselves.
  • by Kagura ( 843695 ) on Tuesday January 24, 2006 @11:42AM (#14548552)
    Where do you see the word "related" or any of its equivalents? As far as I can tell, every story's position is based on the time it is posted to the front page.
  • Their web site states that the 81% number was "year on year" which I interpret to mean that they took the data for years n - 1 to predict year n.

    John.
  • Re:Shocking news! (Score:4, Informative)

    by sunya ( 101612 ) on Tuesday January 24, 2006 @11:55AM (#14548640) Homepage
    nowhere does it rain all day more than 15% of the days.

    Time to brush up on geography. It rains pretty much all the time in Cherrapunji [wikipedia.org].

  • Re:Unimpressed (Score:3, Informative)

    by Vann_v2 ( 213760 ) on Tuesday January 24, 2006 @01:56PM (#14549853) Homepage
    The problem is that saying it is "81% successful" is meaningless. Typically one would use a two-fold measure of success for these sorts of application: precision and recall. In the case of spam, the precision of your algorithm would be the number of correctly marked emails over the total number of emails marked, and the recall would be the number of correctly marked emails over the number of emails that are actually spam.

    In terms of search this is perhaps more clear, so consider Google. You issue Google a search query and it returns a bunch of results. Precision measures how many of the results returned are actually relevant, and recall measures how many of the relevant results were actually returned. One could get 100% precision by returning just one result which could be verified as relevant (or, in the above case, verified as spam), and one could get 100% recall by simply returning everything. Oftentimes one takes the harmonic mean of the two, called the F-score in this case, as an overall measure of the success of the algorithm. In other instances one might want to favor precision over recall or vice versa.

    I think they probably mean "81% precision," but a low recall means that you'll have many spam emails which are not marked. Of course, if they mean the opposite, then low precision could mean many marked emails which are not spam!

All seems condemned in the long run to approximate a state akin to Gaussian noise. -- James Martin

Working...