Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Math News Politics

Math Indicates Pollster Is Forging Results 319

An anonymous reader writes "Nate Silver suggests the political pollster Strategic Vision is 'cooking the books. And whoever is doing so is doing a pretty sloppy job.' Silver crunched five years worth of their polling data, and found their reported results followed a suspicious pattern which traditionally suggests fraud. The five-year distribution of the numbers 'is not random. It's not close to random.' The polling firm had already been reprimanded by the American Association for Public Opinion Research for failing to disclose their methodology, though the firm argues they did comply with the organization's request. Their response to Silver's accusation? 'We have a call in to our attorney on this and fully intend to take action that will vindicate us.'"
This discussion has been archived. No new comments can be posted.

Math Indicates Pollster Is Forging Results

Comments Filter:
  • by postmortem ( 906676 ) on Friday September 25, 2009 @07:33PM (#29545559) Journal

    a. you can't post
    b. if you do manage to post, post goes to wrong topic!

  • Horseshit (Score:4, Funny)

    by Saint Stephen ( 19450 ) on Friday September 25, 2009 @07:43PM (#29545597) Homepage Journal

    I call total, 100%, biased, fuck me up the ass horseshit on this inane accusation. Lies, damn lies, and statistics.

    • 'We have a call in to our attorney on this and fully intend to take action that will vindicate us.'

      Looks like they plan on a retraction from the author...

      • by phil reed ( 626 )
        They can plan all they want. They aren't likely to get one. I read the article. Nate Silver spent as much text on disclaimers about his analysis as he did discussing the analysis.
  • Ah ha! (Score:2, Funny)

    by NoYob ( 1630681 )
    "[W]e categorically deny them and will refute them.

    So, which category do they deny? The category of truth or the category of lying?

    • Re: (Score:2, Informative)

      by etymxris ( 121288 )

      Not sure if you're trying to make a pun, but "categorical" in this case means "without exception." For example, Kant talks about categorical and hypothetical imperatives. Categorical imperatives you do always without exception (such as never lying, according to Kant anyway). Hypothetical imperatives are what you do based on the situation (CPR is appropriate only when someone is not breathing, for example).

  • So what? (Score:4, Funny)

    by msauve ( 701917 ) on Friday September 25, 2009 @07:48PM (#29545617)
    Polls show that 78.6% of all statistics are made up on the spot.
  • by klapaucjusz ( 1167407 ) on Friday September 25, 2009 @08:08PM (#29545709) Homepage

    I'm not sure I understand what Silver is claiming about the data.

    He shows that the distribution of second digits in the results of Pollster's polls doesn't follow a uniform distribution -- and from that he somehow deduces it's not random.

    If you look at the figure in the second article, it looks to my untrained eyes like a gaussian curve with maximum around 8 -- since when are gaussians not random?

    • Just to be clear -- you cannot expect second digits in what are two-digit results to be uniform. You can expect fourth digits to be uniform, but that data is not available.
    • Re: (Score:3, Informative)

      > since when are gaussians not random?

      That's exactly the problem he's pointing out. The second digit should be a UNIFORM distribution if it came from real data. If the digits are gaussian that indicates that either

      • there's some process accounting for a gaussian distribution that he doesn't know about (and he does consider that possibility) or
      • the numbers are cooked by a human being who has a preference for 8's over other digits.
  • improbable (Score:4, Interesting)

    by drDugan ( 219551 ) on Friday September 25, 2009 @08:09PM (#29545713) Homepage

    Reading TFA, Nate's analysis implies that there is a systematic bias toward some last digits in the overall poll percentages aggregated over many disparate topics.

    What seems so improbable (to me) is that if someone really were grossly "cooking the books" like this - literally not doing the poll, or tallying any numbers at all, but instead simply reporting fake results for press ... is that they would be so stupid to make up the results manually instead of using a computer in some way. What, some guy in an office reading other polls and saying "gee I think the number will be 45%."

    If this kind of bias really has been introduced by manually creating and publishing the results (as the analysis seems to imply), then it will be easy to track down and prove with further digging into the data, interviewing people who made the calls or took the data, etc. However, accepting such an explanation would requires a level of stupid on the part of the principals in this company that is so extreme that I find such a scenario an improbable explanation for the results presented.

    • Re: (Score:3, Insightful)

      by cptdondo ( 59460 )

      But here's the deal:

      You do the poll. You have to; you can't just make up the numbers. Sooner or later someone would figure out you don't have a phone bank.

      But the poll numbers come up as 46 for, 43 against, and the rest undecided.

      Now you can't go and say, 98 for, 1 against, and 1 undecided; that's what the communists do and everyone knows they're lying.

      But you report it as 47 for, 42 against, and the rest undecided. Now you've falsified your data, but you think in a way that's hard to catch. You bump th

      • Re: (Score:3, Informative)

        However, I'm unconvinced that this is some sort of smoking gun; Silver needs to really run this sort of simplistic analysis on a lot of other polls and see if there in fact is a bias towards a 47 - 43 split with 10% undecided. That actually sounds about right for a lot of the polls I remember in the last election.

        If you read the TFA, Nate addresses this. He states that his data--SV LLC's polling results--are selected from a wide, wide, wide variety of topics, not just necessarily the highly divisive ones where there may be a relatively even split between two choices.

        Moreover, (as Nate states) over enough data, even the effect of the undecided percentage on the trailing digit should be random.

        • Moreover, (as Nate states) over enough data, even the effect of the undecided percentage on the trailing digit should be random.

          Except that in this case, the trailing digit is merely the second digit. A bias in the second digit of what is after all highly biased data (you don't have a lot of 98-2 results in polls) is not unlikely, even in samples much larger than what he's using. Not saying that the company is honest, but Silver's argument is not sufficient to condemn them.

      • all i get out of reading the article is that silver has a bee in his bonnet and doesn't like the firm in question. anyone who's done a statistics course knows numbers can be twisted and played with to come out with just about any answer. i'd be very suprised if ALL pollsters do this.
    • by ceoyoyo ( 59147 )

      It is a PR firm.

    • Re: (Score:3, Interesting)

      by internic ( 453511 )

      I don't know, this sort of reminds me of a recent case of fraud in Physics [wikipedia.org]. If a PhD physicist can make such a mistake, it doesn't seem totally unbelievable to me that a polling firm might. Also, you have to ask yourself if they ever actually expected their results to come under much scrutiny.

  • Handwaving math. (Score:4, Informative)

    by Gorobei ( 127755 ) on Friday September 25, 2009 @08:11PM (#29545725)

    Nate Silver does great analysis at the first order multiple-linear-regression level -- he outperformed all the other polls/predictors in 2008 iirc.

    He sucks at meta-analysis though, in that he just doesn't understand the math. His 2008 monte-carlo stuff gave good results, but was just a bad reinvention of averaging. His recent foray into analyzing stock returns was interesting but 0-information (i.e. useless.)

    Now he's mentioning Benford's law, but playing with trailing digits. Then he handwaves a non-normal result with an appeal to "it looks wrong." Come on, give us some real math here!

    That said, he's probably right, but he's given us no math to support his claim.

    • by ceoyoyo ( 59147 )

      I did it: http://news.slashdot.org/comments.pl?sid=1382853&cid=29546021 [slashdot.org]

      It's not quite as unlikely as he says, (half a million to one instead of millions to one) but Strategic Vision is almost certainly sampling something that is not what the rest of the industry is sampling.

    • Re:Handwaving math. (Score:4, Informative)

      by Artifakt ( 700173 ) on Friday September 25, 2009 @11:43PM (#29546491)

      Benford's law is sometimes called the First Digit law. It deals with cases where numbers are not equally probable, but rather lower integers are more common than higher ones. A good example of such a number is the first digit of street addresses. There are many short streets that only have a 100's block, and only a portion are long enough to also have a 200's block, fewer to have a 300's block, and so on, so the first digit is not equally likely to be, say, a 4 or a 7, rather there will be more fours than sevens. Some stock market numbers should fit Benford's law, and there are plenty of other cases with real world applications.

            However, the law in extended form does work for second or higher digits, or cases where the most likely value for a digit is not 1. Take the IRS for example. Last year, the standard deduction for married filing jointly was an even $10,000. Many people didn't bother to itemize schedule A unless it got them at least a couple of hundred extra back. So there were many people who claimed $10,2XX on their itemized returns, a few less that claimed in the $10,3XX and so on. $10,0XX or $10,1XX values probably weren't the most common, because a lot of people probably didn't bother to gather all the records needed and do all the paperwork if they though it was only going to get them, say, an extra $27 or even $104.

            The IRS could, and probably does use Benford's law to look for number patterns that may indicate fraud, but for some of those numbers, it's the second or latter digit that they should start at. (They won't publicly discuss whether they have any sorting/flagging software that is Benford's law based. I suspect they do as it would be foolish not to take advantage of the math here, but I have absolutely no proof other than that I use some of the same math in a private role, and it's been damned useful a couple of times in spotting a client trying to get me involved with something shady, so it should work equally well for the government.).

            So, using Benford's law for second or other trailing digits is legitimate. I can't tell from the article whether Nate Silver is doing everything else correctly, but the extension to a particular trailing digit isn't itself a flaw, and I could come up with a good psychological argument whey humans might fudge the second digit by a point or two, but only when it isn't already an 8 or 9, so as not to make the 10's digit roll, so focusing on digit 2 could certainly be justified. (as could focusing on the second digit to the right of a decimal point for precision results, by much the same logic).

    • Nothing new (Score:5, Interesting)

      by hawk ( 1151 ) <hawk@eyry.org> on Saturday September 26, 2009 @01:31AM (#29546859) Journal

      I did a statistical analysis off the year 2000 "recount" almost 9 years ago, looking at the counties with "unusual" results.

      There were six counties in which the changed votes didn't fit the normal bell curve, four benefiting Gore and two Bush.

      Both of Bush's and one of Gore's had rules in which replacement ballots were made for idiot voters who used an X rather than filling the bubble, explaining them.

      One of Gore's had machine problems in the recount and stuck with the original figures.

      And then there were the two counties, which accounted for the lion's share of the "correction" from the recount.

      One of them was 50 standard deviations out--so far out that it is less likely than winning the California Lottery every week for thirteen weeks running . . .

      I wasn't the only one to notice the oddity, but the sad fact is that noone cares . . .

      hawk

  • ... and pollster's statistics
  • Their response to Silver's accusation? 'We have a call in to our attorney on this and fully intend to take action that will vindicate us.'"

    Generally, I would expect a logical course of action from an honest and transparent firm would be to hire a statistician to vindicate themselves. Lawyers don't make a reputable firm appear any less reputable.

    • Re: (Score:3, Insightful)

      by Frosty Piss ( 770223 )

      Lawyers don't make a reputable firm appear any less reputable.

      Lawyers don't make a reputable firm appear any more reputable.

      • I don't know about that ...

        I would say that lawyers make a reputable law firm more reputable than a law firm with no lawyers.

    • I don't see how Silver has committed any sort of tort against them.

      This is not slander. He's just said "I've mined their data and these are the results. This smells somewhat like a rat. This needs looking into." He was very careful to avoid any direct accusation of impropriety, only saying "This is what patterns like this often mean".

  • There's not enough eigenvectors in this thread...

    http://controls.engin.umich.edu/wiki/index.php/Occasionally_dishonest_casino:_crimes_or_just_noise%3F

  • 'We have a call in to our attorney on this and fully intend to take action that will vindicate us.'

    By all means, dont back up your numbers with mathmatical proof of your own.
    I cant wait to see how they try to sue mathmatical laws and formulae.

    Protip: Winning a lawsuite doesnt make you any less a liar.

  • Bothered Slightly (Score:4, Interesting)

    by mathimus1863 ( 1120437 ) on Friday September 25, 2009 @10:04PM (#29546161)
    I've been following Nate ever since the 2008 elections, and I've much enjoyed his analysis. Being a mathematician, I can spot BS math, but Nate usually does a decent job with no BS. But this article is has so many analytical gaps that I feel awkward supporting him this time, even though the article as a whole is convincing. To make such a bold claim as he is, I would've expected him to assess this more completely. He did no comparisons to other pollsters, and sampled data that is not IID (identically and independently distributed). i.e. if a boolean poll has 49% for one side (9) the other answer has to be 51% (1) The last digits (1 and 9) are completely dependent. Not all polls are boolean, but there will still be correlations, and many polls in the sample are boolean. Not only that, but he mis-applied the reference to Benford's Law. I know he knows what Benford's law is, because he's had multiple other posts about it, but got it dead wrong in this article.

    I'm glad there is someone sufficiently mathematical to look for things like this and have a wide enough audience to be heard, but I wish he'd taken some more time to do look at more control groups and do some confidence intervals before sticking his head into a potential legal mess.
  • I have been programming accounting software for almost fifteen years and the first nasty lesson I learned was that data can be presented in unlimited ways and if you want to get paid you better make it look good. Change the scale, oversample, skew the questions and all sorts of other nasty tricks are now par for the course.

    We now have well respected polls contradicting each other by double digits because of the politicizing of any information that might change voters opinions. I never thought that I wou

A committee takes root and grows, it flowers, wilts and dies, scattering the seed from which other committees will bloom. -- Parkinson

Working...