Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Math News Politics

Math Indicates Pollster Is Forging Results 319

An anonymous reader writes "Nate Silver suggests the political pollster Strategic Vision is 'cooking the books. And whoever is doing so is doing a pretty sloppy job.' Silver crunched five years worth of their polling data, and found their reported results followed a suspicious pattern which traditionally suggests fraud. The five-year distribution of the numbers 'is not random. It's not close to random.' The polling firm had already been reprimanded by the American Association for Public Opinion Research for failing to disclose their methodology, though the firm argues they did comply with the organization's request. Their response to Silver's accusation? 'We have a call in to our attorney on this and fully intend to take action that will vindicate us.'"
This discussion has been archived. No new comments can be posted.

Math Indicates Pollster Is Forging Results

Comments Filter:
  • by klapaucjusz ( 1167407 ) on Friday September 25, 2009 @09:08PM (#29545709) Homepage

    I'm not sure I understand what Silver is claiming about the data.

    He shows that the distribution of second digits in the results of Pollster's polls doesn't follow a uniform distribution -- and from that he somehow deduces it's not random.

    If you look at the figure in the second article, it looks to my untrained eyes like a gaussian curve with maximum around 8 -- since when are gaussians not random?

  • improbable (Score:4, Interesting)

    by drDugan ( 219551 ) on Friday September 25, 2009 @09:09PM (#29545713) Homepage

    Reading TFA, Nate's analysis implies that there is a systematic bias toward some last digits in the overall poll percentages aggregated over many disparate topics.

    What seems so improbable (to me) is that if someone really were grossly "cooking the books" like this - literally not doing the poll, or tallying any numbers at all, but instead simply reporting fake results for press ... is that they would be so stupid to make up the results manually instead of using a computer in some way. What, some guy in an office reading other polls and saying "gee I think the number will be 45%."

    If this kind of bias really has been introduced by manually creating and publishing the results (as the analysis seems to imply), then it will be easy to track down and prove with further digging into the data, interviewing people who made the calls or took the data, etc. However, accepting such an explanation would requires a level of stupid on the part of the principals in this company that is so extreme that I find such a scenario an improbable explanation for the results presented.

  • by plague911 ( 1292006 ) on Friday September 25, 2009 @09:38PM (#29545837)
    Strategic Vision is a Republican pollster. Meaning when a Republican politician wants a poll about a particular set of data they give Strategic Vision some money and they do a poll. This can be for either internal polling to give them and idea how the "battle" is going or for general consumption. And yes Strategic Vision is big enough to matter, but they are just the tip of the iceberg how misleading "R" pollsters

    In general there are some Republican some Independent and some Democrat pollsters however all of their results are supposed to be scientific the idea is dose a poll for internal consumption really help if tells you that you are going to win easily on election day only to have to be a landslide against you?The answer is no.

    The reason why this is dangerous is multi fold. 1) Due to the supposed scientific nature it has been used to make public policy decisions 2) It can influence peoples opinions. 3) It can influence a senator's or some other politicians choices while they are in power.

    Here is a perfect example of this. A certain Republican senator from Maine is considering if she should support a public option, so she wants to see what the citizens of her state think about the topic. She hires Strategic Vision to do a poll for her. Strategic Vision comes back and says 60% of your state's citizens are against it. She gose "Wow I guess im not supporting that bill" In reality its 60% the other way. From this the senator decides to not support the bill and it dose not pass.

    I will be as blunt as possible. I am accusing Rasmussen, Strategic Vision and other Republican pollsters of deliberately lying to the American people in order to alter the public debate. If you follow the math they have been consistently off for years. If you want to just look at the last election cycle Rasmussen etc all had the results a lot tighter than the results on election day. This could just be poor polling on their part but I will offer exhibit B

    Since health care reform has been a topic in the news the difference between the several Republican pollsters and "everyone else" has been steadily growing. I firmly believe that the insurance industry has been paying these pollsters to lower their numbers for the democrats to push them to drop health care reform.

    Yes the Democrats poll numbers have been sliding somewhat across the board. However if you look at the data from the Republican sources. They have the numbers significantly different than those of the "Independent and Democratic" pollsters.

    Over all I want to say this "dishonest polling" helps no one. It may help push a certain agenda temporarily but It can also cause those who support it to loose elections..... Look at the results from 2008 the REPUBLICAN PARTY IS BEING MISLEAD BY ITS OWN POLLSTERS AND IT IS COSTING THEM ELECTIONS
  • Bothered Slightly (Score:4, Interesting)

    by mathimus1863 ( 1120437 ) on Friday September 25, 2009 @11:04PM (#29546161)
    I've been following Nate ever since the 2008 elections, and I've much enjoyed his analysis. Being a mathematician, I can spot BS math, but Nate usually does a decent job with no BS. But this article is has so many analytical gaps that I feel awkward supporting him this time, even though the article as a whole is convincing. To make such a bold claim as he is, I would've expected him to assess this more completely. He did no comparisons to other pollsters, and sampled data that is not IID (identically and independently distributed). i.e. if a boolean poll has 49% for one side (9) the other answer has to be 51% (1) The last digits (1 and 9) are completely dependent. Not all polls are boolean, but there will still be correlations, and many polls in the sample are boolean. Not only that, but he mis-applied the reference to Benford's Law. I know he knows what Benford's law is, because he's had multiple other posts about it, but got it dead wrong in this article.

    I'm glad there is someone sufficiently mathematical to look for things like this and have a wide enough audience to be heard, but I wish he'd taken some more time to do look at more control groups and do some confidence intervals before sticking his head into a potential legal mess.
  • Re:improbable (Score:3, Interesting)

    by internic ( 453511 ) on Friday September 25, 2009 @11:58PM (#29546357)

    I don't know, this sort of reminds me of a recent case of fraud in Physics [wikipedia.org]. If a PhD physicist can make such a mistake, it doesn't seem totally unbelievable to me that a polling firm might. Also, you have to ask yourself if they ever actually expected their results to come under much scrutiny.

  • by Artifakt ( 700173 ) on Saturday September 26, 2009 @12:04AM (#29546381)

    Statistically Impossible may well have meaning. In Cosmology, various people at various times (Hawking, Guth, Dirac, and Einstein (1n the late 40's working with Minkowski and Godel), all found that they had to write a few pages on whether very improbable events were distinguishable from zero probability events before they could justify using some of their math. All were working on their own takes on the origin of the Cosmos problem at the time. Most of them decided that any event with a probability of less than 1 in the whole lifetime of the Cosmos was 'statistically impossible' and not just 'improbable'. Rosen later argued that it was better to phrase it in terms of less than 1 during that part of the cosmos's lifetime when entropy was low enough to allow other events of that same energetic magnitude to happen normally rather than the whole lifetime, and others have debated the point various ways, but it's still common to call some things statistically impossible when doing fundamental cosmology.
          Oh, and I need a new spoon.

  • by Anonymous Coward on Saturday September 26, 2009 @12:09AM (#29546391)

    I've just checked, and it's pretty easy to generate last digit distributions that look a great deal like the one shown for strategic vision. If you assume they poll over contentious issues (which are divided close to 50/50 in the population opinion) and that there are a small number of nonrespondents, then you get distributions that with lots of 49s and 48s ,and fewer 41s. My sample histograms even reproduce the spike at 0, and the peak at 7 or 8. This is 10 lines of code in python:

    from pylab import *
    mnvar = 2 # deviation from 50/50 for each question
    nonresp = 3 # mean nonrespondents on each side
    ssize = 10000 # number of questions
    a1 = floor(normal(50, mnvar, [ssize/2])) # first group answers
    a2 = 100-a1 # the second group, their opponents
    a1 -= poisson(nonresp, [ssize/2]) # nonrespondents in the first group
    a2 -= poisson(nonresp, [ssize/2]) # nonrespondents in the second group
    a = concatenate([a1,a2]) # put them all together
    hist(mod(a,10))

    Obviously, I didn't choose any numbers by hand. It seems at least reasonable that pollsters might focus on questions that are close to evenly divided in the population. So, while there's no excuse for not publishing your methods, there is at least one innocent, and quite plausable, explanation for this distribution.

  • by nixish ( 1390127 ) on Saturday September 26, 2009 @12:44AM (#29546497)
    When looking for fraud, Silver was not looking at the poll numbers but the raw data numbers themselves (essentially hundreds of thousands of numbers , if not millions). Out of all the raw numbers, when analyzed there should not be any distribution. But the numbers were slanted towards 6 & 8 suggesting (proving perhaps) tampering. There's plenty of sound theory in this. Just look it up.
  • by khchung ( 462899 ) on Saturday September 26, 2009 @02:20AM (#29546803) Journal

    I find it disturbing, too, that the media just reports the polling companies' results, without reporting things like what questions were asked, in what order, how the poll was conducted or who commissioned it, all of which can have a big effect on the results. A lot of "push polling" goes on, especially when the polls are commissioned by special interest groups, business associations, unions or political parties themselves.

    tl,dr. (Too long, didn't read).

    Unfortunately, for most of the world, this will be the response from most readers if the media took the time to report on the details of the poll.

    Although, really, in the internet age, the media could have added a link so anyone interested could see the details of the poll. However, I suspect doing so would just expose to world how ignorant/lazy the reporters are, because you may find most poll results are either horribly slanted or extremely poorly designed (to the point that the poll was designed to mislead will be obvious).

    For example, I recall seeing a newpaper headline saying ">80% of women has been sexually assaulted at least once". Surprised at this, I RTFA, and it turned out the "poll" was done by an NGO aimed at helping rape victims, and they "polled" 8 (eight) of their staff to get this result. My view of that newspaper (and reporters/editors in general) dropped a few notch after that.

  • Nothing new (Score:5, Interesting)

    by hawk ( 1151 ) <hawk@eyry.org> on Saturday September 26, 2009 @02:31AM (#29546859) Journal

    I did a statistical analysis off the year 2000 "recount" almost 9 years ago, looking at the counties with "unusual" results.

    There were six counties in which the changed votes didn't fit the normal bell curve, four benefiting Gore and two Bush.

    Both of Bush's and one of Gore's had rules in which replacement ballots were made for idiot voters who used an X rather than filling the bubble, explaining them.

    One of Gore's had machine problems in the recount and stuck with the original figures.

    And then there were the two counties, which accounted for the lion's share of the "correction" from the recount.

    One of them was 50 standard deviations out--so far out that it is less likely than winning the California Lottery every week for thirteen weeks running . . .

    I wasn't the only one to notice the oddity, but the sad fact is that noone cares . . .

    hawk

  • by wrook ( 134116 ) on Saturday September 26, 2009 @05:02AM (#29547235) Homepage

    IANAS (I am not a statistician), but according to Wikipedia, Benford's law applies to the distribution of the first digit. It has a logarithmic distribution. This makes complete sense since the probability for certain numbers will be higher than others (i.e., in telephone bills, the 1 is probably much more likely since there are a lot of people with $100+ phone bills). But they are discussing the *2nd* digit. This should be uniform unless it's a very strange dataset.

  • Re:Bothered Slightly (Score:2, Interesting)

    by ytm ( 892332 ) on Saturday September 26, 2009 @06:38AM (#29547469) Homepage

    i.e. if a boolean poll has 49% for one side (9) the other answer has to be 51% (1) The last digits (1 and 9) are completely dependent.

    He could just as well pick the lowest number of the two and check distribution of 0-5 digits. I have a bigger problem with his analysis, from TFA:

    I did not include "non-response responses" like "other" or "undecided", nor did I include a tally for third-party candidates in races beteween the two major parties.

    Given the dependence between the possible outcomes of the poll, I'm more curious about results with this data included.

  • by ceoyoyo ( 59147 ) on Saturday September 26, 2009 @10:10AM (#29548139)

    From his posting, he talked to SV about their refusal to reveal their methodology, then decided to test to see whether their results showed any suspicious bias. He was specifically testing SV and not searching for any pollster.

    You're right, if he tested multiple pollsters then he'd have to correct for multiple comparisons. Even so, you'd expect results as bad as SV's about one time in half a million. There aren't that many major pollsters, so you could detect results skewed as badly as SV's to a high confidence level using a data mining technique.

  • by Rockoon ( 1252108 ) on Saturday September 26, 2009 @12:05PM (#29548779)

    From his posting, he talked to SV about their refusal to reveal their methodology, then decided to test to see whether their results showed any suspicious bias. He was specifically testing SV and not searching for any pollster.

    I suspect that refusal to reveal methodology is quite common, given that most are agenda-driven. Did he only speak to SV, or did he speak to lots of pollsters who refuse to reveal methodology?

    Even so, you'd expect results as bad as SV's about one time in half a million. There aren't that many major pollsters, so you could detect results skewed as badly as SV's to a high confidence level using a data mining technique.

    But there are LOTS of ways (infinite, really) to "test" data, so even if there are only 50 pollsters, you can still end up with millions of chances of finding arbitrary million-to-one outliers (where a lack of outliers would actually be suspicious!)

    Is this second-digit test a common test for normal distribution, or is it an unusual method?

  • BRILLIANT! (Score:1, Interesting)

    by Anonymous Coward on Saturday September 26, 2009 @12:20PM (#29548841)

    Every time a conservative does something fraudulent or immoral--which is constantly--all you have to do is scream, "OMG Teh Democrats Did It Too!" and that fixes it! Even if it's not really true. It deflects attention from your cherished Party and makes you feel a little better about yourself. Bravo!

  • by Brian Gordon ( 987471 ) on Saturday September 26, 2009 @03:42PM (#29549877)

    I don't really have an opinion on gun control but I think this is wrong:

    People murdered each other before guns were invented, removing them might make a few cases go away but won't impact the vast majority of homicides.

    Premeditated murders maybe, but crime in general is greatly assisted by the availability of guns. The problem is that they're just so powerful. If you go into a bank with a knife and start waving it around and telling people to get on the ground they're just going to run away. But pull out a gun and everyone 10 meters around is going to obey every word because you can kill them instantly.

    And people are defenseless against a gun but they can at least run or throw a chair or punch an attacker with a knife. And gun killings are easy and impersonal while with a knife the attacker has to struggle and get covered in blood and listen to screams or whatever.. much nastier

    Swords are a problem I guess but they're impossible to carry concealed

Those who can, do; those who can't, write. Those who can't write work for the Bell Labs Record.

Working...