Forgot your password?
typodupeerror
Stats Education Math

Statistics Losing Ground To CS, Losing Image Among Students 115

Posted by Unknown Lamer
from the big-bad-data dept.
theodp (442580) writes Unless some things change, UC Davis Prof. Norman Matloff worries that the Statistician could be added to the endangered species list. "The American Statistical Association (ASA) leadership, and many in Statistics academia," writes Matloff, "have been undergoing a period of angst the last few years, They worry that the field of Statistics is headed for a future of reduced national influence and importance, with the feeling that: [1] The field is to a large extent being usurped by other disciplines, notably Computer Science (CS). [2] Efforts to make the field attractive to students have largely been unsuccessful."

Matloff, who has a foot in both the Statistics and CS camps, but says, "The problem is not that CS people are doing Statistics, but rather that they are doing it poorly. Generally the quality of CS work in Stat is weak. It is not a problem of quality of the researchers themselves; indeed, many of them are very highly talented. Instead, there are a number of systemic reasons for this, structural problems with the CS research 'business model'." So, can Statistics be made more attractive to students? "Here is something that actually can be fixed reasonably simply," suggests no-fan-of-TI-83-pocket-calculators-as-a-computational-vehicle Matloff. "If I had my druthers, I would simply ban AP Stat, and actually, I am one of those people who would do away with the entire AP program. Obviously, there are too many deeply entrenched interests for this to happen, but one thing that can be done for AP Stat is to switch its computational vehicle to R."
This discussion has been archived. No new comments can be posted.

Statistics Losing Ground To CS, Losing Image Among Students

Comments Filter:
  • by aaaaaaargh! (1150173) on Wednesday August 27, 2014 @09:48AM (#47765013)

    Quite the opposite is the case. Unless we are talking about experiments with terrabytes of data most software packages are complete overkill anyway, you could make your statistics with a pocket calculator instead. The problem is the conceptual work. Most institutes and individual scientists would be much better off if they employed a well-trained full-time statistician. Provided they were interested in correct and robust results rather than getting one more pilot study published as soon as possible (which will in turn be based on an insignificantly small non-random sample using an inadequate model).

  • by 93 Escort Wagon (326346) on Wednesday August 27, 2014 @09:51AM (#47765041)

    You're arguing against a point he didn't make. He didn't say those terms were recently created - he stated they recently were added to the "IT trade lexicon", which is true.

    10-15 years ago you didn't hear people bandying about the terms "big data" or "data science".

  • by Anonymous Coward on Wednesday August 27, 2014 @10:49AM (#47765681)

    It's a funny coincidence this appeared on Slashdot, as I was just reading about this issue and discussing it with my colleagues.

    I'm a statistics researcher in an applied field (university academic research) that suffers its own image problem, and my impression is that what we're witnessing in many STEM areas are problems with stereotyping in science, and marketing fads. I'm not sure that I disagree with what you're saying, but I think that there's another stereotype operating as well that cuts at the field of statistics in a second direction.

    As you point out, there are the sort of applied consulting statisticians who are probably getting increased competition from "data scientists."

    On the other side of the issue, though, you have complaints about theory-focused statisticians who really don't understand how to implement their developments computationally, who are also getting increased competition from "data scientists." This has been mentioned in a number of blog posts in various places, and I see as much more as the driver of "data science" as a banner than competition with consulting statisticians. E.g., CS individuals who feel they can do Hadoop and so forth, and who have had enough stats training, probably in undergrad, that they feel like they can just sort of usurp the statistics from the statisticians. They see the theory as irrelevant or something.

    The problem as I see it is that individuals who identify as "data scientists" don't really understand that the theory has to come from somewhere, and they fail to appreciate the issues that come up when dealing with uncertainty. It's like everyone in the field has some undergraduate-engineering-student level understanding of statistics, and don't have to deal with thorny data collection designs, complex inferences, or replicability of findings. The sort of scenario that's motivated "data science" is essentially this: a extremely large dataset involving relatively simple classification or prediction questions about observational data where there's really no scrutiny about generalizability or the meaning of the results. This problem scenario is why they got involved instead of a statistician in the first place: because the bottleneck was the size of the dataset, not the analysis scenario.

    All of the attempts to distinguish "data science" from statistics it seems to me are based on stereotypes or misunderstandings about statistics, as you point out, or on extremely short-sighted perspectives on science and math. Computational statistics has been a core part of statistics for decades (there are journals devoted to the topic), and you can find peer-reviewed articles on all sorts of computational problems in statistics (e.g., the use of GPUs in estimation problems, how to approach optimization with distributed processors, etc.). The idea that statistics is all theory, and that statisticians don't understand computational issues is naive or has a very stereotyped view of statistics (or I pity their experiences in high school and college--it sounds like they got a poor education in statistics).

    This isn't to pooh-pooh the contributions of CS--it's critical. But I hate the banner of "data science"--not only is the term stupid and redundant (how can you have science without data? What other kind of science is there?), it's based on ignorant stereotypes about statistics as a field.

    To me, this speaks to a longer term problem in CS, which is CS essentially discovering what's been going on in other fields and reinventing the wheel over and over again. I don't see this necessarily in CS academic departments, but I do see it where there's some interface with the business world. It's coming up now with statistics, it's come up before with social sciences and economics, it's come up with AI and neuroscience, it's come up with genomics, it comes up over and over again. It speaks to a sort of arrogance or autism in the field's culture, where they act as if their unawareness of a phenomenon means that no one has ever researched it before.

    Ughh... think about statistics as the mathematics of uncertainty, and see how far you get with deemphasizing that. Damn, I hate society sometimes. I need a walk.

The more cordial the buyer's secretary, the greater the odds that the competition already has the order.

Working...