Statistics Losing Ground To CS, Losing Image Among Students 115
Posted
by
Unknown Lamer
from the big-bad-data dept.
from the big-bad-data dept.
theodp (442580) writes Unless some things change, UC Davis Prof. Norman Matloff worries that the Statistician could be added to the endangered species list. "The American Statistical Association (ASA) leadership, and many in Statistics academia," writes Matloff, "have been undergoing a period of angst the last few years, They worry that the field of Statistics is headed for a future of reduced national influence and importance, with the feeling that: [1] The field is to a large extent being usurped by other disciplines, notably Computer Science (CS). [2] Efforts to make the field attractive to students have largely been unsuccessful."
Matloff, who has a foot in both the Statistics and CS camps, but says, "The problem is not that CS people are doing Statistics, but rather that they are doing it poorly. Generally the quality of CS work in Stat is weak. It is not a problem of quality of the researchers themselves; indeed, many of them are very highly talented. Instead, there are a number of systemic reasons for this, structural problems with the CS research 'business model'." So, can Statistics be made more attractive to students? "Here is something that actually can be fixed reasonably simply," suggests no-fan-of-TI-83-pocket-calculators-as-a-computational-vehicle Matloff. "If I had my druthers, I would simply ban AP Stat, and actually, I am one of those people who would do away with the entire AP program. Obviously, there are too many deeply entrenched interests for this to happen, but one thing that can be done for AP Stat is to switch its computational vehicle to R."
Matloff, who has a foot in both the Statistics and CS camps, but says, "The problem is not that CS people are doing Statistics, but rather that they are doing it poorly. Generally the quality of CS work in Stat is weak. It is not a problem of quality of the researchers themselves; indeed, many of them are very highly talented. Instead, there are a number of systemic reasons for this, structural problems with the CS research 'business model'." So, can Statistics be made more attractive to students? "Here is something that actually can be fixed reasonably simply," suggests no-fan-of-TI-83-pocket-calculators-as-a-computational-vehicle Matloff. "If I had my druthers, I would simply ban AP Stat, and actually, I am one of those people who would do away with the entire AP program. Obviously, there are too many deeply entrenched interests for this to happen, but one thing that can be done for AP Stat is to switch its computational vehicle to R."
As a statisticians (Score:3, Interesting)
As a statisticians, you should know better that you don't make your point with a succession of anecdotes as
- A few years ago, for instance, I attended a talk by a machine learning specialist who had just earned her PhD at one of the very top CS Departments. in the world. She had taken a Bayesian approach to the problem she worked on, and I asked her why she had chosen that specific prior distribution. She couldn’t answer – she had just blindly used what her thesis adviser had given her–and moreover, she was baffled as to why anyone would want to know why that prior was chosen.
- But there is no substitute for precise thinking, and in my experience, many (nominally) successful CS researchers in Stat do not have a solid understanding of the
fundamentals underlying the problems they work on. For example, a recent paper in a top CS conference incorrectly stated that the logistic classification model cannot handle non-monotonic relations
Statistical Practitioners need to Modernize (Score:5, Interesting)
I am a researcher in medical informatics, and statistics is a huge part of my job, though I am not a classically-trained statistician.
First, I would like to offer a stark contrast between two types of statisician: 1) statisticians of the old mold who are wedded to SAS and related tools and 2) research statisticians who employ modern methods such as Bayesian statistics and rather advanced calculus. The former tend to mold all problems into what is available in the canon of SAS routines, while the latter are capable of creating custom models that suit the problem at hand.
Then, there is a new breed of scientist -- the data scientist -- who tends to use black-box machine learning methods and the classical techniques, as programs such as SAS and R have "democratized" the field. I agree with the common gripe of many traditionally-trained statisticians who object that these "data scientist" tend not to understand the statistical background of these computer codes. In fact, it is easy to download R onto one's computer and start firing data through, with little regard for the merits of the model or its results. (Not all data scientists are like this, but I'm simply stating a general observation.)
Another problem with statistics is that it can be very confusing, understanding just what things like p-values mean. After a first course in statistics, it leaves many with a bad taste -- either being terribly confusing, or rather boring. In my opinion, this is because of traditional (frequentist) statistics, which have their origins from luminaries such as Fisher and Pearson.
The "action" today is in Bayesian statistics. This formulation allows for statistical concepts to be expressed is ways that (I believe) most people can understand. But executing Bayesian statistics mandates that one understand the underlying formulation of models; in general, they are not black-box methods. Furthermore, they can be quite computationally-expensive for large data.
Statistics is suffering from perceptions of being a button-pushing, boring profession. As has happened in many other fields (e.g. computational chemistry and CFD), computer programs have democratized the field so that those who have not had years of dedicated study and training can execute statistical models. In my experience, this can be a good thing, or a very bad thing. Another issue is that there is a significant build-up of half a century of code and protocols in both industry (think big business analysis) and government agencies (think FDA).
But modern statistics is actually a hot field. Provided that one understands the background, and is willing to go the extra mile to write custom code, the rewards are endless.
Re:AP? (Score:2, Interesting)
Getting rid of it is just an attempt to waste students' time and extract more money from them by forcing them to take more university courses.
I suspect his complaint is that in high school, AP Statistics is taught by math teachers. In college, classes are taught by professors who specialize in statistics. This goes along with his general complaint that people in other disciplines don't take the time to really understand how statistics work. Of course, the same problem exists in college statistics courses. You can take a one semester survey course or the two semester theory course. He'd prefer that everyone took the two semester course and that it was rigorously graded.
He may be right about AP Statistics though. Taking statistics in high school means that most people will have forgotten it by the time they get to advanced courses that use statistical methods. This leads to students learning statistics from the professors in those advanced courses (who are not focused on statistical rigor). Statistics is a sophomore/junior level class, where most other AP classes substitute for freshman classes.
I would tend to agree with you about the other AP classes though. There's no such thing as a "calculus professor" -- calculus is taught by a mathematics professor who is likely interested in something very different. It doesn't make much difference whether it is taught in a small high school class or a large college lecture.