Laying the Groundwork For Data-Driven Science 55
aarondubrow writes The ability to collect and analyze massive amounts of data is transforming science, industry and everyday life. But what we've seen so far is likely just the tip of the iceberg. As part of an effort to improve the nation's capacity in data science, NSF today announced $31 million in new funding to support 17 innovative projects under the Data Infrastructure Building Blocks (DIBBs) program, including data infrastructure for education, ecology and geophysics. "Each project tests a critical component in a future data ecosystem in conjunction with a research community of users," said said Irene Qualters, division director for Advanced Cyberinfrastructure at NSF. "This assures that solutions will be applied and use-inspired."
Horrible attempt to communicate to a broad audienc (Score:1)
This sounds suspiciously like something written by someone with an online MBA: "Each project tests a critical component in a future data ecosystem in conjunction with a research community of users," said said Irene Qualters, division director for Advanced Cyberinfrastructure at NSF. "This assures that solutions will be applied and use-inspired."
If we want the public to continue to support federal funding of the sciences we have to do better than this. I understand the point, but it this needlessly laden wit
Re: (Score:2)
This sounds suspiciously like something written by someone with an online MBA: "Each project tests a critical component in a future data ecosystem in conjunction with a research community of users," said said Irene Qualters, division director for Advanced Cyberinfrastructure at NSF. "This assures that solutions will be applied and use-inspired."
If we want the public to continue to support federal funding of the sciences we have to do better than this. I understand the point, but it this needlessly laden with buzz-phrases and it is clumsy.
I understand your point about the technobabble. However, Ms. Qualters' résumé appears to be somewhat less fluffy [nsf.gov] that the quote would suggest.
pork, politics ? (Score:2)
from TFA:
"In fiscal year (FY) 2014, its budget is $7.2 billion. NSF funds reach all 50 states through grants to nearly 2,000 colleges, universities and other institutions. Each year, NSF receives about 50,000 competitive requests for funding, and makes about 11,500 new funding awards. NSF also awards about $593 million in professional and service contracts yearly."
and: "awards support research in 22 states"
This particular investment is a tiny fraction of the budget. A low priority.
Note that each congressper
Difficult, not impossible. (Score:4, Insightful)
If the NSF grant process is like the one for NASA, there's still a little bit of flexibility for the program manager after they've gotten the scores.
I know because I was on a panel that specifically gave two proposals 'poor' reviews (the lowest possible), and the program manager asked us to consider changing it. In this case, he's a rather nice guy, and it may just be that he didn't want to have to write the 'your proposal sucks' letter to them ... but those of us on the panel knew that there is _no_ way for them to fund a 'poor'. They have leeway with any other score, and could give something with a marginal rating some seed money (fund 'em for a year, so they might be able to put in a more competitive bid next round).
We told the program manager that no, we wanted to make sure that there was no possible way that those two proposals could get funded.
Re: (Score:2)
In this case, they are targeting computer related things, and grouping it under the name, "Data Infrastructure Building Blocks." The actual funding goes towards things as diverse as an MOOC, and some kind of scientific library for super computers.
The problem with data driven science.. (Score:2)
... is that data isn't evidence. And the simple fact that most people don't understand that simply underscores the danger of it.
Now, science must be empirical. It must be based on observation, experimentation, and the results should drive theory.
However, something that has been worrying for years is a lazy tendency for people... scientists included... to grab a data set, point out some correlating variables, and then conclude a discovery... or propose a theory that is supposed to be taken seriously.
That is
Re:The problem with data driven science.. (Score:5, Insightful)
The problem with data driven science... is that data isn't evidence.
Correlative statistics are not evidence.
I think you are confusing "evidence" with "proof". Data, and more specifically, the patterns in data, most certainly are evidence. If that were not true, then there would be no reason to even try doing science.
Having data isn't an accomplishment.
Any scientist who has spent years obtaining a hard-won dataset would strongly disagree with you. Consider, for example, the ground-breaking data generated a few years ago by the Human Genome Project, or the current explosion of data about exoplanets. These data most certainly do represent substantial intellectual and technical accomplishments. Now, if what you mean is that simply downloading someone else's data from the Web is not an accomplishment, then I agree with you.
Scientists need to be willing to get their hands dirty and get the data themselves.
I think you will find that, in the hard sciences at least, that's usually how it's done. The researchers who write the papers are usually the same people who were involved in collecting the data. However, for very large-scale studies (e.g., global biodiversity research), there is no way that a single scientist, or even a single research team, could gather all of the necessary data. In these cases, the only way to make the research tractable is to integrate multiple datasets.
Your points about the importance of understanding where the data one uses in a study came from, how they were collected, and any potential biases are all well taken. However, ignoring any of these factors is simply sloppy science, no matter whether the researcher collected the data him or herself, or if someone else collected it.
Re: (Score:2)
We can play words games if you like... I'm quite good at them. But I frankly find the prospect to be boring. So I'll just win.
Data (/ËdeÉtÉ(TM)/ DAY-tÉ(TM), /ËdætÉ(TM)/ DA-tÉ(TM), or /ËdÉ'ËtÉ(TM)/ DAH-tÉ(TM))[1] is a set of values of qualitative or quantitative variables; restated, pieces of data are individual pieces of information. Data in computing (or data processing) is represented in a structure that is often tabular (represented b
Re: (Score:2)
No, I don't have a different definition of "data". My point was that your original post repeatedly confuses "evidence" and "proof". As I said, data, and more specifically, the patterns in data (correlative statistics are one example), are used as evidence all of the time in science. That is, in a nutshell, how science works. Data provide evidence, not proof, for or against alternative hypotheses. The strength of the evidence depends on the strength of the data, which encompasses all of the potential da
Re: (Score:2)
The patterns in data are not data. The data is not the analysis of the data which would be a pattern in the data.
Your lack of basic reading comprehension has run out of my patience with you.
Good day.
Re: (Score:2)
The patterns in data are not data. The data is not the analysis of the data which would be a pattern in the data.
Okay. Against what, exactly, are you arguing? When, at any point, have I claimed that "the data are the analysis of the data" or any such nonsense?
Let me remind you of one of your original claims:
Correlative statistics are not evidence.
Do you not understand that "patterns in data" includes correlative statistics? If not, let me make this clear: You originally claimed that neither data, nor the patterns in data, are "evidence". I've tried to explain why, to scientists, patterns in data, including correlative statistics, most certainly are e
Re: (Score:2)
Data is typically the result of measurements
Data is simply information. Data does not imply analysis or even meaning.
So you are saying that measurements have no meaning. That is why I object to your argument.
Data [...] is a set of values
Data is a set of values. DNA is a set of values. DNA is data. That data is important.
Data is simply information.
Information is good. Every book is "simply information".
Re: (Score:2)
Measurements CAN have meaning... but they do not require meaning. I can take measurements that don't mean anything all day. I can set the output of my phone's accelerometer to output to a spreadsheet... then record the data. Will the data mean anything? I can take data furthermore in completely irresponsible ways and it will still be data. I can cherry pick my results and it is still data.
Data in and of itself is not meaningful. Data must be collected in a certain way to preserve its integrity and then it m
Re: (Score:1)
I can set the output of my phone's accelerometer to output to a spreadsheet... then record the data. Will the data mean anything?
Of course it will! The data will convey how much your phone as moved. Why do you think that this data is meaningless? Perhaps you mean "useful"?
Re: (Score:2)
I'm not getting in a word game with you.
When I say meaningful, I refer to whether that data automatically tells you something of significance.
If you want to use the word "useful"... we can see if your use of it is the same as mine. But I've a very low tolerance for semantics games.
Re: (Score:1)
On the other hand, if you'd like an example of data which is (nearly?) universally recognized as meaningless, how about
Re: The problem with data driven science.. (Score:1)
Unless, of course, you happen to have a need for some random data. Then it's useful.
Re: (Score:2)
I'll just use that random data set to justify everything then.
If that argument I just made doesn't make sense it is because your counter to my argument made no sense.
That is how different the two things we're talking about are right now.
So thank you for your response, but you're not talking about what I'm talking about and I'm not going to talk about what you want me to talk about. I was talking about something that I cared about as relates to this topic. And your attempt to strawman my argument while not u
Re: The problem with data driven science.. (Score:1)
One question though, does your happiness have anything to do with this semantics argument? If so, don't write them off as boring. Seek them out! Follow your bliss!
Re: (Score:2)
I find some satisfaction in turning silly rhetorical tactics against their users.
I am by intention if nothing else a hyper rationalist. I hold reason and the mental framework that supports and allows for reason to be of primary importance in any discussion. For without them we will not have a reasonable discussion and will not know if what we have discussed was or was not reasonable.
This means defining what you're talking about, keeping a clear image of what our goals are in the discussion, and then working
Re: (Score:1)
Even data without pristin
Re: (Score:2)
data absent analysis isn't meaningful. And just because you analyzed data doesn't mean the analysis is meaningful.
Data only obtains meaning by establishing cause and effect via the data. Absent that you have white noise.
Re: (Score:1)
Re: (Score:2)
I told you I wasn't going to have a semantics argument with you. You want to define the world "is" mean "pineapple" that is your business.
good day, sir.
Re: (Score:1)
Data is simply information. Information is meaningful.
But yes, have a nice day.
Re: (Score:2)
Actually I didn't.
This is what I said when you started with this foolishness:
""I'm not getting in a word game with you.
When I say meaningful, I refer to whether that data automatically tells you something of significance.
If you want to use the word "useful"... we can see if your use of it is the same as mine. But I've a very low tolerance for semantics games.
""
I am not going to debate you on the definition of basic english words that you seem determined to turn into an argument about nothing.
I will simply s
Re: (Score:1)
Best of luck finding something to interest you.
Re: (Score:2)
Wrong.
Full quote:
""
We can play words games if you like... I'm quite good at them. But I frankly find the prospect to be boring. So I'll just win.
Data (/ÃdeÃtÃ(TM)/ DAY-tÃ(TM), /ÃdÃf¦tÃ(TM)/ DA-tÃ(TM), or /ÃdÃ'ÃtÃ(TM)/ DAH-tÃ(TM))[1] is a set of values of qualitative or quantitative variables; restated, pieces of data are individual pieces of information. Data in computing (or data processing) is represented in a structure that is of
Re: (Score:1)
information
[in-fer-mey-shuh] noun
1. knowledge communicated or received concerning a particular fact or circumstance
Have a nice day!
Re: (Score:2)
You fail at reading comprehension. No where in that definition does it imply the data was analyzed in any manner.
You were always wrong and I think you know it. The sad thing is that you think I don't know it... or that confusing the issue will make you less wrong. At best you might confuse me... but you'd still be wrong.
You didn't even accomplish that though.
Kindly put your thumb at a 90 degree angle, lift off your seat about eight inches, pull down your pants, place your hand thumb up on the seat perpendic
Re: The problem with data driven science.. (Score:1)
I'll take vulgarity as conceit.
Re: The problem with data driven science.. (Score:1)
Re: (Score:2)
Right... because when Galieo insulted people that said the sun was the center of the solar system, it meant he was wrong.
Logical fallacies for the win, fucktard?
I'm insulting you because you deserve to be treated with contempt and after trolling me for all these posts with your idiocy there isn't any courtesy left.
As to my requirements for meaning, I know yours didn't include implicit qualities that are required for their proper function. That was me... because I'm not a moron.
Seriously... Fuck off.
Re: The problem with data driven science.. (Score:1)
Re: (Score:3)
I also see a trend that people look for correlations, find correlations and then draw some conclusion without any proof of causation. To me it strikes me most for economics. Public policy is set based on those correlations.
It is very counterintuÃtive but correlation research means nothing, especially in economics. Correlation research would be an amusing way to spend your time and get to know some variables, but correlation research is being used to inflence people. Repeat after me: correlation means n
Re: (Score:3)
Re: (Score:1)
Re: (Score:3)
Data isn't evidence, but it can be used to find useful hypotheses, starting points for further research.
Remember:
The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka” but “That’s funny...”
(Isaac Asimov)
Re: (Score:2)
Albert Einstein, for one example, is not known for acquiring any of the raw physical data from which he did his work on the Photoelectric Effect, Special Relativity, or General Relativity. All he did was sit at his desk and write equations based on data acquired by others.
Re: (Score:2)
And if most scientists stuck to mathematics then they could get away with that too. Do anything else and you're going to need a bit more.
Walrus census (Score:2)
Perhaps they should start with an annual walrus census (see story following this one).
science driven science? (Score:5, Insightful)
There is nothing wrong with good old "science driven science" where people think, do experiments, and think again.
science driven science? (Score:3)
Science may be data-driven, but historically scientists have not been trained to be good data custodians. They know reasonably well how to use data, but they don't know how to store it, label it, transfer it, etc. Go pick an article from 5 years ago which is data-heavy and try to get the original dataset from the authors: 95 times out of a hundred you'll spend a month emailing people and you'll end up with nothing. Four more out of the 100 you'
Re: (Score:2)
Save Money (Score:2)
I can save the NSF a bunch of money with this initiative. There's a data center in Utah that's not being used (for anything legal) with a huge amount of data storage capacity. The NSF should have it.