Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
United States Stats The Almighty Buck Science Technology

Laying the Groundwork For Data-Driven Science 55

aarondubrow writes The ability to collect and analyze massive amounts of data is transforming science, industry and everyday life. But what we've seen so far is likely just the tip of the iceberg. As part of an effort to improve the nation's capacity in data science, NSF today announced $31 million in new funding to support 17 innovative projects under the Data Infrastructure Building Blocks (DIBBs) program, including data infrastructure for education, ecology and geophysics. "Each project tests a critical component in a future data ecosystem in conjunction with a research community of users," said said Irene Qualters, division director for Advanced Cyberinfrastructure at NSF. "This assures that solutions will be applied and use-inspired."
This discussion has been archived. No new comments can be posted.

Laying the Groundwork For Data-Driven Science

Comments Filter:
  • This sounds suspiciously like something written by someone with an online MBA: "Each project tests a critical component in a future data ecosystem in conjunction with a research community of users," said said Irene Qualters, division director for Advanced Cyberinfrastructure at NSF. "This assures that solutions will be applied and use-inspired."

    If we want the public to continue to support federal funding of the sciences we have to do better than this. I understand the point, but it this needlessly laden wit

    • This sounds suspiciously like something written by someone with an online MBA: "Each project tests a critical component in a future data ecosystem in conjunction with a research community of users," said said Irene Qualters, division director for Advanced Cyberinfrastructure at NSF. "This assures that solutions will be applied and use-inspired."

      If we want the public to continue to support federal funding of the sciences we have to do better than this. I understand the point, but it this needlessly laden with buzz-phrases and it is clumsy.

      I understand your point about the technobabble. However, Ms. Qualters' résumé appears to be somewhat less fluffy [nsf.gov] that the quote would suggest.

  • from TFA:
    "In fiscal year (FY) 2014, its budget is $7.2 billion. NSF funds reach all 50 states through grants to nearly 2,000 colleges, universities and other institutions. Each year, NSF receives about 50,000 competitive requests for funding, and makes about 11,500 new funding awards. NSF also awards about $593 million in professional and service contracts yearly."

    and: "awards support research in 22 states"

    This particular investment is a tiny fraction of the budget. A low priority.

    Note that each congressper

  • ... is that data isn't evidence. And the simple fact that most people don't understand that simply underscores the danger of it.

    Now, science must be empirical. It must be based on observation, experimentation, and the results should drive theory.

    However, something that has been worrying for years is a lazy tendency for people... scientists included... to grab a data set, point out some correlating variables, and then conclude a discovery... or propose a theory that is supposed to be taken seriously.

    That is

    • by binarstu ( 720435 ) on Thursday October 02, 2014 @12:09AM (#48044397)

      The problem with data driven science... is that data isn't evidence.

      Correlative statistics are not evidence.

      I think you are confusing "evidence" with "proof". Data, and more specifically, the patterns in data, most certainly are evidence. If that were not true, then there would be no reason to even try doing science.

      Having data isn't an accomplishment.

      Any scientist who has spent years obtaining a hard-won dataset would strongly disagree with you. Consider, for example, the ground-breaking data generated a few years ago by the Human Genome Project, or the current explosion of data about exoplanets. These data most certainly do represent substantial intellectual and technical accomplishments. Now, if what you mean is that simply downloading someone else's data from the Web is not an accomplishment, then I agree with you.

      Scientists need to be willing to get their hands dirty and get the data themselves.

      I think you will find that, in the hard sciences at least, that's usually how it's done. The researchers who write the papers are usually the same people who were involved in collecting the data. However, for very large-scale studies (e.g., global biodiversity research), there is no way that a single scientist, or even a single research team, could gather all of the necessary data. In these cases, the only way to make the research tractable is to integrate multiple datasets.

      Your points about the importance of understanding where the data one uses in a study came from, how they were collected, and any potential biases are all well taken. However, ignoring any of these factors is simply sloppy science, no matter whether the researcher collected the data him or herself, or if someone else collected it.

      • We can play words games if you like... I'm quite good at them. But I frankly find the prospect to be boring. So I'll just win.

        Data (/ËdeÉtÉ(TM)/ DAY-tÉ(TM), /ËdætÉ(TM)/ DA-tÉ(TM), or /ËdÉ'ËtÉ(TM)/ DAH-tÉ(TM))[1] is a set of values of qualitative or quantitative variables; restated, pieces of data are individual pieces of information. Data in computing (or data processing) is represented in a structure that is often tabular (represented b

        • No, I don't have a different definition of "data". My point was that your original post repeatedly confuses "evidence" and "proof". As I said, data, and more specifically, the patterns in data (correlative statistics are one example), are used as evidence all of the time in science. That is, in a nutshell, how science works. Data provide evidence, not proof, for or against alternative hypotheses. The strength of the evidence depends on the strength of the data, which encompasses all of the potential da

          • The patterns in data are not data. The data is not the analysis of the data which would be a pattern in the data.

            Your lack of basic reading comprehension has run out of my patience with you.

            Good day.

            • The patterns in data are not data. The data is not the analysis of the data which would be a pattern in the data.

              Okay. Against what, exactly, are you arguing? When, at any point, have I claimed that "the data are the analysis of the data" or any such nonsense?

              Let me remind you of one of your original claims:

              Correlative statistics are not evidence.

              Do you not understand that "patterns in data" includes correlative statistics? If not, let me make this clear: You originally claimed that neither data, nor the patterns in data, are "evidence". I've tried to explain why, to scientists, patterns in data, including correlative statistics, most certainly are e

        • by AK Marc ( 707885 )

          Data is typically the result of measurements
          Data is simply information. Data does not imply analysis or even meaning.

          So you are saying that measurements have no meaning. That is why I object to your argument.

          Data [...] is a set of values

          Data is a set of values. DNA is a set of values. DNA is data. That data is important.

          Data is simply information.

          Information is good. Every book is "simply information".

          • Measurements CAN have meaning... but they do not require meaning. I can take measurements that don't mean anything all day. I can set the output of my phone's accelerometer to output to a spreadsheet... then record the data. Will the data mean anything? I can take data furthermore in completely irresponsible ways and it will still be data. I can cherry pick my results and it is still data.

            Data in and of itself is not meaningful. Data must be collected in a certain way to preserve its integrity and then it m

            • by mick129 ( 126225 )

              I can set the output of my phone's accelerometer to output to a spreadsheet... then record the data. Will the data mean anything?

              Of course it will! The data will convey how much your phone as moved. Why do you think that this data is meaningless? Perhaps you mean "useful"?

              • I'm not getting in a word game with you.

                When I say meaningful, I refer to whether that data automatically tells you something of significance.

                If you want to use the word "useful"... we can see if your use of it is the same as mine. But I've a very low tolerance for semantics games.

                • by mick129 ( 126225 )
                  Some people use their accelerometer data as a pedometer, some don't. Is the data meaningful or not? Whether data is meaningful, or useful or significant (all are fine with me) is subjective. Accelerometer data is not meaningful to you? Fine. But, you are not the arbiter of what is meaningful.

                  On the other hand, if you'd like an example of data which is (nearly?) universally recognized as meaningless, how about /dev/random > data.txt?
                  • > how about /dev/random > data.txt
                    Unless, of course, you happen to have a need for some random data. Then it's useful.
                  • I'll just use that random data set to justify everything then.

                    If that argument I just made doesn't make sense it is because your counter to my argument made no sense.

                    That is how different the two things we're talking about are right now.

                    So thank you for your response, but you're not talking about what I'm talking about and I'm not going to talk about what you want me to talk about. I was talking about something that I cared about as relates to this topic. And your attempt to strawman my argument while not u

                    • Well, I think I've caught the flu, but yes, I'm generally happy. Thanks for the concern. Glad to hear you are too, buddy.

                      One question though, does your happiness have anything to do with this semantics argument? If so, don't write them off as boring. Seek them out! Follow your bliss!
                    • I find some satisfaction in turning silly rhetorical tactics against their users.

                      I am by intention if nothing else a hyper rationalist. I hold reason and the mental framework that supports and allows for reason to be of primary importance in any discussion. For without them we will not have a reasonable discussion and will not know if what we have discussed was or was not reasonable.

                      This means defining what you're talking about, keeping a clear image of what our goals are in the discussion, and then working

                    • by mick129 ( 126225 )
                      I agree that defining terms is important. FYI, I think you should consult a external reference for a definition of meaningful, as you did much higher up the thread with "data". Your usage of the word seems unusually specific, specifically the "automatically" part referring to data without further analysis. Collecting and analyzing data is what researchers do, so requiring data to be useful without analysis is baffling. Several commenters have tried to correct your definition.

                      Even data without pristin
                    • data absent analysis isn't meaningful. And just because you analyzed data doesn't mean the analysis is meaningful.

                      Data only obtains meaning by establishing cause and effect via the data. Absent that you have white noise.

                    • by mick129 ( 126225 )
                      Data's meaning is made accessible through analysis. Absent analysis, its meaning may be obscured or unintelligible, but it's in there regardless of if we observe it. The data sharing initiative from the article is allowing future researchers the chance to find that meaning in the data.
                    • I told you I wasn't going to have a semantics argument with you. You want to define the world "is" mean "pineapple" that is your business.

                      good day, sir.

                    • by mick129 ( 126225 )
                      Actually, you said if it was a semantics argument you would "just win".

                      Data is simply information. Information is meaningful.

                      But yes, have a nice day.
                    • Actually I didn't.

                      This is what I said when you started with this foolishness:
                      ""I'm not getting in a word game with you.

                      When I say meaningful, I refer to whether that data automatically tells you something of significance.

                      If you want to use the word "useful"... we can see if your use of it is the same as mine. But I've a very low tolerance for semantics games.
                      ""

                      I am not going to debate you on the definition of basic english words that you seem determined to turn into an argument about nothing.

                      I will simply s

                    • by mick129 ( 126225 )
                      This has been a semantic argument since the first reply telling you that you're using words wrong. Your response: [slashdot.org] "We can play words games if you like... I'm quite good at them. But I frankly find the prospect to be boring. So I'll just win."

                      Best of luck finding something to interest you.
                    • Wrong.

                      Full quote:
                      ""

                      We can play words games if you like... I'm quite good at them. But I frankly find the prospect to be boring. So I'll just win.

                      Data (/ÃdeÃtÃ(TM)/ DAY-tÃ(TM), /ÃdÃf¦tÃ(TM)/ DA-tÃ(TM), or /ÃdÃ'ÃtÃ(TM)/ DAH-tÃ(TM))[1] is a set of values of qualitative or quantitative variables; restated, pieces of data are individual pieces of information. Data in computing (or data processing) is represented in a structure that is of

                    • by mick129 ( 126225 )
                      Data is simply information. Information contains meaning.

                      information
                      [in-fer-mey-shuh] noun
                      1. knowledge communicated or received concerning a particular fact or circumstance

                      Have a nice day! :)
                    • You fail at reading comprehension. No where in that definition does it imply the data was analyzed in any manner.

                      You were always wrong and I think you know it. The sad thing is that you think I don't know it... or that confusing the issue will make you less wrong. At best you might confuse me... but you'd still be wrong.

                      You didn't even accomplish that though.

                      Kindly put your thumb at a 90 degree angle, lift off your seat about eight inches, pull down your pants, place your hand thumb up on the seat perpendic

                    • I wasn't the one who required analysis for there to be meaning, that was your position.

                      I'll take vulgarity as conceit.
                    • Hmm, my dictionary says I meant conceding.
                    • Right... because when Galieo insulted people that said the sun was the center of the solar system, it meant he was wrong.

                      Logical fallacies for the win, fucktard?

                      I'm insulting you because you deserve to be treated with contempt and after trolling me for all these posts with your idiocy there isn't any courtesy left.

                      As to my requirements for meaning, I know yours didn't include implicit qualities that are required for their proper function. That was me... because I'm not a moron.

                      Seriously... Fuck off.

                    • "She still moves" isn't an insult.
    • by slimme ( 84675 )

      I also see a trend that people look for correlations, find correlations and then draw some conclusion without any proof of causation. To me it strikes me most for economics. Public policy is set based on those correlations.

      It is very counterintuÃtive but correlation research means nothing, especially in economics. Correlation research would be an amusing way to spend your time and get to know some variables, but correlation research is being used to inflence people. Repeat after me: correlation means n

      • It is not correct to say that correlations mean nothing. The fallacy most arguments fall victim to is for some correlation between X and Y and some correlation between Y and Z someone mistakenly correlates or even jumps to causation of X and Z. Correlations are meaningful in that we know that some relationship exists and the relationship becomes stronger the more acute that secant angle becomes. Correlations are not a solution but they are small pieces of a big puzzle. If we could gather enough data and
    • Data isn't evidence, but it can be used to find useful hypotheses, starting points for further research.

      Remember:

      The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka” but “That’s funny...”

      (Isaac Asimov)

    • Albert Einstein, for one example, is not known for acquiring any of the raw physical data from which he did his work on the Photoelectric Effect, Special Relativity, or General Relativity. All he did was sit at his desk and write equations based on data acquired by others.

      • And if most scientists stuck to mathematics then they could get away with that too. Do anything else and you're going to need a bit more.

  • Perhaps they should start with an annual walrus census (see story following this one).
     

  • by louic ( 1841824 ) on Thursday October 02, 2014 @02:34AM (#48044807)
    All science is data driven. Without data there is no hypothesis, and without hypothesis there is nothing to test (falsify). This is just another hype, like nanotechnology or now nanobiotechnology etc. Nearly all molecules are nanoscale: their size is measured in nanometers, and in the same way all science is data driven.

    There is nothing wrong with good old "science driven science" where people think, do experiments, and think again.
    • This particular push may not be effective, but it's not hype.

      Science may be data-driven, but historically scientists have not been trained to be good data custodians. They know reasonably well how to use data, but they don't know how to store it, label it, transfer it, etc. Go pick an article from 5 years ago which is data-heavy and try to get the original dataset from the authors: 95 times out of a hundred you'll spend a month emailing people and you'll end up with nothing. Four more out of the 100 you'

    • by radl33t ( 900691 )
      Let's just eliminate all specificity and refer to everything as physics. Sounds useful to me! Anything else is just more hype for charlatans and fools! Nuance and differentiation are for the birds!
  • I can save the NSF a bunch of money with this initiative. There's a data center in Utah that's not being used (for anything legal) with a huge amount of data storage capacity. The NSF should have it.

Adding features does not necessarily increase functionality -- it just makes the manuals thicker.

Working...