Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Open Source News

Big Data's Invisible Open Source Community 49

itwbennett writes "Hadoop, Hive, Lucene, and Solr are all open source projects, but if you were expecting the floors of the Strata Conference to be packed with intense, boostrapping hackers you'd be sorely disappointed. Instead, says Brian Proffitt, 'community' where Big Data is concerned is 'acknowledged as a corporate resource', something companies need to contribute back to. 'There is no sense of the grass-roots, hacker-dominated communities that were so much a part of the Linux community's DNA,' says Proffitt."
This discussion has been archived. No new comments can be posted.

Big Data's Invisible Open Source Community

Comments Filter:
  • Sorry (Score:4, Insightful)

    by discord5 ( 798235 ) on Thursday March 01, 2012 @09:03PM (#39216311)

    My basem^H^H^H^H^H hacker cave simply doesn't have any room for a storage array in the PB order.

  • by blahplusplus ( 757119 ) on Thursday March 01, 2012 @09:13PM (#39216365)

    ... must face the fact that lots of code is boring to maintain and update. Not to mention unless you are independently wealthy contributing to open source is a drain one ones time and resources. No one should really be concerned that many corporations see value in open source, it's like seeing value in roads or sewers. There is much code that is just like roads and sewers that which would be hard to maintain on a volunteer basis.

  • by Anonymous Coward on Thursday March 01, 2012 @09:46PM (#39216505)

    "There is no sense of the grass-roots, hacker-dominated communities that were so much a part of the Linux community's DNA"

    This is for one simple reason: most hackers don't need "BigData".

    Perhaps if the typical hacker had a cluster of servers to play with, this would change. But as long as most hackers are bound to using a single personal computer, they're just not going to be very concerned with clusterware.

    They're also not concerned with plenty of other things that are essential to big corporations, like payroll software and CRM (customer relationship managment) software.

  • Re:Sorry (Score:3, Insightful)

    by martin-boundary ( 547041 ) on Thursday March 01, 2012 @10:19PM (#39216679)
    Well, you really shouldn't be debugging code on petabyte datasets to begin with. If there's a bug that shows, there's a minimal dataset on which the bug shows, and that's the dataset you can ask help with.

    In general, you should always develop code on a tiny sample of the dataset. Once it's fully debugged and works correctly, then you apply it on your petabyte dataset.

  • by scheme ( 19778 ) on Friday March 02, 2012 @12:06AM (#39217247)

    OTOH I'm sure hadoop and friends would be very useful for the LHC and other big science projects, but they have are mostly taxpayer funded and are fighting to keep the dollars they're getting, not looking for new ways to spend it.

    HDFS is already used by CMS (one of the detectors at the LHC) to store and manage distributed filesystems at various regional centers. After all, when you are generating multiple petabytes each year and need to process it and keep various subsets of it around for analysis by various groups, you need filesystems that can handle multiple PB of files. And yes, I believe patches are being fed upstream as necessary. Other filesystems being used in the US include lustre, dcache, and xrootdfs.

    Although funding is an issue, continuing to run and analyze data from the LHC means that money needs to be spent to buy more storage and servers as needed and to pay people to develop and maintain the systems needed to distribute and analyze all the data being generated . Having multiple PB of particle collision data is useless if you can't analyze it and look for interesting events.

Our business in life is not to succeed but to continue to fail in high spirits. -- Robert Louis Stevenson

Working...