Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?
Open Source News

Big Data's Invisible Open Source Community 49

itwbennett writes "Hadoop, Hive, Lucene, and Solr are all open source projects, but if you were expecting the floors of the Strata Conference to be packed with intense, boostrapping hackers you'd be sorely disappointed. Instead, says Brian Proffitt, 'community' where Big Data is concerned is 'acknowledged as a corporate resource', something companies need to contribute back to. 'There is no sense of the grass-roots, hacker-dominated communities that were so much a part of the Linux community's DNA,' says Proffitt."
This discussion has been archived. No new comments can be posted.

Big Data's Invisible Open Source Community

Comments Filter:
  • by oneiros27 ( 46144 ) on Thursday March 01, 2012 @10:57PM (#39216873) Homepage

    Internet Archive's last published generation Petabox [] (now more than a year old, so they were using smaller drives), would take two racks ... which is still reasonable, but you could probably fit it in a single rack with today's drives. A Backblaze Pod [] is 42 disks in 4U, so you could do it yourself and assuming you can get enough large disks after that whole flooding thing, be able to get a TB in a single rack easily. The Sun Thumper took 48 disks in 4U ... I don't know if the X4540 ever supported larger than 1TB disks, though.

    My department just got a Nexsan E60 in yesterday ... 60 3TB disks in 4U, so you can squeeze 1.8PB raw in a 42U rack. (usable space ... still more than a PB, even with spares.)

    So, space isn't the issue ... power and cooling way be, though.

  • by evilviper ( 135110 ) on Thursday March 01, 2012 @10:58PM (#39216875) Journal

    This is for one simple reason: most hackers don't need "BigData".

    Perhaps if the typical hacker had a cluster of servers to play with, this would change.

    "Most hackers" don't need a lot of things that are, never-the-less developed as successful open source projects. Anybody think there's a huge audience for DReaM?

    Storage is getting big... Even a tiny shop can afford obscene amounts of storage. Each 2U server can have 6 x 2TB SATA (3.5") drives pretty inexpensively. As soon as you've got a dataset that needs more space than you can store on one of those, you'd benefit from thesee "big data" solutions, rather than the standby (more expensive) solution of "throw in a monster SAN".

    And you don't even need that much infrastructure. The virtual servers (cloud) service providers aren't very expensive, particularly when you don't care about SLA, and will give you as big of a cluster "to play with" as you could want.

If I have seen farther than others, it is because I was standing on the shoulders of giants. -- Isaac Newton