Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Open Source Software Technology Apache

Spark Advances From Apache Incubator To Top-Level Project 24

rjmarvin writes "The Apache Software Foundation announced that Spark, the open-source cluster-computing framework for Big Data analysis has graduated from the Apache Incubator to a top-level project. A project management committee will guide the project's day-to-day operations, and Databricks cofounder Matei Zaharia will be appointed VP of Apache Spark. Spark runs programs 100x faster than Apache Hadoop MapReduce in memory, and it provides APIs that enable developers to rapidly develop applications in Java, Python or Scala, according to the ASF."
This discussion has been archived. No new comments can be posted.

Spark Advances From Apache Incubator To Top-Level Project

Comments Filter:
  • by Anonymous Coward
    Generally when Spark advances you get engine knock.
  • Only thing -- where do I get my big data?
  • Solr claims to be yet strictly fails to be a drop-in search engine for your website.

    A former employer of mine, who didn't have a clue about Linux, Java or Open Source, bet the farm on Solr speeding up the report generation for his online service.

    I don't want to tell you who this employer is because they provide a valuable service to the business community. But the owner of the company is a raging alcoholic, who devoted at least an hour at the end of each day for not having gotten Solr up and running yet, d

    • the target market for Solr is the "enterprise". big corporations who have developers and operations people on staff with heavy duty skills.

      don't cry because because you can't handle it

    • by jockm ( 233372 )

      So do you judge every Apache project this way? Are Apache, Tomcat, Commons, Batik, CouchDB, etc etc etc all crap until proven otherwise because of Solr? Apache is a collection of projects, maintained by different people.

      And not to trash your friend's company, but he picked a technology without trying it out yet? Then that company had bigger problems that Solr. Nor would I judge Solr by that story (I have never used Solr, nor am I involved with it in any way).

  • Spark runs programs 100x faster than Apache Hadoop MapReduce in memory

    And Tachyon, another component of Matei's Berkeley Data Analytics Stack, boosts [datascienceassn.org] Spark another factor of 2-8x by sidestepping JVM garbage collection issues.

  • On one carefully selected benchmark, discounting a lot of things that matter (like data movement) spark performs better than Hadoop. Tech reports generated by the authors suggest that this is a corner case and that the variance in spark performance is wildly variable. Don't believe the hype.

    • I think the major advantage to using Spark isn't just in the performance but in using libraries such as MLBase/MLLib. Is this not correct? While I realize R is mostly adopted in the industry, MLLib seems to be catching up very fast.

As you will see, I told them, in no uncertain terms, to see Figure one. -- Dave "First Strike" Pare

Working...