Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Open Source Software Apache

Databricks Open-Sources Delta Lake To Make Delta Lakes More Reliable (techcrunch.com) 15

Databricks, the company founded by the original developers of the Apache Spark big data analytics engine, today announced that it has open-sourced Delta Lake, a storage layer that makes it easier to ensure data integrity as new data flows into an enterprise's data lake by bringing ACID transactions to these vast data repositories. TechCrunch reports: Delta Lake, which has long been a proprietary part of Databrick's offering, is already in production use by companies like Viacom, Edmunds, Riot Games and McGraw Hill. The tool provides the ability to enforce specific schemas (which can be changed as necessary), to create snapshots and to ingest streaming data or backfill the lake as a batch job. Delta Lake also uses the Spark engine to handle the metadata of the data lake (which by itself is often a big data problem). Over time, Databricks also plans to add an audit trail, among other things.

What's important to note here is that Delta lake runs on top of existing data lakes and is compatible with the Apache spark APIs. The company is still looking at how the project will be governed in the future. "We are still exploring different models of open source project governance, but the GitHub model is well understood and presents a good trade-off between the ability to accept contributions and governance overhead," said Ali Ghodsi, co-founder and CEO at Databricks. "One thing we know for sure is we want to foster a vibrant community, as we see this as a critical piece of technology for increasing data reliability on data lakes. This is why we chose to go with a permissive open source license model: Apache License v2, same license that Apache Spark uses." To invite this community, Databricks plans to take outside contributions, just like the Spark project.

This discussion has been archived. No new comments can be posted.

Databricks Open-Sources Delta Lake To Make Delta Lakes More Reliable

Comments Filter:

Computer programmers do it byte by byte.

Working...