Databricks is putting more substance behind its data lakehouse model, with a new SQL Analytics service, revealed Nov. 12, that is part of the company’s Unified Data Analytics Platform.
The data lakehouse is a concept that the data science and engineering vendor has been advocating over the course of 2020 as a technical architecture that combines the best elements of data lake and data warehouse models.
The technology foundation for Databricks’ vision of the lakehouse is an open source project known as Delta Lake, which is currently hosted by The Linux Foundation. In June, Databricks expanded on Delta Lake with the launch of its Delta Engine, which adds Spark 3.0-based data queries and caching to the lakehouse.
The Databricks SQL Analytics service brings Delta Engine into the Databricks platform to help customers use the lakehouse model. The new service also integrates technologies from data visualization vendor Redash, which Databricks acquired in June.
While Databricks unveiled the SQL Analytics service today, it will be available only as a preview starting Nov. 18. The vendor said it expects general availability to follow in early 2021.
Why the data lakehouse concept works
The lakehouse concept that is at the core of the Databricks service makes good sense to Hyoun Park, CEO and principal analyst at Amalgam Insights.
Park said the lakehouse that Databricks advocates is fundamentally about the idea that data lakes, collections of data sources across a variety of data formats, need to be both governed and analytically available, for lakehouse users to make sound data-based decisions.
“The data warehouse has been an extremely powerful tool for unlocking analytics but is becoming slightly outdated in an era that data is everywhere, being created all the time, and stored in a wide variety of formats,” Park said. “In this context, the idea of a lakehouse as a data lake that performs data warehouse-like purposes is an important step forward for the analytics community.”
How Databricks SQL Analytics advances the lakehouse
In Park’s view, the new Databricks SQL Analytics service is significant because it bridges gaps for data analysts and data scientists who need to bring semistructured data into both analytic and data science efforts quickly, with the governance and performance needed for production environments.
Hyoun ParkCEO and principal analyst, Amalgam Insights
“With this service, Databricks increases the availability of data lakes for analytic usage and helps unlock the insights and guidance currently hidden in semistructured data sources that have traditionally been difficult to both connect and analyze in context of traditional structured data sources,” Park said.
Arsalan Tavakoli, senior vice president of field engineering at Databricks, explained that the SQL Analytics service enhances the SQL query capabilities that Databricks has long provided with its platform.
Tavakoli noted that there is now better support for business intelligence tools with connectors to help users access lakehouse data. There is also improved query performance from a concurrency perspective, such that more users can query a given data source at the same time.
Redash integration provides lakehouse visibility
Databricks SQL Analytics marks the official debut of Redash technology into the Databricks technology stack. Redash is an open source technology that provides users with the ability to query a data set with a SQL interface.
Tavakoli said that since the acquisition, Databricks has been working on security and performance improvements for production workloads. He added that with the Redash integration, Databricks users can now choose to get a data science view or a SQL Analytics view of data within the same environment.
“It’s not a standalone product now. Redash has now been fully integrated within Databricks,” Tavakoli said. “So, we’re pulling in all the Redash capabilities, but natively integrated with versioning, security and collaboration and deeply integrating it with all of the underlying infrastructure.”