Cazena on Thursday released its Instant AWS Data Lake service, providing an integrated set of capabilities to help organizations get started quickly with an AWS cloud data lake.
With a cloud data lake, data is ingested and stored, but it’s often difficult for organizations to then use the data, which has led to a rise in different technologies that aim to help make the data useful.
The cloud data lake vendor, based in Waltham, Mass., is now updating its approach to help AWS customers in particular use the cloud data lake model. The Instant AWS Data Lake is a tailored offering of Cazena’s Instant Data Lake platform, which helps to connect data with AWS services including MSK, Amazon EMR, Athena, Glue and Amazon SageMaker.
Among Cazena’s users is Bardess, an analytics services provider based in Randolph, N.J.
Daniel Parton, lead data scientist at Bardess, said the firm’s data science team was often slowed down by insufficient or nonexistent tooling.
Bardess had been trying to assemble its own machine learning environment, with cloud infrastructure, data platforms and data science workbenches — a time-consuming project. By engaging with Cazena, Parton noted that Bardess was able to get the capabilities it needed as a service.
How the Instant Cloud Lake helps Bardess
With the AWS Instant Data Lake release in particular, Parton noted that there are now more tools and capabilities available to Bardess’ data scientists.
“Cazena allows us to focus on designing the data architecture, doing exploratory data analysis, and starting to train models to support the data science project, without worrying about the technical infrastructure,” he said.
As an example of how Bardess is using Cazena, Parton noted that the company recently solved a challenging anomaly detection problem in time-series data streaming from multiple sensors at a water treatment plant.
Daniel PartonLead data scientist, Bardess
“Cazena enabled us to build and orchestrate an end-to-end analytics pipeline with mixed workloads, cloud stacks and tools, all from within the data lake, and do it within a few hours,” he said.
Parton explained that the pipeline for the deployment included streaming data, transformation and data normalization from the various streams using Glue and landing the data in an S3 bucket. That data ingest pipeline then enabled Python code to train a forecasting and anomaly detection machine learning model in Amazon SageMaker.
“The data volumes involved, and the need to install various libraries and tools, would have taken us weeks to accomplish without Cazena,” Parton said.
Cazena’s path to the Instant Cloud Data Lake
The instant cloud data lake model is the second generation of his firm’s platform for helping organizations better use data.
Cazena was founded in 2013, after CEO and founder Prat Moghe left data warehouse vendor Netezza, which was acquired by IBM in 2010.
Moghe said he carried his Netezza experience to Cazena. Netezza was a pioneer in the simplification of the data warehouse industry, offering the technology as an appliance to make it easier for organizations to use.
The same basic idea of making a complex technology more consumable is at the core of Cazena’s model, Moghe said.
The first generation of Cazena’s platform was called the Open SaaS Data Platform. It was an attempt to orchestrate and optimize any cloud stack to build a cloud data lake in a service-based approach.
The Instant Cloud Data Lake is the evolution of Cazena’s platform, which Moghe said further simplifies the process of building a cloud data lake that is useful for analytics and data science.
“Our mission is to make cloud data lakes easy for all enterprises,” Moghe said.