Cloudera adds data engineering capability to enable DataOps

Big data vendor Cloudera is growing its portfolio with a series of efforts aimed at enabling a DataOps model.

Earlier this month, the company, based in Santa Clara, Calif., announced new and upcoming features for its Cloudera Data Platform, including Cloudera Data Engineering and Cloudera Data Visualization. The Data Engineering service makes use of Apache Spark for data queries and the Apache Airflow platform for workflow monitoring. The Data Visualization offering is based on technology that comes from Cloudera’s 2019 acquisition of Arcadia Data, which provides reporting and charting functionality.

Cloudera Data Engineering is generally available now; Cloudera Data Visualization is in technical preview.

According to Doug Henschen, an analyst at Constellation Research, Cloudera makes a good case for the breadth and depth of capabilities it can deliver without the heavy lifting of knitting together multiple point solutions, like databases, analytics environments and streaming tools. That said, he added that Cloudera also knows it still has work to do on simplifying its platform to lower the cost of ownership and maximize value for customers looking to support data engineering, as well as data science, data warehousing and operational database use cases.

How Cloudera Data Engineering enables DataOps


Continue Reading

Talend evolves enterprise data integration efforts

Integrating data from one location with another has long been a primary challenge of data management.

Data is typically located in multiple locations and formats and bringing it all together in a format that helps organizations is difficult. Among the various approaches to enterprise data integration are data virtualization, in which data stays in its original location, and data loading into a centralized location.

A standout among vendors in the data integration market is Talend, based in Redwood City, Calif. Talend has grown its suite of tools in recent years. It acquired Stitch Inc. in 2018, bringing in a new data loader tool to the company’s portfolio to complement its own extract, transform and load platform offerings. The vendor has continued to develop the Stitch technology since the acquisition and is pushing forward on broader efforts to enable data trust as well.

In this Q&A, Laurent Bride, CTO and COO of Talend, outlines the evolution of the company in recent years and provides insight into the state of data integration today.

What’s the difference between data loading with Stitch and Talend’s other data integration tools?

Laurent Bride

Laurent Bride: Stitch is really focused on quick data ingestion.

Continue Reading

Ahana releases managed Cloud for Presto service

Data analytics startup Ahana released its new Ahana Cloud for Presto system, providing a managed service for organizations using Presto.

The new managed Presto service is now in preview and is set to be generally available by the end of 2020. Presto is an open source distributed SQL query engine that competes against Apache Spark.

Ahana, based in San Mateo, Calif., emerged from stealth mode on June 2 and revealed its first offering on June 30 with the availability of the open source PrestoDB in the AWS Marketplace. While the initial Ahana product provides support for the open source Presto project, Ahana Cloud for Presto is a managed service that goes beyond the open source project.

“Managing Presto environments can be a bit daunting for many, especially as clusters grow to meet the speed, size and scale of data,” said Mike Leone, senior analyst at ESG.

Leone said Ahana’s managed service has the potential to address the perceived challenges of getting started with Presto. He noted that Ahana has a slick user interface that can be used to ramp up deployments quickly as well as help with ongoing management. As Ahana’s engineering team grows, continuing to infuse automation and

Continue Reading

How Riot Games upped its enterprise data governance game

League of Legends is among the most popular online games in the world. The game generates a lot of data that its developer, Riot Games, needs to manage and govern.

That data includes game and player data, as well as enterprise and corporate data. As a result, Riot Games must deal with a myriad of enterprise data governance challenges, including data ownership.

Managing data at the video game vendor, based in Los Angeles, is the task of Chris Kudelka, technical product manager of data governance.

“As we started to grow, we had data ownership problems,” Kudelka said. “A lot of really well-meaning people in the company were producing data so they could measure, understand and do the right thing for the player.”

As Riot Games has grown, the biggest challenge has become a certain lack of clarity about who owns the data in the enterprise and what it was originally intended for. Another challenge is that Riot Games has grown its portfolio beyond League of Legends to include games such as Legends of Runeterra, Valorant and more in the works.

So the company has begun to deal with data management across multiple game titles, as well as understanding the

Continue Reading

K2View takes aim at DataOps with new funding

Organizations typically store user data in many different places, often making it a challenge to get a complete view of all the data.

Among the myriad approaches for consolidating data is ingesting data into a data warehouse or data lake to bring different sources together. Startup K2View, based in Dallas and Tel Aviv, Israel, takes a different approach with its fabric platform that aims to unify all sources of data for a given user or entity. It’s an approach that uses what the company calls micro-databases, in which each database includes all the data from different sources for the specific user. 

On Aug. 11, K2View revealed that it raised $28 million to continue to build out and advance its technologies, which fit into a growing segment of the market commonly referred to as DataOps (Data Operations). In this Q&A, Achi Rotem, CEO and co-founder of K2View, discusses his views on DataOps and the challenges of data management at scale.

Why are you now raising money amid the disruption of the COVID-19 pandemic?

Achi Rotem: We didn’t feel like we should take anyone’s money before. We wanted to be absolutely sure there is a market and that we had

Continue Reading

Aerospike Connect improves integrations with NoSQL database

NoSQL database vendor Aerospike released a series of enhancements that enable better data integration and accelerate data analysis for machine learning workloads.

The Aerospike Connect updates, unveiled Sept. 15, include enhanced integrations with Apache Spark, Apache Kafka, Java Message Service and Apache Pulsar.

The connectors make it easier for users to get different data sources in and out of the Aerospike database. The updates are an evolution of the company’s initial release of Aerospike Connect in March 2019 that had the first version of the Spark and Kafka connectors. The updated connectors also benefit from the Aerospike 5.1 database update that became generally available in July.

With Aerospike Connect, the vendor is providing its users with a low-friction entry into existing environments, said James Curtis, an analyst at S&P Global.

“Aerospike is part of a larger trend with NoSQL vendors becoming enablers to analytics, whereas in the past most NoSQL databases focused primarily on operational/transactional workloads,” Curtis said. “The company’s Spark connector, including its updates, is a big part of that strategy.”

The Aerospike Connect integrations provide an accelerated approach for users to get data into different types of systems.

Aerospike Connect benefits from Cross-Datacenter Replication

Continue Reading

Load More