NoSQL database vendor Aerospike released a series of enhancements that enable better data integration and accelerate data analysis for machine learning workloads.
The connectors make it easier for users to get different data sources in and out of the Aerospike database. The updates are an evolution of the company’s initial release of Aerospike Connect in March 2019 that had the first version of the Spark and Kafka connectors. The updated connectors also benefit from the Aerospike 5.1 database update that became generally available in July.
With Aerospike Connect, the vendor is providing its users with a low-friction entry into existing environments, said James Curtis, an analyst at S&P Global.
“Aerospike is part of a larger trend with NoSQL vendors becoming enablers to analytics, whereas in the past most NoSQL databases focused primarily on operational/transactional workloads,” Curtis said. “The company’s Spark connector, including its updates, is a big part of that strategy.”
Aerospike Connect benefits from Cross-Datacenter Replication
The Aerospike 5.0 database debuted in May 2019 and was updated to version 5.1 in July.
The enhanced connectors use the updated features in the latest Aerospike database releases and help improve real-time updates and performance for all of the connectors.
One of the key additions in the Aerospike Database 5 series is support for Cross-Datacenter Replication (XDR). In addition to XDR, the Aerospike Database 5 series introduced global distributed transactions. The new features both help to enable users to run Aerospike across multiple sites and clouds with high performance for different applications, including financial payments.
Spark improvements will help AI models
The Spark connector in the Aerospike Connect update is being improved to help accelerate performance of AI model generation.
When developers are running algorithms for AI model generation, they tend to try to fit all the data that is being analyzed in-memory, in order to generate a model quickly, said Srini Srinivasan, chief product officer and co-founder of Aerospike.
Before the new Spark connector, users tended to copy data from an Aerospike real-time database to another database, such as HBase, and then run a Spark process on top of that. The problem with that scenario is that users needed to copy data, which takes time and compute resources.
With the new connector, Srinivasan said that his company has built data frame-based access to the Aerospike database from Spark. The new connector now aligns the parallel execution of Spark with the parallel execution of Aerospike. The net result is that users can generate AI models faster and with more real-time data, because the data isn’t being copied to a supplementary system.
Apache Pulsar comes to Aerospike Connect, and Presto is next
While Aerospike previously had connectors for Kafka and Spark, the Pulsar connector is entirely new. Pulsar is an event streaming technology that is often seen as an alternative to Apache Kafka.
“There’s a lot of activity in the Pulsar space and we were getting a lot of requests for it,” Srinivasan said. “We feel that the Pulsar collector will be used heavily at scale, based on some of the input we have seen in the market.”
In addition to Pulsar, Aerospike is now working on a Presto connector that is currently in beta. Presto is an increasingly popular open source SQL query engine that is often seen as a competitor to Spark.