The role of DataOps engineer is on the rise.
As more organizations adopt an analytics approach to decision-making, the amount of data they accumulate can often be overwhelming. As a result, in recent years data-driven organizations have developed methodologies for managing their data, methodologies loosely lumped together under the umbrella term DataOps, which is short for data operations.
Someone, however, is needed to coordinate those DataOps to ensure that their organization’s data is properly managed and developed so that it’s trustworthy when ready for consumption, and just within the last couple of years that has led to the advent of the DataOps engineer.
Often former software engineers with experience in DevOps — the collaborative approach to application development and IT operations — DataOps engineers work with data scientists and data engineers to make sure that data is properly managed throughout the analytics process.
Joe Hilleary, a research analyst at Eckerson Group, a consulting firm based in Hingham, Mass., has followed the growth of the DataOps movement within analytics, and subsequently the development of the DataOps engineer.
He recently discussed DataOps and the emerging role of DataOps engineer, including the role’s origins and how it can change what organizations can do with their data.
How do you define DataOps?
Joe Hilleary: At its core, it’s a methodology for developing data solutions. It’s a philosophy we started seeing a few years back that’s really starting to gain steam. It’s patterned off of DevOps to some extent and looks at how to make iterative approaches to data development a reality. A lot of that starts with rethinking some of the basics and moving away from a waterfall system to doing continuous implementation and continuous development of data pipelines. DataOps is the philosophy that takes technology, people and processes and brings them together to make that [continuous development] happen.
A lot of the early buzz came out of DataKitchen — [CEO] Chris Bergh over there is a real pioneer for DataOps, and there are several other players in the space. They started working around this concept a few years back.
How can DataOps help organizations — what can its adoption enable?
Hilleary: The problem is that it takes too long to get data analytics. Right now, we hear from clients and we hear from vendors that when they start a project — doing new analysis and building out new pipelines for a department — they’re talking about it taking months between the time a business user asks a question and when they can get the answer to that question. That’s really not sustainable. You need to get answers faster and you need to have trust in those answers. DataOps addresses both parts of that. It helps accelerate those timelines, and it also focuses on ensuring quality so that six months down the line the numbers are trustworthy and there’s accountability and the data quality is good.
One of the key tenets of DataOps is consistent testing. Throughout the pipeline, throughout the development, tests are layered in that are built into production and development environments that can help show where things break and how they can be fixed and then garner that level of trust and buy-in from executives and data consumers at the end of the line. They understand why this data is correct.
What is the job of the DataOps engineer?
Hilleary: To some extent, they’ve been around for years, but we haven’t called them DataOps engineers. Some of their responsibilities were spread out before now, but essentially the DataOps engineer is responsible for the environment in which data development takes place. They’re building the tools that data engineers and data analysts are using within that development workflow. I [recently looked] at four leading vendors that have produced software platforms that support different aspects of DataOps through communications tools, pipeline builders and automators, and that kind of thing. One of the key personas we were starting to see emerge who was using these platforms is the DataOps engineer, someone who is full-time dedicated to thinking about how things are going to be developed. They’re not working with the data directly, but they’re highly technical individuals who are building the infrastructure for the data development.
Which are the four leading players in DataOps platforms you were looking at?
Hilleary: DataKitchen was one. DataOps.Live is a group out of the U.K. Unravel is doing a lot of work around DataOps functionality — particularly on the monitoring side, which is one of those key aspects of DataOps. And then Zaloni as well is sort of an all-in-one platform for building data pipelines.
Joe HillearyResearch analyst, Eckerson Group
When an organization hires a DataOps engineer or team of DataOps engineers, how does it change what that organization can do with its data?
Hilleary: What you’re doing when you’re implementing a DataOps methodology is you’re freeing up engineers to do more analytics. They’re spending less time going back and fixing things, less time trying to keep all the balls up in the air communicating back and forth over email and trying to send files and spreadsheets back and forth. By building a concrete infrastructure from the ground up and implementing your development strategy on top of that, you’re able to give your engineers and analysts the time to actually work on analytics. You’re getting more done with the same number of people because you’re spending that time in advance to do it right.
You mentioned that DataOps engineers have been around to some extent but without the title, so are those people actually doing everything a DataOps engineer would do or just taking on some of the responsibilities?
Hilleary: I think we see that on a lot of data teams there’s already someone who’s trying to do some of these things, who’s concerned about people writing very similar code and emailing things back and forth, tweaking a couple of lines and then sending it off. They’re starting to think of other ways to do some things like developing a Git repository, which is a solution we see a lot of people reaching for, or simply keeping track of who has what assets. There’s someone playing that role already, but they don’t have that official capacity to allow them to take it to the next step.
When did the role of DataOps engineer actually develop?
Hilleary: Over the last year, maybe year and a half. Now we’re starting to see companies like AstraZeneca, some of the big pharmaceutical companies and big tech companies, starting to hire DataOps engineers. If you go on LinkedIn these days you can find listings, and there’s more and more talking about DataOps in job listings even if they’re still calling the roles things like software developers for data teams. We’re at the stage of familiarity with DataOps now — which itself is only about four or five years old — that there’s enough familiarity with the concepts, with the tenets of that methodology that companies are starting to hire dedicated people to implement those strategies. But we’re very much still on the front end.
If we’re still at the front end, are only organizations on the cutting edge of analytics hiring DataOps engineers or is the DataOps role becoming more widespread?
Hilleary: I would say we’re definitely more in that prior camp. You’ve got massive companies with huge quantities of data that are thinking about these data problems all the time that are starting to hire DataOps engineers. Some smaller outfits where data is critical to their business — think about gaming outfits or betting industries where they’re dealing with massive volumes of data but maybe there are only six people in the shop, or maybe really data-intensive startups — are thinking about these things when they’re dealing with huge quantities of data, when data is the service they provide. Other than that, it’s mostly big companies that generate massive quantities of data and see that as a competitive edge.
Who becomes a DataOps engineer? Colleges and universities aren’t yet providing courses in DataOps engineering, so how does someone acquire the necessary skills?
Hilleary: Typically, they’re software engineers. The vast majority of the ones that I’ve come across come from a software engineering background. A lot of them directly come from a DevOps background and were a DevOps engineer who became familiar with agile methodologies and agile development within a software context. They get hired by companies who put them on data teams in a support role and are building software for the data team and it quickly becomes apparent to them that data development is lagging four or five years behind where software development is in terms of thinking about development methodologies and they’ll stumble into DataOps and sort of claim that title for their own. Other times they’ve been hired specifically because someone higher up in their organization has realized the need.
But the typical background is a software engineer who is versatile. These are technical positions the vast majority of the time, though it is becoming less so because some of these platforms I mentioned before lower the bar for who can fill this role for the team. The other background we see are data engineers, people on these teams who are already doing some of the things a DataOps engineer does and sees first-hand the frustrations, the pain points in data development and are taking the next steps. They’re moving away from the data itself and thinking about how the data is being developed.
What skills make someone qualified for the role of DataOps engineer?
Hilleary: I think there are three core components for what makes a great DataOps engineer. First, they need the technical skills, being able to develop things. The second thing is knowledge of the data itself; domain knowledge is still really important for solving a lot of these problems, so understanding the eccentricities of the particular kinds of data they’re working with, what the needs are within that organization. It’s not a cookie-cutter, one-size-fits-all solution when you’re trying to implement these methodologies, so having those two components are important.
The third component is the social aspect. Technology is a big part of DataOps, but it’s equally people and processes, so the ability to convince analysts and engineers that a DataOps methodology is going to be a benefit is a hurdle for a lot of organizations that are trying to implement DataOps. They’ve done something one way for so long. There used to be the cowboy coder, a lone wolf producing code for a data pipeline, and there’s sort of a mystique to it. DataOps is a transition away from artisanal data development, building something one time that’s labor intensive to industrialize it so it’s repeatable and works like an assembly line. In some respects, coders see DataOps as less glorious, so that can be a hard hurdle to get over. There’s a real personal skill set that’s needed. But then what we’ve found is that once a DataOps methodology does get implemented analysts and engineers realize how much time it saves them, how much easier it makes their jobs. Still, it’s that convincing part at the front end that really takes a lot of social skills.
Looking forward, how will the role of DataOps engineer evolve?
Hilleary: I think where we’re going to see this going is that it’s going to get less technical. Right now, there are some things you can get out of the box, but the data platforms that are coming out now are all pretty recent — within the last couple of years they’ve really started developing it — so those products are still very much in development and being improved. So, what we’ll see is the bar change on how much a DataOps engineer is doing on their own versus how much they’re doing through a tool drop over time. You’ll get people who aren’t as technical who can use a drag-and-drop interface to orchestrate all the tools in a pipeline. It’s starting to be a more visual user interface rather than requiring a lot of knowledge of coding. That’s down the line, though. We’re not there yet.
How important is DataOps to organizations now and how important will it be in the future?
Hilleary: The only thing I would throw out there about DataOps engineers is I think it’s important to realize why dedicating an entire technical person to this endeavor is worthwhile. It doesn’t have to even be their full-time job. In smaller groups, it can be a part-time position, but it’s important to really lay it out on paper as someone’s responsibility. Someone needs to own the DataOps transformation for it to be successful. A lot of times what we’re seeing in the early stages is that it’s spread across five, six or seven people who do different bits of it, and that can work, but it will only get you so far. At some point, there needs to be more directionality, and that comes with having a dedicated person or group of people — depending on the size of the project — who are thinking about this as a critical element of their job.