A Good Data Analytics Program Relies on a Good Data Ops Process

The need for a strong Data Ops process is often undervalued – and misunderstood when applied to data analytics projects. Simply put, Data Ops is DevOps (the set of practices that combines tools and IT operations) for data – and is the process of operationalizing data and addressing the core idea that every time you do a deployment or make a change, you need to be mindful of the data that is already in place and the potential impact of the changes that are being promoted.

The challenge, in situations where proper attention isn’t paid to the underlying Data Ops process, is that a host of issues can arise – ultimately leading to some serious implications:

You push a change that breaks something in production

This is every data team’s worst nightmare. Even worse, though, is not having a process to know 1) what change was introduced, and 2) how to remove the issue. If you don’t have a line of sight into what changes are being deployed, you have no recourse for quickly addressing the newly-introduced issue. This is a dev issue, but it quickly turns into a business issue in that you can start losing your business audience. If your customer base doesn’t have trust in your system and the underlying processes (and they begin seeing corrupt data in real-time), the credibility of your entire data program gets called into question – and it’s being called into question over something that could be solved by a clear, tested and documented process.

The speed to delivery for enhancements

If you don’t have a solid process in place, and if you’re seeing data that isn’t accurate, your time to fix issues and provide enhancements is going to be extremely long. The result? You’ll be looking at bad (or incomplete) data longer. The deployment process itself needs to be seen as a part of your overall data program. Implement a zero code change to simply test the deployment process. Is the process itself working as it should – or is that process actually what’s introducing the wrong things into production?

You’ve removed the ability to do a hot-fix

Issues arise – it’s inevitable, and dev teams need to be able to jump in quickly and perform a hot-fix to address the immediate issue. The problem, though, is that if you don’t have a Data Ops process in place, you risk reintroducing that same bug on your next deployment.

Human error and cost

No matter how cautious people are being, mistakes get made. A DevOps process is built to remove as much human error as possible from your data analytics program. The less human error, the more accurate your data – and program. People are expensive, and processes can help reduce that cost. The more people involved in a deployment, the more expensive that process is. Remove the manual aspects of your data analytics program, and you’ll have a better, cheaper and faster program.

If you’re unsure about the current state of your Data Ops process, ask your team these three questions. The answers will tell you all that you need to know.

- What is our current process for getting data changes into production? Is it consistent and well documented?
- Are there isolated development and test environments where work is being done?
- Do people have admin access to production to make changes? Is there a process in place to prevent people from pushing their own changes into production (i.e. what is the governance between development and deployment)?

If your Data Ops process is not well understood, it might be leading to inconsistencies in your data – and inconsistencies in your data leads to doubt in the minds of your customers about the quality of their information and that they can’t really trust what they see as a source of truth. Build a better process, and you’ll go faster, remain trustworthy in the eyes of your customer, and you’ll know you’ve built a single version of the truth that can be relied on to make critical business decisions.

Dave Taddei

SVP of Global Data Analytics Strategy