The Art of Choosing the Right Database

Seldom can one database fit the needs of multiple distinct use cases. The days of the one-size-fits-all monolithic database are behind us” – Werner Vogels, CTO & VP at Amazon.com.

When asking customers how they came to chose their database engine over other alternatives, we surprisingly often get answers such as: “This is the technology we know how to use”, or “This seems to be trendy nowadays and we believe it to be the new best thing”

While being far from the best reasons for choosing your technology, these answers are understandable. Nowadays, Selecting the most appropriate database for your use cases is not an easy task. Not only are there several types of databases (such as Relational, Key-Value, Document, Graph, Wide Column, Time Series, Ledger, In-Memory, Search, and counting) but each of them may come in dozens of varieties with its own strengths, weaknesses, and elaborate list of features . As tempting as it may be to go with what you already know, choosing the right tool for the right job should be the only consideration when picking one database technology over another. As with any technology, you should think about your application first, and then pick the database solution that can support its requirements and data model best.

By choosing the right database to address a particular problem or set of problems, you will break away from rigid, one-size-fits-all monolithic databases, and be free to design apps that suit your business needs. This strategy will help you get more value from your data, reduce organizational complexity, and allow your developers to remain focused on what they do best.

How to pick the right one?

The first step in designing a new database solution is to recognize anticipated access patterns and to understand the nature of your data , such as its volume, growth, types, formats, retention and more. Following that, the next step is to address performance, security, scalability, and availability needs.

Here are some examples of questions you will need to answer during this process:

What is your workload type? Is it operational? analytical? Is it a read-heavy or a write-heavy workload?
Will access patterns be fixed or will there be a need to support random and ad hoc queries?
What are the data storage size requirements? The data growth rate? The average object size? And max object size?

What are the latency and throughput requirements?
What are the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) policies in case of a disaster?

As mentioned above, in order to choose the right database, the desired applications and their requirements have to be taken into account. However, here are a few rules of thumb that may help you to orient yourself:

For structured datasets that need a strict scheme and complicated queries, Relational databases will be the most valuable.
When speed is more important than consistency (like sub-millisecond latency), NoSQL databases are a better choice over SQL databases.
When schema flexibility is needed due to frequently evolving data sources, it will be more effective to choose NoSQL databases.

However, there is no reason to restrict yourself to one “Chosen One”. It is entirely feasible to use multiple database technologies for various subsystems, to serve different functions.

A real-world Case Study

As part of my position as a data engineer at AllCloud, I worked on a project of a financial services corporation which seeked to create a new fraud monitoring system. The goal of the project was to find a suitable storage solution that could conduct simple queries to demonstrate financial transaction relationships and identify whether an individual has used the same identity-related information as in previous cases of fraud.

When I was first introduced to the project, the company was working with Relational databases, which posed a variety of challenges. In order to finalize the project with the existing technology, several tables with multiple international keys would have been required for it to be usable. In order to traverse data of this kind, SQL queries would need nested queries and complicated joins that would have become unusable quickly. And with data size increasing over time, performance would have dropped sharply.

Relying on the “think about your application first and find the most fitting database later” approach, it was decided to switch to a Graph database. Graph DBs are useful for connected, contextual, relationship-driven data. With a Graph database it is easy to create relationships between data and quickly query these relationships – and this was exactly what we were looking for.

Since Graph databases can represent how entities relate by using actions, ownership, parentage, etc, they pose clear advantages over Relational databases in cases such as this, which saved our client much time, resources, and trouble.

Focus on your business, not on technology

To focus on what matters most – your clients – Cloud-based databases offer the most agile and cost-effective solutions.

Traditional databases require companies to provision all of the underlying infrastructure and resources necessary to manage their databases, which naturally results in future legacy constraints. Cloud-based databases, on the other hand, free you from these constraints exactly, while offering a plethora of optimization and security capabilities. This does not only result in significantly reduced operation costs, but also allows for maintaining better data privacy standards and control.

Cloud databases are perfect for customers who wish to build their software without being troubled by infrastructure constraints. If you want help with deciding which database best suits your need, and learn more about how Cloud-based databases grant you flexibility while reducing costs, please feel free to contact us directly.

Idan Petel

Data Engineer