Choosing Databases: The Wise Way
Choose how to store data wisely - Every dev ever.
We have come so far in the field of managing our data physically. From storing them as files, tables and as documents, and now it's even possible to store it as graphs. Not only the way data is stored has evolved, but the techniques have also become pocket friendly.
So, if the cost is not the issue then what
It's how your applications use that data.
Some applications need faster retrieval. Some need ACID assurance. Some need the latest write in the latest read. All these things are primary and should be considered when you're choosing your data storage medium.
If you're looking for help, follow this guide, mark your points and you will eventually get the right DB for you.
People are generally aware of two types of databases:
...but in this guide, we will draw a comparison between three types:
- Graph( Also a type of NoSQL but needs some special attention, keeping its unique characteristics in mind.)
Let's see how each DB stores an entity(real-world object) and its attributes(properties of that object):
SQL stores Entities as Tables and attributes as columns in that table.
No SQL stores Entities as either Document, Key-Value store, Table, etc. and attributes as Keys in the document, Keys in a key-value store and Columns in the table.
In Graphs, Entities are stored as Nodes and attributes are assigned as attributes to these nodes.
Fetch queries are the ones that decide the fate of your application. If the retrieval of data is fast, your application is fast and if retrieval is slow, you know you have to optimize the query or regret choosing that database.
SQL itself is quite fast but things get complicated when the data you have to retrieve is split across multiple tables and, in worst-case scenarios, across multiple databases. You are forced to use joins which, if happens on more than 2 tables, reduces your response time to a great extent. That's why, gone are the days of normalizing your data model. We are fine with redundancy if that improves the performance of queries.
NoSQL didn't change the way of storing data just for the sake of its name. It did that in order to avoid heavy use of joins. NoSQL emphasizes storing the related data in the same object, instead of referencing it using a foreign key, though we will have data redundancy, which in turn doesn't give a dime about the cost of storage (not that it matters nowadays ~ cheap storage).
"I am speed" is what a Graph based database says, but we have to keep in mind the limitation of its use cases. Graph database fails to deliver in applications where we fetch range based data. But for any social platform, analytics website, security-management applications and the sorts, Graph database is the ideal for faster retrieval of results.
When dealing with transactions (operations on multiple entities at the same time), ACID compliance is a must for databases.
ACID stands for
Atomicity - Either all operations in the transaction take place or none at all.
Consistency - Database to remain consistent before and after the transaction.
Isolation - Operations between two or more transactions are independent of each other.
Durability - Doesn't lose the effect of successful transaction even in case of system failure
The reason why SQL is most widely used is not only because of the initial investment in the development of SQL, but also because it is ACID compliant.
NoSQL made an entry in the industry with the tradeoff between faster retrieval and ACID compliance. NoSQL, originally, is non-ACID-compliant but there are some databases like MongoDB, which have tried to make it ACID-compliant in their newer releases.
Graphs are already a kind of NoSQL, so you can't expect them to be ACID compliant either. But there are some Graph databases too which are making effort towards it. One example would be the Neo4j. They already claim their Graph database is ACID compliant.
Does this database scale well? It is already the first thing that pops up in your mind when choosing the stack for your application. It's obvious that we have to be foresighted about future databases. Otherwise, migrating data to another database at a later stage will give you a tough time.
There are 2 types of scaling
Vertical Scaling - Increase the capacity like memory, storage, and bandwidth of the network on that machine. Horizontal Scaling - Use another instance of a database on another server. (Distributed database)
SQL is vertically scalable. As you can see clearly, we can't go on increasing the limit of the existing storage infinitely. Each machine has some limitations and that is the scaling limit of our SQL databases.
NoSQL is a database that scales well using the latter approach. This scaling comes at a compromise of consistency according to CAP (only two things can be achieved amongst Consistency, Availability and Partial Tolerance in a distributed data environment) theorem.
Since it is a kind of NoSQL database, we expect the graph to be horizontally scalable as well. Neo4j and ArangoDB are examples of such a graph database.
Go for SQL if:
- Your queries don't get complicated with multiple joins.
- You need robust ACID support.
Horizontal scaling is fine with your application.
NoSQL can be considered at times where:
- You know the queries will get complicated with multiple joins in SQL and,
Your application doesn't have a need for ACID transactions (though some databases are starting to support multiple ACID transactions nowadays).
Graphs are ideal:
- In an application where an object has a hierarchy of nested related objects. The graph database would be faster than any other database and the best option.
- To analyze data. So, you can expect the graph database to be extremely helpful in the case of data analysis application and predicting future trends.
With the advancement in each type of database. It is possible that the shortcomings they have now won't exist in the future or they might have an alternate solution. If you can avail all the functionalities that your application needs in one database, you must just check the stability of the version and if it's fine, you are good to go.
If you are able to choose your database using this article, go ahead and leave some likes and comments.
Also, focus on data modeling now. 😄
Thank you for reading.