Over the weekend, I read "Designing Data-Intensive Applications" by Martin Kleppman. I began with "Data Models and Query Languages" as this is one of my favorite areas. Data modelling is a fundamental component of software development as well.
Martin explains how data models evolve with time, giving rise to new sets of technologies. The relational database (Think: Tables, Rows, Columns) has been in the industry for over 30 years and remains relevant. Although the relationship database is quite mature, but it has its own set of issues.
With almost all the languages are object-oriented (OO). However, the underlying relational data store is not based on OO. This creates a gap for having a smooth conversion between objects to relational structures. This is called “Impedance Mismatch”. We solve this problem by adding a new layer of object-relational mapping(ORM) frameworks like Activerecord and Datamappers.
The decision to use SQL or NoSQL is mainly driven by the relationship between data entities and how to access it. To go a bit more detail, we need a short primer on the relationship between the entities.
1). One-to-One Relationship (A: Human, B: Brain = A human has a brain)
2). One-to-Many Relationships (A: Book, B: Pages = A book has multiple pages)
3). Many-to-Many Relationship (A: Author, B: Books = Author can write multiple books and book can be written by multiple authors)
The relational database is very good at solving the problem of multiple-to-multiple relationship using “joins” intensively. If your data has a one-to-many relationship, then Document Database (NoSQL) is more appropriate.
With the relational structure, the dataset is normalized (duplicates are removed) to several tables and referenced by foreign keys. In the NoSQL world, each document is a full-fledged entity. It creates a different approach for accessing data based on where the data is stored.
Conventional SQL databases have a query optimizer (e.g. Oracle’s Cost Based Optimizer - CBO). All you need to do is tell the database (via SQL) what data you need, and it will figure out the best possible way to get the data using the CBO. This is not the case with the no SQL world, the ‘access path’ needs to be defined in the application code. In short, ‘How to fetch’ logic is taken care by the Database in relational world and in case of NoSQL the application code requires to worry about.
Schema flexibility is another major differentiating factors between SQL and NoSQL. The relational world imposes a clear schematic structure during the design phase itself and any change in the structure is generally considered expensive. NoSQL offers Schema-on-Read. This makes it a lot easier to change the data format as data evolves.
Although there are so many differences between these two models, most relational databases offer JSON support to manage NoSQL use cases. On NoSQL, databases support “join” for Many-to-Many Relationship use cases.
Often data relationships are not binary, some are optimized for many-to-many and some for one-to-many type. In an ideal world, we need both performed by a system. Currently, we are solving this problem by a phenomenon called polyglot persistence using multiple database each optimized for a specific use case.
We are clearly seeing the convergence of feature between these two realms. Database world is seeing a shift from ‘specialization cycle’ to ‘aggregation cycle’.