To SQL (relational) or not to SQL (NoSQL) that is the question

March 19, 2012

This document is work in progress. I have been asked to review this issue and I am providing this info in a draft form.
This post focused on MongoDB; you will find these issues related with other NoSQL DBs, but the particulars may be different.

As always, the answer to the question “do we go relational DBMS or NoSql?” is: ‘it depends’. And it depends on a number of issues. I will try to enumerate and address those issues in the posting. I don’t assume that I have covered all issues; you should not either.

Before we begin, I am going to ask a question of the reader, “do you have a data description document?” Something in writing (written down, not just in your head) that describes your data requirements and how you will be using your data. I know that some people are thinking that I want you to have a E-R diagram fully developed, that is not the case. I just want the reader to have some sense what data they will be working with, how is that data related (or not), what they need to do with the data, etc. If you know those things then you will be able to determine what issues are relevant to you and how important are they.

All data is related. If your data is not related then you may/may not need a DB to store your data.  Generally, in all  non-trivial, large-scale, productions deployments of data stores, there is some piece of data (entity or attribute) that is related to another. Therefore, the idea that NoSQL databases are schema-less may not be the best way of thinking about your data.  I believe that you will want/need to develop some sort of schema – description of the data, how is organized, how is it related and how it will be used.
Note: I do not suggest creating a E-R diagram as it will drive you down the ‘relational’ path. However, you will want that description to define your collection and documents if you decide go down the ‘NoSQL’ path.

Relationships Between Entities
This is a very tricky question, because the answer depends on what data entities you have and how are they related. Again, having a schema makes addressing this issue easier. If you find that you data (read entities) have a large number of relationships, then NoSQL may not be the best solution.
I suggest that you create a very high-level E-R diagram and then take the entities and see if you can ‘easily’ refactor the schema into a MongoDB schema – how efficiently and effectively and can you embed the related entities into objects and arrays inside a BSON document. Also, this ‘embedding’ will be supported by client side linking … more on that later.

While Mongo does support some [built in] atomic operations (e.g. findAndModidy), it currently it does not guarantee atomic operation on a single documents. If you have a schema where a number of entities need to be atomically updated at the same time (read transactions) then Mongo is not right for you

Consistency – Single DB
RBRMS are strongly consistent by design; most allow table and/or row level locking.
MongoDB (current version) is strongly consisted because it has one global lock for read/write operations. There is some talk of have global, collection level locking. Bottom line, if your DB is going to be to a significant number of concurrent read/write operation then Mongo may not be the best solution for you.

WYOW Consistency (single server)
One proposed work around is to leverage MongoDB’s atomic update semantics with an optimistic concurrency solution. This comprises four basic steps: 1. read a document; 2. modify the document (i.e. present it to the user in a web form); 3. validate that the document hasn’t changed; 4. commit or abandon the user’s update.
Note: There are a number of posts regarding read-your-own-write consistency that that would be good to review is this is a large issues for you.

Consistency – Distributed DBMS
For ‘industrial strength’ DBMS this is a solved problem. For example, Oracle has RAC. If you really, really need it then it may be worth the money, but be very sure you need it as it is a very expensive solution.
MongoDB does not offer master-master replication or multi-version concurrency. In other words, writes always go to the same server in a replica set. By default, even reads from secondaries are disabled so the default behavior is that you communicate only with one server at a time.   This may need to be its own posting as this is a complicated issue.  More on this later.

Many RDBMS have standardized on SQL (ANSI) and are generally consistent. However, your stored procedures are not portable.
MongoDB has a relatively rich set of data access and manipulation commands. The find (select) command returns cursors. However, the language is particular to Mongo.

Both RDBMS and MongoDB support the declaration and use of indexes.

Scale-out is relatively easy. Scale reads by using replica sets. Scale writes by using sharding (auto balancing). There are issues with Sharding that need to be understood. More on that later.

MongoDB is ‘free’. However, there are a number of ‘free’ RDBMS. But, as always, you need to factor in the costs for development and production support – which are non-trivial.

This is a challenge for a new player like MongoDB. The administrative tools are pretty immature when compared with a product like MySQL.