Intro to Document-Oriented NoSQL Databases

This is the first post in the series about comparing MongoDB with Couchbase, which are currently the two most popular document stores.
Featured story Blog
Intro to Document-Oriented NoSQL Databases

This is the first post in the series about comparing MongoDB with Couchbase, which are currently the two most popular document stores. Before we start comparing them, we have to say something about document-oriented databases in general and how they fit the NoSQL ecosystem. Just to put things into a correct perspective. This introductory post is about that.

I had been working with MongoDB for a year and a half before I recently started using Couchbase. MongoDB is the first database that I actually learned how to use properly. I learned how indexes work. I learned how to optimize queries, what scalability of the database means. I’m kind of biased over here, so take my comparison with a grain of salt. Especially if I say that MongoDB is the best database ever.

But enough about me. What about MongoDB and Couchbase? Usually, when we’re talking about NoSQL databases (the irony is that Couchbase uses dialect of SQL as a query language), we start with a famous CAP theorem, the one that states: “Consistency, Availability, Partition tolerance – pick two”. But in this case we’re not going to do that. This “picking”, some would say, is usually just picking between consistency and availability. And not even that. It’s not like consistency and availability are discrete states. You can tune MongoDB, which falls into databases that favor consistency over availability, to be more available and less consistent. The same goes for Couchbase, which is, like MongoDB, traditionally put at the Consistency part of the spectrum. On the other hand, you can also tune Cassandra, which falls into the “availability” part of the CAP theorem triangle, to be more consistent. So, it’s a range, as represented by this kindergarten level drawing:


Popularity Contest

Popularity contests, though usually stupid (or are they?), are important when we’re making a decision what tools we are going to use. I might like programming in Lisp, but will I be able to easily find ninjas who are willing to throw parentheses at the problems? Not likely.

At the time of writing this post (August 2016), MongoDB was the most popular NoSQL database, and 4th overall. The second most popular NoSQL database was Cassandra (7th overall). Couchbase was the second most popular document store (with only MongoDB being more popular).

Job ads on confirm this research. On 30th of August, MongoDB was in the lead with 16 active job postings, there were 6 active job ads for Cassandra, 6 for Redis, and 3 for Couchbase.

Why people use document stores?

Why are document stores gaining traction?

In my opinion, the reason is twofold:

1. Flexible schema

Document stores do not enforce fixed schema. Documents can be unstructured, or semi-structured – and flexible schema allows that. It’s easy to change the underlying object. And (usually) there is no need for migration scripts to be written. This speeds up the development. Especially in the prototyping stage when you don’t know even what your product is, let alone what the access patterns are. This is a document store’s main advantage over, let’s say Cassandra, which requires you to have clearly defined access patterns in order to use it effectively.

2. Marketed for horizontal scalability

The idea is that you can simply add more servers (nodes) in order to increase performance under large loads by clicking a button. The data will be equally distributed among the nodes, and the application can just continue to work as if there was only one node. The database server will handle (most of) the other stuff related to collecting the data and distributing it among the nodes. “But, what if I told you”, one might say, “that this can be achieved with relational databases as well?” Yes, of course, it could be done. And git can be used as a database as well, but it’s not what it has been designed to be used as. Let’s use the right tool for the job. Otherwise, we can agree to just use php for everything.

Scalability is achieved by ditching the concept of ACID transactions. No transactions means that:

  1. you can easily scale horizontally because the database does not have to worry about locking the tables on different machines
  2. you have to model your data according to how it will usually be read – embedding is preferred over normalization in MongoDB and Couchbase. Embed whenever it makes sense.

JSON is the new Table

To conclude this post, I’d like to make a small addition to Atwood’s law (which says that any application that can be written in JavaScript, will eventually be written in JavaScript). The addition is: any data that can be stored as JSON will eventually be stored as JSON.

Mark my words! Or don’t, in case it doesn’t happen.

That’s all for now. Stay tuned. In the next post I will actually start comparing MongoDB against Couchbase.