This is going to be a bit of a rant - but a rant from something that came up recently where someone was considering MongoDB.

I was just reading MongoDB Set to Become the ‘New Default’ Database

Just… wow. Quite a bold statement there. To save people giving details on the form (another personal bugbear of mine… so I filled it with junk) - here’s the link to the relevant piece.

HIGH PERFORMANCE BENCHMARKING: MongoDB and NoSQL Systems

First things first let’s pick apart the minor error in the press release that eWeek clearly didn’t check up on.

All tests were performed with 400M records distributed across three servers, which represents a data set larger than RAM.

Ok…

Our setup consisted of one database server and one client server to ensure the YCSB client was not competing with the database for resources. Both servers were identical.

And…

Load 20M records using the “load” phase of YCSB

So that’d be mistake one… it wasn’t three servers at all. That is a gross error as the read statistics for Cassandra would be way off as a result. In fact they say as such in the Conclusions.

We focused on single server performance in these tests. Multi-server deployments address high availability and scale out for all three databases. We believe that this introduces a different set of considerations, and that the trade offs may be quite different.

My point is that it looks like the creators of MongoDB have commissioned and paid for this report. If they haven’t then really the press release and news around it is tripe and if they have… where’s the notification of bias.

It’s worth adding that the three databases tested are completely different! Cassandra, MongoDB and CouchBase each have very different use cases. It’s not overly fair to pit them off against each other. If you were to pit MongoDB and CouchDB against each other, that would be fairer. CouchBase is really CouchDB but prettier and with a very very clever caching front end on it.

I have deployed a large Cassandra and very large CouchDB set up. I wouldn’t use either one for the other’s workload.

Rant over…