Investigating
NoSQL, from a SQL Perspective
Peter Sdergren, Bjrn Englund
Abstract
This
report investigates the current branches in the selection of databases
available today. What do the new, non-relational databases have to offer? What
differentiations can be found? What are their pros and cons?
First off
we will cover more specifically what these benefits are, and in this first part
of the report cover a broader spectrum of databases. Amongst others, we look at
characteristics such as sharding and replication. As
examples, we also look specifically at five databases to compare them with each
other: Dynamo (Amazon S3), BigTable (Google App
Engine), CouchDB, MongoDB
and Neo4j.
In the
second part we will run a performance test on three of the databases, namely CouchDB, MongoDB and Neo4j. We
will for this test use a type of data that most accurately can be described as
data found when running a social network, containing persons, blog posts,
relations, events and comments.
The
purpose of this investigation is to look into who could benefit from putting
their SQL database in the attic, and to start using a non-relational database
instead.
More than
just explaining the different characteristics of non-relational databases, we
present a table in the results section where we compare the five different
databases with each other. We also have a section under the results chapter
covering the general pro's and con's that can be found for using SQL vs NoSQL databases depending on
what type of data you store.
The performance test results shows us that MongoDB was the fastest one, because it was the best suited database for our type of data and queries. Neo4j also performed good, especially in regards to execution speed as function of data size. CouchDB produced the slowest execution times since our data and queries suited the database poorly. This ment that CouchDB had to send large amounts of data to our program for external filtering and thus performed worse.