NewSQL: overcoming limitations of relational and NoSQL databases

Technologies behind databases have never been so variegated as today. After the rise of relational SQL-based databases, new needs have emerged, with the consequences of fostering a plethora of new database models, e.g. the NoSQL ones. Today, new models are being proposed by researchers, with the emerging NewSQL databases. In this article, we will recap the talk delivered by Miguel Ángel Fajardo, VP of Technology at Geoblink, during Codemotion Rome 2019. Fajardo provided an interesting and detailed overview of the evolution of database technologies from the ’60s to today, finally introducing the NewSQL databases.

The beginning of database technologies

The main problem with this idea was in the limitation of this structure: trees do not allow to have cycles, and consequently it was not possible (or at least it is pretty difficult) to have more complex connections among the different data stored inside such a database.

To overcome such limitations, in 1970 Ted Codd published a paper to present his original idea of a relational model of data, which laid the foundations of many of the modern databases.

In the relational model, data is divided into tables, which in turn can be connected by means of relations (which gave the name to the whole model).

On top of this model, between the ’80s and ’90s a lot of technologies were developed around a relational database. In particular, we witnessed an impressive number of relational database management systems (RDBMS), including the well-known MySQL, Oracle, PostgreSQL, SB2, SQL Server and many others. All these technologies implemented the original relational database model, improving it by means of several peculiarities:

  • relational databases allow for easily modeling data
  • a high number of clients have been developed for interacting with such databases, in turn contributing to the wide adoption of such a model — which also means a big community of users
  • RDBMS implement ACID transactions. This means that every transaction is: atomic (transactions are either success or failure), consistent (only valid data is saved), isolated (concurrent transactions do not affect each other), durable (committed data is persisted on disk)

Web 2.0 and the emerging of NoSQL

In this context, the problem of scaling ACID transactions came up as one of the main difficulties to overcome. In particular, the need for horizontal escalation, with the consequent distribution of data in different physical machines, do not allow to reliably and effectively implement such transactions. Moreover, there is also the need to perform analytics over petabytes of data, and provide high availability across different regions.

To solve all these peculiar needs of big data, two different classes of approaches have been put in place: distributed processing (represented by technologies such as Hadoop, Flink, Apache Spark and many others) and NoSQL databases. The latter is a set of quite variegated classes of database, with the common feature of being non-relational — indeed NoSQL has been often reported as an acronym for Non-SQL, or Not Only SQL.

NoSQL databases can be divided in the following classes:

  • key-values stores: in such databases, data is represented as part of a big “dictionary”, stored in the form of key-value entries. These databases are commonly used for user session data, component configuration, or to provide fast access by caching data. However, queries can be very complex to build in some cases, due to the specific structure of the DB. Examples of key-values stores are Redis, Amazon DynamoDB and Riak
  • column-oriented DB: this class of databases is particularly suited for real time analytics, since it allows us to easily (and effectively) aggregate a lot of of data. However, column-oriented DBs do not perform well in cases of queries against a few rows. Moreover, they do not allow for flexible data schemas. Cassandra, Hypertable and Apache Hbase are probably the most representative instances of column-oriented databases
  • document-oriented DB: these databases store data in the form of documents, which are often parts of sets known as collections. Documents allow for storing unstructured data and to represent models with many layers. Drawbacks in this case are the difficulties in making joins of such heterogeneous data. For similar reasons, queries can generally be not very flexible. The most representative database of this class is of course MongoDB, although there are other examples, such as Couchbase or RavenDB
{
_id: <ObjectId1>,
username: "123xyz",
contact: {
phone: "123-456-7890",
email: "xyz@example.com"
},
access: {
level: 5,
group: "dev"
}
}
  • graph DB: these databases are particularly useful when the original connections among data can be mapped into a graph structure. This is the case for social networks data, or for those cases where algorithms based on routing or fraud/disease spreading detection need to be run. The main drawbacks of such approach are the difficulties in aggregating data, as well as in performing analytics. Examples of graph databases are Neo4j, ArangoDB and OrientDB

In general, the advantages of NoSQL model are easy scalability, high availability across different DCs, support for different data types and analytics. However, NoSQL also has drawbacks: they are generally more complex and expensive to maintain, require specific set up for the given set of queries, often rely on specific query languages and are fundamentally not transactional.

NewSQL

  • consistency: every read receives the most recent write or an error
  • availability: every request receives a (non-error) response — without the guarantee that it contains the most recent write
  • partition tolerance: the system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

To overcome this limitation, in 2012 a research team from Google proposed a new class of relational databases which is currently known as NewSQL. These databases seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system. NewSQL databases are also horizontally scalable and allow for high availability.

The mechanisms that allow us to have all these features implemented in a single database are quite complicated and each implementation is different from the others. Basically, such systems employ a distributed cluster of shared-nothing nodes, in which each node manages a subset of the data. This requires the implementation of components that are responsible for concurrency control, flow control and distributed query processing. Of course, the distributed nature of NewSQL database requires the adoption of proper consensus protocols, as well as the timing synchronisation (e.g. for keeping transaction order).

There are many implementation of such databases class. Among them, it is worth mentioning Google Spanner, Amazon Aurora, Cockroach, Azure Cosmos DB, MemSQL and VoltDB.

--

--

We help tech communities to grow worldwide, providing top-notch tools and unparalleled networking opportunities.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Codemotion

We help tech communities to grow worldwide, providing top-notch tools and unparalleled networking opportunities.