ArangoDB Use Case

TABLE OF CONTENTS

ArangoDB Use Case

Introduction

As a software development company, we very often work on complex applications that need to handle lots of data. Recently, on one of our projects, we’ve faced a challenge – We had a lot of data on many levels and we had to be able to operate directly on these documents.

Do you want to know how we resolved our problem?

Join us on our journey to find out.

But first, let me introduce our background.

Our technologies

At Brainhub we specialize in building apps with JavaScript and we do so using the following technologies:

On the backend:
NodeJS as the platform
Express or Koa as an HTTP library/framework
Various DBMS which fit a particular need, mostly MongoDB and Redis
RabbitMQ for queuing
Consul for microservices registration
GitLab CI or Circle CI as a continuous integration tool
Google App Engine, Bluemix or own servers for deployment

‍

On the web frontend:
React
Redux
Redux Saga
Scss
On the mobile frontend:
React Native
On the desktop frontend:
Electron

What kind of problem we have

We have a lot of data on many levels, which means, in a document model, many levels of nested documents. Moreover, we have to be able to operate directly on these nested documents (children, grandchildren, great-grandchildren etc.).

We have to create an API not only for our frontend but also for external integrations. The user should be able to send a JSON schema, which is later used for validation of provided data when creating or updating, and it’s also used to join documents from various collections.

An example of a simple JSON schema:

{
 "$schema": "http://json-schema.org/draft-04/schema#",
 "title": "profiles",
 "description": "Profile Schema",
 "type": "object",
 "properties": {
   "id": {
     "type": "string",
     "description": "ID"
   },
   "address": {
     "type": "string",
     "description": "Address"
   },
   "email": {
     "type": "string",
     "description": "E-mail address"
   },
   "firstname": {
     "type": "string",
     "description": "First name"
   },
   "lastname": {
     "type": "string",
     "description": "Last name"
   },
   "transactions": {
     "type": "array",
     "description": "List of transactions connected to the profile",
     "items": {
       "title": "transactions",
       "type": "object",
       "properties": {
         "id": {
           "type": "string",
           "description": "ID"
         },
         "orderTotal": {
           "type": "string",
           "description": "Total value of the transaction"
         },
         "invoices": {
           "type": "array",
           "description": "List of invoices related to the transaction",
           "items": {
             "title": "invoices",
             "type": "object",
             "properties": {
               "id": {
                 "type": "string",
                 "description": "ID"
               },
               "discountPercent": {
                 "type": "string",
                 "description": "Discount percent"
               },
               "itemNo": {
                 "type": "string",
                 "description": "Item ID"
               }
             }
           }
         }
       }
     }
   }
 }
}

So there are the following solutions:

Document database with foreign keys inside the documents:
Operating on data with many queries
Operating with a single query if a DBMS permits it
Relational database:
Manual serialization/deserialization
ORM
Graph database
Multi-model document-graph database

What type of DBMS we expect

We would like a DBMS which satisfies:

Open source
High performance
Good support for JavaScript/NodeJS
Good community
Supporting ACID (atomicity, consistency, isolability, durability)

Other useful features are:

Multi-model
Powerful query language

Potential DBMS to choose

We love open source solutions, so we eliminated DBMS such as Oracle, SQL Server, DB2 and for licensing issues MySQL.

We made a comparison of many No-SQL DBMS (not only for this project but also to have some overview for other projects, the data is as of February 18th, 2018):

We took some DBMS from the top of the rank above but eliminating:

LevelDB – is a great DBMS but designed for text storage, no document storage, which we need
PouchDB – though it is higher in this rank than CouchDB, we decided to consider CouchDB instead because PouchDB is generally designed for backend-frontend synchronization, which we don’t need in our app, but CouchDB is a DBMS used typically on the backend side
Memcached – allows caching data like Redis but after more detailed research, it looks like Redis is the undisputed winner over Memcached. Moreover, we’re looking for a document-based DBMS, so we don’t want to consider more key-value DBMS than Redis
Firebase Realtime Database – not an Open Source and generally seems not to be enough good to take the risk of having some bugs which we cannot fix
Neo4j – designed especially for Java but not NodeJS and is not document-based

Moreover, among the SQL DBMS, we decided to include only PostgreSQL in our research because it makes it possible to store JSON-like data, which we need.

Based on the table above and other research, DBMS that seem to suit our needs the most are:

MongoDB – the most popular document DBMS:
Advantages:
NodeJS developers have the greatest knowledge of this DBMS
Great clustering options
Disadvantages:
Missing joins – very important for our data; MongoDB Aggregation Framework is very limited and hard to debug
Missing transactions – we need ACID (though planned in MongoDB 4.0)
Missing expressive, dedicated query language – queries only in JSON
Issues can be reported only in Jira but not GitHub, which is less user-friendly for the Open Source community
CouchDB:
Advantages:
RESTful API
Disadvantages:
Missing transactions – we need ACID
RethinkDB:
Advantages:
ReQL
Distributed
Disadvantages:
Relatively slow performance
ArangoDB:
Advantages:
Great community – very helpful team on Slack
Multi-model
AQL
Transactions
Sharding and replication
Disadvantages:
Still not very popular, so it’s practically impossible to find developers experienced in ArangoDB
Relatively slow writes
Redis:
Advantages:
Super fast
Popular – among the non-relational databases – only MongoDB is more popular
The most stars on GitHub among all DBMS
Disadvantages:
Not designed for durable persistence (as default everything is kept inside RAM)
No query language
Very limited queries (only very basic operators like get or incr`)
PostgreSQL:
Advantages:
Very popular
Supports SQL
Supports many kinds of data like multi-dimensional arrays and user-defined types
Is proved to be very mature in production
Disadvantages:
Running JavaScript on PostgreSQL is not out-off-the-box (requires an extension) and JavaScript is never the main language used on PostgreSQL server because it has to be always embedded inside the SQL syntax
Though user-defined functions and data types are very useful, their syntax seems to be from an old epoch like Fortran or Pascal
PostgreSQL json and jsonb types have low performance in comparison with document databases
Sharding in PostgreSQL is not so easy – it requires either using a fork of the official database engine or an implementation in the application layer
Like other relational DBMS requires a schema which slows down creation of a simple MVP microservice

Why we chose ArangoDB

ArangoDB seems to be something like MongoDB (we have the most experience in MongoDB) with some extra features. Of course it lacks some MongoDB features like the Aggregation Framework but, in reality, this one is not lacking but replaced with something more user-friendly – AQL + joins.

ArangoDB like MongoDB provides clustering, though the ArangoDB clustering has not proven to work stably on production. One of the key factors was a very active community. It has a very low ratio of open issues to the total number of issues. Moreover, everyone can easily access ArangoDB Slack where the support team is very helpful, and also in Stackoverflow they give adequate responses.

Another reason was that ArangoDB is a multi-model DBMS, which is useful as we were planning to extend our documents with using graphs.

How we have used ArangoDB

We have used the following ArangoDB features:

AQL
Transactions
Admin UI (only for some debugging purposes)

We are potentially planning to use in the future:

Graphs

In the ArangoDB shell, we found a very useful feature which doesn’t exist in MongoDB. To learn AQL, no data in the collections was needed because it’s possible to type something like this:

db._query('for i in [1,2,3] return i * i')

Because one of the requirements was to build the data from many collections using the provided JSON schema, we were looking for an ArangoDB query builder.

We found something which was rather unpopular and lacked many features, so we created our own ArangoDB query builder.

We created an abstract interface, so when replacing ArangoDB into another DBMS, only the inner implementation would be changed.

An example code of our query builder:

const QueryBuilder = () => {
 const priv = {
   // private fields and methods
 };

 const pub = {
   getQueryTree() {
     return priv.queryTree;
   },

   fromSchema(schema) {
     priv.mainCollectionName = schema.title;
     priv.queryTree.loop = `FOR ${schema.title}Item in ${schema.title}`;
     priv.queryTree.sorting = `SORT ${schema.title}Item.id`;
     // some more code

     return pub;
   },

   withLimit(offset, count) {
     // some code
   },

   byId(id) {
     // some code
   },

   byIdentifiers(identifiers) {
     // some code
   },

   byParentId(collectionName, id) {
     // some code
   },

   toAQL() {
     return [
       priv.toString(),
       priv.bindings,
     ];
   },
 };

 return pub;
};

export default QueryBuilder;

‍

We have created some JavaScript code which runs on the ArangoDB server, and we use this code for most transactions.

// this function is run on the ArangoDB server, and, thus, it cannot use all es6 features
const dbProcedure = (params) => {
 const db = require('internal').db;

 const updateProperObject = () => {
   const collection = db._collection(params.myCollectionName);

   return collection.updateByExample({ id: params.newObject.id }, params.newObject);
 };

 const removeObjects = (collectionName) => {
   params.childrenIdsToBeRemoved[collectionName].forEach((id) => {
     db._collection(collectionName).removeByExample({ id });
   });
 };

 /*
  * The remaining public and private methods
  */

 const actions = {
   create,
   update,
   remove,
   override,
 };

 return actions[params.action]();
};

export default dbProcedure;

‍
It was pretty cool that we were able to use some popular libraries like Lodash even on the database server.

ArangoDB provides an HTTP framework named Foxx which simplifies creating microservices that connect to ArangoDB.

However, we decided not to use Foxx because we didn’t want our microservices to be dependent on a database (like it is while using MongoDB + Mongoose or direct connecting between frontend and CouchDB REST API). Instead, we created abstract models which internally use ArangoDB.

This approach proved to be a good choice because later we had to replace ArangoDB into Redis in some critical places of the system in order to remove bottlenecks that were hindering the overall performance.

Our benefits from using ArangoDB

Thanks to AQL, writing and debugging queries was much easier than when using the MongoDB Aggregation Framework. Also, the transactions were very helpful. Even running JavaScript code on the database server was easier than in MongoDB because it was possible to use some libraries like Lodash.

Our drawbacks from using ArangoDB

Unfortunately, ArangoDB was not fast enough to handle a very large number of write/reads in a short period of time (however, even with other DBMS, e.g. MongoDB or MySQL, writing data to the hard disk would be too slow). That’s why in some microservices we decided to replace ArangoDB into Redis, which works in memory.

However, even in this situation, using ArangoDB was a better choice than MongoDB + Mongoose because Mongoose models usually make the entire architecture dependent on DBMS.

On the other hand, we created an abstract model using ArangoDB, so replacing DBMS into another was relatively easy.

Take note that MongoDB can be used with such an abstract model as well – so I just say that ArangoDB + our abstract models can be much better than MongoDB + Mongoose.

Importance of key characteristics of ArangoDB

Table with key characteristic of ArangoDB.

Frequently Asked Questions

No items found.

Our promise

Every year, Brainhub helps founders, leaders and software engineers make smart tech decisions. We earn that trust by openly sharing our insights based on practical software engineering experience.

Authors

Bianka Pluszczewska

Tech Editor

Software development enthusiast with 8 years of professional experience in this industry.

Bianka Pluszczewska

Tech Editor

Software development enthusiast with 8 years of professional experience in this industry.

ArangoDB Use Case