09 June 2019/ database-internals

NoSQL vs SQL

Relational Databases are persistence software where Data is stored in terms of relations. For Example, let’s take an Employee of an Organization.

Employee

Id
FirstName
LastName
Email

Organization

Id
Name
Revenue
Domain

Now an organization will have many employees and an employee belongs to one particular organization, this implicit relationship between an organization and an employee is modeled in RDBMS by creating two tables Organization and Employee, attributes of both the models will become columns in the table and each and every employee/organization will be rows within the table.

Say for example there is an employee Foo who works for Bar there will be a row in the Employee table for Foo and a corresponding row for Bar in the Organization table. The relationship between them is modelled as a foreign key constraint. The owning entity of the relationship will hold the foreign key. Let’s see how to decide upon the owner of the relationship, In our case its An employee working for an organization the relationship can be traversed in both the ways either from the employee side or the organization side, but the cardinality of the relationship changes based on how to traverse the relationship

Employee -> Department 1 -> 1
Department -> Employee 1 -> *

Given this kind of relationship is clearly concrete and is not gonna change over time, modelling this in a RDBMS is a very efficient approach.

In comparision to the above model, an employee will also have a profile which consists of the following sections

Skills
Projects
Hobbies

Let’s say we were to model this in a RDMS system, we would create separate tables one for each of Skills, Projects, Hobbies and we would have a 1->many relationship between the employee and the respective sections. Now let’s assume we have a very nice project manager(Sarcastic) who comes up to you and asks for all the Employees who have a cool technology Java(Sarcasm again) as their primary skill. You being a very good hacker come up with an API which does a join operation on the employee table and the skill table filter by the skill name and returns the matched employees. Let’s also assume that on an average each employee has 5 skills and there are around 10K employees this gives you 50K rows in the join table. Even if you had index on skill name this query is gonna eat up lot of resources in your database. Now as the scale increases the API is gonna become slower and slower(as long as you don’t upscale your DB instance that is).

How do we solve this Problem? NoSQL to the rescue. The basic principle of NoSQL is that it does not assume any structure of the data which you insert. The corresponding modelling for the above problem in a NoSQL realm would be like

  "employee": {
    "skills": [
      {
        "name": "java",
        "proficiency": "beginner",
        "type": "primary"
      },
      {
        "name": "python",
        "proficiency": "beginner",
        "type": "secondary"
      },
      {
        "name": "erlang",
        "proficiency": "beginner",
        "type": "secondary"
      },
      
    ]
  }
}

this model is very advantageous because

object is similar to a popular language’s Object model(You know JavaScript is popular right? No Pun intended!)
You can add more skills to the skills list as and when an employee upgrades himself.
You could search for a skill inside the skills list without performing joins(Binary search will result in O(log(n)) time remember). We could accomplish this because we are not expecting any structure in our schema, you could say that there is a structure within the skills list, a skill has a name, proficiency, type i.e. That’s true but the skills list itself is not of fixed length and you could add another attribute to each of the skill object and NoSQL databases would accept it just fine they come with no expectations (Expectations hurt right?).

Thus NoSQL vs SQL/RDMS is purely dependent on the use case at hand, if you have models which are inscribed on stone and are assured that they aren’t gonna change you should definitely use an RDBMS solution, but if the model is gonna change the attributes aren’t fixed and are bound to vary then a NoSQL solution is the best bet. How all of this is accomplished under the hood is a topic for another blog post, so until then stay tuned!

Happy Coding!

NoSQL vs SQL

Behaviour Parameterization with lambda expressions

The trouble with distributed systems part 1