-
Notifications
You must be signed in to change notification settings - Fork 2
Graph Databases as a Replacement to Relational Databases.
Graph Databases are not new - sites like LinkedIn and Facebook are based on highly connected data that is not managed on traditional RDBMS (Relational Database Management System) infrastructure. Graph DB technology is being rapidly commoditized with platforms like Neo4J and OrientDB leading the way. It is believed they will become a new defacto standard in developing all sorts of business and online applications once the inertia of 30+ years of RDBMS thinking is slowly broken down.
Often asked what a graph database is - the ideal way to store highly connected data - most people just shrug and say that was solved years ago with the RDBMS platforms - they have 'Relational' in the name after all, right there as the first letter of the acronym! This post is an attempt to explain what makes them a better choice for many applications.
Firstly let's take a look at an example. Say you have a permissioning service that manages permissions for various systems grouped by roles which in turn have a list of functions. In a relational model, we may end up with a table for 'System', 'Role', 'Function', and 'Person' with additional join tables for 'Role_Function' and 'Person_Role'. A typical query of this model would be to determine which functions Person 'A' has permissions for Application 'X'. The most basic TSQL implementation would be something like
SELECT Function.Id, Function.Name
FROM Function
INNER JOIN Role_Function ON Role_Function.FunctionId = Function.Id
INNER JOIN Role ON Role.Id = Role_Function.RoleId
INNER JOIN System ON System.Id = Role.SystemId
INNER JOIN Person_Role ON Person_Role.RoleId = Role.Id
INNER JOIN Person ON Person.Id = Person_Role.PersonId
WHERE Person.Name = 'A'
AND System.Name = 'X'
Of course, if you add in some reasonable complexity like supporting the fact that some functions may imply permissions to other functions (to edit a record you need to be able to view or search for it). Or that you might have profiles linked to positions rather than people you end up with an explosion in the JOIN factory and the TSQL becomes many times more complicated.
In a graph world, however, each row in each table simply becomes a node in the graph. The person A node would have an 'IS_IN_ROLE' relationship to a bunch of Roles which would be linked to systems with a 'HAS_ROLE relationship and to functions with an 'INCLUDES' relationship. Functions could relate to each other in a hierarchy. You could add Profile nodes which a Person could hold which can have Roles of their own etc. E.g. a graph looking something like this:
With graph technology comes new querying languages/syntaxes. Neo4J provides a very elegant Cypher language which allows you to query the graph very succinctly. E.g. our complex non-performant TSQL statement might look like this in a graph world:
MATCH (:System {Name:"X"})-->(r:Role)-[]->(f:Function), (p:Person {Name:"A"})-[]->(r) RETURN DISTINCT f
-
One of the obvious benefits of graph DBs is the types of queries that are easily supported and often DO NOT require changing even for changes to the graph structure itself. Greatly speeding up development time.
-
Another benefit is performance. In the TSQL world, there are many index lookups going on to find data in separate tables to JOIN on. In the graph world, each node has direct references to its related nodes meaning that traversing the graph (given known starting points like the Person with Name "A" and system with name "X") is super fast as it only ever considers related nodes to see if they match the query. In fact, although indexes are supported in graph DBs they are generally only used to 'anchor' the query to fixed starting points in the graph not to find the data being retrieved.
-
Flexibility to requirements changes. In an agile development world (which is everywhere now really right?) Graph databases accommodate changes to requirements far more easily. The rise of ORMs was due largely to the impedance mismatch between Object Oriented development and the RDBMS data storage structure. Graph DBs remove this issue by allowing data to be stored in a way that more closely matches the code. In fact Graph DBs do not strictly have schemas (though this is somewhat dependent on the technology used) - there is nothing to prevent one node representing a Person having an 'Eye Colour' attribute and another node from having a 'Height' attribute. Obviously, for use in business applications, you will expect some conformity but this is held and defined in code rather than in a separate DB schema as with RDBMS.
-
Deployment of changes is also simplified. Though there are gotchas to look out for with the lack of a schema is driven model you are free to add and remove nodes and relationships dynamically meaning you could re-organize the structure of the graph in a live environment.
-
Most obvious is the lack of mainstream support. Graph technology is new and untrusted in both enterprise architect and development worlds. This will change over time as exposure increases.
-
The market has not yet stabilized meaning even the most prominent players have not yet settled on a standardized querying language or codebase (e.g. Neo4J have recently deprecated their original APIs)
-
There are some applications where a 'good old' RDBMS is still more suitable. Any application with serious aggregation/number crunching requirements or where the structure of the information is very static, not highly related, nor subject to frequent change is probably still going to be developed using an RDBMS backend.
-
Reporting requirements are also probably better suited to a properly structured reporting cube maintained separately from the graph. This is actually true of systems running on an RDBMS but since TSQL can aggregate data well often reporting and transactional requirements are supported by a single DB. If you are a reporting purist in some ways this is another benefit of the Graph DB as it forces us to think about the reporting requirements separately from the transactional requirements of the system.
| Criteria | RDBMS | Object-oriented database | Graph database(NoSql database) |
|---|---|---|---|
| Data storage | Data is stored in the form of rows and columns. Since RDBMS does not store relationships, it requires less space. | Stores data as objects as well as methods to use it. Needs higher space in comparison to RDBMS. | Graph data is kept in store files, each of which contains data for a specific part of the graph, such as nodes, relationships, labels, and properties |
| Flexibility: adaptability to change | Changing the table design may require a complete rebuild. Can be altered once the database is deployed however can take significantly more time than the graph database. | New objects can easily be constructed from existing objects. | Allows the addition of new nodes and relationship without compromising the existing network |
| Query language | Standard query language (SQL) | Exist a standard language but hardly implemented | No standard query language yet |
| Query Performance | Performs well for simple, structured data but is not well suited for the data with many many-to-many relationships. | No join is required as objects can be accessed by using pointers | Performs exceptionally well for traversal queries, highly interconnected data, deep and complex queries. |
| Integrity constraints: Rule that defines the set of consistent database states or changes of state or both. | There are four types of integrity constraints: Domain Constraint, Entity Integrity constraints, Referential Integrity Constraint, Key Constraint | Consistency constraints have not been fully implemented. Provides only a limited number of features for integrity constraints | Integrity constraints support is still under development in a graph database |
| Maturity and level of support: how thoroughly tested it is | Development of RBMS leads back to 1974 making it one of the oldest and reliable DBMS. | Development is still going on therefore a sufficient number of Programmers and Database Administrators are available for OODBMS. | Development of graph database boomed in 1998 however it is still not widely adopted. |
| Ease of programming | The common language makes transitioning between implementation easier | It is direct and extensive support to OO programming. | Graph database are language-specific and have their own API making transitioning between graphs databases difficult |
| Security | contains extensive support for ACL-based security, built-in multi-user support | most of the Object-Oriented Databases do not support the authorization | Contains some access control list (ACL) security mechanism but lack support for multiuser environments(both handled in application level) |
| Scalability | Relational model operations cannot be extended because the Relational Data Model has a fixed number of SQL operations. | It provides full support to advance applications. A set of processes can be extended easily | Functionality can be extended easily by using API and plugins. |
From above we can conclude that the choice of databases depends on the type of data we have on hand. If our data is quite simple and structured then a Relational Database is enough however to deal with highly interrelated data, a Graph database would be a wise choice. A graph database is highly efficient for deep and complex analysis. Whereas for handling the complex graphical & hypermedia data and advanced applications, an Object-Oriented Database System can be an alternative to traditional database
-
Graham Pearson.graph-databases-replace-rdbms-technologies.https://www.linkedin.com/pulse/graph-databases-replace-rdbms-technologies-graham-pearson
-
Angles, R., A Comparison of Current Graph Database Models,2012 IEEE 28th International Conference on Data Engineering Workshops, IEEE, 2012.https://www.researchgate.net/publication/261076480_A_Comparison_of_Current_Graph_Database_Models
-
Vicknair, C.; Macias, M.; Zhao, Z.; Nan, X.; Chen, Y. & Wilkins, D. A comparison of a graph database and a relational database Proceedings of the 48th Annual Southeast Regional Conference on - ACM SE '10, ACM Press, 2010.https://www.researchgate.net/publication/220996559_A_comparison_of_a_graph_database_and_a_relational_database_A_data_provenance_perspective
-
Aziz, T.; Haq, E.-u. & Muhammad, D., Performance-based Comparison between RDBMS and OODBMS International Journal of Computer Applications, Foundation of Computer Science, 2018, 180, 42-46.https://www.researchgate.net/publication/323218317_Performance_based_Comparison_between_RDBMS_and_OODBMS