Skip to content

Graph Database

sitashma rajbhandari edited this page Jan 16, 2021 · 62 revisions

Graph Database Management Systems

Introduction

The limitation of the traditional database is getting more vivid while dealing with complex data such as Clinical data which has thousands of instances and complex relationships among them. Such associative data requires complex queries to retrieve precise and in-depth information which can be resource expensive and time-consuming.

In addition, Traditional databases such as relational databases fail to cover the changing requirements of the current application domain which demands flexibility and high performance as a basic requirement to be incorporated in the database management tools.

To cope with new projects especially where the relations are as important as the entities, the development of new database technology named Graph Database began which ultimately resulted in numerous powerful graph databases to be available in the market today.

Graph Database Technologies

Just like its name suggests, the Graph database stores the data as well as its relation in a graph-like structure which makes traversing highly connected and complex data faster and easier. Depending upon our application and requisite, there are plenty of graph databases we can choose from, with their own specialty and key features.

Top Graph databases according to the internet search ArangoDB; Neo4J,OrientDB,AllegroGraph ,Amazon Neptune,DataStax,HyperGraphDB,InfiniteGraph,sones,Filament.

In the upcoming section, we will be listing out the most frequently used and unique graph databases and compare them as well.

ArangoDB: It is a fully managed graph, multi-model database (key-value pairs, graphs, and JSON documents) that can be accessed with one declarative query language - AQL. All three data models can be utilized and can be horizontally scaled to build highly efficient applications. [1]

Neo4j: Can be considered as the leading Graph databases that use Cypher as its Quering language. It implements an Object-oriented API, has Neo4j BLoom for visual exploration, APOC procedures, and graph analytic algorithm libraries to extend the functionalities of Cypher.[2]

AllegroGraph: AllegroGraph is a multimodel graph technology (employs a combination of document and graph technologies)with an elevated performance for highly complex and distributed data. Billions of nodes can be effectively handled by AllegroGraph using efficient memory management that has been combined with disk-based storage.[3]

HyperGraphDB: It is a general-purpose, open-source graph database based on generalized Hypergraphs and has two-layered Architecture for data organization (Primitive storage layer and a model layer)[4]

OrientDB: OrientDB is a distributed, multi-model graph database with the support for replication and is implemented in Java.

Amazon Neptune

DataStax

InfiniteGraph is a distributed graph database that is implemented in Java with core in c++ usually used in areas such as network management, healthcare, cybersecurity, Bioinformatics, and social networking.

Motivation

To determine the most suitable graph database to be for the clinical knowledge graph it is crucial to compare the databases based on the sets of common features in them. We will also be considering the key features of the respective graph databases.

Comparision of Graph Database Models

Based on whether a graph database implements database language, API, and GUI.

Unlike traditional databases, graph database does not have a standard querying language therefore each graph databases offers their own query language. For data operation and manipulation graph databases usually provide APIs and It is a plus point if a graph database has implemented GUI

Graph Database Querying language API GUI
ArangoDB AQL JAVA Only Web interface
Neo4j Cypher JAVA Only Web Interface
AllegroGraph SPARQL Java, Python
OrientDB SQL or GREMLIN Java only Web interface
Amazon Neptune SPARQL or GREMLIN supports open graph APIs -
DataStax
HyperGraphDB SQL styled JAVA -
InfiniteGraph GREMLIN JAVA -

Based on data storing and support for data storing

To compare the databases from the perspective of data storing, we have considered main memory, external memory, and online backup as well as if the implementation of indexes is supported. Since graph databases deal with a huge amount of data, external memory storage can be considered as the main requirement.

Graph Database Main Memory External Memory Online backup Indexes
ArangoDB
Neo4j
AllegroGraph
OrientDB
Amazon Neptune
DataStax
HyperGraphDB
InfiniteGraph -

Based on the supported Data models and Graph type.

In the most general sense, a data model is a collection of conceptual tools used to model representations of real-world entities and the relations among these entities.

Graph Database Data model Graph type
ArangoDB Besides key-value store and document store, ArangoDB also supports graph store
Neo4j Neo4j uses a Graph data model Property graph
AllegroGraph AllegroGraph is a closed source triplestore that is designed to store Resource Description Framework (RDF) triples, which is a standard format for linked-data
OrientDB Supports data models in graph, objects, documents as well as key/value
Amazon Neptune Supports graph models and RDF
DataStax
HyperGraphDB Support directed Hypergraph model Hyphergraph
InfiniteGraph

Based on support for sharding, Acid Transactions, Sharding is the ability of the database to break the large dataset into smaller parts called shards which are easily manageable and faster.

Graph Database Sharding ACID compliant
ArangoGraph Yes Yes
Neo4j No Yes
AllegroGraph Yes Yes
OrientDB Yes Yes
Amazon Neptune No Yes
DataStax
HyperGraphDB Not durable
InfiniteGraph No Yes

Usecase

ArangoDB Neo4j AllegroGraph OrientDB Amazon Neptune DataStax HyperGraphDB InfiniteGraph
adb neo Ge-temporal reasoning, Social network analysis Fraud Detection, Network/IT operations, Graph search, Recommendation engines, Forensic Analysis, Recommendation Engines, Fraud Detection, Knowledge Graphs, Drug Discovery ------ ------- Network Management, Cyber Security, Bioinformatics

Functionality restriction of Graph Databases

  1. Most commercial graph databases do not have a declarative query interface which ultimately means the graph database lacks query optimization abilities.
  2. Most graph database does not have distributed data management which means the functionality to partition and distribute data in networks is not supported
  3. Often graph model is restricted as possibilities of data schema and constraints definition are restricted resulting in data inconsistencies
  4. They support a procedure which sequentially writes data from multiple buffers to a single data stream or reads data from a data stream to multiple buffers ie. most database does not support horizontal scaling

Other limitation

  1. In most graph databases, It is difficult to efficiently extract a graph from non-graph data stores.
  2. Since most real-world graphs are very dynamic and generate large volumes of data at a rapid rate, it is challenging to store the historical trace compactly and efficiently execute the queries at the same time.
  3. The low latency query execution is more prioritized in current graph databases over high-throughput data analytics.
  4. Parallelisation is crucial in the context of big graphs so that the data can be handled efficiently by one server however only few graph databases have proper implementation of parallelization.
  5. Graph databases are inefficient if the graph datasets to the query are heterogeneity, incompleteness, and inconsistency. Most graph databases lack GUI to quickly add new nodes, set labels, set properties and relationships with a click.

Resources

  1. "ArangoDB," https://www.arangodb.com/features-may-2018/
  2. “Neo4j,” http://neo4j.org/.
  3. "AllegroGraph," https://allegrograph.com/products/allegrograph/.
  4. Iordanov, “Hypergraphdb: a generalized graph database,” in Proceedings of the 2010 international conference on Web-age information management (WAIM). Springer-Verlag, 2010, pp. 1-4. https://www.researchgate.net/publication/225204980_HyperGraphDB_A_Generalized_Graph_Database 5."Amazon Neptune" https://aws.amazon.com/neptune/
  5. “Infinitegraph,” http://infinitegraph.com/.
  6. Renzo Angles. A comparison of current graph database models. _ In 2012 IEEE 28th International Conference on Data Engineering Workshop._ IEEE,apr2012.https://www.researchgate.net/publication/261076480_A_Comparison_of_Current_Graph_Database_Models
  7. Diogo Fernandes and Jorge Bernardino. Graph databases comparison: Alle-groGraph, ArangoDB, InfiniteGraph, neo4j, and OrientDB. In Proceedings of the 7th International Conference on Data Science, Technology and Appli-cations. SCITEPRESS - Science and Technology Publications, 2018. https://www.scitepress.org/Papers/2018/69102/69102.pdf
  8. Deepak Sigh Rawat and Navneet Kumar Kashyap. Graph Database: A complete GDBMS S.urvey. IJIRST –International Journal for Innovative Research in Science & Technology. May 2017 (online).http://www.ijirst.org/articles/IJIRSTV3I12047.pdf

Clone this wiki locally