Skip to content

Graph Database

sitashma rajbhandari edited this page Jan 17, 2021 · 62 revisions

Graph Database Management Systems

Introduction

The limitation of the traditional database is getting more vivid while dealing with complex data such as Clinical data which has thousands of instances and complex relationships among them. Such associative data requires complex queries to retrieve precise and in-depth information which can be resource expensive and time-consuming.

In addition, Traditional databases such as relational databases fail to cover the changing requirements of the current application domain which demands flexibility and high performance as a basic requirement to be incorporated in the database management tools.

To cope with new projects especially where the relations are as important as the entities, the development of new database technology named Graph Database began which ultimately resulted in numerous powerful graph databases to be available in the market today.

Graph Database Technologies

Just like its name suggests, the Graph database stores the data as well as its relation in a graph-like structure which makes traversing highly connected and complex data faster and easier. Depending upon our application and requisite, there are plenty of graph databases we can choose from, with their own specialty and key features.

Top Graph databases according to the internet search

ArangoDB, Neo4J,OrientDB,AllegroGraph,Amazon Neptune, DataStax, HyperGraphDB,InfiniteGraph,sones,Filament, Titan, GraphDB

In the upcoming section, we will be listing out the most frequently used and unique graph databases and compare them as well.

ArangoDB: It is a fully managed graph, multi-model database (key-value pairs, graphs, and JSON documents) that can be accessed with one declarative query language - AQL. All three data models can be utilized and can be horizontally scaled to build highly efficient applications. [1]

Neo4j: Can be considered as the leading Graph databases that use Cypher as its Quering language. It implements an Object-oriented API, has Neo4j BLoom for visual exploration, APOC procedures, and graph analytic algorithm libraries to extend the functionalities of Cypher.[2]

AllegroGraph: AllegroGraph is a multimodel graph technology (employs a combination of document and graph technologies)with an elevated performance for highly complex and distributed data. Billions of nodes can be effectively handled by AllegroGraph using efficient memory management that has been combined with disk-based storage.[3]

HyperGraphDB: It is a general-purpose, open-source graph database based on generalized Hypergraphs and has two-layered Architecture for data organization (Primitive storage layer and a model layer)[4]

OrientDB: OrientDB is a distributed, multi-model graph database that supports schema-less, schema full, and schema-mixed modes.[5]

Amazon Neptune: A fully managed graph database that is fast, reliable, and highly secure and simplifies the process of building and running applications with highly connected dataset.[6]

InfiniteGraph: It is a distributed graph database that is implemented in Java with core in c++ usually utilized in areas such as network management, healthcare, cybersecurity, Bioinformatics, and social networking.[7]

TigerGraph: TigerGraph is the self-proclaimed most scalable Graph Database for enterprises, that is was built for real-time big graph and designed to cope with a massive amount of data and supply real-time analytics. It is based on a Native parallel graph which overcomes the limitations of a general native graph by enabling faster data loading, faster execution of the graph algorithm as well as the real-time capability for streaming updates and insertion. [8]

Motivation

To determine the most suitable graph database to be for the clinical knowledge graph it is crucial to compare the databases based on the sets of common features in them. We will also be considering the key features of the respective graph databases.

Comparision of Graph Database Models

Based on whether a graph database implements database language, API, and GUI.

Unlike traditional databases, graph database does not have a standard querying language therefore each graph databases offers their own query language. For data operation and manipulation graph databases usually provide APIs and It is a plus point if a graph database has implemented GUI

Graph Database Querying language API GUI
ArangoDB AQL JAVA -
Neo4j Cypher JAVA -
AllegroGraph SPARQL JAVA, Python,Perl, C#
OrientDB SQL or GREMLIN JAVA -
Amazon Neptune SPARQL or GREMLIN supports open graph APIs -
HyperGraphDB SQL styled JAVA -
InfiniteGraph GREMLIN JAVA -
TigerGraph GSQL RESTful HTTP/JSON API

Based on data storing and support for data storing

To compare the databases from the perspective of data storing, we have considered main memory, external memory, and online backup as well as if the implementation of indexes is supported. Since graph databases deal with a huge amount of data, external memory storage can be considered as the main requirement.

Graph Database Main Memory External Memory Backup Indexes
ArangoDB
Neo4j
AllegroGraph
OrientDB
Amazon Neptune
HyperGraphDB -
InfiniteGraph -
TigerGraph

Based on the supported Data model.

Data in a graph database can be model as a property graph, Hypergraph, or as Triple store. In the Property graph model data is maintained as nodes, relations, and properties. What differentiates Property graph model from the simple graph model is that it allows relationships to have properties as well.

Another frequently used data model is the Triple store model which is additionally called as RDF(Resource Description Framework). RDF allows data to be organized in a format named subject-predicate-object. Each element here (Subject, predicate, object) is stored independently as nodes and logically linked.

Some Graph database also stores data as Hypergraphs where the links are often connected to any number of nodes enabling to model data in a more compact manner(reduces the complexity of a representation).[4]

Graph Database Data model Graph type
ArangoDB Besides key-value store and document store, ArangoDB also supports graph store Property Graph
Neo4j Neo4j uses a Graph data model and Native graph storage Property graph
AllegroGraph AllegroGraph is a closed source triplestore that is designed to store Resource Description Framework (RDF) triples, which is a standard format for linked-data Property Graph, Hypergraph
OrientDB Supports data models in graph, objects, documents as well as key/value Property Graph
Amazon Neptune Supports graph models and RDF Property graph
HyperGraphDB Support directed Hypergraph model Hyphergraph
InfiniteGraph Labeled directed multigraph Property graph
TigerGraph uses a Graph data model and Native graph storage Property graph

Based on support for sharding and if the database is acid compliant.

Sharding is the ability of the database to break the large dataset into smaller parts called shards which are easily manageable and faster.

Graph Database Sharding ACID-compliant
ArangoDB Yes Yes
Neo4j No Yes
AllegroGraph Yes Yes
OrientDB Yes Yes
Amazon Neptune No Yes
HyperGraphDB No Not durable
InfiniteGraph No Yes
TigerGraph Yes Yes

Based on support for Graph Visualization.

To interpreter the data, good visualization is a critical component a graph database should include. The ease of visualizing the graph should also be considered.

Graph Database Graph Visualization
ArangoDB Graph Viewer included
Neo4j Neo4j Bloom included
AllegroGraph Uses Gruff's visualization capability via web browser
OrientDB Offers a graph editor to visualize and edit the graphs or we can use a visual tool such as Gephi[12]
Amazon Neptune Allows visualization of graph using the Neptune Workbemch
HyperGraphDB Have to use third party tool to visualize graph
InfiniteGraph We can use Infinitegraph Visualizer[13]
TigerGraph Visual query Builder and explore graph page

Usecase

Generally, graph databases can be used in the area of Social network analyis, recommendation, and fraud detection. Besides the common uses case, each graph database can have specific use cases as well that have been listed below.

ArangoDB Neo4j AllegroGraph OrientDB Amazon Neptune HyperGraphDB InfiniteGraph TigerGraph
Dependency Management, Identity & Access Management, Master Data Management Artificial Intelligence, Machine learning, Social network analysis Geospatial, temporal reasoning, Network/IT operations, Graph search Forensic Analysis Network security, Knowledge Graphs, Drug Discovery Natural language processing, Semantic Web Search Network Management, Cyber Security, Bioinformatics Customer 360, AI, and machine learning, Real-Time Monitoring and Control of Dynamic Networks, Internet of Things

Functionality restriction of Graph Databases

  1. Most commercial graph databases do not have a declarative query interface which ultimately means the graph database lacks query optimization abilities.
  2. Most graph database does not have distributed data management which means the functionality to partition and distribute data in networks is not supported
  3. Often graph model is restricted as possibilities of data schema and constraints definition are restricted resulting in data inconsistencies
  4. They support a procedure which sequentially writes data from multiple buffers to a single data stream or reads data from a data stream to multiple buffers ie. most database does not support horizontal scaling

Other limitation

  1. In most graph databases, It is difficult to efficiently extract a graph from non-graph data stores.
  2. Since most real-world graphs are very dynamic and generate large volumes of data at a rapid rate, it is challenging to store the historical trace compactly and efficiently execute the queries at the same time.
  3. The low latency query execution is more prioritized in current graph databases over high-throughput data analytics.
  4. Parallelisation is crucial in the context of big graphs so that the data can be handled efficiently by one server however only few graph databases have proper implementation of parallelization.
  5. Graph databases are inefficient if the graph datasets to the query are heterogeneity, incompleteness, and inconsistency. Most graph databases lack GUI to quickly add new nodes, set labels, set properties and relationships with a click. [14]

Resources

  1. "ArangoDB," https://www.arangodb.com/features-may-2018/
  2. “Neo4j,” http://neo4j.org/.
  3. "AllegroGraph," https://allegrograph.com/products/allegrograph/.
  4. Iordanov, “Hypergraphdb: a generalized graph database,” in Proceedings of the 2010 international conference on Web-age information management (WAIM). Springer-Verlag, 2010, pp. 1-4. https://www.researchgate.net/publication/225204980_HyperGraphDB_A_Generalized_Graph_Database
  5. "OrientDB" https://orientdb.com/docs/last/index.html 6."Amazon Neptune" https://aws.amazon.com/neptune/
  6. “Infinitegraph,” http://infinitegraph.com/.
  7. Yu Xu, Victor Lee, Mingxi Wu, Gaurav Deshpande, Alin Deutsch. Native Parallel Graphs: The Next Generation of Graph Database for Real-Time Deep Link Analytics, 2018. https://www.tigergraph.com/wp-content/uploads/2018/09/Native-Parallel-Graphs-The-Next-Generation-of-Graph-Database-for-Real-Time-Deep-Link-Analytics.pdf
  8. Renzo Angles. A comparison of current graph database models. _ In 2012 IEEE 28th International Conference on Data Engineering Workshop._ IEEE,apr2012.https://www.researchgate.net/publication/261076480_A_Comparison_of_Current_Graph_Database_Models
  9. Diogo Fernandes and Jorge Bernardino. Graph databases comparison: Alle-groGraph, ArangoDB, InfiniteGraph, neo4j, and OrientDB. In Proceedings of the 7th International Conference on Data Science, Technology and Appli-cations. SCITEPRESS - Science and Technology Publications, 2018. https://www.scitepress.org/Papers/2018/69102/69102.pdf
  10. Deepak Sigh Rawat and Navneet Kumar Kashyap. Graph Database: A complete GDBMS S.urvey. IJIRST –International Journal for Innovative Research in Science & Technology. May 2017 (online).http://www.ijirst.org/articles/IJIRSTV3I12047.pdf
  11. "OrientDB" https://orientdb.com/docs/3.0.x/studio/working-with-data/graph-editor/ 13.Infinite Graph. Meaningfull Visualization of connected Data. https://www.objectivity.com/meaningful-visualizations-of-connected-data/ 14.Pokorný, J.Graph Databases: Their Power and Limitations Computer Information Systems and Industrial Management, Springer International Publishing, 2015, 58-69 https://www.researchgate.net/publication/297735790_Graph_Databases_Their_Power_and_Limitations

Clone this wiki locally