Graph Database

Graph Database Management Systems

Introduction

The limitation of the traditional database is getting more vivid while dealing with complex data such as Clinical data which has thousands of instances and complex relationships among them. Such associative data requires complex queries to retrieve precise and in-depth information which can be resource expensive and time-consuming.

In addition, Traditional databases such as relational databases fail to cover the changing requirements of the current application domain which demands flexibility and high performance as a basic requirement to be incorporated in the database management tools.

To cope with new projects especially where the relations are as important as the entities, the development of new database technology named Graph Database began which ultimately resulted in numerous powerful graph databases to be available in the market today.

Graph Database Technologies

Just like its name suggests, the Graph database stores the data as well as its relation in a graph-like structure which makes traversing highly connected and complex data faster and easier. Depending upon our application and requisite, there are plenty of graph databases we can choose from, with their own specialty and key features.

Top Graph databases according to the internet search

ArangoDB, Neo4J,OrientDB,AllegroGraph,Amazon Neptune, DataStax, HyperGraphDB,InfiniteGraph,sones,Filament, Titan, GraphDB

In the upcoming section, we will be listing out the most frequently used and unique graph databases and compare them as well.

ArangoDB: It is a fully managed graph, multi-model database (key-value pairs, graphs, and JSON documents) that can be accessed with one declarative query language - AQL. All three data models can be utilized and can be horizontally scaled to build highly efficient applications. [1]

Neo4j: Can be considered as the leading Graph databases that use Cypher as its Quering language. It implements an Object-oriented API, has Neo4j BLoom for visual exploration, APOC procedures, and graph analytic algorithm libraries to extend the functionalities of Cypher.[2]

AllegroGraph: AllegroGraph is a multimodel graph technology (employs a combination of document and graph technologies)with an elevated performance for highly complex and distributed data. Billions of nodes can be effectively handled by AllegroGraph using efficient memory management that has been combined with disk-based storage.[3]

HyperGraphDB: It is a general-purpose, open-source graph database based on generalized Hypergraphs and has two-layered Architecture for data organization (Primitive storage layer and a model layer)[4]

OrientDB: OrientDB is a distributed, multi-model graph database that supports schema-less, schema full, and schema-mixed modes.[5]

Amazon Neptune: A fully managed graph database that is fast, reliable, and highly secure and simplifies the process of building and running applications with highly connected dataset.[6]

InfiniteGraph: It is a distributed graph database that is implemented in Java with core in c++ usually utilized in areas such as network management, healthcare, cybersecurity, Bioinformatics, and social networking.[7]

TigerGraph: TigerGraph is the self-proclaimed most scalable Graph Database for enterprises, that is was built for real-time big graph and designed to cope with a massive amount of data and supply real-time analytics. It is based on a Native parallel graph which overcomes the limitations of a general native graph by enabling faster data loading, faster execution of the graph algorithm as well as the real-time capability for streaming updates and insertion. [8]

Motivation

To determine the most suitable graph database to be for the clinical knowledge graph it is crucial to compare the databases based on the sets of common features in them. We will also be considering the key features of the respective graph databases.

Comparision of Graph Database Models

Based on whether a graph database implements database language, API, and GUI.

Unlike traditional databases, graph database does not have a standard querying language therefore each graph databases offers their own query language. For data operation and manipulation graph databases usually provide APIs and It is a plus point if a graph database has implemented GUI

Graph Database	Querying language	API	GUI
ArangoDB	AQL	JAVA	-
Neo4j	Cypher	JAVA	-
AllegroGraph	SPARQL	JAVA, Python,Perl, C#	✓
OrientDB	SQL or GREMLIN	JAVA	-
Amazon Neptune	SPARQL or GREMLIN	supports open graph APIs	-
HyperGraphDB	SQL styled	JAVA	-
InfiniteGraph	GREMLIN	JAVA	-
TigerGraph	GSQL	RESTful HTTP/JSON API	✓

Based on data storing and support for data storing

To compare the databases from the perspective of data storing, we have considered main memory, external memory, and online backup as well as if the implementation of indexes is supported. Since graph databases deal with a huge amount of data, external memory storage can be considered as the main requirement.

Graph Database	Main Memory	External Memory	Backup	Indexes
ArangoDB	✓	✓	✓	✓
Neo4j	✓	✓	✓	✓
AllegroGraph	✓	✓	✓	✓
OrientDB	✓	✓	✓	✓
Amazon Neptune	✓	✓	✓	✓
HyperGraphDB	✓	✓	-	✓
InfiniteGraph	-	✓	✓	✓
TigerGraph	✓	✓	✓	✓

Based on the supported Data model.

Data in a graph database can be model as a property graph, Hypergraph, or as Triple store. In Property graph model data is maintained as nodes, relations, and properties. What differentiates Property graph model from the simple graph model is that it allows relationships to have properties as well.

Another frequently used data model is the Triple store model which is additionally called as RDF(Resource Description Framework). RDF allows data to be organized in a format named subject-predicate-object. Each element here (Subject, predicate, object) is stored independently as nodes and logically linked.

Some Graph database also stores data as Hypergraphs where the links are often connected to any number of nodes enabling to model data in a more compact manner(reduces the complexity of a representation).[4]

Graph Database	Data model	Graph type
ArangoDB	Besides key-value store and document store, ArangoDB also supports graph store	Property Graph
Neo4j	Neo4j uses a Graph data model and Native graph storage	Property graph
AllegroGraph	AllegroGraph is a closed source triplestore that is designed to store Resource Description Framework (RDF) triples, which is a standard format for linked-data	Property Graph, Hypergraph
OrientDB	Supports data models in graph, objects, documents as well as key/value	Property Graph
Amazon Neptune	Supports graph models and RDF	Property graph
HyperGraphDB	Support directed Hypergraph model	Hyphergraph
InfiniteGraph	Labeled directed multigraph	Property graph
TigerGraph	uses a Graph data model and Native graph storage	Property graph

Based on support for sharding and if the database is acid compliant.

Sharding is the ability of the database to break the large dataset into smaller parts called shards which are easily manageable and faster.

Graph Database	Sharding	ACID compliant
ArangoDB	Yes	Yes
Neo4j	No	Yes
AllegroGraph	Yes	Yes
OrientDB	Yes	Yes
Amazon Neptune	No	Yes
HyperGraphDB	No	Not durable
InfiniteGraph	No	Yes
TigerGraph	Yes	Yes

Based on support for Graph Visualization.

To interpreter the data, good visualization is a critical component a graph database should include. The ease of visualizing the graph should also be considered.

Graph Database	Graph Visualization
ArangoDB	Graph Viewer included
Neo4j	Neo4j Bloom included
AllegroGraph	Uses Gruff's visualization capability via web browser
OrientDB	Offers a graph editor to visualize and edit the graphs
Amazon Neptune	Allows visualization of graph using the Neptune Workbemch
HyperGraphDB	No
InfiniteGraph	No
TigerGraph	Yes

Usecase

ArangoDB	Neo4j	AllegroGraph	OrientDB	Amazon Neptune	HyperGraphDB	InfiniteGraph	TigerGraph
Dependency Management, Identity & Access Management, Master Data Management	Artificial Intelligence, Machine learning, Recommendation, Social network analysis	Ge-temporal reasoning, Social network analysis	Fraud Detection, Network/IT operations, Graph search, Recommendation engines, Forensic Analysis, Recommendation Engines, Fraud Detection, Knowledge Graphs, Drug Discovery	Natural language processing, Semantic Web Search	Network Management, Cyber Security, Bioinformatics	fraud detection, customer 360, AI, and machine learning, Real-Time Monitoring and Control of Dynamic Networks, Internet of Things

Functionality restriction of Graph Databases

Most commercial graph databases do not have a declarative query interface which ultimately means the graph database lacks query optimization abilities.
Most graph database does not have distributed data management which means the functionality to partition and distribute data in networks is not supported
Often graph model is restricted as possibilities of data schema and constraints definition are restricted resulting in data inconsistencies
They support a procedure which sequentially writes data from multiple buffers to a single data stream or reads data from a data stream to multiple buffers ie. most database does not support horizontal scaling

Other limitation

In most graph databases, It is difficult to efficiently extract a graph from non-graph data stores.
Since most real-world graphs are very dynamic and generate large volumes of data at a rapid rate, it is challenging to store the historical trace compactly and efficiently execute the queries at the same time.
The low latency query execution is more prioritized in current graph databases over high-throughput data analytics.
Parallelisation is crucial in the context of big graphs so that the data can be handled efficiently by one server however only few graph databases have proper implementation of parallelization.
Graph databases are inefficient if the graph datasets to the query are heterogeneity, incompleteness, and inconsistency. Most graph databases lack GUI to quickly add new nodes, set labels, set properties and relationships with a click.

Resources

"ArangoDB," https://www.arangodb.com/features-may-2018/
“Neo4j,” http://neo4j.org/.
"AllegroGraph," https://allegrograph.com/products/allegrograph/.
Iordanov, “Hypergraphdb: a generalized graph database,” in Proceedings of the 2010 international conference on Web-age information management (WAIM). Springer-Verlag, 2010, pp. 1-4. https://www.researchgate.net/publication/225204980_HyperGraphDB_A_Generalized_Graph_Database
"OrientDB" 6."Amazon Neptune" https://aws.amazon.com/neptune/
“Infinitegraph,” http://infinitegraph.com/.
Yu Xu, Victor Lee, Mingxi Wu, Gaurav Deshpande, Alin Deutsch. Native Parallel Graphs: The Next Generation of Graph Database for Real-Time Deep Link Analytics, 2018. https://www.tigergraph.com/wp-content/uploads/2018/09/Native-Parallel-Graphs-The-Next-Generation-of-Graph-Database-for-Real-Time-Deep-Link-Analytics.pdf
Renzo Angles. A comparison of current graph database models. _ In 2012 IEEE 28th International Conference on Data Engineering Workshop._ IEEE,apr2012.https://www.researchgate.net/publication/261076480_A_Comparison_of_Current_Graph_Database_Models
Diogo Fernandes and Jorge Bernardino. Graph databases comparison: Alle-groGraph, ArangoDB, InfiniteGraph, neo4j, and OrientDB. In Proceedings of the 7th International Conference on Data Science, Technology and Appli-cations. SCITEPRESS - Science and Technology Publications, 2018. https://www.scitepress.org/Papers/2018/69102/69102.pdf
Deepak Sigh Rawat and Navneet Kumar Kashyap. Graph Database: A complete GDBMS S.urvey. IJIRST –International Journal for Innovative Research in Science & Technology. May 2017 (online).http://www.ijirst.org/articles/IJIRSTV3I12047.pdf

Home

Project Report

Survey of Databases

Clinical Knowledge Graph

Useful Links : Neo4j

Optimization techniques for Graph Database Systems
Neo4j : Constraints
Neo4j : Indexes
Analyze Query
Querying:
1. LOAD-CSV
2. APOC
3. GDS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph Database

Graph Database Management Systems

Introduction

Graph Database Technologies

Top Graph databases according to the internet search

Motivation

Comparision of Graph Database Models

Based on whether a graph database implements database language, API, and GUI.

Based on data storing and support for data storing

Based on the supported Data model.

Based on support for sharding and if the database is acid compliant.

Based on support for Graph Visualization.

Usecase

Functionality restriction of Graph Databases

Other limitation

Resources

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Home

Project Report

Survey of Databases

Clinical Knowledge Graph

Useful Links : Neo4j

Clone this wiki locally