-
Notifications
You must be signed in to change notification settings - Fork 2
Graph Database
The limitation of the traditional database is getting more vivid while dealing with complex data such as Clinical data which has thousands of instances and complex relationships among them. Such associative data requires complex queries to retrieve precise and in-depth information which can be resource expensive and time-consuming.
In addition, Traditional databases such as relational databases fail to cover the changing requirements of the current application domain which demands flexibility and high performance as a basic requirement to be incorporated in the database management tools.
To cope with new projects especially where the relations are as important as the entities, the development of new database technology named Graph Database began which ultimately resulted in numerous powerful graph databases to be available in the market today.
Just like its name suggests, the Graph database stores the data as well as its relation in a graph-like structure which makes traversing highly connected and complex data faster and easier. Depending upon our application and requisite, there are plenty of graph databases we can choose from, with their own specialty and key features.
ArangoDB, Neo4J,OrientDB,AllegroGraph,Amazon Neptune, DataStax, HyperGraphDB,InfiniteGraph,sones,Filament, Titan, GraphDB
In the upcoming section, we will be listing out the most frequently used and unique graph databases and compare them as well.
ArangoDB: It is a fully managed graph, multi-model database (key-value pairs, graphs, and JSON documents) that can be accessed with one declarative query language - AQL. All three data models can be utilized and can be horizontally scaled to build highly efficient applications. [1]
Neo4j: Can be considered as the leading Graph databases that use Cypher as its Quering language. It implements an Object-oriented API, has Neo4j BLoom for visual exploration, APOC procedures, and graph analytic algorithm libraries to extend the functionalities of Cypher.[2]
AllegroGraph: AllegroGraph is a multimodel graph technology (employs a combination of document and graph technologies)with an elevated performance for highly complex and distributed data. Billions of nodes can be effectively handled by AllegroGraph using efficient memory management that has been combined with disk-based storage.[3]
HyperGraphDB: It is a general-purpose, open-source graph database based on generalized Hypergraphs and has two-layered Architecture for data organization (Primitive storage layer and a model layer)[4]
OrientDB: OrientDB is a distributed, multi-model graph database that supports schema-less, schema full, and schema-mixed modes.[5]
Amazon Neptune: A fully managed graph database that is fast, reliable, and highly secure and simplifies the process of building and running applications with highly connected dataset.[6]
InfiniteGraph: It is a distributed graph database that is implemented in Java with core in c++ usually utilized in areas such as network management, healthcare, cybersecurity, Bioinformatics, and social networking.[7]
TigerGraph: TigerGraph is the self-proclaimed most scalable Graph Database for enterprises, that is was built for real-time big graph and designed to cope with a massive amount of data and supply real-time analytics. It is based on a Native parallel graph which overcomes the limitations of a general native graph by enabling faster data loading, faster execution of the graph algorithm as well as the real-time capability for streaming updates and insertion. [8]
To determine the most suitable graph database to be for the clinical knowledge graph it is crucial to compare the databases based on the sets of common features in them. We will also be considering the key features of the respective graph databases.
Unlike traditional databases, graph database does not have a standard querying language therefore each graph databases offers their own query language. For data operation and manipulation graph databases usually provide APIs and It is a plus point if a graph database has implemented GUI
| Graph Database | Querying language | API | GUI |
|---|---|---|---|
| ArangoDB | AQL | JAVA | - |
| Neo4j | Cypher | JAVA | - |
| AllegroGraph | SPARQL | JAVA, Python,Perl, C# | ✓ |
| OrientDB | SQL or GREMLIN | JAVA | - |
| Amazon Neptune | SPARQL or GREMLIN | supports open graph APIs | - |
| HyperGraphDB | SQL styled | JAVA | - |
| InfiniteGraph | GREMLIN | JAVA | - |
| TigerGraph | GSQL | RESTful HTTP/JSON API | ✓ |
To compare the databases from the perspective of data storing, we have considered main memory, external memory, and online backup as well as if the implementation of indexes is supported. Since graph databases deal with a huge amount of data, external memory storage can be considered as the main requirement.
| Graph Database | Main Memory | External Memory | Backup | Indexes |
|---|---|---|---|---|
| ArangoDB | ✓ | ✓ | ✓ | ✓ |
| Neo4j | ✓ | ✓ | ✓ | ✓ |
| AllegroGraph | ✓ | ✓ | ✓ | ✓ |
| OrientDB | ✓ | ✓ | ✓ | ✓ |
| Amazon Neptune | ✓ | ✓ | ✓ | ✓ |
| HyperGraphDB | ✓ | ✓ | - | ✓ |
| InfiniteGraph | - | ✓ | ✓ | ✓ |
| TigerGraph | ✓ | ✓ | ✓ | ✓ |
Data in a graph database can be model as a property graph, Hypergraph, or as Triple store. In Property graph model data is maintained as nodes, relations, and properties. What differentiates Property graph model from the simple graph model is that it allows relationships to have properties as well.
Another frequently used data model is the Triple store model which is additionally called as RDF(Resource Description Framework). RDF allows data to be organized in a format named subject-predicate-object. Each element here (Subject, predicate, object) is stored independently as nodes and logically linked.
Some Graph database also stores data as Hypergraphs where the links are often connected to any number of nodes enabling to model data in a more compact manner(reduces the complexity of a representation).[4]
| Graph Database | Data model | Graph type |
|---|---|---|
| ArangoDB | Besides key-value store and document store, ArangoDB also supports graph store | Property Graph |
| Neo4j | Neo4j uses a Graph data model and Native graph storage | Property graph |
| AllegroGraph | AllegroGraph is a closed source triplestore that is designed to store Resource Description Framework (RDF) triples, which is a standard format for linked-data | Property Graph, Hypergraph |
| OrientDB | Supports data models in graph, objects, documents as well as key/value | Property Graph |
| Amazon Neptune | Supports graph models and RDF | Property graph |
| HyperGraphDB | Support directed Hypergraph model | Hyphergraph |
| InfiniteGraph | Labeled directed multigraph | Property graph |
| TigerGraph | uses a Graph data model and Native graph storage | Property graph |
Sharding is the ability of the database to break the large dataset into smaller parts called shards which are easily manageable and faster.
| Graph Database | Sharding | ACID compliant |
|---|---|---|
| ArangoDB | Yes | Yes |
| Neo4j | No | Yes |
| AllegroGraph | Yes | Yes |
| OrientDB | Yes | Yes |
| Amazon Neptune | No | Yes |
| HyperGraphDB | No | Not durable |
| InfiniteGraph | No | Yes |
| TigerGraph | Yes | Yes |
To interpreter the data, good visualization is a critical component a graph database should include. The ease of visualizing the graph should also be considered.
| Graph Database | Graph Visualization |
|---|---|
| ArangoDB | Graph Viewer included |
| Neo4j | Neo4j Bloom included |
| AllegroGraph | Uses Gruff's visualization capability via web browser |
| OrientDB | Offers a graph editor to visualize and edit the graphs |
| Amazon Neptune | Allows visualization of graph using the Neptune Workbemch |
| HyperGraphDB | No |
| InfiniteGraph | No |
| TigerGraph | Yes |
| ArangoDB | Neo4j | AllegroGraph | OrientDB | Amazon Neptune | HyperGraphDB | InfiniteGraph | TigerGraph |
|---|---|---|---|---|---|---|---|
| Dependency Management, Identity & Access Management, Master Data Management | Artificial Intelligence, Machine learning, Recommendation, Social network analysis | Ge-temporal reasoning, Social network analysis | Fraud Detection, Network/IT operations, Graph search, Recommendation engines, Forensic Analysis, Recommendation Engines, Fraud Detection, Knowledge Graphs, Drug Discovery | Natural language processing, Semantic Web Search | Network Management, Cyber Security, Bioinformatics | fraud detection, customer 360, AI, and machine learning, Real-Time Monitoring and Control of Dynamic Networks, Internet of Things |
- Most commercial graph databases do not have a declarative query interface which ultimately means the graph database lacks query optimization abilities.
- Most graph database does not have distributed data management which means the functionality to partition and distribute data in networks is not supported
- Often graph model is restricted as possibilities of data schema and constraints definition are restricted resulting in data inconsistencies
- They support a procedure which sequentially writes data from multiple buffers to a single data stream or reads data from a data stream to multiple buffers ie. most database does not support horizontal scaling
- In most graph databases, It is difficult to efficiently extract a graph from non-graph data stores.
- Since most real-world graphs are very dynamic and generate large volumes of data at a rapid rate, it is challenging to store the historical trace compactly and efficiently execute the queries at the same time.
- The low latency query execution is more prioritized in current graph databases over high-throughput data analytics.
- Parallelisation is crucial in the context of big graphs so that the data can be handled efficiently by one server however only few graph databases have proper implementation of parallelization.
- Graph databases are inefficient if the graph datasets to the query are heterogeneity, incompleteness, and inconsistency. Most graph databases lack GUI to quickly add new nodes, set labels, set properties and relationships with a click.
- "ArangoDB," https://www.arangodb.com/features-may-2018/
- “Neo4j,” http://neo4j.org/.
- "AllegroGraph," https://allegrograph.com/products/allegrograph/.
- Iordanov, “Hypergraphdb: a generalized graph database,” in Proceedings of the 2010 international conference on Web-age information management (WAIM). Springer-Verlag, 2010, pp. 1-4. https://www.researchgate.net/publication/225204980_HyperGraphDB_A_Generalized_Graph_Database
- "OrientDB" 6."Amazon Neptune" https://aws.amazon.com/neptune/
- “Infinitegraph,” http://infinitegraph.com/.
- Yu Xu, Victor Lee, Mingxi Wu, Gaurav Deshpande, Alin Deutsch. Native Parallel Graphs: The Next Generation of Graph Database for Real-Time Deep Link Analytics, 2018. https://www.tigergraph.com/wp-content/uploads/2018/09/Native-Parallel-Graphs-The-Next-Generation-of-Graph-Database-for-Real-Time-Deep-Link-Analytics.pdf
- Renzo Angles. A comparison of current graph database models. _ In 2012 IEEE 28th International Conference on Data Engineering Workshop._ IEEE,apr2012.https://www.researchgate.net/publication/261076480_A_Comparison_of_Current_Graph_Database_Models
- Diogo Fernandes and Jorge Bernardino. Graph databases comparison: Alle-groGraph, ArangoDB, InfiniteGraph, neo4j, and OrientDB. In Proceedings of the 7th International Conference on Data Science, Technology and Appli-cations. SCITEPRESS - Science and Technology Publications, 2018. https://www.scitepress.org/Papers/2018/69102/69102.pdf
- Deepak Sigh Rawat and Navneet Kumar Kashyap. Graph Database: A complete GDBMS S.urvey. IJIRST –International Journal for Innovative Research in Science & Technology. May 2017 (online).http://www.ijirst.org/articles/IJIRSTV3I12047.pdf