-
Notifications
You must be signed in to change notification settings - Fork 35
Named graph organization
As different people have used csv2rdf4lod-automation, two complementary approaches to naming and populating named graphs have arisen:
- Source-based named graphs
- Content-based named graphs
Because of its nature, csv2rdf4lod-automation provides a source-based organization for the named graphs it creates in the triple stores it loads, meaning that the graphs in an endpoint are named according to the same 3-attribute (source, dataset, version) scheme of the datasets. We do this to minimize the uncertainty of where a dataset is, because answering three rigid questions will lead to its name and location. However, many "do not care" about where the data came. For most consumers, this is a secondary consideration. Content-based organization is better suited to specific applications and use cases. One concern with content-based organization is the multitude of domain-specific and individual perspectives can be applied to how the content "should" be organized. Instead of asking and answering just three questions, content-based organization could have an inordinate number of questions to know what a graph's name is and where to find it.
Fortunately, starting with a source-based organization can provide a solid foundation for the increasing -- and changing -- content-based organization needs. Arbitrary graph naming schemes can be used and populated in either of two ways:
- pvloading entire dump files from source-based datasets
- pvloading queries draw from source-based datasets into the content-based graphs.
The advantage of starting with source-based organization for the graphs in a triple store is that it provides consistency. The advantage of creating content-based graphs within the same triple store is that data consumers can access their interesting data faster and with less distraction. The advantage of constructing content-based graphs drawn from source-based graphs is that we maintain the provenance required to trace all the way back to the original source.