-
Notifications
You must be signed in to change notification settings - Fork 35
Dataset granularities: Abstract vs. Versioned vs. Layer
timrdf edited this page Mar 14, 2013
·
26 revisions
When csv2rdf4lod converts tabular data to RDF, it also asserts metadata about the RDF using the conversion vocabulary. One type of metadata that it asserts is void:Dataset details, which lets us group collections of triples in a hierarchical fashion. csv2rdf4lod groups triples into three specific types of void:Dataset:
- conversion:AbstractDataset (subclass of void:Dataset)
- conversion:VersionedDataset (subclass of void:Dataset that is a void:subset of conversion:AbstractDataset)
- conversion:LayerDataset (subclass of void:Dataset that is a void:subset of conversion:VersionsedDataset)
So, the "largest" void:Dataset in the list above is conversion:AbstractDataset, and the "smallest" is conversion:LayerDataset.
As described in [the naming phase](Conversion process phase: name), three provenance-related aspects are used to organize any third party data that are retrieved. Each aspect is given a short, URI-friendly identifier string:
- source identifier - indicates the person/agent/organization that provided the data. For example, "epa-gov" provides a variety of datasets.
- dataset identifier - indicates the logical group of data. For example, the EPA distinguishes between the datasets "photochemical-assessment-monitoring-stations-pams" and "historical-radnet-air-quality-data".
- version identifier - indicates the version of a particular dataset. For example, you could have retrieved version "r23" - a designation from EPA themselves, or I could have retrieved version "2013-Mar-14" because the EPA didn't provide a version designation for what they provided.