Skip to content

Dataset granularities: Abstract vs. Versioned vs. Layer

timrdf edited this page Mar 14, 2013 · 26 revisions

When csv2rdf4lod converts tabular data to RDF, it also asserts metadata about the RDF using the conversion vocabulary. One type of metadata that it asserts is void:Dataset details, which lets us group collections of triples in a hierarchical fashion. csv2rdf4lod groups triples into three specific types of void:Dataset:

  • conversion:AbstractDataset (subclass of void:Dataset)
    • conversion:VersionedDataset (subclass of void:Dataset that is a void:subset of conversion:AbstractDataset)
      • conversion:LayerDataset (subclass of void:Dataset that is a void:subset of conversion:VersionsedDataset)

So, the "largest" void:Dataset in the list above is conversion:AbstractDataset, and the "smallest" is conversion:LayerDataset.

As described in [the naming phase](Conversion process phase: name), three provenance-related aspects are used to organize any third party data that are retrieved. Each aspect is given a short, URI-friendly identifier string:

  • source identifier - indicates the person/agent/organization that provided the data. For example, "epa-gov" provides a variety of datasets.
  • dataset identifier - indicates the logical group of data. For example, the EPA distinguishes between the datasets "photochemical-assessment-monitoring-stations-pams" and "historical-radnet-air-quality-data".
  • version identifier - indicates the version of a particular dataset. For example, you could have retrieved version "r23" - a designation from EPA themselves, or I could have retrieved version "2013-Mar-14" because the EPA didn't provide a version designation for what they provided.

Clone this wiki locally