Skip to content

Use Case: DOIs among LOD

timrdf edited this page Jan 10, 2013 · 39 revisions

What is first

What we will cover

This page describes how to use DataFAQs to find other LOD Cloud data sources that describe a set of publications with Digital Object Identifiers (DOIs). This pattern could be applied more generally for a particular property's value (i.e., those that are often inverse functional).

Let's get to it!

The DataOne project is working with some documents/datasets with indexing terms, e.g.:

"status"|"doi:10.5063/AA/nceas.226.3"
"status"|"doi:10.5063/AA/nceas.227.15"

""snow"|"doi:10.6073/AA/knb-lter-arc.1423.1"

"%285"|"doi:10.6073/AA/knb-lter-fce.111.5"
"%285"|"doi:10.6073/AA/knb-lter-fce.112.5"
"%285"|"doi:10.6073/AA/knb-lter-fce.108.5"

With some quick csv2rdf4lod'ing, we can get good linked data URIs that reuse bibo:

<http://dx.doi.org/10.5063/AA/nceas.227.15>
   dcterms:isReferencedBy <http://localhost/source/patrice/dataset/index-term-doi-pairs/version/2013-Jan-09> ;
   void:inDataset <http://localhost/source/patrice/dataset/index-term-doi-pairs/version/2013-Jan-09> ;
   a index-term-doi-pairs_vocab:DigitalObject ;
   bibo:doi "10.5063/AA/nceas.227.15" ;
   ov:csvRow "2"^^xsd:integer .

<http://dx.doi.org/10.6073/AA/knb-lter-arc.1423.1>
   dcterms:isReferencedBy <http://localhost/source/patrice/dataset/index-term-doi-pairs/version/2013-Jan-09> ;
   void:inDataset <http://localhost/source/patrice/dataset/index-term-doi-pairs/version/2013-Jan-09> ;
   a index-term-doi-pairs_vocab:DigitalObject ;
   bibo:doi "10.6073/AA/knb-lter-arc.1423.1" ;
   ov:csvRow "4"^^xsd:integer .

Since we want to model this as good Linked Data, we can get better about the source identifier and the dataset identifier, per the conventions:

  • source identifier: (what organization produced this data?)
  • dataset identifier: (what would the organization call it?)

Then, we can commit the source data and the enhancement parameters (per these conventions) into an existing csv2rdf4lod node such as LOGD, whose conversion data root is here.

DataFAQs revolves around two things: Datasets and FAqT Services. The data that we show above is the one dataset that we want to work with, i.e. "evaluate". We'll also need to create some FAqT Services to fulfill our use case of finding other Digital Objects that may be described elsewhere in the LOD Cloud.

First, we'll make sure that we have described the dataset appropriately.

Next, we need to model the inputs and outputs for each FAqT Service. The input to a FAqT Service is a dcat:Dataset, but we can and should add additional descriptions that are appropriate for our use case.

  • For each http://dx.doi.org URI, SPARQL-query each Bubble's SPARQL endpoint to see if it is described there.
  • For each bibo:doi value, SPARQL-query each Bubble's SPARQL endpoint to see if it is described there.

The final result that we are looking for would look something like the following. The three void:inDataset would be asserted if the same URI is found in these datasets. The owl:sameAs would result when we find other URIs with the same bibo:doi in those sources. For the distinct URIs, we also indicate which LOD bubble they are from using void:inDataset. The example URIs in this example are notional.

<http://dx.doi.org/10.6073/AA/knb-lter-arc.1423.1>
   void:inDataset <http://datahub.io/dataset/twc-logd>,
                  <http://datahub.io/dataset/dbpedia>,
                  <http://datahub.io/dataset/vivo-indiana-university>;
   owl:sameAs <http://ieee.rkbexplorer.com/id/10.6073/AA/knb-lter-arc.1423.1>;

.
<http://ieee.rkbexplorer.com/id/10.6073/AA/knb-lter-arc.1423.1>
   void:inDataset <http://datahub.io/dataset/rkb-explorer-ieee>;
.

Since most LOD Cloud bubbles should have SPARQL endpoints, we'll avoid trying to access and load their void:dataDumps and instead query the SPARQL endpoint from within the FAqT Service. This assumption should be verified: how many bubbles offer endpoints, and how many offer data dumps, and how many do both/neither?

What is next

Clone this wiki locally