Finding Linksets among Linked Data Bubbles

What is first

CKAN - a walk through of how to add and annotate dataset entries (and the extra requirements to suit the lodcloud group).
https://github.com/jimmccusker/twc-healthdata/wiki/Listing-twc-healthdata-as-a-LOD-Cloud-Bubble

What we will cover

This page describes how to calculate VoID Linksets between a csv2rdf4lod node and all other bubbles in the Linked Data Diagram, using csv2rdf4lod-automations' one-click data dump and lodcloud's "namespace" annotations. Calculating the Linksets makes it easier to find out how a bubble is connected to others, which also makes it easier to assert the CKAN lodcloud annotation required to get into the diagram.

Let's get to it!

To find links, we need two things:

A list of all RDF nodes in a bubble. We can get this rather easily by running csv2rdf4lod's one-click data dump through nt-nodes.sh.
The namespace for each Linked Data bubble, which is given with the "namespace" annotation in CKAN. For example,
- http://datahub.io/dataset/2000-us-census-rdf's namespace is http://www.rdfabout.com/rdf/usgov/geo/, and
- http://datahub.io/dataset/a-seobook-dataset's namespace is http://seobook.blog.com.

We can get a bubble's namespace by POSTing its URI to a deployed instance of lift-ckan.py (e.g. here), which provides a good RDF description of the contorted annotations in the CKAN data entry.

curl -H "Content-Type: text/turtle" \
  -d '<http://datahub.io/dataset/2000-us-census-rdf> a <http://purl.org/twc/vocab/datafaqs#CKANDataset> .' \
    http://aquarius.tw.rpi.edu/projects/datafaqs/services/sadi/ckan/lift-ckan

returns the following RDF triples (among others). The one we need is void:uriSpace.

<http://datahub.io/dataset/2000-us-census-rdf> a datafaqs:CKANDataset;
    ov:shortName "US Census (rdfabout)";
    dcterms:title "2000 U.S. Census in RDF (rdfabout.com)";
    void:sparqlEndpoint <http://www.rdfabout.com/sparql>;
    void:triples 1002848918;
    void:uriSpace "http://www.rdfabout.com/rdf/usgov/geo/" .

Modeling the Linkset

When 50 URIs occur in both http://datahub.io/dataset/twc-healthdata and http://datahub.io/dataset/2000-us-census-rdf, it is represented in VoID like this:

<http://datahub.io/dataset/twc-healthdata>
    void:subset [
        a void:Linkset;
        void:target 
          <http://datahub.io/dataset/twc-healthdata>, 
          <http://datahub.io/dataset/2000-us-census-rdf>;
        void:triples 50;
    ], 
.

Limitations of this approach

This is cheaper to calculate because we don't need to go through the hassle of finding and retrieving the full data dump of each bubble, and we don't have as much instance data to process. However, this will miss connections between our bubble and others' when they mention the same URIs that we do, but are not in their own namespace.

What is next?

How hard is it to get one click data dumps for bubbles that do not use csv2rdf4lod-automation?
What is the disparity between the manual assertion on the CKAN entry and what was actually found?
How can we model the Linkset calculation so that it naturally provides justification for the resulting CKAN annotation? (SIO-qualifying the void:triples triple and saying it prov:wasDerivedFrom the analysis that produced it. Tie into Jim's aggregation thesis?)
Some thoughts on How to characterize a list of RDF node URIs
CKAN lodcloud RDF vocabulary to use add-metadata.py to submit the Linksets to CKAN.

Finding Linksets among Linked Data Bubbles

What is first

What we will cover

Let's get to it!

Modeling the Linkset

Limitations of this approach

What is next?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!