-
Notifications
You must be signed in to change notification settings - Fork 1
Dataset datahub io lodcloud group
(supporting Survey 3 2014 Jul 04)
Against http://lodcloud.tw.rpi.edu/sparql
http://datahub.io/group/lodcloud says 283 or 214 datasets, depending on if you look to the left or in the middle...
294 typed as datafaqs:CKANDataset, 240 typed as void:Dataset
prefix void: <http://rdfs.org/ns/void#>
prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#>
prefix tag: <http://www.holygoat.co.uk/owl/redwood/0.1/tags/>
select count(distinct ?dataset)
where {
graph <http://purl.org/twc/lodcloud/source/datahub-io/dataset/lodcloud-group/version/2014-07-04> {
?dataset # <http://thedatahub.org/dataset/DBpedia>
a datafaqs:CKANDataset
}
}
Only 49 datasets return when we ask for their void:Linkset connectivity...
prefix void: <http://rdfs.org/ns/void#>
prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#>
prefix tag: <http://www.holygoat.co.uk/owl/redwood/0.1/tags/>
select count(distinct ?dataset)
where {
graph <http://purl.org/twc/lodcloud/source/datahub-io/dataset/lodcloud-group/version/2014-07-04> {
?dataset # <http://thedatahub.org/dataset/DBpedia>
a datafaqs:CKANDataset;
void:subset ?linkset
.
optional{ ?dataset tag:taggedWithTag ?tag }
optional{ ?dataset void:triples ?triples }
?linkset # <http://instances.tw.rpi.edu/id/linkset/DBpedia/e977476546bf11f68176d67246280e63>
void:target ?target; # <http://thedatahub.org/dataset/aemet>
void:triples ?overlap # 82
.
?target a datafaqs:CKANDataset .
filter(?dataset != ?target)
}
}
Relax it, and we get 294 again (with the void:Linkset when it's there...)
prefix void: <http://rdfs.org/ns/void#>
prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#>
prefix tag: <http://www.holygoat.co.uk/owl/redwood/0.1/tags/>
select count(distinct ?dataset)
where {
graph <http://purl.org/twc/lodcloud/source/datahub-io/dataset/lodcloud-group/version/2014-07-04> {
?dataset # <http://thedatahub.org/dataset/DBpedia>
a datafaqs:CKANDataset
.
optional{ ?dataset tag:taggedWithTag ?tag }
optional{ ?dataset void:triples ?triples }
optional {
?dataset void:subset ?linkset .
?linkset # <http://instances.tw.rpi.edu/id/linkset/DBpedia/e977476546bf11f68176d67246280e63>
void:target ?target; # <http://thedatahub.org/dataset/aemet>
void:triples ?overlap # 82
.
?target a datafaqs:CKANDataset .
filter(?dataset != ?target)
}
}
}
Show the datasets, how big they are, and how much they overlap with another (here). Order by overlap, so that those without the overlap appear at the bottom.
prefix void: <http://rdfs.org/ns/void#>
prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#>
prefix tag: <http://www.holygoat.co.uk/owl/redwood/0.1/tags/>
select ?dataset ?triples ?overlap ?target
where {
graph <http://purl.org/twc/lodcloud/source/datahub-io/dataset/lodcloud-group/version/2014-07-04> {
?dataset # <http://thedatahub.org/dataset/DBpedia>
a datafaqs:CKANDataset
.
optional{ ?dataset tag:taggedWithTag ?tag }
optional{ ?dataset void:triples ?triples }
optional {
?dataset void:subset ?linkset .
?linkset # <http://instances.tw.rpi.edu/id/linkset/DBpedia/e977476546bf11f68176d67246280e63>
void:target ?target; # <http://thedatahub.org/dataset/aemet>
void:triples ?overlap # 82
.
?target a datafaqs:CKANDataset .
filter(?dataset != ?target)
}
}
}
order by desc(?overlap) desc(?triples)
Skimming down that list of those with no overlaps, the first few do not actually claim overlaps. The first one that does is http://datahub.io/dataset/ub-mannheim-linked-data. http://thedatahub.org/dataset/taxonconcept does, too. SO DOES http://datahub.io/dataset/dbpedia :-/
A major source of the problem is probably that the domains names for datahub are inconsistent (http://datahub.io vs. http://thedatahub.org).
prefix void: <http://rdfs.org/ns/void#>
prefix datafaqs: <http://purl.org/twc/vocab/datafaqs#>
prefix tag: <http://www.holygoat.co.uk/owl/redwood/0.1/tags/>
select ?n
where {
graph <http://purl.org/twc/lodcloud/source/datahub-io/dataset/lodcloud-group/version/2014-07-04> {
{?n ?p []} union
{[] ?p ?n}
filter(regex(str(?n),'ub-mannheim-linked-data'))
}
}
order by ?n
Ugh, let's hack it:
lodcloud@lodcloud:~/prizms/lodcloud/data/source/datahub-io/lodcloud-group/version/2014-07-04$ rdf2nt.sh source/* > manual/sources.nt
perl -pi -e 's|http://thedatahub.org|http://datahub.io|g' manual/sources.nt
cr-publish.sh manual/sources.nt