-
Notifications
You must be signed in to change notification settings - Fork 35
Ermilov's wiki.publicdata.eu CSV2RDF Application
Ermilov et al. presented a wiki-based approach to crowd-sourcing the enhancements of ~9k datasets listed at http://publicdata.eu (WebSci 2012 paper).
A year after its publication, how far has the crowd-sourcing come?
This pages provides a summary and review of Ermilov's wiki.publicdata.eu CSV2RDF Application.
Four accounts contributed, and the two non-author accounts provided fewer than ten contributions.
find manual/pages -name "*.ttl" | xargs -L1 grep "wasAttributedTo" | sort -u shows only a handful of contributors:
prov:wasAttributedTo <http://wiki.publicdata.eu/wiki/User:178.25.43.32>;
prov:wasAttributedTo <http://wiki.publicdata.eu/wiki/User:2001:638:902:2010:0:168:35:101>;
prov:wasAttributedTo <http://wiki.publicdata.eu/wiki/User:Iermilov>;
prov:wasAttributedTo <http://wiki.publicdata.eu/wiki/User:IvanErmilov>;
prov:wasAttributedTo <http://wiki.publicdata.eu/wiki/User:Soeren>;
Fifteen terms were reused from nine vocabularies for more than 9,000 datasets. We skip the three non-CURIEs listed below because it is not clear that they are RDF terms.
find manual/pages -name "*.xml.ttl" | xargs -L1 grep "conversion:label" | sed 's/conversion:label//' | grep : | sed 's/^ *"/"/' | grep -v " " | sort -u:
"cgov:fullTimeEquivalentSalary";
"cgov:lowerBound";
"cgov:upperBound";
"dce:date";
"foaf:mbox";
"foaf:name";
"foaf:phone";
"http://dbpedia.org/resource/Category:Ministerial_departments_of_the_United_Kingdom_Government";
"http://statistics.data.gov.uk/id/local-authority/32UC";
"http://www.google.co.uk";
"org:OrganizationalUnit";
"org:organization";
"org:unitOf";
"pc:supplier";
"rdf:type";
"rdfs:comment";
"skos:Amount";
"whois:Job";
- Enables community-editable mappings using an existing mechanism (wikimedia).
- The main CKAN dataset listing site links to the mapping wiki.
- User-invokable reconversion.
Usability Shortcomings:
- The wiki-page is hard to use because it is disconnected from both the original and resulting data.
- The community hasn't used the tool, even though it has been available for a year.
- The mapping wiki pages have meaningless names (e.g. http://wiki.publicdata.eu/wiki/Csv2rdf:F449751c-68d3-4f84-8fe3-5c3a4cb86c84).
Linked Data Best Practices Shortcomings:
-
curl -H "Accept: application/rdf+xml" -L http://publicdata.eu/dataset/directgov-referring-sitesreturns a gzipped HTML file (appending.rdfworks, though: http://publicdata.eu/dataset/directgov-referring-sites.rdf). - The mappings are NOT expressed in RDF; they are only expressed as mediawiki template arguments (and sparqlify behind the scenes, but they aren't available for public inspection). Although the intent is to make them easy to read/write for a novice, that does not mean that they shouldn't be lifted behind the scenes and made available as RDF for other systems to use.
- The mappings are NOT described with RDF, since it's just a wiki page (The Special:Export can be used, but it's not findable from the page itself using linked data principles). The mapping description does NOT refer back to the dataset that they enhance [using RDF], and they do NOT refer to the resulting RDF conversion [using RDF].
- The namespace used (http://wiki.publicdata.eu/ontology/) for the RDF properties 404s.
- The site for the converter tool (http://sparqlify.org/wiki/Main_Page) 404s.
- The RDF conversion dump files use the NTriples serialization but have the extension
.rdf(which is generally reserved forapplication/rdf+xmlserialization). (e.g. http://csv2rdf.aksw.org/sparqlified/f449751c-68d3-4f84-8fe3-5c3a4cb86c84_default-tranformation-configuration.rdf). This confuses even the best-of-breed RDF serialization tools.
Mapping Capabilities Shortcomings:
- It can't specify a datatype for a cell's value like conversion:range does (e.g. ""85" is an xsd:integer).
- It can't "promote" a cell value to a URI like conversion:range does (e.g. "http://www.google.co.nz" becomes <http://www.google.co.nz>).
- It can't type a URI to a given class like conversion:range_template/conversion:subclass_of do (e.g. <http://www.google.co.nz> is a sioc:Space).
- It's property creation strategy (put everything into http://wiki.publicdata.eu/ontology/) is not conservative enough and fosters collisions. csv2rdf4lod uses a hierarchical naming based on the publishing organization, the dataset, and the version of the dataset (the so-called "[SDV naming](Conversion process phase: name)") to avoid terminology collisions while facilitating natural and incremental dataset integration.
Provenance and Metadata Shortcomings:
- (to be enumerated)