Skip to content

Ermilov's wiki.publicdata.eu CSV2RDF Application

Tim L edited this page May 12, 2013 · 43 revisions

What is first

What we will cover

Ermilov et al. presented a wiki-based approach to crowd-sourcing the enhancements of ~9k datasets listed at http://publicdata.eu (WebSci 2012 paper).

A year after its publication, how far has the crowd-sourcing come?

This pages provides a summary and review of Ermilov's wiki.publicdata.eu CSV2RDF Application.

Let's get to it

How many people contributed to the "crowd-source" enhancement?

Four accounts contributed, and the two non-author accounts provided fewer than ten contributions.

find manual/pages -name "*.ttl" | xargs -L1 grep "wasAttributedTo" | sort -u shows only a handful of contributors:

      prov:wasAttributedTo <http://wiki.publicdata.eu/wiki/User:178.25.43.32>;
      prov:wasAttributedTo <http://wiki.publicdata.eu/wiki/User:2001:638:902:2010:0:168:35:101>;
      prov:wasAttributedTo <http://wiki.publicdata.eu/wiki/User:Iermilov>;
      prov:wasAttributedTo <http://wiki.publicdata.eu/wiki/User:IvanErmilov>;
      prov:wasAttributedTo <http://wiki.publicdata.eu/wiki/User:Soeren>;

How many existing vocabulary terms did the crowd-sourced enhancement produce?

Fifteen terms were reused from nine vocabularies for more than 9,000 datasets. We skip the three non-CURIEs listed below because it is not clear that they are RDF terms.

find manual/pages -name "*.xml.ttl" | xargs -L1 grep "conversion:label" | sed 's/conversion:label//' | grep : | sed 's/^ *"/"/' | grep -v " " | sort -u:

"cgov:fullTimeEquivalentSalary";
"cgov:lowerBound";
"cgov:upperBound";
"dce:date";
"foaf:mbox";
"foaf:name";
"foaf:phone";
"http://dbpedia.org/resource/Category:Ministerial_departments_of_the_United_Kingdom_Government";
"http://statistics.data.gov.uk/id/local-authority/32UC";
"http://www.google.co.uk";
"org:OrganizationalUnit";
"org:organization";
"org:unitOf";
"pc:supplier";
"rdf:type";
"rdfs:comment";
"skos:Amount";
"whois:Job";

Benefits

  • Enables community-editable mappings using an existing mechanism (wikimedia).
  • The main CKAN dataset listing site links to the mapping wiki.
  • User-invokable reconversion.

Shortcomings

Usability:

  • The wiki-page is hard to use because it is disconnected from the original and resulting data.
  • The community hasn't used the tool, even though it has been available for use for a year.

Linked Data Best Practices:

  • curl -H "Accept: application/rdf+xml" -L http://publicdata.eu/dataset/directgov-referring-sites returns a gzipped HTML file (appending .rdf works, though: http://publicdata.eu/dataset/directgov-referring-sites.rdf).
  • The mappings are expressed in RDF; they are only expressed mediawiki template arguments (and sparqlify behind the scenes, but they aren't available for public inspection).
  • The mappings are not described with RDF, since it's just a wiki page. They do not refer back to the dataset that they enhance, and they do not refer to the resulting RDF conversion.

Mapping capabilities:

  • It can't specify a datatype for a cell's value like conversion:range does (e.g. ""85" is an xsd:integer).
  • It can't "promote" a cell value to a URI like conversion:range does (e.g. "http://www.google.co.nz" becomes http://www.google.co.nz).
  • It can't type a URI to a given class like conversion:range_template/conversion:subclass_of do (e.g. http://www.google.co.nz is a sioc:Space).

Clone this wiki locally