Skip to content

A quick and easy conversion

timrdf edited this page Sep 17, 2012 · 61 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

What's first?

An example

Let's say we are interested in some oil well data shown at ScraperWiki, which offers a URL for the CSV: https://scraperwiki.com/scrapers/uk-offshore-oil-wells. As data curators, we'll need to choose identifiers for our source, dataset, and version, so we choose scraperwiki-com, uk-offshore-oil-wells, and 2011-Jan-24, respectively. Knowing these values, we can make the directory:

bash-3.2$ mkdir ~/Desktop/source  # Creates the directory for all data that you collect and convert.
bash-3.2$ cd ~/Desktop/source
bash-3.2$ mkdir -p scraperwiki-com/uk-offshore-oil-wells/version/2011-Jan-24

Next, we'll hop into our conversion cockpit and set up shop:

~/Desktop/source
bash-3.2$ cd scraperwiki-com/uk-offshore-oil-wells/version/2011-Jan-24
bash-3.2$ mkdir source
bash-3.2$ mkdir manual

Hop into source/ and grab the data (with pcurl.sh!) and get back into our conversion cockpit:

~/Desktop/source/scraperwiki-com/uk-offshore-oil-wells/version/2011-Jan-24
bash-3.2$ cd source/
bash-3.2$ pcurl.sh http://purl.org/twc/query/scraperwiki/uk-offshore-oil-wells -e csv
bash-3.2$ cd ..

[Make the conversion trigger](Conversion process phase: create conversion trigger), [pull it](Conversion process phase: pull conversion trigger) (if you hit a memory error, see FAQ), and see [what it did](A guided tour of csv2rdf4lod's Turtle dump file):

~/Desktop/source/scraperwiki-com/uk-offshore-oil-wells/version/2011-Jan-24
bash-3.2$ cr-create-convert-sh.sh -w source/uk-offshore-oil-wells.csv

bash-3.2$ ./convert-uk-offshore-oil-wells.sh

bash-3.2$ vi automatic/uk-offshore-oil-wells.csv.raw.ttl

There's your RDF --------^^ as a verbatim interpretation of the tabular literals (if you hit a memory error, see FAQ). But having an enhanced version is better! Start by reviewing the Conversion process phases, one of which will show you how to make an enhancement to add to your initial conversion. conversion:Enhancement shows a good set of things you can tell the converter to make nicely-structured RDF from a relatively uninformative bucket of literals. But if you want to cheat, grab my enhancement parameters for this dataset and plop them into the manual/ directory of your conversion cockpit; then run ./convert-uk-offshore-oil-wells.sh again (it'll realize that you already ran the raw conversion and move on with enhancing).

Letting RDFa drive

These instructions repeat the instructions above, but use some automation driven by parameters available in an RDFa file on the web or local disk. If you did the stuff above, you don't need to do the stuff below. Note that this example requires rapper, which is discussed in Installing csv2rdf4lod automation - complete.

If you'd like to get more serious and set up a data skeleton so that anybody can set up their own version of the dataset, check out Automated creation of a new Versioned Dataset.

bash-3.2$ cd ~Desktop/source

bash-3.2$ cr-create-dataset-dir.sh \
https://github.com/timrdf/csv2rdf4lod-automation/raw/master/bin/dup/scraperwiki-com-uk-offshore-oil-wells-2011-Jan-24.xhtml

bash-3.2$ ls -lt scraperwiki-com/uk-offshore-oil-wells/2011-Jan-24/source/
total 6856
-rw-r--r--  1 lebot  staff  3489521 Jan 24 20:54 uk-offshore-oil-wells.csv
-rw-r--r--  1 lebot  staff     3441 Jan 24 20:54 uk-offshore-oil-wells.csv.pml.ttl
-rw-r--r--  1 lebot  staff     1928 Jan 24 20:54 scraperwiki-com-uk-offshore-oil-wells-2011-Jan-24.xhtml
-rw-r--r--  1 lebot  staff     4278 Jan 24 20:54 scraperwiki-com-uk-offshore-oil-wells-2011-Jan-24.xhtml.pml.ttl

bash-3.2$ cd scraperwiki-com/uk-offshore-oil-wells/2011-Jan-24/

bash-3.2$ cr-create-convert-sh.sh -w source/uk-offshore-oil-wells.csv

bash-3.2$ ./convert-uk-offshore-oil-wells.sh

bash-3.2$ vi automatic/uk-offshore-oil-wells.csv.raw.ttl

What's next?

Clone this wiki locally