Skip to content

Script: cr test conversion.sh

timrdf edited this page Aug 7, 2011 · 108 revisions

Motivation

Since csv2rdf4lod is being continually developed, it is good to use the latest and greatest version (by using git pull). But what if some new behavior of the converter changes, producing your data differently? Well, that's a problem. And you need to know about it ASAP. Even better, I need to know about it ASAP. Ideally, I would know about the problem and fix it before I even release the next version of the converter. That way, you wouldn't have to worry about it. cr-test-conversion.sh helps you identify these problems so that you can handle them quickly. At the same time, it helps you share your explicit expectations for the converter so that I can verify that it works for you before I release another version.

Implementation

The script $CSV2RDF4LOD_HOME/bin/util/cr-test-conversion.sh is a start at tackling this challenge. Like virtually all other cr- scripts, it is invoked from any conversion cockpit. When invoked, it applies a variety of SPARQL queries to verify the converted data.

Dependencies

The testing infrastructure is currently using Jena's TDB because it lets us set up a triple store in a local directory of our choosing. See TWC's page for help installing Jena TDB. If you can successfully tdbloader and tdbquery, then you're good to go. (If you have a burning desire to test using other triple stores, go vote for #150)

Using version-controlled csv2rdf4lod skeletons to report bugs

version control strategies discusses how csv2rdf4lod-automation can be used within a version control system. When using one, it becomes incredibly easy to report a bug, all one needs to do is commit the .rq and point others to the URL of the test on the SVN web server. For example, someone could say:

Hey, this doesn't work and I need it Real Soon!

https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/data-gov-au/catalog/version/2011-Jun-27/rq/test/ask/present/thing_2.rq

Listing tests in RDF using EARL: https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/worldbank-org/world-development-indicators/rq/test/list.ttl

An abstract dataset is under version control and has unit tests:

<http://logd.tw.rpi.edu/source/worldbank-org/dataset/world-development-indicators>
  a conversion:AbstractDataset, void:Dataset;
  a conversion:VersionControlledDataset;
  doap:repository [
    a doap:SVNRepository;
    doap:location <https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/worldbank-org/world-development-indicators/>;
  ];
  a conversion:UnitTestedDataset;
  conversion:testable_by [ 
     a doap:Project;
     doap:developer <http://tw.rpi.edu/instances/MaryamFazel-Zarandi>;
     doap:developer <http://tw.rpi.edu/instances/TimLebo>;
     doap:repository [ 
       a doap:SVNRepository;
       doap:location <https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/worldbank-org/world-development-indicators/rq/>
     ];
  ];

Sometimes tests can only apply to specific versions, since they have to assume specific values for a specific data element. Although they aren't as broadly applicable, they are still useful. The following RDF encoding states A versioned dataset is under version control and has unit tests:

<http://logd.tw.rpi.edu/source/data-gov-au/dataset/catalog/version/2011-Jun-27>
  a conversion:VersionedDataset, void:Dataset;
  a conversion:VersionControlledDataset;
  doap:repository [
    a doap:SVNRepository;
    doap:location <https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/data-gov-au/catalog/>;
  ];
  a conversion:UnitTestedDataset;
  conversion:testable_by [ 
     a doap:Project;
     doap:developer <http://tw.rpi.edu/instances/YongmeiShi>;
     doap:developer <http://tw.rpi.edu/instances/TimLebo>;
     doap:repository [ 
       a doap:SVNRepository;
       doap:location 
  <https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/data-gov-au/catalog/version/2011-Jun-27/rq/>
     ];
  ];
.

Setup

cr-test-conversion.sh --help:

usage: cr-test-conversion.sh
 --rq                   : Create initial rq/test/ask/{present,absent}/*.rq directory structure.
 --setup                : Run tests, populate the tdb/ beforehand.
 --setup {--verbose, -v}: Run tests, populate the tdb/ beforehand, and show query contents.
                        : Run tests. Needs rq/test or ../../rq/test and publish/tdb/.
 {--verbose, -v}        : Run tests. Needs same as above. Shows the query contents while testing.
 --catalog -w           : Find all rq/test and create rq/test/list.ttl rdf:typing them to earl:TestCase.
 --catalog              : Show dryrun of finding all rq/test; print hypothetical contents of rq/test/list.ttl.
 --show-catalog         : Show all rq/test/list.ttl

bash-3.2$ cd /source/medicare-gov/catalog

bash-3.2$ ls version/

bash-3.2$ cr-test-conversion.sh --rq Creating rq/test for dataset medicare-gov catalog rq/test/ask/present rq/test/ask/present/a-dataset-exists.rq rq/test/ask/absent rq/test/ask/absent/impossible.rq

bash-3.2$ ls version/ rq/


The two sample queries (`a-dataset-exists.rq` and `impossible.rq`) take the following form. If you follow this capitalization and structure, the `--verbose` flag will be a little cleaner when executing the tests.

... ASK WHERE { GRAPH ?g { ... } }


(or on another machine, according to [[Version control strategies: only the essential minimum is needed]])

Next, we can hop into a [[conversion cockpit]] and prepare to test:

bash-3.2$ cd version/2011-Jul-18/

bash-3.2$ ls source/ doc/ manual/ convert-catalog.sh automatic/ publish/


bash-3.2$ export CSV2RDF4LOD_PUBLISH_TDB=true

bash-3.2$ publish/bin/publish.sh ... WARN [main] (FactoryGraphTDB.java:241) - No BGP optimizer Load: publish/medicare-gov-catalog-2011-Jul-18.nt 34,552 triples: loaded in 2.3 seconds [15,254.7 triples/s]


### Test!

SOURCE THE `my-csv2rdf4lod-source-me.sh` for the project that you are testing against. See [my-csv2rdf4lod-source-me.sh](https://github.com/timrdf/csv2rdf4lod-automation/wiki/Script:-source-me.sh).
* then reset your `CSV2RDF4LOD_HOME` `CSV2RDF4LOD_CONVERT_MACHINE_URI `
`CSV2RDF4LOD_CONVERT_PERSON_URI` to point to your copy of the converter.

bash-3.2$ cr-test-conversion.sh ../../rq/test/ask/absent/impossible.rq Ask => No ../../rq/test/ask/present/a-dataset-exists.rq Ask => Yes

2 of 2 passed


If you'd like to see a bit more, use `-v` or `--verbose`:

bash-3.2$ cr-test-conversion.sh --verbose ................................................................................ ../../rq/test/ask/absent/impossible.rq (Ask => No)

  twi:TimLebo owl:sameAs twi:notTimLebo .

................................................................................ ../../rq/test/ask/present/a-dataset-exists.rq (Ask => Yes)

  ?dataset a conversion:Dataset, void:Dataset .

2 of 2 passed


### Example: Testing GovTrack

From a [[conversion cockpit]]:

bash-3.2$ find rq rq rq/test rq/test/ask rq/test/ask/absent rq/test/ask/absent/9-to-7.rq rq/test/ask/present rq/test/ask/present/0-to-2.rq rq/test/ask/present/2-to-3.rq rq/test/ask/present/3-to-5.rq rq/test/ask/present/3-to-7.rq rq/test/ask/present/5-to-1.rq rq/test/ask/present/7-to-5.rq


`export CSV2RDF4LOD_PUBLISH_TDB=true` to load the conversion into a TDB directory to query.

<center>
<a href="http://download.geonames.org/export/zip/US.zip">http://download.geonames.org/export/zip/US.zip</a>

<img src="https://github.com/timrdf/csv2rdf4lod-automation/raw/master/doc/images/example-geonames-zip.png" alt="diagram of enhancements to geonames zip code dump"/>
</center>

bash-3.2$ cr-test-conversion.sh -v --!--!-!-!--!--!-!-!--!--!-!-!--!--!-!-!--!--!-!-!--!--!-!-!--!-!-/ rq/test/ask/absent/9-to-7.rq (Ask => Yes) - - - FAIL - - -

  typed_subdivision_order_3:r40040c9reference_199_VA_US geonames:parentFeature <http://logd.tw.rpi.edu/source/geonames-org/dataset/zip-us/us/typed/subdivision_order_2/199_VA_US> .

................................................................................ rq/test/ask/present/0-to-2.rq (Ask => Yes)

  zip-us-us:point_40040 
     a                       wgs:Point;
     geonames:parentFeature <http://logd.tw.rpi.edu/id/usps-com/zip/23690>;
     wgs:lat                ?lat;
     wgs:long               ?long .

................................................................................ rq/test/ask/present/2-to-3.rq (Ask => Yes)

  <http://logd.tw.rpi.edu/id/usps-com/zip/23690> geonames:parentFeature typed_place:Yorktown_VA_US .

................................................................................ rq/test/ask/present/3-to-5.rq (Ask => Yes)

  typed_place:Yorktown_VA_US geonames:parentFeature typed_subdivision_order_1:VA_US .

................................................................................ rq/test/ask/present/3-to-7.rq (Ask => Yes)

  typed_place:Yorktown_VA_US geonames:parentFeature <http://logd.tw.rpi.edu/source/geonames-org/dataset/zip-us/us/typed/subdivision_order_2/199_VA_US> .

................................................................................ rq/test/ask/present/5-to-1.rq (Ask => Yes)

  typed_subdivision_order_1:VA_US geonames:parentFeature typed_country:US .

................................................................................ rq/test/ask/present/7-to-5.rq (Ask => Yes)

  <http://logd.tw.rpi.edu/source/geonames-org/dataset/zip-us/us/typed/subdivision_order_2/199_VA_US> geonames:parentFeature typed_subdivision_order_1:VA_US .

6 of 7 passed


### Test results vocabularies

* <http://www.w3.org/TR/EARL10/> ([diagram](https://github.com/timrdf/csv2rdf4lod-automation/blob/master/doc/ontology-diagrams/earl-2011-May-10.pdf?raw=true))
* <http://www.w3.org/2006/03/test-description> ([diagram](https://github.com/timrdf/csv2rdf4lod-automation/blob/master/doc/ontology-diagrams/w3-test-description-2006-Mar-13.pdf?raw=true))

Clone this wiki locally