-
Notifications
You must be signed in to change notification settings - Fork 35
Script: cr test conversion.sh
Since csv2rdf4lod is being continually developed, it is good to use the latest and greatest version (by using git pull). But what if some new behavior of the converter changes, producing your data differently? Well, that's a problem. And you need to know about it ASAP. Even better, I need to know about it ASAP. Ideally, I would know about the problem and fix it before I even release the next version of the converter. That way, you wouldn't have to worry about it. cr-test-conversion.sh helps you identify these problems so that you can handle them quickly. At the same time, it helps you share your explicit expectations for the converter so that I can verify that it works for you before I release another version.
The script $CSV2RDF4LOD_HOME/bin/util/cr-test-conversion.sh is a start at tackling this challenge. Like virtually all other cr- scripts, it is invoked from any conversion cockpit. When invoked, it applies a variety of SPARQL queries to verify the converted data.
The testing infrastructure is currently using Jena's TDB because it lets us set up a triple store in a local directory of our choosing. See TWC's page for help installing Jena TDB. If you can successfully tdbloader and tdbquery, then you're good to go. (If you have a burning desire to test using other triple stores, go vote for #150)
version control strategies discusses how csv2rdf4lod-automation can be used within a version control system. When using one, it becomes incredibly easy to report a bug, all one needs to do is commit the .rq and point others to the URL of the test on the SVN web server. For example, someone could say:
Hey, this doesn't work and I need it Real Soon!
Listing tests in RDF using EARL: https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/worldbank-org/world-development-indicators/rq/test/list.ttl
An abstract dataset is under version control and has unit tests:
<http://logd.tw.rpi.edu/source/worldbank-org/dataset/world-development-indicators>
a conversion:AbstractDataset, void:Dataset;
a conversion:VersionControlledDataset;
doap:repository [
a doap:SVNRepository;
doap:location <https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/worldbank-org/world-development-indicators/>;
];
a conversion:UnitTestedDataset;
conversion:testable_by [
a doap:Project;
doap:developer <http://tw.rpi.edu/instances/MaryamFazel-Zarandi>;
doap:developer <http://tw.rpi.edu/instances/TimLebo>;
doap:repository [
a doap:SVNRepository;
doap:location <https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/worldbank-org/world-development-indicators/rq/>
];
];
Sometimes tests can only apply to specific versions, since they have to assume specific values for a specific data element. Although they aren't as broadly applicable, they are still useful. The following RDF encoding states A versioned dataset is under version control and has unit tests:
<http://logd.tw.rpi.edu/source/data-gov-au/dataset/catalog/version/2011-Jun-27>
a conversion:VersionedDataset, void:Dataset;
a conversion:VersionControlledDataset;
doap:repository [
a doap:SVNRepository;
doap:location <https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/data-gov-au/catalog/>;
];
a conversion:UnitTestedDataset;
conversion:testable_by [
a doap:Project;
doap:developer <http://tw.rpi.edu/instances/YongmeiShi>;
doap:developer <http://tw.rpi.edu/instances/TimLebo>;
doap:repository [
a doap:SVNRepository;
doap:location
<https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/data-gov-au/catalog/version/2011-Jun-27/rq/>
];
];
.
cr-test-conversion.sh --help:
usage: cr-test-conversion.sh
--rq : Create initial rq/test/ask/{present,absent}/*.rq directory structure.
--setup : Run tests, populate the tdb/ beforehand.
--setup {--verbose, -v}: Run tests, populate the tdb/ beforehand, and show query contents.
: Run tests. Needs rq/test or ../../rq/test and publish/tdb/.
{--verbose, -v} : Run tests. Needs same as above. Shows the query contents while testing.
--catalog -w : Find all rq/test and create rq/test/list.ttl rdf:typing them to earl:TestCase.
--catalog : Show dryrun of finding all rq/test; print hypothetical contents of rq/test/list.ttl.
--show-catalog : Show all rq/test/list.ttl
bash-3.2$ cd /source/medicare-gov/catalog
bash-3.2$ ls version/
bash-3.2$ cr-test-conversion.sh --rq Creating rq/test for dataset medicare-gov catalog rq/test/ask/present rq/test/ask/present/a-dataset-exists.rq rq/test/ask/absent rq/test/ask/absent/impossible.rq
bash-3.2$ ls version/ rq/
The two sample queries (`a-dataset-exists.rq` and `impossible.rq`) take the following form. If you follow this capitalization and structure, the `--verbose` flag will be a little cleaner when executing the tests.
... ASK WHERE { GRAPH ?g { ... } }
(or on another machine, according to [[Version control strategies: only the essential minimum is needed]])
Next, we can hop into a [[conversion cockpit]] and prepare to test:
bash-3.2$ cd version/2011-Jul-18/
bash-3.2$ ls source/ doc/ manual/ convert-catalog.sh automatic/ publish/
bash-3.2$ export CSV2RDF4LOD_PUBLISH_TDB=true
bash-3.2$ publish/bin/publish.sh ... WARN [main] (FactoryGraphTDB.java:241) - No BGP optimizer Load: publish/medicare-gov-catalog-2011-Jul-18.nt 34,552 triples: loaded in 2.3 seconds [15,254.7 triples/s]
### Test!
SOURCE THE `my-csv2rdf4lod-source-me.sh` for the project that you are testing against. See [my-csv2rdf4lod-source-me.sh](https://github.com/timrdf/csv2rdf4lod-automation/wiki/Script:-source-me.sh).
* then reset your `CSV2RDF4LOD_HOME` `CSV2RDF4LOD_CONVERT_MACHINE_URI `
`CSV2RDF4LOD_CONVERT_PERSON_URI` to point to your copy of the converter.
bash-3.2$ cr-test-conversion.sh ../../rq/test/ask/absent/impossible.rq Ask => No ../../rq/test/ask/present/a-dataset-exists.rq Ask => Yes
2 of 2 passed
If you'd like to see a bit more, use `-v` or `--verbose`:
bash-3.2$ cr-test-conversion.sh --verbose ................................................................................ ../../rq/test/ask/absent/impossible.rq (Ask => No)
twi:TimLebo owl:sameAs twi:notTimLebo .
................................................................................ ../../rq/test/ask/present/a-dataset-exists.rq (Ask => Yes)
?dataset a conversion:Dataset, void:Dataset .
2 of 2 passed
### Example: Testing GovTrack
From a [[conversion cockpit]]:
bash-3.2$ find rq rq rq/test rq/test/ask rq/test/ask/absent rq/test/ask/absent/9-to-7.rq rq/test/ask/present rq/test/ask/present/0-to-2.rq rq/test/ask/present/2-to-3.rq rq/test/ask/present/3-to-5.rq rq/test/ask/present/3-to-7.rq rq/test/ask/present/5-to-1.rq rq/test/ask/present/7-to-5.rq
`export CSV2RDF4LOD_PUBLISH_TDB=true` to load the conversion into a TDB directory to query.
<center>
<a href="http://download.geonames.org/export/zip/US.zip">http://download.geonames.org/export/zip/US.zip</a>
<img src="https://github.com/timrdf/csv2rdf4lod-automation/raw/master/doc/images/example-geonames-zip.png" alt="diagram of enhancements to geonames zip code dump"/>
</center>
bash-3.2$ cr-test-conversion.sh -v --!--!-!-!--!--!-!-!--!--!-!-!--!--!-!-!--!--!-!-!--!--!-!-!--!-!-/ rq/test/ask/absent/9-to-7.rq (Ask => Yes) - - - FAIL - - -
typed_subdivision_order_3:r40040c9reference_199_VA_US geonames:parentFeature <http://logd.tw.rpi.edu/source/geonames-org/dataset/zip-us/us/typed/subdivision_order_2/199_VA_US> .
................................................................................ rq/test/ask/present/0-to-2.rq (Ask => Yes)
zip-us-us:point_40040
a wgs:Point;
geonames:parentFeature <http://logd.tw.rpi.edu/id/usps-com/zip/23690>;
wgs:lat ?lat;
wgs:long ?long .
................................................................................ rq/test/ask/present/2-to-3.rq (Ask => Yes)
<http://logd.tw.rpi.edu/id/usps-com/zip/23690> geonames:parentFeature typed_place:Yorktown_VA_US .
................................................................................ rq/test/ask/present/3-to-5.rq (Ask => Yes)
typed_place:Yorktown_VA_US geonames:parentFeature typed_subdivision_order_1:VA_US .
................................................................................ rq/test/ask/present/3-to-7.rq (Ask => Yes)
typed_place:Yorktown_VA_US geonames:parentFeature <http://logd.tw.rpi.edu/source/geonames-org/dataset/zip-us/us/typed/subdivision_order_2/199_VA_US> .
................................................................................ rq/test/ask/present/5-to-1.rq (Ask => Yes)
typed_subdivision_order_1:VA_US geonames:parentFeature typed_country:US .
................................................................................ rq/test/ask/present/7-to-5.rq (Ask => Yes)
<http://logd.tw.rpi.edu/source/geonames-org/dataset/zip-us/us/typed/subdivision_order_2/199_VA_US> geonames:parentFeature typed_subdivision_order_1:VA_US .
6 of 7 passed
### Test results vocabularies
* <http://www.w3.org/TR/EARL10/> ([diagram](https://github.com/timrdf/csv2rdf4lod-automation/blob/master/doc/ontology-diagrams/earl-2011-May-10.pdf?raw=true))
* <http://www.w3.org/2006/03/test-description> ([diagram](https://github.com/timrdf/csv2rdf4lod-automation/blob/master/doc/ontology-diagrams/w3-test-description-2006-Mar-13.pdf?raw=true))