Skip to content
timrdf edited this page Jun 9, 2012 · 21 revisions

vload is a shell script that makes it a bit easier to load local RDF files into a Virtuoso triple store. It was originally created by Zhenning Shangguan for RPI's data.gov effort, but I've adopted it and added some whistles.

vload is part of the csv2rdf4lod-automation repository, so you get it with a [git clone](Installing csv2rdf4lod automation). After it is on your path, running it without arguments will show its usage:

usage: vload [--target] {rdf, ttl, nt, nq} <data_file> <graph_uri> [-v | --verbose]

Note that this script works independently of the rest of the csv2rdf4lod conversion process, so it can be used for any RDF file and any Virtuoso server without any of csv2rdf4lod-automation. However, if you are using csv2rdf4lod-automation, you never need to run vload on your own -- scripts in your conversion cockpits's publish/ directory are created for you to use, which give vload the right parameters and load the converted data into named graphs that are consistently organized. See Conversion process phase: publish if you are using csv2rdf4lod-automation and are trying to load data into a Virtuoso server.

Knowing where the load will go

Running the --target flag will show you the underlying isql command that it uses, along with the port and username it will use to connect to the virtuoso server. It also shows where it will store a log. The essential parts of how vload works are controlled by setting CSV2RDF4LOD environment variables.

vload --target

/opt/virtuoso
/opt/virtuoso/bin/isql 1111 dba
dba
/opt/csv2rdf4lod-automation/tmp/vload/input-files/load_2012-06-09T05_19_25-04-00_13450.log

The rest of the parameters should be self-explanatory. rdf, ttl, nt, nq is the format of the <data_file> RDF file, <graph_uri> is the graph name that the triples will be loaded into, and the -v or --verbose flag will show a bit more output (including a path to the log file, and the contents of the log).

CSV2RDF4LOD environment variables used

vload changes its behavior when the following variables are changed, and does its best when these are not set:

  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_HOME is "/opt/virtuoso" by default. This is used to get to "bin/isql" if the following is not set.
  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_ISQL_PATH defaults to $virtuoso_home/bin/isql.
  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT defaults to 1111
  • CSV2RDF4LOD_PUBLISH_VIRTUOSO_PASSWORD defaults to dba
  • CSV2RDF4LOD_HOME/tmp/vload/input-files is the directory for logs.
  • CSV2RDF4LOD_CONVERT_DATA_ROOT is used to avoid needless file copies if Virtuoso already has permissions for the directory that the loading RDF file is in.
  • CSV2RDF4LOD_CONCURRENCY is fed to Virtuoso when loading.

Clone this wiki locally