Skip to content

CSV2RDF4LOD environment variables (considerations for a distributed workflow)

timrdf edited this page Oct 1, 2011 · 87 revisions

Different CSV2RDF4LOD environment variables apply in different situations. Your variable settings can depend on:

  • the Project you are working on.
  • the Machine you are working on.
  • the Dataset you are working on.
  • who You are (as opposed to your team members).
  • what you are Doing (e.g., bulk conversion, developing enhancements, testing, etc.).

For example, you could be working on LOGD, LOBD, SWQP, or OrgPedia. All of which have a different CSV2RDF4LOD_BASE_URI (http://logd.tw.rpi.edu, http://health.tw.rpi.edu, etc.)

In simple environments, your my-csv2rdf4lod-source-me.sh does the job. But as you start working on many projects, collaborating with others through version control, and start using different machines, things can start to get a bit messy. This page offers some recommendations and best practices for managing the issues in these more complicated environments.

Naming conventions

In the simple case of one machine, one project, and one you, stick with my-csv2rdf4lod-source-me.sh. When you get more of any of those, the following naming conventions help organize the settings for CSV2RDF4LOD environment variables according to how, when, or why they should be used. The as-, for-, on-, when- lend themselves a nice sort order and help to naturally indicate the type of environment variables the file contains.

csv2rdf4lod-source-me-for-PROJECTNAME.sh
csv2rdf4lod-source-me-on-MACHINENAME.sh  
csv2rdf4lod-source-me-as-USERNAME.sh
csv2rdf4lod-source-me-when-ACTIVITYNAME.sh

Including documentation pointers

We recommend including these comments in your source-mes so people have pointers to the latest information about what they are for and how to use them.

#3 <#> a <http://purl.org/twc/vocab/conversion/CSV2RDF4LOD_environment_variables> ;
#3     rdfs:seeAlso 
#3     <http://purl.org/twc/page/csv2rdf4lod/distributed_env_vars>,
#3     <https://github.com/timrdf/csv2rdf4lod-automation/wiki/Script:-source-me.sh> .

Using version control

We recommend that you version control all source-mes. See Version control strategies: only the essential minimum is needed.

Example

The following source-mes are on TWC's SVN, which is public so that others can reproduce the conversions.

csv2rdf4lod-source-me-for-logd.sh                     (a project)
csv2rdf4lod-source-me-on-gemini.sh                    (a machine)
csv2rdf4lod-source-me-on-sam.sh                       (a machine)
csv2rdf4lod-source-me-as-lebot.sh                     (a person)

The source-mes above can be checked out using the command:

svn checkout https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source --depth=files

The following source-mes are on TWC's SVN, which is private because it contains ports, usernames, and passwords for the endpoint administration.

csv2rdf4lod-source-me-when-publishing-via-virtuoso.sh (an activity)

Grab them:

svn checkout https://scm.escience.rpi.edu/svn/private/projects/logd/config/        /mnt/raid/logd/svn/config/
svn checkout https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/ /mnt/raid/logd/svn/source/

Note that the second call is getting the entire [data root](csv2rdf4lod data root) as well as the configurations.

Source (in ~/.bashrc) the right mix based on what machine you're on, what project you're working on, and who you are:

alias l='ls -lt'
source /mnt/raid/logd/svn/config/csv2rdf4lod-source-me-on-gemini.sh
source /mnt/raid/logd/svn/config/csv2rdf4lod-source-me-for-logd.sh
source /mnt/raid/logd/svn/config/csv2rdf4lod-source-me-as-lebot.sh

The conversion trigger, too!

The conversion triggers can contain dataset-specific CSV2RDF4LOD environment variables and should also be version controlled. This eliminates the need for the consumer to know "what data files should be converted?".

CSV2RDF4LOD_CONVERT_OMIT_RAW_LAYER="true"

Since the conversion trigger is version-specific, you can apply it to all future versions in the source/SSS/DDD/version/2source.sh. See Automated creation of a new Versioned Dataset.

Tracking down where a CSV2RDF4LOD environment variable is being set

$CSV2RDF4LOD_HOME/bin/util/cr-where-was-envvar-set.sh will dig through all of the source-mes to show you where a particular environment variable is set:

$ cr-where-was-envvar-set.sh --help
usage: cr-where-was-envvar-set.sh [-rc ~/.bashrc] [ [--list] | [CSV2RDF4LOD_var] [--only] ]
                -rc : the rc (.bashrc, .login, etc.) file used to source all csv2rdf4lod-source-mes.
             --list : show the source-mes that are used to set up the environment.
  [CSV2RDF4LOD_var] : a CSV2RDF4LOD_ environment variable name.
                      All variables are listed by running cr-vars.sh.
                      If not specified, defaults to CSV2RDF4LOD_HOME.
             --only : omit the CSV2RDF4LOD_ variables that are more specific than the one specified.

see https://github.com/timrdf/csv2rdf4lod-automation/wiki/CSV2RDF4LOD-environment-variables
    https://github.com/timrdf/csv2rdf4lod-automation/wiki/Script:-source-me.sh

To see what source-mes are used to setup your environment:

$ cr-where-was-envvar-set.sh -rc ~/.bashrc --list
/srv/logd/data/source/csv2rdf4lod-source-me-for-logd.sh
/srv/logd/data/source/csv2rdf4lod-source-me-on-gemini.sh
/srv/logd/data/source/csv2rdf4lod-source-me-as-lebot.sh
/srv/logd/data/source/csv2rdf4lod-source-me-when-publishing.sh
/srv/logd/config/triple-store/virtuoso/csv2rdf4lod-source-me-for-virtuoso-credentials.sh

To show where CSV2RDF4LOD_PUBLISH_VIRTUOSO is set, while omitting the variables CSV2RDF4LOD_PUBLISH_VIRTUOSO_HOME, CSV2RDF4LOD_PUBLISH_VIRTUOSO_PORT, etc. If you want to see all "children" variables, omit the --only parameter.

$cr-where-was-envvar-set.sh -rc ~/.bashrc CSV2RDF4LOD_PUBLISH_VIRTUOSO --only

/srv/logd/data/source/csv2rdf4lod-source-me-for-logd.sh:export CSV2RDF4LOD_PUBLISH_VIRTUOSO="true"
/srv/logd/data/source/csv2rdf4lod-source-me-for-logd.sh:export CSV2RDF4LOD_PUBLISH_VIRTUOSO="false" 
/srv/logd/config/triple-store/virtuoso/csv2rdf4lod-source-me-for-virtuoso-credentials.sh:export CSV2RDF4LOD_PUBLISH_VIRTUOSO="false" #

Clone this wiki locally