Skip to content

Enhancement Parameters Reference

timrdf edited this page Jan 22, 2011 · 7 revisions

== Parameters ==

=== Essential parameters ===

==== Data collection's base URI ==== e.g. http://logd.tw.rpi.edu

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1554/version/2010-Feb-13/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1554"; conversion:dataset_version "2010-Feb-13"; .

==== Data source identifier ==== e.g. data-gov, or recovery-gov

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1554/version/2010-Feb-13/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1554"; conversion:dataset_version "2010-Feb-13"; .

==== Data source's identifier for dataset ==== e.g. 1623 from data.gov

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1554/version/2010-Feb-13/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1554"; conversion:dataset_version "2010-Feb-13"; .

==== Dataset version ==== e.g. "2010-Feb-13"

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1554/version/2010-Feb-13/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1554"; conversion:conversion:version_identifier "2010-Feb-13"; .

@DEPRECATED: conversion:dataset_version; use conversion:version_identifier. status: both still being asserted, but conversion:dataset_version will go away eventually.

=== Citing a property (using ov:csvCol and/or conv:property_name) === The property to enhance can be cited using either the ov:csvCol or conversion:property_name properties. The ov:csvHeader is an owl:AnnotationProperty and should only be used as an editing aid.

e.g. Dataset 1147 with existing enhancement parameters file.

  conversion:enhance [
     ov:csvCol    1;
     ov:csvHeader "blah blah"
  ];

e.g. Dataset 1146 that resembles Dataset 1147, but with columns swapped. Want to just refer to the property by the resulting local name, not by the csvCol.

  conversion:enhance [
     ov:csvHeader "blah blah"
     conv:property_name "blah_blah"; 
  ];

TODO: if the csvHeader exists in the param file and does not match, report a message.

=== Raw parameters ===

==== Data file to be converted ==== URL to local file or web

==== Character encoding ==== Files could be in any of the following formats (98% coverage):

  • UTF-8
  • ISO-8859-1 (aka Latin-1)
  • ISO-8859-9
  • Windows Latin-1

All of these are ASCII

in end, make unicode

e.g., Dataset 1627

<92>

apostrophes are the common culprit

e.g., Dataset 1530

<93>blind<94>

e.g., Dataset 1450

raw:organization_name "Medicare y Mucho M·s" com.hp.hpl.jena.shared.JenaException: com.hp.hpl.jena.riot.ParseException: [Line:24239,Col:49] Unknown char: ?(183)

==== Structure assistance parameters ====

===== Does header start on line 1? ===== Some CSVs include #Top matter such as titles and summaries at the top. A naive conversion would attempt to produce data triples out of these non-data CSV rows. If the column headers appear on a later line, then the conversion tool needs to know to avoid producing invalid data triples.

For example, Dataset 1623 offers an Excel spreadsheet that can be converted to csv. The header starts on line 7 and the data starts on line 8: (line numbers added) [1] Office of Medicare Hearings and Appeals (OMHA),,,,,, [2] Claims Listed by State,,,,,, [3] "As of January 7, 2010",,,,,, [4] [5] [6] "Table 1. List of Total Claims Received by Region, State, and Fiscal Year",,,,,, [7] Region,State,Fiscal Year 06,Fiscal Year 07,Fiscal Year 08,Fiscal Year 09,Total [8] Mid-Atlantic,District of Columbia,12,289,342,376,"1,019"

The structure parameters for Dataset 1623 would look like the following.

@prefix conversion: http://logd.tw.rpi.edu/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2009-May-18/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1623"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvRow 7; a conversion:HeaderRow; ];

An opposite extreme to including top matter in a CSV is to exclude a header row. If there is no header row, then the HeaderRow should be explicitly set to 0 (file line counting starts at 1) to avoid interpreting the first data row as a header.

Other datasets that benefit from this conversion parameter include Dataset 1590, Dataset 1572, and Dataset 1574.

===== Does data start immediately after the header? ===== defaults to conversion:HeaderRow + 1 if not specified.

e.g., Dataset 1612

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1612/version/2009-May-18/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1612"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:subject_discriminator "ActiveDuty_MaritalStatus-total"; conversion:enhance [ # All row and column numbers are one-based. ov:csvRow 9; a conversion:HeaderRow; ]; conversion:enhance [ # This is not necessary since 10 = 9 + 1, but can be used to specify a different row. ov:csvRow 10; a conversion:DataStartRow; ]; conversion:enhance [ # Both DataStartRow and DataEndRow are inclusive. ov:csvRow 37; a conversion:DataEndRow; ]; conversion:enhance [ ov:csvCol 1; conversion:range rdfs:Literal; ]; conversion:enhance [ ov:csvCol 2; rdfs:label "Pay Grade"; conversion:label "pay grade"; conversion:range rdfs:Resource; ];

===== Does data stop before the last line? ===== All rows are processed and converted to triples unless this parameter is set.

The enhancement parameter #Only if column parameter, can also be used to ensure that triples are produced from only data rows (and not top matter like titles or bottom matter like footnotes).

Avoids attempting to interpret #Bottom matter as a data row.

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1322/version/2009-May-18/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1322"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvRow 61; a conversion:DataEndRow; ];

===== Repeat column's previous value if empty parameter ===== Some abbreviations are used in CSVs that are authored by humans in Excel. One such abbreviation is to leave empty cells when the value repeats. These implicit values can be filled in by using the RepeatPreviousIfEmptyEnhancement. If, after passing the onlyIfCol test, a value is not present in a CSV row, use the value from the previous row.

e.g., Dataset 1623

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2009-May-18/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1623"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvRow 7; a conversion:HeaderRow; ]; conversion:enhance [ ov:csvCol 1; conversion:label "Region"; conversion:range rdfs:Resource; a conversion:Repeat_previous_if_empty_column;

      <font color="#777777">a conversion:TypedResourcePromotion;</font>
      conversion:range_name "Region";
   ];

Other datasets that benefit from this structural parameter include Dataset 311, and Dataset 10030.

===== More ===== Some more structural assistance parameters are listed in the enhancements section: #Structure assistance enhancements.

==== Column label parameter ==== By default, RDF predicates are created using the column titles in the CSV's header. If two columns have identical titles, distinct predicates are ensured by appending "_2" to the first duplicate, "_3" to the second duplicate, and so on. If, however, the header is missing or was missed during parsing, a substitute can be provided using the conversion:label enhancement. The conversion:label enhancement can also be used in cases where the header is unusually long and a more concise predicate is desirable. The label may contain a first capital and spaces just as one would use in rdfs:label. The conversion utility will lower case the entire string and replace spaces with underscores.

e.g. Dataset 1450

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1450/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:sourceIdentifier "data-gov"; conversion:datasetIdentifier "1450"; conversion:datasetVersion "18-May-2009"; conversion:conversion_process [ a conversion:EnhancementProcess; conversion:enhancement_identifier "1";

    conversion:enhance [
      ov:csvCol           2;
      ov:csvHeader       "STAR (*) INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE";
      conversion:<font color="#FF0000">label   "Offers only in this state"</font>;
      conversion:range    rdfs:Literal;
   ];
];

.

@prefix ds1450: http://logd.tw.rpi.edu/source/data-gov/dataset/1450/version/2009-May-18/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1450/vocab/raw/ .

raw:star_indicates_that_organization_only_offers_employer_plans_in_this_state ov:csvCol "2"^^xsd:integer ; ov:csvHeader "STAR (*) INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE" ; rdfs:label "STAR * INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE" ; rdfs:range rdfs:Literal .

ds1450:thing_2 raw:state "Alabama" ; raw:star_indicates_that_organization_only_offers_employer_plans_in_this_state "*" ;

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1450/vocab/enhancement/1/ .

e1:offers_only_in_this_state ov:csvCol "2"^^xsd:integer ; ov:csvHeader "STAR (*) INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE" ; rdfs:label "Offers only in this state" ; rdfs:range rdfs:Literal .

ds1450:thing_2 e1:state "Alabama" ; e1:offers_only_in_this_state "*" .

(This property would also benefit from the #xsd:boolean to make the "*" a "true"^^xsd:boolean, a #Typed resource promotion enhancement to make "Alabama" a URI, and a ObjectSameAsEnhancement to link :Alabama to DBPedia, Geonames, and GovTrack's URI for Alabama)

This would effect the name of the predicate URI created from the column label.

Other datasets that benefit from this enhancement include Dataset 8, Dataset 9, Dataset 10, Dataset 32, Dataset 33, Dataset 59, Dataset 90, Dataset 311, Dataset 401, Dataset 402, Dataset 403, Dataset 1000, Dataset 1133, Dataset 1171, Dataset 1322, Dataset 1330, Dataset 1350, Dataset 1359, Dataset 1374, Dataset 1450, Dataset 1491, Dataset 1492, Dataset 1530, Dataset 1564, Dataset 1571, Dataset 1612, Dataset 1623, Dataset 1627, Dataset 1930, Dataset 1961.

==== Column comment parameter ==== In the previous example for conversion:label, we renamed the long header to create a more concise predicate. However, we lost a bit of meaning for how the values should be interpreted. That long description should be preserved as a rdfs:comment, and conversion:comment will do just that.

e.g. Dataset 1450

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1450/params/enhancement/2/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:sourceIdentifier "data-gov"; conversion:datasetIdentifier "1450"; conversion:datasetVersion "18-May-2009"; conversion:conversion_process [ a conversion:EnhancementProcess; conversion:enhancement_identifier "2"; conversion:enhance [ ov:csvCol 2; ov:csvHeader "STAR () INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE"; conversion:comment "STAR () INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE"; conversion:label "Offers only in this state"; conversion:range rdfs:Literal; ]; ]; .

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1450/vocab/enhancement/1/ .

e1:offers_only_in_this_state ov:csvCol "2"^^xsd:integer ; ov:csvHeader "STAR (*) INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE" ; rdfs:label "Offers only in this state" ; rdfs:range rdfs:Literal .

'''becomes'''

@prefix e2: http://logd.tw.rpi.edu/source/data-gov/dataset/1450/vocab/enhancement/2/ .

e2:offers_only_in_this_state ov:csvCol "2"^^xsd:integer ; ov:csvHeader "STAR () INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE" ; rdfs:label "Offers only in this state" ; rdfs:range rdfs:Literal ; rdfs:comment "STAR () INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE" .

==== Sample value ==== When observing the enhancement parameters, it is useful to see a sample value. conversion:eg is an owl:AnnotationProperty that provides sample values from the column.

:dataset a void:Dataset; conversion:base_uri "http://data-gov.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "whitehouse-gov"; conversion:dataset_identifier "visitor-records"; conversion:dataset_version "0510"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvCol 13; ov:csvHeader "APPT_END_DATE"; conversion:comment "Date and time for which the appointment was scheduled to end"; conversion:eg "9/28/200911:59:00PM"; conversion:datetime_pattern "M/d/yy HH:mm", "M/d/yyyyhh:mm:ssaa"; conversion:range xsd:dateTime; conversion:datetime_timezone -300; ];

==== Domain template ==== TODO

If a property is designated as a primary key, the URI of the subject is changed to incorporate the property name and its value.

e.g. Dataset 1530

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1530/version/2009-May-18/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1530"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:EnhancementProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvCol 1; ov:csvHeader "Request ID"; conversion:label "Request ID"; conversion:range rdfs:Literal;

      <font color="#FF0000">a conversion:PrimaryKeyEnhancement</font>;
   ];

@prefix ds1530: http://logd.tw.rpi.edu/source/data-gov/dataset/1530/version/2009-May-18/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/raw/ .

<http://logd.tw.rpi.edu/source/data-gov/dataset/1530/version/2009-May-18/thing_1> raw:request_id "07-F-0001"; raw:requester_name "Connolly, Ward" .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/enhancement/1/ .

<http://logd.tw.rpi.edu/source/data-gov/dataset/1530/version/2009-May-18/request_id/07-F-0001> e1:request_id "07-F-0001"; e1:requester_name "Connolly, Ward" .

Other datasets that benefit from this enhancement include Dataset 32, Dataset 1627, and Dataset 1530.

===== Composite domain template =====

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/financial-yahoo/version/2009-May-18/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "financial-yahoo"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ conversion:domain_template "[#1]-[#3]-[#4]"; # Only one is needed; they are equivalent. ]; conversion:enhance [ ov:csvCol 1; conversion:domain_template "[.]-[#3]-[#4]"; # Only one is needed; they are equivalent. ];

@prefix financial-yahoo: http://logd.tw.rpi.edu/source/data-gov/dataset/financial-yahoo/version/2009-May-18/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/financial-yahoo/vocab/raw/ .

financial-yahoo:thing_2 raw:symbol "CLK10.NYM" ; raw:last_trade_date "4/7/2010" ; raw:last_trade_time "8:48am" ; ov:csvRow 2 .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/financial-yahoo/vocab/enhancement/1/ .

financial-yahoo:CLK10_NYM_4_7_2010_8_48am raw:symbol "CLK10.NYM" ; raw:last_trade_date "4/7/2010" ; raw:last_trade_time "8:48am" ; ov:csvRow 2 .

The special case conversion:domain_template "[.]" is equivalent to a conversion:PrimaryKeyEnhancement

==== Subject discriminator parameter ==== If a dataset has multiple CSVs, converting each will result in the same names for different rows from each file. This can be avoided by tucking in an extra level to the #dataset identifier, but it then becomes impossible to query for all rows that came from a particular file.

e.g., Dataset 1350 has appe.xls with two tabs "APP A" and "App e" that can be exported to CSV.

conv:dataset_identifier "1350/app-a"; ... conv:dataset_identifier "1350/app-e";

Other datasets that benefit from this enhancement include Dataset 326, Dataset 1612, and Dataset 10030.

see also #Annotation triple parameter.

=== Enhancement parameters ===

==== Enhancement identifier ==== :dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "dfid-gov-uk"; conversion:dataset_identifier "sid-2009"; conversion:dataset_version "2009-Nov-10"; conversion:conversion_process [ a conversion:EnhancementProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvCol 1; ov:csvHeader ""; conversion:label "Country"; ]; ]; .

@prefix sid-2009: http://logd.tw.rpi.edu/source/dfid-gov-uk/dataset/sid-2009/version/2009-Nov-10/ . @prefix raw: <http://logd.tw.rpi.edu/source/dfid-gov-uk/dataset/sid-2009/vocab/raw/> .

sid-2009:thing_1 raw:column_1 "Algeria" .

'''becomes'''

@prefix e1: <http://logd.tw.rpi.edu/source/dfid-gov-uk/dataset/sid-2009/vocab/enhancement/1/> .

sid-2009:thing_1 e1:country "Algeria" .

==== Enhancement author ==== :dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "dfid-gov-uk"; conversion:dataset_identifier "sid-2009"; conversion:dataset_version "2009-Nov-10"; conversion:conversion_process [ a conversion:EnhancementProcess; conversion:enhancement_identifier "1"; conversion:subject_discriminator "america"; conversion:author http://logd.tw.rpi.edu/wiki/Special:URIResolver/Tim_Lebo;

==== Patterns vs. Templates ==== In this conversion vocabulary, ''patterns'' are specified to '''guide the parsing''' of an original input value, while ''templates'' are used to '''construct''' literals and URIs to assert in the resulting RDF.

The following conversion vocabulary predicates specify '''patterns''':

The following conversion vocabulary predicates can specify '''templates''':

The following conversion vocabulary predicates '''cannot''' specify '''templates''':

  • conversion:range_name (only a local label may be specified, use conversion:subclass_of to link to the external class)

==== Template variables ==== Templates may specify variables that will be populated with values relevant to the input and conversion parameters.

===== Input template variables ===== Three other variables can be used in the template: [e] - the dataset's enhancement_identifier [r] - the row of this value [c] - the column of this value

===== Enhancement parameters template variables ===== Four namespace variables can be used in the template: [/] - the value of #Data collection's base URI. [/s] - [/] with "source/" and the value of #Data source identifier appended. [/sd] - [/s] with "dataset/" and the value of #Data source's identifier for dataset appended. [/sdv] - [/sd] with "version/" and the value of #Dataset version appended.

[@] - The local name of the property created for the current column. (NOTE: not implemented) [T] - The conversion:range_name of the property created for the current column. (NOTE: not implemented)

===== Row's other values variables =====

[.] - The value of the cell.

The values of a row's columns can be referenced using either the column index or the local name of the property created for the column. When referencing the column index, a '#' precedes the integer. When referencing the property's local name, an '@' precedes the local name. For example, the following references the value in the first column:

[#1] - the value of the cell in column 1.

And the following references the value of the column with header "Property local name":

[@property_local_name] - the value of the cell in the column that became the predicate property_local_name*.

  • NOTE: if multiple columns become named the same property, this will be more than one value.

==== Domain template (renaming "thing_2") ==== In #Example Input 1, the first column names the president being described, but his URI becomes:

http://logd.tw.rpi.edu/data-gov/dataset/1627/version/02-05-2010/thing_1

Although URIs are to be "opaque" and "thing_1" is "just as good' as "George_Washington", developers are still human and could use a break.

When enhancing CSV: conv:enhance [ ov:csvCol 1; ov:csvHeader "Name"; conv:range rdfs:Resource; a conv:Primary_key; ];

When enhancing raw RDF: conv:enhance [ conv:property_name "name"; conv:range rdfs:Resource; a conversion:Primary_key; ];

http://logd.tw.rpi.edu/data-gov/dataset/1627/version/02-05-2010/thing_1 becomes http://logd.tw.rpi.edu/data-gov/dataset/1627/version/02-05-2010/name/George_Washington

Note that the property local name is incorporated into the URI.

(unfinished extension: multi-value key. Described by conv:primaryKeys ( 1 3 4 ) multiple a conv:Primary_key with implicit ordering imposed by ov:csvCol ordering.)

TODO: assert that the property used for primary key is subproperty dc:identifier.

===== A column's value should be used for subject's local name ===== TODO conversion:enhance [ conversion:domain_template "[#1]"; ];

e.g., the minimal conversion of ftp://ftp.bls.gov/pub/time.series/gp/gp.charact from Dataset 326 looks like:

@prefix ov: http://open.vocab.org/terms/ . @prefix d326: http://logd.tw.rpi.edu/dataset/326/gp.charact/thing_ . @prefix d326p: http://logd.tw.rpi.edu/property/326/gp.charact/ . @prefix d326v: http://logd.tw.rpi.edu/value/326/gp.charact/ .

http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/ {

<http://logd.tw.rpi.edu/dataset/326/gp.charact/'''thing_2'''> 
        d326p:column_1 "0020" ;
        d326p:column_2 "White, 16+;" ;
        ov:csvRow      "2"^^<http://www.w3.org/2001/XMLSchema#int> 
. 

<http://logd.tw.rpi.edu/dataset/326/gp.charact/'''thing_3'''> 
        d326p:column_1 "0062" ;
        d326p:column_2 "men, 16+;" ;
        ov:csvRow      "3"^^<http://www.w3.org/2001/XMLSchema#int> 
.

}

An identifying tag of "326" is used for the data, while "326/gp.charact/" is used from the source's identifying tag for the supporting files.

The names "thing_2" and "thing_3" are created because they are the 2nd and 3rd data entry and a class name was not provided as an enrichment parameter. If the primary key column parameter of "1" is given, the following names are used:

http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/ {

<http://logd.tw.rpi.edu/dataset/326/gp.charact/'''0020'''> 
        d326p:column_1 "0020" ;
        d326p:column_2 "White, 16+;" ;
        ov:csvRow      "2"^^<http://www.w3.org/2001/XMLSchema#int> 
.

<http://logd.tw.rpi.edu/dataset/326/gp.charact/'''0062'''> 
        d326p:column_1 "0062" ;
        d326p:column_2 "men, 16+;" ;
        ov:csvRow      "3"^^<http://www.w3.org/2001/XMLSchema#int> 
.

}

NOTE: predicates created from primary key columns should be asserted as rdfs:subPropertyOf dc:identifier (this will allow UIs to avoid rendering this triple b/c it is in the URI)

===== A column's value is a URI and should be used as the subject ===== TODO

conversion:enhance [ conversion:domain_template "[#1]"; ];

Note: This is the same parameters as above, but the value is recognized as a URI and is treated as one instead of just a literal.

While augmenting Dataset 326, CSV is:

http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/2732,women,16+,"service occupations;" http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/2719,women,16+,"technicians and related support;" http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/2745,women,16+,"transportation and material moving;" http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/0084,women,16+;,"" http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/0086,women,16-19;,"" http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/0099,women,White,"16+;"

Result should be: http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/2732 p:col_1 "women"; p:col_2 "16+"; p:col_3 "service occupations;" . ...

use of #Superproperty of all predicates created would also make sense, given that the positions change (e.g., "16+"):

http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/2352,16+,adminstrative,"support including clerical" http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/2336,16+,executive,"adminstrative, and managerial"

===== Several columns' values should be combined for subject's local name =====

==== Column label parameter ====

see #Column label parameter

==== Range template ==== The default conversion:range_template for un-typed resource promotions is:

conversion:range_template "[/svd]value-of/[@]/[.]";

The default conversion:range_template for typed resource promotions is:

conversion:range_template "[/svd]typed/[T]/[.]";

(see #Template variables for how they are used to within templates.)

==== Datatype promotion parameters ====

Accepted values for datatype casting, and their frequency of use (in number of columns):

xsd:nonNegativeInteger (78) xsd:integer (306) xsd:gYear (6) xsd:decimal (1125) xsd:dateTime (12) xsd:date (12) xsd:boolean (7) rdfs:Resource (120) rdfs:Literal (52)

===== xsd:integer ===== @prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/8/version/2010-May-19/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "8"; conversion:dataset_version "2010-May-19"; conversion:conversion_process [ a conversion:EnhancementProcess; conversion:enhancement_identifier "1"; ... conversion:enhance [ ov:csvCol 5; ov:csvHeader "RANK"; conversion:label "RANK"; conversion:comment "Annual rank of the 8-hour daily max."; conversion:range xsd:integer; ];

@prefix ds8: http://logd.tw.rpi.edu/source/data-gov/dataset/8/version/2010-May-19/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/8/vocab/raw/ .

ds8:thing_1 raw:rank "117" ; ov:csvRow "2"^^xsd:integer .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/8/vocab/enhancement/1/ .

ds8:thing_1 e1:rank "117"^^xsd:integer ; ov:csvRow "2"^^xsd:integer .

====== Multiplier parameter ====== e.g. Dataset 326 provides counts of people in units of one thousand. The essential conversion of Dataset 326 looks like:

http://logd.tw.rpi.edu/dataset/326/ { http://logd.tw.rpi.edu/dataset/326/thing_1 p326:series_id "GPU00100000E0000"; p326:year "1981"; p326:period "A01"; p326:value "1491"; ov:csvRow 1; .

p326:period 
       rdfs:range    rdfs:Literal;
       ov:csvHeader "period";
       ov:csvCol     3;
.

}

Providing a multiplier to scale value during conversion:

http://logd.tw.rpi.edu/data-gov/conversionParams/326/enrichment/1 { [] conv:dataset_source "data-gov" . [] conv:source_identifier "326" . [] conv:enrichment_identifier "1" .

[] ov:csvRow 4;  
   enrichment:range      xsd:int;
   enrichment:multiplier 1000 .

}

Would result in:

@prefix p326e1: http://logd.tw.rpi.edu/data-gov/property/326/enrichment/1/

http://logd.tw.rpi.edu/dataset/326/ {

<http://logd.tw.rpi.edu/dataset/326/thing_1> 
       p326e1:series_id "GPU00100000E0000";
       p326e1:year      "1981";
       p326e1:period    "A01";
       p326e1:value      1491000 ;
       ov:csvRow         1;
.

}

===== xsd:boolean ===== The default recognized lexical representations are (case insensitive): 'yes', 'no', 'true', 'false', '0', and '1'. The conv:boolean_true and conv:boolean_false properties may be used to add additional lexical forms.

If any values new lexical forms a provided, all defaults are overridden. This is to avoid mis-interpretation. For example, HINTS 2005 uses 1 and 2 for true and false).

e.g., Dataset 1571

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1571"; conversion:dataset_version "2010-Apr-08"; conversion:conversion_process [ a conversion:EnhancementProcess; conversion:enhancement_identifier "1"; conversion:interpret [ conversion:symbol " - "; conversion:interpretation conversion:null; ]; conversion:enhance [ ov:csvRow 9; a conversion:HeaderRow; ]; conversion:enhance [ ov:csvRow 29; a conversion:DataEndRow; ]; conversion:enhance [ ov:csvCol 1; ov:csvHeader " Year "; conversion:label "Year"; conversion:range xsd:gYear; ]; conversion:enhance [ ov:csvCol 2; ov:csvHeader ""; conversion:label "Illegal act was responsible"; conversion:range xsd:boolean; conversion:interpret [ conversion:symbol "*"; conversion:interpretation true; ]; ];

@prefix ds1571: http://logd.tw.rpi.edu/source/data-gov/dataset/1571/version/2010-Apr-08/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1571/vocab/raw/ .

ds1571:thing_5 raw:year "1994" ; raw:column_2 "*" .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1571/vocab/enhancement/1/ .

ds1571:thing_5 e1:year "1994"^^xsd2:gYear ; e1:illegal_act_was_responsible "true"^^xsd:boolean .

Other datasets that benefit from this enhancement include Dataset 1450 (:offers_only_in_this_state */), Dataset 1171 (:chairperson Yes/No), Dataset 1491 (:pa_program_declared Yes/No), and Dataset 1492 (:education_applicant Yes/No).

===== xsd:gYear =====

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "epa-gov-mcmahon-ethan"; conversion:dataset_identifier "environmental-reports"; conversion:dataset_version "2011-Jan-12"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:subject_discriminator "enviro-reports-and-indicators"; conversion:enhance [ ov:csvRow 2; a conversion:HeaderRow; ]; conversion:enhance [ ov:csvCol 4; ov:csvHeader "Year"; conversion:label "Year"; conversion:comment ""; conversion:range xsd:gYear; ];

@prefix environmental-reports-enviro-reports-and-indicators: http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/enviro-reports-and-indicators/version/2011-Jan-12/ . @prefix raw: http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/vocab/raw/ . @prefix e1: http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/vocab/enhancement/1/ .

environmental-reports-enviro-reports-and-indicators:thing_3 dcterms:isReferencedBy http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/version/2011-Jan-12 ; raw:id_no "16" ; raw:title "City of Bowie State of the Environment Report" ; raw:organization "Department of Planning and Economic Development" ; raw:year "2009" ;

'''becomes'''

environmental-reports-enviro-reports-and-indicators:thing_3 dcterms:isReferencedBy http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/version/2011-Jan-12 ; e1:id_no "16" ; e1:title "City of Bowie State of the Environment Report" ; e1:organization "Department of Planning and Economic Development" ; e1:year "2009"^^xsd2:gYear ;

===== xsd:date pattern processing =====

e.g. Dataset 1627

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1627/version/2010-Apr-09/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1627"; conversion:dataset_version "2010-Apr-09"; conversion:conversion_process [ a conversion:EnhancementProcess; conversion:enhancement_identifier "1"; ... conversion:enhance [ ov:csvCol 4; ov:csvHeader "Received Date"; conversion:label "Received Date"; conversion:range xsd:date; conversion:date_pattern "MM/dd/yy"; # Java style - currently implemented conversion:date_pattern "%m/%d/%y"; # strftime style - desirable implementation ];

Both perl's strftime and [http://java.sun.com/j2se/1.5.0/docs/api/java/text/SimpleDateFormat.html Java's pattern] should be accepted.

@prefix ds1627: http://logd.tw.rpi.edu/source/data-gov/dataset/1627/version/2010-Apr-09/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1627/vocab/raw/ .

ds1627:thing_1 raw:received_date "12/19/07" .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1627/vocab/enhancement/1/ .

ds1627:thing_1 e1:received_date "2007-12-19"^^xsd:date .

Other datasets that benefit from this enhancement include Dataset 957, Dataset 1171, Dataset 1350, Dataset 1359, Dataset 1374, Dataset 1492, Dataset 1530, Dataset 1577, and Dataset 1627.

TODO: xsd:date timezone is specified in number of minutes from GMT [http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/datatype/XMLGregorianCalendar.html#getTimezone%28%29])

In future revisions, 'date_pattern' will be replaced with simply 'pattern'. The notion of "date" will be indicated by the conversion:range.

===== xsd:dateTime pattern processing ===== Dates are formatted in a myriad ways. Fortunately, there are common conventions for the components of a date. Promoting a literal to an xsd:date or xsd:dateTime requires a pattern that should be used to parse the value correctly.

e.g. Dataset 10025

  conv:enhance [
     a conv:DateTimePromotionEnhancement ;
     conv:property_name "toa" ;
     <font color="#FF0000">conv:datetime_pattern</font>  "%m/%d/%y %H:%M";
     <font color="#FF0000">conv:datetime_timezone</font> "-05:00";
  ] ;

  conversion:enhance [
     ov:csvCol         7;
     ov:csvHeader     "TOA";
     <font color="#FF0000">conv:range</font>                     xsd:dateTime;
     <font color="#FF0000">conv:datetime_pattern</font>         "M/d/yy HH:mm";
     <font color="#FF0000">conv:datetime_timezone_offset</font>  -300;
     conversion:label   "Time of Arrival";
     conversion:comment "Time of Arrival";
  ];

raw:toa "12/23/09 11:08" '''becomes''' e1:toa "2009-12-23T11:08:00-05:00"^^xsd:dateTime

(NOTE: perl (above) vs java: conversion:datetime_pattern "M/d/yy HH:mm"; [http://java.sun.com/j2se/1.5.0/docs/api/java/text/SimpleDateFormat.html]) (timezone is specified in number of minutes from GMT [http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/datatype/XMLGregorianCalendar.html#getTimezone%28%29])

In future revisions, 'datetime_pattern' will be replaced with simply 'pattern'. The notion of "datetime" will be indicated by the conversion:range.

====== Multiple patterns for same range type ====== If two patterns are used to produce the same range:

  conversion:enhance [
     ov:csvCol        7;
     ov:csvHeader    "TOA";
     onversion:range            xsd:dateTime;
     conversion:datetime_pattern "M/d/yy HH:mm", <font color="#FF0000">"M/d/yyyyhh:mm:ssaa"</font>;
     conversion:datetime_timezone -300;
     conversion:label   "Time of Arrival";
     conversion:comment "Time of Arrival";
  ];

If two patterns are used to produce different ranges, two enhancements need to be made.

  conversion:enhance [
     ov:csvCol        11;
     ov:csvHeader    "APPT_MADE_DATE";
     <font color="#FF0000">conversion:range            xsd:dateTime;
     conversion:datetime_pattern "M/d/yy HH:mm";</font>
     conversion:datetime_timezone -300;
     conversion:comment "Date the Appointment was made.";
  ];
  conversion:enhance [
     ov:csvCol        11;
     ov:csvHeader    "APPT_MADE_DATE";
     <font color="#FF0000">conversion:range            xsd:date;
     conversion:date_pattern "M/d/yy";</font>
     conversion:comment "Date the Appointment was made.";
  ];

Other datasets that benefit from this enhancement include Dataset 8, Dataset 9, Dataset 32 ("Thursday, May 13, 2010 05:11:55 UTC"), Dataset 1171, and Dataset 1491 ("1999/09/24 17:45:00").

===== Different patterns for the same column leading to different datatypes ===== If date values for the same column appear in multiple formats.

e.g. Dataset 10025

  conversion:enhance [
     ov:csvCol        11;
     ov:csvHeader    "APPT_MADE_DATE";
     conversion:range xsd:dateTime;
     <font color="#FF0000">conversion:datetime_pattern "M/d/yy HH:mm";
     conversion:datetime_pattern "M/d/yy"
     conversion:datetime_timezone -300</font>;
     conversion:comment "Date the Appointment was made.";
  ];
  conversion:enhance [
     ov:csvCol        11;
     ov:csvHeader    "APPT_MADE_DATE";
     conversion:range xsd:date;
     <font color="#FF0000">conversion:date_pattern "M/d/yy"</font>;
     conversion:comment "Date the Appointment was made.";
  ];

dsvisitor-records:thing_231 raw:appt_start_date "12/14/09" . dsvisitor-records:thing_232 raw:appt_start_date "12/3/09 18:30" .

'''become'''

dsvisitor-records:thing_231 e1:appt_start_date "2009-12-14"^^xsd:date . dsvisitor-records:thing_232 e1:appt_start_date "2009-12-14T18:30"^^xsd:dateTime .

===== Email, phone, zip, etc ===== e.g., Dataset 1450

ov:csvCol 10; 1-205-930-5520

===== wgs:lat and wgs:long ===== e.g. Dataset 32

==== Resource promotion parameters ==== (promoting resources is a good thing for Linked Data.)

Setting the conversion:range to rdfs:Resource, without any further parameters, will do one of two things. If the value is already a URI (guessing via containing "://"), the value will be cast to a URI. If the value is not a URI, one will be created using the predicate-scoped URI construction technique. The former behavior can be requested explicitly by typing the enhancement to type conversion:CastResourcePromotion, while the latter behavior can be requested explicitly by typing the enhancement to type conversion:PredicateScopedResourcePromotion.

e.g., Dataset 1564 benefits from the default '''resource casting'''

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1564/version/2009-May-18/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1564"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvCol 8; ov:csvHeader "Link"; conversion:label "Link"; conversion:comment "The Link field provides a link to the Structured Product Labeling (SPL) information associated with each animal drug product listed electronically."; conversion:range rdfs:Resource; ];

@prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1564/vocab/raw/ . @prefix ds1564: http://logd.tw.rpi.edu/source/data-gov/dataset/1564/version/2009-May-18/ .

ds1564:thing_1 raw:link "http://www.accessdata.fda.gov/spl/data/fcac4de8-8e3a-4108-a98b-f206626020cc/fcac4de8-8e3a-4108-a98b-f206626020cc.xml" ; ov:csvRow "2"^^xsd:integer .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1564/vocab/enhancement/1/ .

ds1564:thing_1 e1:link <http://www.accessdata.fda.gov/spl/data/fcac4de8-8e3a-4108-a98b-f206626020cc/fcac4de8-8e3a-4108-a98b-f206626020cc.xml> ; ov:csvRow "2"^^xsd:integer .

e.g., Dataset 8 benefits from the default '''predicate-scoped resource promotion'''

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "8"; conversion:dataset_version "2010-May-19"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvCol 1; ov:csvHeader "SITE_ID"; conversion:label "SITE_ID"; conversion:comment "Site identification code."; conversion:range rdfs:Resource; ];

@prefix ds8: http://logd.tw.rpi.edu/source/data-gov/dataset/8/version/2010-May-19/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/8/vocab/raw/ .

ds8:thing_1 raw:site_id "ANL146" ; ov:csvRow "2"^^xsd:integer .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/8/vocab/enhancement/1/ .

ds8:thing_1 e1:site_id <http://data-gov.tw.rpi.edu/source/data-gov/dataset/8/value-of/site_id/ANL146> ; ov:csvRow "2"^^xsd:integer .

Other datasets that benefit from this enhancement include Dataset 8, Dataset 9, Dataset 10, Dataset 32, Dataset 311, Dataset 401, Dataset 402, Dataset 403, Dataset 957, Dataset 1000, Dataset 1146, Dataset 1147, Dataset 1148, Dataset 1149, Dataset 1171, Dataset 1322, Dataset 1330, Dataset 1350, Dataset 1356, Dataset 1359, Dataset 1374, Dataset 1450, Dataset 1491, Dataset 1492, Dataset 1530, Dataset 1554, Dataset 1564, Dataset 1577, Dataset 1612, Dataset 1623, and Dataset 1627.

===== Predicate-scoped resource promotion parameter ===== Per 3 May discussions:

http://logd.tw.rpi.edu/source/data-gov/dataset/1147/type/badge-number/78026 . # A typed resource promotion vs http://logd.tw.rpi.edu/source/data-gov/dataset/1147/value/bdgnbr/78026 . # A predicate-scoped resource promotion. vs http://logd.tw.rpi.edu/source/data-gov/dataset/1147/78026 # All promotions go to same value space

(badge-number is a class, bdgnbr is a property)

another example when subject discriminators are used (multiple files in a dataset - e.g., Dataset 10030):

http://logd.tw.rpi.edu/source/dfid-gov-uk/dataset/sid-2009/africa/type/country/Algeria # A typed resource promotion vs http://logd.tw.rpi.edu/source/dfid-gov-uk/dataset/sid-2009/africa/value/recipient_country/Algeria # A predicate-scoped resource promotion. vs http://logd.tw.rpi.edu/source/dfid-gov-uk/dataset/sid-2009/africa/Algeria # All promotions go to same value space

A type of crutch promotion that uses the name of the column instead of the value of another column to create the URI of the value.

dsvisitor-records:thing_1 raw:bdgnbr "78026" . becomes dsvisitor-records:thing_1 e1:bdgnbr http://logd.tw.rpi.edu/whitehouse-gov/dataset/visitor-records/version/2010-Mar-26/bdgnbr/78026 .

This takes less effort than Typed Resource Promotion, since the user does not need to specify the type.

===== Cast resource promotion parameter ===== The value in the column is already a URL or URI and simply needs to be cast to a resource instead of a literal.

Casting is a default when conversion:range rdfs:Resource is used and the value contains the "://". An example is available at #Resource promotion parameters. However, casting can be explicitly requested by typing the enhancement to type conversion:CastResourcePromotion.

augmenting Dataset 1564 example from above with an explicit request to cast:

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1564/version/2009-May-18/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1564"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvCol 8; ov:csvHeader "Link"; conversion:label "Link"; conversion:comment "The Link field provides a link to the Structured Product Labeling (SPL) information associated with each animal drug product listed electronically."; a conversion:CastResourcePromotion; conversion:range rdfs:Resource; ];

Other datasets that benefit from this enhancement include Dataset 92, and Dataset 1564.

===== Typed resource promotion ===== e.g., Dataset 1530

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1530"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvCol 2; ov:csvHeader "Requester Name"; conversion:label "Requester Name"; conversion:range rdfs:Resource;

      <font color="#777777">a conversion:TypedResourcePromotion;</font>
      conversion:<font color="#FF0000">range_name "Requester"</font>;
   ];

@prefix ds1530: http://logd.tw.rpi.edu/source/data-gov/dataset/1530/version/2009-May-18/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/raw/ .

ds1530:thing_1 raw:request_id "07-F-0001"; raw:requester_name "Connolly, Ward" .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/enhancement/1/ .

ds1530:thing_1 e1:request_id "07-F-0001"; e1:requester_name <http://logd.tw.rpi.edu/source/data-gov/dataset/1530/type/requester/Connolly_Ward> .

<http://logd.tw.rpi.edu/source/data-gov/dataset/1530/'''type/requester/Connolly_Ward'''> a <http://logd.tw.rpi.edu/source/data-gov/dataset/1530/Requester>; rdfs:label "Connolly, Ward" .

Other datasets that benefit from this enhancement include Dataset 10025 ("Badge").

===== Range template resource promotion ===== Range templates use #Template variables to specify the triple object to assert. They apply to both literal and resource objects.

TODO: This may be better named Object template.

====== Local promotion ====== Often, a single value is insufficient to uniquely identify a concept. This is a problem when promoting values to URIs -- if the URIs happen to match, then they are presumed to be the same thing. Incorporating additional values when constructing a URI can avoid this situation. A single value can use other values as a "crutch" to promote itself to a unique URI.

(Note, Range template promotion used to be called Crutch resource promotion).

e.g. Dataset 1374 mentions city names, but their URIs should incorporate their state to ensure uniqueness.

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1374/version/2010-May-17/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1374"; conversion:dataset_version "2010-May-17"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; ... conversion:enhance [ ov:csvCol 2; ov:csvHeader "City"; conversion:range rdfs:Resource;

      a conversion:RangeTemplateResourcePromotion;
      conversion:<font color="#FF0000">range_template "[#2]-[#3]"</font>;        # only one pattern is required;
      conversion:<font color="#FF0000">range_template "[@city]-[@state]"</font>; # these four are equivalent.
      conversion:<font color="#FF0000">range_template "[#2]-[@state]"</font>;    # '#', '@', and '.' references can be mixed in the same pattern.
      conversion:<font color="#FF0000">range_template "[.]-[#3]"</font>;         # Period is used as a short hand for "value of this property".
   ];

@prefix ds1374: http://logd.tw.rpi.edu/source/data-gov/dataset/1374/version/2010-May-03/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1374/vocab/raw/ .

ds1374:thing_1 raw:city "Elmwood Park"; raw:state "IL"; .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1374/vocab/enhancement/1/ .

ds1374:thing_1 e1:city <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/value-of/city/Elmwood_Park-IL>; e1:state "IL"; .

<http://logd.tw.rpi.edu/source/data-gov/dataset/1374/'''value-of/city/Elmwood_Park-IL'''> rdfs:label "Elmwood Park" .

Range template promotions end up in one of two sections in the local dataset namespace (to create a URI outside of the local namespace, use a #Template resource promotion parameter). In the example above, the "City" column was promoted to a rdfs:Resource with no additional ResourcePromotion specified. So, the default PropertyScopedResourcePromotion was used, which constructs the '''value-of/city/''' style URI. In the example below, a TypedResourcePromotion was further specified beyond the rdfs:Resource range, resulting in the construction of the '''typed/city/''' style URI. In the ''value-of'' case, ''city'' is mentioning the property name, while in the ''typed'' case, ''city'' is a lower-case version of the class local name given by conversion:range_name.

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1374/version/2010-May-17/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1374"; conversion:dataset_version "2010-May-17"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "2"; ... conversion:enhance [ ov:csvCol 2; ov:csvHeader "City"; conversion:range rdfs:Resource;

      a conversion:TypedResourcePromotion;
      conversion:<font color="#FF0000">range_name "City"</font>;

      a conversion:CrutchResourcePromotion;
      conversion:<font color="#FF0000">range_template "[#2]-[#3]"</font>;        # only one pattern is required;
      conversion:<font color="#FF0000">range_template "[@city]-[@state]"</font>; # these four are equivalent.
      conversion:<font color="#FF0000">range_template "[#2]-[@state]"</font>;    # '#', '@', and '.' references can be mixed in the same pattern.
      conversion:<font color="#FF0000">range_template "[.]-[#3]"</font>;         # Period is used as a short hand for "value of this property".
   ];

@prefix ds1374: http://logd.tw.rpi.edu/source/data-gov/dataset/1374/version/2010-May-03/ . @prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1374/vocab/enhancement/1/ .

ds1374:thing_1 e1:city <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/'''value-of/city'''/Elmwood_Park-IL>; e1:state "IL"; .

<http://logd.tw.rpi.edu/source/data-gov/dataset/1374/'''value-of/city/'''Elmwood_Park-IL> rdfs:label "Elmwood Park" .

'''becomes'''

@prefix e2: http://logd.tw.rpi.edu/source/data-gov/dataset/1374/vocab/enhancement/2/ .

ds1374:thing_1 e2:city <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/typed/city/Elmwood_Park-IL>; e2:state "IL"; .

<http://logd.tw.rpi.edu/source/data-gov/dataset/1374/'''typed/city/'''Elmwood_Park-IL> rdfs:label "Elmwood Park" .

Using a label_pattern will produce a different rdfs:label than what is used to create the crutched URI. The result of the label_pattern will also be used instead of the raw value when looking up owl:sameAs relations during #ObjectSameAsEnhancement_parameter.

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1374/version/2010-May-17/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1374"; conversion:dataset_version "2010-May-17"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "3"; ... conversion:enhance [ ov:csvCol 2; ov:csvHeader "City"; conversion:range rdfs:Resource;

      a conversion:TypedResourcePromotion;
      conversion:range_name "City";

      a conversion:CrutchResourcePromotion;
      conversion:range_template "[#2]-[#3]";        # only one pattern is required;
      conversion:range_template "[@city]-[@state]"; # these four are equivalent.
      conversion:range_template "[#2]-[@state]";    # '#', '@', and '.' references can be mixed in the same pattern.
      conversion:range_template "[.]-[#3]";         # Period is used as a short hand for "value of this property".

      <font color="#FF0000">conversion:label_pattern  "[@city], [@state]"</font>;
   ];

@prefix ds1374: http://logd.tw.rpi.edu/source/data-gov/dataset/1374/version/2010-May-03/ . @prefix e2: http://logd.tw.rpi.edu/source/data-gov/dataset/1374/vocab/enhancement/2/ .

ds1374:thing_1 e2:city <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/typed/city/Elmwood_Park-IL>; e2:state "IL"; .

<http://logd.tw.rpi.edu/source/data-gov/dataset/1374/'''value-of/city/'''Elmwood_Park-IL> rdfs:label "Elmwood Park" .

'''becomes'''

@prefix e3: http://logd.tw.rpi.edu/source/data-gov/dataset/1374/vocab/enhancement/3/ .

ds1374:thing_1 e3:city <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/typed/city/Elmwood_Park-IL>; e3:state "IL"; .

<http://logd.tw.rpi.edu/source/data-gov/dataset/1374/'''typed/city/'''Elmwood_Park-IL> rdfs:label "Elmwood Park, IL" .

e.g. Dataset 1146 mentions county identifiers ''within'' a state. For example, county "000" in CA and county "000" in PA are different counties. Thus, the state needs to be incorporated into the URI created for the county.

:thing_1 raw:state "01"; raw:county "001" . '''becomes''' :thing_1 e1:county http://logd.tw.rpi.edu/source/SSS/dataset/DDD/01/001 .

      <font color="#FF0000">conv:range_template "[#1]-[#4]"</font>;

Other datasets that benefit from this enhancement parameter include Dataset 1330 (District/State).

====== External promotion ====== A pattern may be specified to populate with a column's value.

(one value is not a crutch)

  conv:enhance [
     ov:csvRow              1;
     ov:csvHeader          "";
     conv:property_name    "my_prop";
     <font color="#FF0000"><nowiki>conv:range_template "http://some.other.org/instances/[.]"</nowiki></font>;
  ] ;

:thing_1 raw:my_prop "hi" . '''becomes''' :thing_1 e1:my_prop http://some.other.org/instances/hi .

  conv:enhance [
     ov:csvRow              1;
     ov:csvHeader          "";
     conv:property_name    "my_prop";
     <font color="#FF0000"><nowiki>conv:range_template "[/sdv]/[.]"</nowiki></font>;
  ] ;

TODON: how are multiple values inserted? How are the property names cited, how are the property columns cited? What happens when there is a column named 'value'?

e.g. Dataset 326 with CSVs resembling RDBMS tables.

e.g. The raw conversion of Dataset 326 looks like:

http://logd.tw.rpi.edu/dataset/326/thing_1

    p326:series_id "GPU00100000E0000";
    p326:year      "1981";
    p326:period    "A01";
    p326:value     "1491";
    ov:csvRow      1; 

.

p326:period ov:csvCol 3; ov:csvHeader "period"; rdfs:range rdfs:Literal; .

"A01" is referring to the period from gp.period

@prefix ns1: http://logd.tw.rpi.edu/dataset/326/gp.period/ . @prefix ns2: http://logd.tw.rpi.edu/property/326/gp.period/ .

ns1:A01 ns2:period "A01"; ns2:period_abbr "ANN"; ns2:period_name "Annual"; ov:csvRow 1; .

p326:period "A01"; becomes p326:period http://logd.tw.rpi.edu/dataset/326/gp.period/A01;

The enrichment parameters to do this would be:

http://logd.tw.rpi.edu/data-gov/conversionParams/326/enrichment/1 { [] essential:dataset-source "data-gov" . [] essential:sources-identifier "326" . [] enrichment:enrichment-level "1" .

conv:enhance [
   ov:csvRow 3;  

   a conv:
   conv:promotionNamespace <http://logd.tw.rpi.edu/dataset/326/gp.period/> .
];

}

(can behave like foreign key)

TODO: if a template is used, should the internal URI be created as well?

===== Resource column bundling promotion parameter ===== Values can be bundled by an ''implicit'' resource (as in Dataset 10025) or an ''existing'' resource (as in Dataset 1147). In the case of an implicit resource a URI is minted to bundle the values, while in the existing case the URI used to bundle values is derived from a value present in another column of the csv row.

====== Implicit column bundle resource promotion ======

The first, middle, and last names of visitors are cited in Dataset 10025. Each of these values is describing an unmodeled person. Using an implicit bundle will create a URI for this person and will be described with the bundled first, middle, and last name properties. When the implicit bundle incorporates these properties, it loses its association to the remaining elements of the csv row -- the row needs to connect to the new implicit bundle. The property conversion:property_name is used to cite the local name of the property from the row resource to the new ImplicitBundle.

The type created from conversion:type_name (e.g., "Person"") can be subclassed to an external class (e.g. foaf:Person) using #Subclass enhancement)

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/whitehouse-gov/dataset/visitor-records/params/enhancement/1/ .

:visitor_bundle a conversion:ImplicitBundle; conversion:property_name "visitor"; conversion:type_name "Person"; .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "whitehouse-gov"; conversion:dataset_identifier "visitor-records"; conversion:dataset_version "2010-Mar-26"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvCol 1; ov:csvHeader "NAMELAST"; conversion:range rdfs:Literal; conversion:label "Last name"; conversion:bundled_by :visitor_bundle; ]; conversion:enhance [ ov:csvCol 2; ov:csvHeader "NAMEFIRST"; conversion:range rdfs:Literal; conversion:label "First name"; conversion:bundled_by :visitor_bundle; ]; conversion:enhance [ ov:csvCol 3; ov:csvHeader "NAMEMID"; conversion:range rdfs:Literal; conversion:label "Middle name"; conversion:bundled_by :visitor_bundle; ]; ]; .

@prefix visitor-records: http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/version/2010-Mar-26/ . @prefix raw: http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/vocab/raw/ . @prefix e1: http://logd.tw.rpi.edu/source/whitehouse-gov/dataset/visitor-records/vocab/enhancement/1/ .

visitor-records:thing_1 raw:last_name "AABY"; raw:first_name "DONETT"; raw:middle_name "L"; .

'''becomes'''

visitor-records:thing_1 e1:visitor visitor-records:implicit_visitor_1; . visitor-records:implicit_visitor_1 rdf:type vocab:Person; e1:last_name "AABY"; e1:first_name "DONETT"; e1:middle_name "L" .

TODO: version/2009-Oct-02/PROPERTY_NAME/thing_1 (1 == row's thing_1) (when implicit bundle not typed) TODO: version/2009-Oct-02/PROPERTY_NAME/type_1 (1 == rows's thing_1) (when implicit bundled typed with type_name)

e.g., Dataset 1450

@prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1450/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1450"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1";

   conversion:enhance [
      ov:csvCol         7;
      ov:csvHeader     "LAST NAME";
      conversion:range  rdfs:Literal;
   ];
   conversion:enhance [
      ov:csvCol         8;
      ov:csvHeader     "FIRST NAME";
      conversion:range  rdfs:Literal;
   ];
   conversion:enhance [
      ov:csvCol         9;
      ov:csvHeader     "MI";
      conversion:range  rdfs:Literal;
   ];

];

.

@prefix ds1450: http://csv2rdf.org/source/data-gov/dataset/1450/version/18-May-2009/ . @prefix raw: http://csv2rdf.org/source/data-gov/dataset/1450/vocab/raw/ .

ds1450:thing_1 raw:state "Alabama"; raw:legal_entity_name "ACCENDO INSURANCE COMPANY"; raw:organization_name "RxAmerica"; raw:organization_description "PDP"; raw:title "Director of Medicare Services"; raw:last_name "Low"; raw:first_name "Jeff"; raw:phone "1-801-961-6251"; raw:fax "1-801-961-6313"; raw:email "jeff.low@rxamerica.com"; raw:street_address "221 N. Charles Lindbergh Dr."; raw:city "SLC"; raw:state_2 "UT"; raw:zip "84116"; ov:csvRow 2; .

'''becomes'''

@prefix e1: http://csv2rdf.org/source/data-gov/dataset/1450/vocab/enhancement/1/ . @prefix legal-entity: http://csv2rdf.org/source/data-gov/dataset/1450/version/18-May-2009/legal-entity/ . @prefix organization: http://csv2rdf.org/source/data-gov/dataset/1450/version/18-May-2009/organization/ . @prefix state: http://csv2rdf.org/source/data-gov/dataset/1450/version/18-May-2009/state/ . @prefix city: http://csv2rdf.org/source/data-gov/dataset/1450/version/18-May-2009/city/ . @prefix title: http://csv2rdf.org/source/data-gov/dataset/1450/version/18-May-2009/title/ .

ds1450:thing_1 e1:address ds1450:implicit_address_1; e1:legal_entity_name legal-entity:ACCENDO_INSURANCE_COMPANY; e1:organization_description "PDP" ; e1:organization_name organization:RxAmerica; e1:point_of_contact ds1450:implicit_point_of_contact_1; e1:state state:Alabama; ov:csvRow 2; .

ds1450:implicit_address_1 e1:city city:SLC; e1:state_2 state:UT; e1:street_address "221 N. Charles Lindbergh Dr."; e1:zip "84116" .

legal-entity:ACCENDO_INSURANCE_COMPANY rdfs:label "ACCENDO INSURANCE COMPANY" .

organization:RxAmerica rdfs:label "RxAmerica" .

ds1450:implicit_point_of_contact_1 e1:email "jeff.low@rxamerica.com"; e1:fax "1-801-961-6313"; e1:first_name "Jeff"; e1:last_name "Low"; e1:phone "1-801-961-6251"; e1:title title:Director_of_Medicare_Services .

title:Director_of_Medicare_Services rdfs:label "Director of Medicare Services" .

====== Existing column bundle resource promotion parameter ====== Values can be bundled by an implicit resource (as in Dataset 10025) or an inline resource (as in Dataset 1147). In the case of an implicit resource a URI is created to bundle the values, while in the inline case no URI is created and the values are associated with a resource that was promoted from an existing value.

(TODO: because of the poor quality of 1147, a new example needs to be used. -Tim)

e.g., Dataset 1147

@prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1147/params/enhancement/1/ .

:dataset a void:Dataset; conv:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conv:source_identifier "data-gov"; conv:dataset_identifier "1147"; conv:dataset_version "2009-Oct-08"; conv:conversion_process [ conversion:enhance [ ov:csvCol 3; ov:csvHeader "State_Code_Dest";

     <font color="#FF0000">conversion:range rdfs:Resource</font>; # Inline bundles must be promoted to resources.

     a conv:TypedResourcePromotionEnhancement; # This is not required for inline bundling.
     conv:range_name "state";
  ];
  conv:enhance [
     ov:csvCol     5;
     ov:csvHeader "State_Abbrv";
     conv:range    rdfs:Literal;

     <font color="#FF0000">a conv:ExistingBundleEnhancement;
     conv:bundled_by [ ov:csvCol 3 ]</font>;
  ];
];

.

@prefix ds1147: http://logd.tw.rpi.edu/source/data-gov/dataset/1147/version/2009-Oct-08/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1147/vocab/raw/ . @prefix e1: <http://logd.tw.rpi.edu/source/

ds1147:thing_6 raw:state_code_origin "01"; raw:state_abbrv "AL"; .

'''becomes'''

ds1147:thing_6 e1:state_code_origin http://logd.tw.rpi.edu/source/data-gov/dataset/1147/version/2009-Oct-08/state/01;

e1:state_abbrv does NOT describe ds1146:thing_6

.

http://logd.tw.rpi.edu/source/data-gov/dataset/1147/version/2009-Oct-08/state/01 e1:state_abbrv "AL"; rdfs:label "01"; .

(TODO: one-level vs hierarchical)

Other datasets that benefit from this enhancement include Dataset 1492.

===== Global resource promotion parameter =====

The better thing to use is a #Typed_resource_promotion_parameter.

e.g., Dataset 1530

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1530"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvCol 2; ov:csvHeader "Requester Name"; conversion:label "Requester Name"; conversion:range rdfs:Resource; a conversion:GlobalResourcePromotionEnhancement; ];

@prefix ds1530: http://logd.tw.rpi.edu/source/data-gov/dataset/1530/version/2009-May-18/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/raw/ .

ds1530:thing_1 raw:request_id "07-F-0001"; raw:requester_name "Connolly, Ward" .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/enhancement/1/ . @prefix ds1530_value: http://logd.tw.rpi.edu/source/data-gov/dataset/1530/ .

ds1530:thing_1 e1:request_id "07-F-0001" . e1:requester_name ds1530_value:Connolly_Ward .

dsvisitor-records:thing_1 raw:bdgnbr "78026" . becomes dsvisitor-records:thing_1 e1:bdgnbr http://logd.tw.rpi.edu/whitehouse-gov/dataset/visitor-records/version/2010-Mar-26/value/78026 .

"value" is a hard-coded default.

===== Linked Data resource promotion parameters =====

(possible) scheme for where TWC will place their mapping files:

http://logd.tw.rpi.edu/source/tetherless/mapping/dbpedia-states/2010-Apr-29.ttl

See ObjectSameAsEnhancement for a current list of links_via mapping files.

====== ObjectSameAsEnhancement parameter ====== Objects that are promoted to Resources may be linked to external resources. A list of RDF files containing mappings and a list of predicates to query those mappings may be specified (with conv:links_via and conv:subject_of, respectively). All predicates listed by conv:subject_of will be used in all files listed by conv:links_via. To express more granular control, use multiple ObjectSameAsEnhancements listing different files and predicates.

@prefix dcterms: http://purl.org/dc/terms/ . @prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1356/version/2009-Dec-03/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1356"; conversion:dataset_version "2009-Dec-03"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvCol 1; rdfs:label "State_Code";

      <font color="#777777">a conversion:ObjectSameAsEnhancement;</font>
      conversion:range rdfs:Resource;
      conversion:links_via <http://www.rpi.edu/~lebot/lod-links/state-fips-dbpedia.ttl>,
                           <http://www.rpi.edu/~lebot/lod-links/state-fips-geonames.ttl>,
                           <http://www.rpi.edu/~lebot/lod-links/state-fips-govtrack.ttl>;
      conversion:subject_of dcterms:identifier;

      conversion:range_name          "State"; 
      <font color="#777777">a conversion:TypedResourcePromotionEnhancement;</font>

      conversion:predicate      geonames:parentFeature;
      conversion:object         <http://www.dbpedia.org/resource/United_States>;
   ];

@prefix ds1356: http://logd.tw.rpi.edu/source/data-gov/dataset/1356/version/2009-Dec-03/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/raw/ .

ds1356:thing_2 raw:state_code "01" ; .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1356/vocab/enhancement/1/ .

ds1356:thing_2 e1:state_code http://logd.tw.rpi.edu/source/data-gov/dataset/1356/typed/state/01 ; .

http://logd.tw.rpi.edu/source/data-gov/dataset/1356/typed/state/01 a ds1356_vocab:state ; rdfs:label "01" ; owl:sameAs http://dbpedia.org/resource/Alabama , http://www.rdfabout.com/rdf/usgov/geo/us/AL , http://sws.geonames.org/4829764/ .

e.g. Dataset 1147

data-gov-1147-2009-Oct-08.csv.e1.params.ttl: @prefix dcterms: http://purl.org/dc/terms/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1147/params/enhancement/1/ .

:dataset a void:Dataset;

conv:base_uri           "http://logd.tw.rpi.edu"^^xsd:anyURI;
conv:source_identifier  "data-gov";
conv:dataset_identifier "1147";
conv:dataset_version    "2009-Oct-08";

conv:conversion_process [
   a conv:RawConversionProcess;
   conv:enhancement_identifier "1";

   conv:enhance [
      ov:csvCol           1;
      ov:csvHeader       "State_Code_Origin";
      conv:property_name "state_code_origin";
      conv:range          rdfs:Resource;

      <font color="#FF0000">a conv:TypedResourcePromotionEnhancement;
      conv:range_name    "state";

      a conv:ObjectSameAsEnhancement;
      conv:links_via <http://www.rpi.edu/~lebot/lod-links/state-fips-dbpedia.ttl>,
                     <http://www.rpi.edu/~lebot/lod-links/state-fips-geonames.ttl>,
                     <http://www.rpi.edu/~lebot/lod-links/state-fips-govtrack.ttl>;
      conv:subject_of dcterms:identifier;</font>
   ];
];

.

@prefix dcterms: http://purl.org/dc/terms/ .

state-fips-dbpedia.ttl: http://dbpedia.org/resource/Alabama dc:identifier "AL", "01", "Alabama", "ALABAMA", "alabama" . state-fips-geonames.ttl: http://sws.geonames.org/4829764/ dc:identifier "AL", "01", "Alabama", "ALABAMA", "alabama" . state-fips-govtrack.ttl http://www.rdfabout.com/rdf/usgov/geo/us/AL dc:identifier "01", "AL", "Alabama", "ALABAMA", "alabama" .

@prefix ds1147: http://logd.tw.rpi.edu/source/data-gov/dataset/1147/version/2009-Oct-08/ .

ds1147:thing_1 raw:state_code_origin "01".

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1147/vocab/enhancement/1/ .

ds1147:thing_1 e1:state_code_origin http://logd.tw.rpi.edu/source/data-gov/dataset/1147/type/state/01 .

http://logd.tw.rpi.edu/source/data-gov/dataset/1147/type/state/01 rdfs:label "01"; owl:sameAs http://dbpedia.org/resource/Alabama, http://sws.geonames.org/4829764/, http://www.rdfabout.com/rdf/usgov/geo/us/AL; .

(TODO: consider materializing from row to external same-as resources: :thing_1 e1:state_code_origin http://dbpedia.org/resource/Alabama . On one hand it would allow loading this RDF into a store with dbpedia data and querying directly across them without having to know there are sameAs links. On the other, if you just wanted to query for { ds1147:thing_1 e1:state_code_origin ?origin }, materializing the sameAs assertions would return you several results instead of just one (which might be expected). )

mock ups to illustrate other multi-typings:

conversion:enhance [ a conv:ObjectSameAsEnhancement, conv:DefaultResourcePromotion; ov:csvCol 7; conv:property_name "state"; conv:range rdfs:Resource; conv:linksVia http://url.to/my_mappings.rdf; conv:subjectOf dcterms:identifier; ];

conversion:enhance [ a conv:ObjectSameAsEnhancement; # If not other type, uses PredicateScopedResourcePromotion ov:csvCol 7; conv:property_name "state"; conv:range rdfs:Resource; conv:links_via http://url.to/my_mappings.rdf; conv:subject_of dcterms:identifier; ];

Put into ontology: (ObjectSameAsEnhancement rdfs:subClassOf ResourcePromotionResource)

Other datasets that benefit from this enhancement include Dataset 1330.

prefix owl: http://www.w3.org/2002/07/owl# select distinct ?o where { ?o owl:sameAs ?e }

====== SubjectSameAsEnhancement parameter ======

conversion:enhance [ a conv:SubjectSameAsEnhancement, conv:TypedResourcePromotionEnhancement; ov:csvCol 7; conv:property_name "state"; conv:range rdfs:Resource; conv:linksVia http://url.to/my_mappings.rdf; conv:subjectOf dcterms:identifier;

# For TypedResourcePromotionEnhancement
conv:type          "state";

];

Same multi-typed as ObjectSameAsEnhancement

e.g., nuclear reactor 957?

===== Codebook ===== Many datasets use abbreviated codes instead of citing things directly. Codebook enhancements describe how input values occurring within a column should be interpreted. This functionality could also be known as a "Data Dictionary". Codebook enhancements feature one or more conversion:interpret triples that cite a (conversion:symbol - conversion:interpretation) pairing. If the conversion:symbol appears as a value in the csv for the particular column, the conversion:interpretation will be output instead. This works when the conversion:range is either rdfs:Literal or rdfs:Resource. In the resource promotion case, the interpretation is used as the basis for promotion instead of the input value.

A common practice is to have one enhancement that decodes the codes into literal expansions and a second enhancement that promotes the interpretations to resources. This allows for easy inspection and query at the same time that it allows third-party description augmentation. Note, however, that resource promotion can be done as a single step by asserting the conversion:range.

====== Codebook Literal Promotion ====== Input values can be replaced with their interpretations by specifying one or more conversion:Interpretations on a particular conversion:Enhancement. If no conversion:range is asserted -- or if it is asserted to be rdfs:Literal -- the output triple will be a literal. The example below shows this case.

e.g., Dataset 1930

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1930/version/1st-anniversary/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1930"; conversion:dataset_version "1st-anniversary"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; ... conversion:enhance [ ov:csvCol 4; ov:csvHeader "can_off"; conversion:label "Candidate Office"; conversion:range rdfs:Literal; conversion:comment "office abbreviation"; conversion:comment "P=President; S=Senate; H=House"; conversion:interpret [ conversion:symbol "S"; conversion:interpretation "Senate"; ]; conversion:interpret [ conversion:symbol "P"; conversion:interpretation "President"; ]; conversion:interpret [ conversion:symbol "H"; conversion:interpretation "House"; ]; ];

@prefix ds1930: http://logd.tw.rpi.edu/source/data-gov/dataset/1930/version/1st-anniversary/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1930/vocab/raw/ .

ds1930:thing_2 raw:can_off "H" .

'''becomes'''

@prefix e1: http://data-gov.tw.rpi.edu/source/data-gov/dataset/1930/vocab/enhancement/1/ .

ds1930:thing_2 e1:candidate_office "House" .

====== Codebook Resource Promotion ====== conversion:Interpretations are also used when the conversion:Enhancement's conversion:range is rdfs:Resource.

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1930/version/1st-anniversary/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1930"; conversion:dataset_version "1st-anniversary"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; ... conversion:enhance [ ov:csvCol 4; ov:csvHeader "can_off"; conversion:label "Candidate Office"; conversion:range rdfs:Resource; conversion:comment "office abbreviation"; conversion:comment "P=President; S=Senate; H=House"; conversion:interpret [ conversion:symbol "S"; conversion:interpretation "Senate"; ]; conversion:interpret [ conversion:symbol "P"; conversion:interpretation "President"; ]; conversion:interpret [ conversion:symbol "H"; conversion:interpretation "House"; ]; ];

@prefix ds1930: http://logd.tw.rpi.edu/source/data-gov/dataset/1930/version/1st-anniversary/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1930/vocab/raw/ .

ds1930:thing_2 raw:can_off "H" .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1930/vocab/enhancement/1/ .

ds1930:thing_2 e1:candidate_office <http://logd.tw.rpi.edu/source/data-gov/dataset/1930/value-of/candidate_office/House> .

(Note that conversion:range_name may be used to promote "H" to <http://logd.tw.rpi.edu/source/data-gov/dataset/1930/typed/candidate_office/House>)

A special case of this structure is also used in #Interprets as null parameter to describe a conversion process or enhancement.

Other datasets that benefit from this enhancement include Dataset 9, and Dataset 1564.

Potential features: codebook with regex for symbol and interpretation (e.g. SEC)

The script distinct-values-2-symbol-interps.pl helps create the symbol/interpretation parameters for a given property by querying a sparql endpoint for its distinct values.

==== Composite value parameter ==== e.g., Dataset 10025

combining first, middle and last name into a new value for a '''new''' predicate. This can be done with a literal range_template, but will override one of the contributing values' predicates.

NOTE: This is not completed.

==== Splitting value parameter ====

===== Delimiter parameters =====

====== Object delimiter parameter ======

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "epa-gov-mcmahon-ethan"; conversion:dataset_identifier "environmental-reports"; conversion:dataset_version "2011-Jan-12"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; #conversion:subject_discriminator "enviro-reports-and-indicators"; conversion:interpret [ conversion:symbol ""; conversion:interpretation conversion:null; ]; conversion:enhance [ ov:csvRow 2; a conversion:HeaderRow; ]; conversion:enhance [ ov:csvCol 19; ov:csvHeader "Environmental Indicator Scale"; conversion:label "Environmental Indicator Scale"; conversion:comment ""; conversion:delimits_object ",\s*"; conversion:range rdfs:Literal; ];

@prefix environmental-reports-enviro-reports-and-indicators: http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/enviro-reports-and-indicators/version/2011-Jan-12/ . @prefix raw: http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/vocab/raw/ . @prefix e1: http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/vocab/enhancement/1/ .

environmental-reports-enviro-reports-and-indicators:report_206 dcterms:identifier "report_206" ; dcterms:isReferencedBy http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/version/2011-Jan-12 ; a environmental-reports_vocab:Report , foaf:Document ; raw:environmental_indicator_scale "Municipality, County, State, Regional, Watershed" ;

'''becomes'''

environmental-reports-enviro-reports-and-indicators:report_206 dcterms:identifier "report_206" ; dcterms:isReferencedBy http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/version/2011-Jan-12 ; a environmental-reports_vocab:Report , foaf:Document ; e1:environmental_indicator_scale "Municipality" , "County" , "State" , "Regional" , "Watershed" ;

====== Subject delimiter parameter ====== not implemented. Satellite collision use case.

===== Substring split parameter ===== eg.

@prefix stack-heights: http://tw2.tw.rpi.edu/source/cordad-at-rpi-edu/dataset/stack-heights/version/2010-Jul-14/ .

stack-heights:thing_2 raw:state_and_county_fips_code "09009" .

'''becomes'''

stack-heights:thing_2 e1:state_fips_code "09"; e1:county_fips_code "009"; .

NOTE: This is not completed.

==== Annotation triple enhancement ====

Note: Annotation triples typed to scovo:Item become #Cell-based conversion.

===== Annotating all columns ===== e.g., Dataset 10030 has implicit giving country of UK, Dataset 1554 has implicit giving country of US.

conversion:conversion_process [ conversion:enhance [ a conversion:AnnotateSubjectEnhancement; conversion:predicate some:givingCountry; conversion:object geo:UK; ];

e.g., SEC financial reports, the reporting company is implicit.

conversion:conversion_process [ conversion:enhance [ a conversion:AnnotateSubjectEnhancement; conversion:predicate "company"; conversion:object "AAPL"; ]; ];

The object may be a resource, but the predicate must be a string. A predicate will be created in the usual enhancement namespace, so make sure to not collide with other properties derived from columns.

conversion:conversion_process [ conversion:enhance [ a conversion:AnnotateSubjectEnhancement; conversion:predicate "company"; conversion:object :AAPL; ];

See also #Subject discriminator parameter.

===== Annotating specific columns ===== conversion:enhance [ ov:csvCol 1; conversion:predicate "Predicate for subjects pointing to column 1"; conversion:object "Value of predicate"; ];

See #Multi-dimensional for combination with a cell-based conversion.

==== Ontology parameters ====

===== Subject type enhancement ===== e.g., Dataset 1530 :dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1530"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvCol 1; ov:csvHeader "Request ID"; conversion:label "Request ID"; conversion:range rdfs:Literal;

      conversion:<font color="#FF0000">domain_name "FOIA Request"</font>;
   ];

@prefix ds1530: http://logd.tw.rpi.edu/source/data-gov/dataset/1530/version/2009-May-18/ . @prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/raw/ .

ds1530:thing_1 raw:request_id "07-F-0001"; raw:requester_name "Connolly, Ward" .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/enhancement/1/ .

ds1530:thing_1 rdf:type <http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/FOIA_Request>; e1:request_id "07-F-0001"; e1:requester_name "Connolly, Ward" .

<http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/FOIA_Request> rdf:type rdfs:Class; .

conversion:domain_name can go on any property enhancement that is not bundled (see #Resource column bundling promotion parameter).

e.g., Dataset 1491

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1491"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvCol 1; conv:property_name "disaster_number"; conversion:domain_name "Disaster"; ]; ]; .

ds1491:thing_1 e1:disaster_number "1303"^^xsd:integer; e1:declaration_date "1999-09-24T17:45:00-04:00"^^xsd:dateTime; e1:title "HURRICANE FLOYD MAJOR DISASTER DECLARATIONS"; e1:incident_begin_date "1999-09-16T00:00:00-04:00"^^xsd:dateTime; .

'''becomes'''

ds1491:thing_1 rdf:type <http://logd.tw.rpi.edu/source/data-gov/dataset/1491/vocab/Disaster>; e1:disaster_number "1303"^^xsd:integer; e1:declaration_date "1999-09-24T17:45:00-04:00"^^xsd:dateTime; e1:title "HURRICANE FLOYD MAJOR DISASTER DECLARATIONS"; e1:incident_begin_date "1999-09-16T00:00:00-04:00"^^xsd:dateTime; .

<http://logd.tw.rpi.edu/source/data-gov/dataset/1491/vocab/Disaster> rdf:type rdfs:Class; .

Other datasets that benefit from this enhancement parameter include Dataset 1491, Dataset 1492, Dataset 1530, Dataset 1627.

===== Superproperty enhancement =====

====== Referenced by URI ====== param is the URI of one superproperty (e.g. my:predicate). :col_1e1 rdfs:subPropertyOf my:predicate . :col_2e1 rdfs:subPropertyOf my:predicate .

Mint new properties every time and subproperty them. E.g., not like Dataset_326#another.

state all triples entailments of superproperty vs just the rdfs:subProperty axiom.

:dataset a void:Dataset; conv:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conv:source_identifier "data-gov"; conv:dataset_identifier "0"; conv:dataset_version "2009-May-18"; conv:conversion_process [ conversion:enhance [ ov:csvCol 3; conv:property_name "origin_state" conv:subproperty_of tw:state; ]; ]; .

(note: ov:csvCol currently needs to be specified, but shouldn't have to be.)

====== Referenced by Template ====== A #Template variables may also be used to refer to the super property.

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/version/2010-Aug-09/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "nci-nih-gov"; conversion:dataset_identifier "tobacco-law-coverage"; conversion:dataset_version "2010-Aug-09"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "2"; conversion:subject_discriminator "table1-anrf-zt"; conversion:enhance [ ov:csvCol 8; ov:csvHeader "FIPS"; conversion:bundled_by [ ov:csvCol 2 ]; conversion:label "FIPS"; conversion:range todo:Literal; conversion:subproperty_of "[/]vocab/fips_code"; ];

@prefix e1: http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/vocab/enhancement/1/ .

value_of_region:Anchorage_AK e1:fips "02-0140" ; base_vocab:fips_code "02-0140" .

'''becomes '''

@prefix e2: http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/vocab/enhancement/2/ . @prefix base_vocab: http://logd.tw.rpi.edu/vocab/ .

value_of_region:Anchorage_AK e2:fips "02-0140" ; base_vocab:fips_code "02-0140" .

===== Subclass enhancement ===== The URIs for classes created for during conversion are "internal" to the dataset's vocabulary. This is done to avoid premature collisions with other similar terms with different meanings and representation intentions. A Subclass enhancement types instances to classes from more popular ontologies in addition to the default typing to an internal class.

The subclass enhancement can be applied to classes created by implicit bundles (see #Implicit column bundle resource promotion).

e.g., Dataset 1350

@prefix foaf: http://xmlns.com/foaf/0.1/ . @prefix wgs: http://www.w3.org/2003/01/geo/wgs84_pos# . @prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1350/version/2009-May-18/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1350"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:subject_discriminator "appe-app-e"; conversion:enhance [ ov:csvCol 1; ov:csvHeader "Licensee"; conversion:label "Licensee"; conversion:range rdfs:Resource; conversion:range_name "Licensee"; ]; conversion:enhance [ conversion:class_name "Licensee"; conversion:subclass_of foaf:Organization; ]; conversion:enhance [ ov:csvCol 2; ov:csvHeader "Location"; conversion:label "Location"; conversion:range rdfs:Resource; conversion:range_name "Location"; ]; conversion:enhance [ conversion:class_name "Location"; conversion:subclass_of wgs:SpatialThing; ];

@prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1350/vocab/raw/ . @prefix ds1350-appe-app-e: http://logd.tw.rpi.edu/source/data-gov/dataset/1350/appe-app-e/version/2009-May-18/ .

ds1350-appe-app-e:thing_1 raw:licensee "Aerotest" ; raw:location "San Ramon, CA" .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/data-gov/dataset/1350/vocab/enhancement/1/ . @prefix ds1350_vocab: http://logd.tw.rpi.edu/source/data-gov/dataset/1350/vocab/ . @prefix value_of_licensee: http://logd.tw.rpi.edu/source/data-gov/dataset/1350/appe-app-e/value-of/licensee/ . @prefix value_of_location: http://logd.tw.rpi.edu/source/data-gov/dataset/1350/appe-app-e/value-of/location/ .

ds1350-appe-app-e:thing_1 e1:licensee typed_licensee:Aerotest ; e1:location typed_location:San_Ramon_CA .

typed_licensee:Aerotest a ds1350_vocab:Licensee, foaf:Organization ; rdfs:label "Aerotest" .

typed_location:San_Ramon_CA a ds1350_vocab:Location, wgs:SpatialThing ; rdfs:label "San Ramon, CA" .

The conversion:domain_name and conversion:range_name properties provide the local label names to create two classes: http://logd.tw.rpi.edu/source/data-gov/dataset/1554/vocab/FOIA_Request http://logd.tw.rpi.edu/source/data-gov/dataset/1554/vocab/Requester

The conversion:class_name property cites the local label name of the class that should be subclassed.

Classes present in the internal dataset namespace are created using when using the properties: conv:domain_name (via #Subject type parameter) conv:range_name (via #Typed resource promotion parameter) conv:type_name (via #Implicit column bundle resource promotion parameter)

Other datasets that benefit from this enhancement include Dataset 1530 (Requester->foaf:Person), and Dataset 1492 (Applicant->foaf:Agent).

====== Subclassing to SSS's vocabulary ====== While the previous section showed how to type instances to both internal and external vocabularies, the external classes were provided explicitly. However, the URI design of the instances and vocabularies allows an incremental integration of datasets from the same source, and integration of datasets across sources. In this case, templates are used to specify the class within conversion:base_uri value space. These vocabularies thus fall on a continuum from "internal" to "external" as more broadly-scoped vocabularies are created.

Avoids the hard-coding the superclass URI so that it is relative to the internal namespace.

e.g. Dataset 1450:

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1450"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvCol 1; ov:csvHeader "STATE";

      conversion:range rdfs:Resource;
      conversion:range_name "State";

      conversion:links_via <http://www.rpi.edu/~lebot/lod-links/state-fips-dbpedia.ttl>,
                           <http://www.rpi.edu/~lebot/lod-links/state-fips-geonames.ttl>,
                           <http://www.rpi.edu/~lebot/lod-links/state-fips-govtrack.ttl>;
      conversion:subject_of dcterms:identifier;
   ];
   conversion:enhance [
      conversion:class_name "State";
      conversion:subclass_of <http://geonames/FAKE/vocab/US_State>;
   ];
   conversion:enhance [
      conversion:class_name "State";
      conversion:subclass_of "<font color="#FF0000">[/s]vocab/State</font>";
   ];
   conversion:enhance [
      conversion:class_name "State";
      conversion:subclass_of "<font color="#FF0000">[/]vocab/State</font>";
   ];

'''becomes'''

@prefix ds1450_vocab: http://tw2.tw.rpi.edu/source/data-gov/dataset/1450/vocab/ .

typed_state:Alabama a ds1450_vocab:State , # This is in the dataset-specific ontology <http://logd.tw.rpi.edu/source/data-gov/vocab/State> , # This is in Tetherless World's ontology for "data-gov". <http://logd.tw.rpi.edu/vocab/State> , # This is in Tetherless World's ontology <http://geonames/FAKE/vocab/US_State> ; # This is in an external ontology rdfs:label "Alabama" .

For the enhancement parameters above, the variables in the template would evaluate to the following: [/] - http://logd.tw.rpi.edu [/s] - http://logd.tw.rpi.edu/source/data-gov/ [/sd] - http://logd.tw.rpi.edu/source/data-gov/dataset/1450/ [/sdv] - http://logd.tw.rpi.edu/source/data-gov/dataset/1450/version/2009-May-18/

(see #Patterns vs. Templates for a discussion on templates.)

==== Structure assistance enhancements ====

===== Interprets as null enhancement ===== Certain values are used to express that there is no value for a relationship. These can be ignored by setting the "interpret as null" enhancement parameter, so that the null values do not interfere with the actual values. Triples are not asserted for values that should be interpreted as null. The null value can be interpreted for all columns or for a specific column.

Note, this structure is also used in #Codebook Resource Promotion parameter, but is used by an enhancement not by the conversion process.

e.g., Dataset 1530

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1530"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ conversion:interpret [ conversion:symbol "-", "- "; conversion:interpretation conversion:null; ]; ]; .

@prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/raw/ .

ds1530:thing_1 raw:organization "-" . ds1530:thing_2538 raw:closed_date "- " .

'''becomes'''

''no triple asserted''

Other datasets that benefit from this enhancement include Health Information National Trends Survey 2005 ("#NULL!"), Dataset 10030 (" - "), Dataset 1330 ("?? Total").

An interesting extension to this enhancement would be to add a pattern for what to interpret as null.

====== Column-specific null interpretations ====== The above example showed how to interpret a symbol as null for all columns. This behavior can be set for a specific column by moving the interpretation to a single enhancement.

:dataset a void:Dataset; conv:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conv:source_identifier "data-gov"; conv:dataset_identifier "1530"; conv:dataset_version "2009-May-18"; conv:conversion_process [ conv:enhancement [ ov:csvCol 1; conv:interpret [ conv:symbol "-", "- "; conv:interpretation conv:null; ]; ]; ]; .

Other datasets that benefit from this enhancement includes Dataset 1491.

===== Only if column enhancement ===== The processing of an entire row can be omitted using the Only if column enhancement type. When an enhancement is typed to this class, its column must contain a non-empty value to be processed.

e.g. Dataset 10030

@prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix xsd: http://www.w3.org/2001/XMLSchema# . @prefix void: http://rdfs.org/ns/void# . @prefix ov: http://open.vocab.org/terms/ . @prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/dfid-gov-uk/dataset/10030/params/enhancement/1/ .

:dataset a void:Dataset; conv:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conv:source_identifier "dfid-gov-uk"; conv:dataset_identifier "10030"; conv:dataset_version "12-Jan-2010"; conv:conversion_process [ conversion:enhancement_identifier "1"; conversion:headerRow 4; conversion:dataStartRow 8; conversion:dataEndRow 150; conversion:enhance [ ov:csvCol "1"^^xsd:integer; conversion:range rdfs:Resource; ]; conversion:enhance [ ov:csvCol "2"^^xsd:integer; a conversion:Only_if_column; conversion:range xsd:date; ]; ]; .

TODO: accept pattern for what should be accepted beyond non-empty.

===== Splitting value into multiple values ===== e.g. Dataset 1171

column 19, "ChangedBy", Jocelyn Rowe, jrowe@usaid.gov, 202-712-4002

e.g. keywords

e.g. Dataset 1340 has a field called STCNTY that is "A five digit number representing the state and county in which the institution is physically located. The first two digits represent the FIPS state numeric code and the last three digits represent the FIPS county numeric code." A way for the program to parse fields that were joined together like that in the CSV file might be helpful.

===== Omitting columns =====

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/version/2010-Aug-09/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "nci-nih-gov"; conversion:dataset_identifier "tobacco-law-coverage"; conversion:dataset_version "2010-Aug-09"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:subject_discriminator "table1-anrf-zt"; conversion:enhance [ ov:csvCol 7; ov:csvHeader "WRB"; a conversion:Omitted; conversion:label "WRB"; conversion:comment "Work Restaurants Bars"; conversion:range rdfs:Literal; ];

@prefix tobacco-law-coverage-table1-anrf-zt: http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/table1-anrf-zt/version/2010-Aug-09/ . @prefix raw: http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/vocab/raw/ .

tobacco-law-coverage-table1-anrf-zt:thing_2 raw:work "Yes" ; raw:restaur "Yes" ; raw:bars "Yes" ; raw:wrb "YesYesYes" ;

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/tobacco-law-coverage/vocab/enhancement/1/ .

tobacco-law-coverage-table1-anrf-zt:thing_2 e1:work "Yes" ; e1:restaur "Yes" ; e1:bars "Yes" ;

===== Example Resource ===== conversion:enhance [ ov:csvRow 3; a conversion:ExampleResource; ];

==== Human navigation enhancement ==== Although navigating linked data is useful for RDF crawlers, it is not ideal for human navigation. Although linked data browsers exist, it leaves users isolated from communities that share an interest in the data. The Human navigation enhancement provides a pointer to a human-centric web site that allows the development of communities surrounding the topics in the linked data produced by the conversion.

Both row/cells and the promoted resources obtain this additional description. (Note: the current csv2rdf4lod implementation only adds to row/cell).

e.g. Dataset 311:

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "311"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ conversion:human_redirect "[/]retrieveRDF?uri="; # http://logd.tw.rpi.edu/retrieveRDF?uri=http://tw2.tw.rpi.edu/source/nci-nih-gov/file/state-tobacco-tax/typed/state/California

      # This allows to point to a site outside of base_uri:
      #conversion:human_redirect "http://someother.tw.rpi.edu/retrieveRDF?uri="; 
   ];

@prefix ds311: http://tw2.tw.rpi.edu/source/data-gov/dataset/311/version/2009-May-18/ .

ds311:thing_5 dcterms:isReferencedBy http://tw2.tw.rpi.edu/source/data-gov/dataset/311/version/2009-May-18 .

'''becomes'''

ds311:thing_5 dcterms:isReferencedBy http://tw2.tw.rpi.edu/source/data-gov/dataset/311/version/2009-May-18 . http://tw2.tw.rpi.edu/retrieveRDF?uri=http://tw2.tw.rpi.edu/source/data-gov/dataset/311/version/2009-May-18/thing_5 foaf:primaryTopic ds311:thing_5 .

=== Aggregate parameters === When multiple csvs are in a dataset and share the same parameters (e.g., datatype ranges for columns) it would be convenient to have each param file point to a central param file.

e.g. Dataset 1330 (not really, the two tabs are just reorderings)

=== Cell-based conversion === See Row-based vs Cell-based csv2rdf4lod conversions.

cell-ify-eparams.awk can help automate the enhancement parameter creation.

As opposed to row-based conversion. @prefix scovo: http://purl.org/NET/scovo# . @prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/state-tobacco-tax/version/2001-Jan-01/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "nci-nih-gov"; conversion:dataset_identifier "state-tobacco-tax"; conversion:dataset_version "2010-Mar-29"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "1"; conversion:enhance [ ov:csvRow 2; a conversion:HeaderRow; ]; conversion:enhance [ ov:csvRow 53; a conversion:DataEndRow; ]; conversion:enhance [ ov:csvCol 1; ov:csvHeader ""; conversion:label "State Order"; conversion:range xsd:integer; conversion:bundled_by [ ov:csvCol 2 ]; ]; conversion:enhance [ ov:csvCol 2; ov:csvHeader ""; conversion:label "State";

      conversion:range  rdfs:Resource;

      conversion:range_name "State";

      conversion:links_via <http://www.rpi.edu/~lebot/lod-links/state-fips-dbpedia.ttl>,
                           <http://www.rpi.edu/~lebot/lod-links/state-fips-geonames.ttl>,
                           <http://www.rpi.edu/~lebot/lod-links/state-fips-govtrack.ttl>;
      conversion:subject_of dcterms:identifier;

      conversion:domain_name "Annual tax average";
   ];
   conversion:enhance [
      ov:csvCol         3;
      ov:csvHeader     "2000"; 

      <font color="#FF0000">a scovo:Item;</font>
      conversion:label "<font color="#FF0000">Year</font>";            # Property from cell URI to "2000"^^xsd:gYear
      <font color="#FF0000">conversion:object "2000"^^xsd:gYear;</font>

      conversion:range  xsd:decimal; # Range of property "out of page"
   ];

@prefix state-tobacco-tax_vocab: http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/state-tobacco-tax/vocab/ . @prefix raw: http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/state-tobacco-tax/vocab/raw/ .

state-tobacco-tax:thing_3 raw:column_1 "1" ; raw:column_2 "Alabama" ; http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/state-tobacco-tax/vocab/raw/2000 "16.5¢" ; ov:csvRow 3 .

'''becomes'''

@prefix e1: http://logd.tw.rpi.edu/source/nci-nih-gov/dataset/state-tobacco-tax/vocab/enhancement/1/ .

state-tobacco-tax:annual_tax_average_3_3 a state-tobacco-tax_vocab:Annual_tax_average ; e1:state typed_state:Alabama ; e1:year "2000"^^xsd:gYear ; rdf:value "16.5"^^xsd:decimal; # TODO: should be in e1. ov:csvRow "3"^^xsd:integer ; ov:csvCol "3"^^xsd:integer .

  • If the conversion:object predicate is omitted, the object will be a Resource named using the original column header. (hhs chsi e.g.) ** misses using header as a literal automatically, but shouldn't be going out ofyour way to keep something a literal, especially something important enough to be listed in the header.
  • An conversion:object value of "[/sd]/value-of/[@]/[.]" will omit the subject discriminator when naming the Resource.
  • The conversion:object can be a template, e.g. conversion:object "[/sd]typed/council/[H]"; will type-promote the header outside of the subjectDiscrimiator.

Candidates for cell-based conversion: Dataset 1612, Dataset 10030, Dataset 1554, Dataset 401, Dataset 402

(SEC company financial reports - http://viewerprototype1.com/viewer choose a company, and "export to Excel")

==== Multi-dimensional ====

e.g. Dataset 1612

@prefix conversion: http://purl.org/twc/vocab/conversion/ . @prefix : http://logd.tw.rpi.edu/source/data-gov/dataset/1612/version/2009-May-18/params/enhancement/1/ .

:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1612"; conversion:dataset_version "2009-May-18"; conversion:conversion_process [ a conversion:RawConversionProcess; conversion:enhancement_identifier "2"; conversion:subject_discriminator "air_force"; conversion:enhance [ ov:csvRow 9; a conversion:HeaderRow ]; conversion:enhance [ ov:csvCol 3; ov:csvHeader "single male without children"; conversion:label "Gender"; a scovo:Item; conversion:object "Male" ;

      conversion:range xsd:nonNegativeInteger;
   ];
   conversion:enhance [
      ov:csvCol        3;
      <font color="#FF0000">a scovo:Item;</font>
      conversion:predicate "Married";
      conversion:object    false ;
   ];
   conversion:enhance [
      ov:csvCol        3;
      <font color="#FF0000">a scovo:Item;</font>
      conversion:predicate "Has Children";
      conversion:object    false ;
   ];

@prefix raw: http://logd.tw.rpi.edu/source/data-gov/dataset/1612/vocab/raw/ . @prefix ds1612-air_force: http://logd.tw.rpi.edu/source/data-gov/dataset/1612/air_force/version/2009-May-18/ .

ds1612-air_force:thing_10 raw:column_3 "8,127" ; ov:csvRow 10 .

'''becomes'''

ds1612-air_force:thing_10_3 e2:gender "Male" ; e2:has_children "false"^^xsd:boolean ; e2:married "false"^^xsd:boolean ; rdf:value "8127"^^xsd:integer ; ov:csvRow 10 ; ov:csvCol 3 ; ov:subjectDiscriminator http://logd.tw.rpi.edu/source/data-gov/dataset/1612/discriminator/air_force .

=== Enhancements to consider adding ===

  • type the promoted resource of column 1 to the local name mentioned in column 2.

  • omit a column entirely (Dataset 1612 column 1)

    conversion:enhance [
       ov:csvCol        1;
       conversion:range rdfs:Literal;
       <font color="#FF0000">a conversion:Omitted;</font>
    ];
    
  • column-specific interpretations: (is this already handled?) conversion:interpret [ ov:csvCol 1;
    conversion:symbol ""; conversion:interpretation conversion:null; ];

Clone this wiki locally