-
Notifications
You must be signed in to change notification settings - Fork 35
Enhancement Parameters Reference
PLEASE NOTE Although this page was awesome within a wikimedia wiki, it does NOT behave well here in github. So this page is transitioning to Suggested ordering of predicates on a conversion:Enhancement. Please use that new page and only refer to this old page for historical purposes.
The four essential parameters are used to establish the namespace for the entities named during conversion. All four are required even for the most basic conversion. For a discussion of the "3-part" naming scheme used, see csv2rdf4lod's 3 Part Paradigm for Naming Datasets: source, dataset, and version.
@prefix conversion: <http://purl.org/twc/vocab/conversion/> . @prefix : <http://logd.tw.rpi.edu/source/data-gov/dataset/1554/version/2010-Feb-13/params/enhancement/1/> . :dataset a void:Dataset; conversion:<font color="#FF0000">base_uri</font> "<font color="#FF0000">http://logd.tw.rpi.edu"^^xsd:anyURI</font>; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1554"; conversion:dataset_version "2010-Feb-13"; .
e.g. data-gov, or recovery-gov
:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:<font color="#FF0000">source_identifier "data-gov"</font>; conversion:dataset_identifier "1554"; conversion:dataset_version "2010-Feb-13"; .
e.g. 1623 from data.gov
:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:<font color="#FF0000">dataset_identifier "1554"</font>; conversion:dataset_version "2010-Feb-13"; .
e.g. "2010-Feb-13"
:dataset a void:Dataset; conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI; conversion:source_identifier "data-gov"; conversion:dataset_identifier "1554"; conversion:<font color="#FF0000">conversion:version_identifier "2010-Feb-13"</font>; .
@DEPRECATED: conversion:dataset_version; use conversion:version_identifier. status: both still being asserted, but conversion:dataset_version will go away eventually.
The property to enhance can be cited using either the ov:csvCol or conversion:property_name properties. The ov:csvHeader is an owl:AnnotationProperty and should only be used as an editing aid.
e.g. Dataset 1147 with existing enhancement parameters file.
conversion:enhance [
ov:csvCol 1;
ov:csvHeader "blah blah"
];
e.g. Dataset 1146 that resembles Dataset 1147, but with columns swapped. Want to just refer to the property by the resulting local name, not by the csvCol.
conversion:enhance [
ov:csvHeader "blah blah"
conv:property_name "blah_blah";
];
TODO: if the csvHeader exists in the param file and does not match, report a message.
URL to local file or web
Files could be in any of the following formats (98% coverage):
- UTF-8
- ISO-8859-1 (aka Latin-1)
- ISO-8859-9
- Windows Latin-1
in end, make unicode
e.g., Dataset 1627
<C2><92>
apostrophes are the common culprit
e.g., Dataset 1530
<93>blind<94>
e.g., Dataset 1450
raw:organization_name "Medicare y Mucho M·s" com.hp.hpl.jena.shared.JenaException: com.hp.hpl.jena.riot.ParseException: [Line:24239,Col:49] Unknown char: ?(183)
Some CSVs include #Top matter such as titles and summaries at the top. A naive conversion would attempt to produce data triples out of these non-data CSV rows. If the column headers appear on a later line, then the conversion tool needs to know to avoid producing invalid data triples.
For example, Dataset 1623 offers an Excel spreadsheet that can be converted to csv. The header starts on line 7 and the data starts on line 8: (line numbers added)
[1] Office of Medicare Hearings and Appeals (OMHA),,,,,, [2] Claims Listed by State,,,,,, [3] "As of January 7, 2010",,,,,, [4] [5] [6] "Table 1. List of Total Claims Received by Region, State, and Fiscal Year",,,,,, [7] Region,State,Fiscal Year 06,Fiscal Year 07,Fiscal Year 08,Fiscal Year 09,Total [8] Mid-Atlantic,District of Columbia,12,289,342,376,"1,019"
The structure parameters for Dataset 1623 would look like the following.
@prefix conversion: <http://logd.tw.rpi.edu/vocab/conversion/> .
@prefix : <http://logd.tw.rpi.edu/source/data-gov/dataset/1623/version/2009-May-18/params/enhancement/1/> .
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1623";
conversion:dataset_version "2009-May-18";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
<font color="#FF0000">conversion:enhance [
ov:csvRow 7;
a conversion:HeaderRow;
];</font>
An opposite extreme to including top matter in a CSV is to exclude a header row. If there is no header row, then the HeaderRow should be explicitly set to 0 (file line counting starts at 1) to avoid interpreting the first data row as a header.
Other datasets that benefit from this conversion parameter include Dataset 1590, Dataset 1572, and Dataset 1574.
defaults to conversion:HeaderRow + 1 if not specified.
e.g., Dataset 1612
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1612";
conversion:dataset_version "2009-May-18";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
conversion:subject_discriminator "ActiveDuty_MaritalStatus-total";
conversion:enhance [ # All row and column numbers are one-based.
ov:csvRow <font color="#FF0000">9</font>;
a conversion:<font color="#FF0000">HeaderRow</font>;
];
conversion:enhance [ # This is not necessary since 10 = 9 + 1, but can be used to specify a different row.
ov:csvRow <font color="#FF0000">10</font>;
a conversion:<font color="#FF0000">DataStartRow</font>;
];
conversion:enhance [ # Both DataStartRow and DataEndRow are inclusive.
ov:csvRow <font color="#FF0000">37</font>;
a conversion:<font color="#FF0000">DataEndRow</font>;
];
conversion:enhance [
ov:csvCol 1;
conversion:range rdfs:Literal;
];
conversion:enhance [
ov:csvCol 2;
rdfs:label "Pay Grade";
conversion:label "pay grade";
conversion:range rdfs:Resource;
];
All rows are processed and converted to triples unless this parameter is set.
The enhancement parameter #Only if column parameter, can also be used to ensure that triples are produced from only data rows (and not top matter like titles or bottom matter like footnotes).
Avoids attempting to interpret #Bottom matter as a data row.
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1322";
conversion:dataset_version "2009-May-18";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
<font color="#FF0000">conversion:enhance [
ov:csvRow 61;
a conversion:DataEndRow;
];</font>
see conversion:Repeat_previous_if_empty_column
Some more structural assistance parameters are listed in the enhancements section: #Structure assistance enhancements.
By default, RDF predicates are created using the column titles in the CSV's header. If two columns have identical titles, distinct predicates are ensured by appending "_2" to the first duplicate, "_3" to the second duplicate, and so on. If, however, the header is missing or was missed during parsing, a substitute can be provided using the conversion:label enhancement. The conversion:label enhancement can also be used in cases where the header is unusually long and a more concise predicate is desirable. The label may contain a first capital and spaces just as one would use in rdfs:label. The conversion utility will lower case the entire string and replace spaces with underscores.
e.g. Dataset 1450
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:sourceIdentifier "data-gov";
conversion:datasetIdentifier "1450";
conversion:datasetVersion "18-May-2009";
conversion:conversion_process [
<font color="#777777">a conversion:EnhancementProcess;</font>
conversion:enhancement_identifier "1";
conversion:enhance [
ov:csvCol 2;
ov:csvHeader "STAR (*) INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE";
conversion:<font color="#FF0000">label "Offers only in this state"</font>;
conversion:range rdfs:Literal;
];
];
.
raw:star_indicates_that_organization_only_offers_employer_plans_in_this_state
ov:csvCol "2"^^xsd:integer ;
ov:csvHeader "STAR (*) INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE" ;
rdfs:label "STAR * INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE" ;
rdfs:range rdfs:Literal .
<font color="#0000FF">ds1450:thing_2</font>
raw:state "Alabama" ;
raw:star_indicates_that_organization_only_offers_employer_plans_in_this_state "*" ;
'''becomes'''
@prefix e1: <http://logd.tw.rpi.edu/source/data-gov/dataset/1450/vocab/enhancement/1/> .
e1:<font color="#FF0000">offers_only_in_this_state</font>
ov:csvCol "2"^^xsd:integer ;
ov:csvHeader "STAR (*) INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE" ;
rdfs:label "<font color="#FF0000">Offers only in this state</font>" ;
rdfs:range rdfs:Literal .
<font color="#0000FF">ds1450:thing_2</font>
e1:state "Alabama" ;
e1:offers_only_in_this_state "*" .
(This property would also benefit from the xsd:boolean cast enhancement to make the "*" a "true"^^xsd:boolean, a #Typed resource promotion enhancement to make "Alabama" a URI, and a ObjectSameAsEnhancement to link :Alabama to DBPedia, Geonames, and GovTrack's URI for Alabama)
This would effect the name of the predicate URI created from the column label.
Other datasets that benefit from this enhancement include Dataset 8, Dataset 9, Dataset 10, Dataset 32, Dataset 33, Dataset 59, Dataset 90, Dataset 311, Dataset 401, Dataset 402, Dataset 403, Dataset 1000, Dataset 1133, Dataset 1171, Dataset 1322, Dataset 1330, Dataset 1350, Dataset 1359, Dataset 1374, Dataset 1450, Dataset 1491, Dataset 1492, Dataset 1530, Dataset 1564, Dataset 1571, Dataset 1612, Dataset 1623, Dataset 1627, Dataset 1930, Dataset 1961.
In the previous example for conversion:label, we renamed the long header to create a more concise predicate. However, we lost a bit of meaning for how the values should be interpreted. That long description should be preserved as a rdfs:comment, and conversion:comment will do just that.
e.g. Dataset 1450
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:sourceIdentifier "data-gov";
conversion:datasetIdentifier "1450";
conversion:datasetVersion "18-May-2009";
conversion:conversion_process [
<font color="#777777">a conversion:EnhancementProcess;</font>
conversion:enhancement_identifier "2";
conversion:enhance [
ov:csvCol 2;
ov:csvHeader "STAR (*) INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE";
conversion:<font color="#FF0000">comment "STAR (*) INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE"</font>;
conversion:label "Offers only in this state";
conversion:range rdfs:Literal;
];
];
.
e1:offers_only_in_this_state ov:csvCol "2"^^xsd:integer ; ov:csvHeader "STAR (*) INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE" ; rdfs:label "Offers only in this state" ; rdfs:range rdfs:Literal . '''becomes''' @prefix e2: <http://logd.tw.rpi.edu/source/data-gov/dataset/1450/vocab/enhancement/2/> . e2:offers_only_in_this_state ov:csvCol "2"^^xsd:integer ; ov:csvHeader "STAR (*) INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE" ; rdfs:label "Offers only in this state" ; rdfs:range rdfs:Literal ; <font color="#FF0000">rdfs:comment "STAR (*) INDICATES THAT ORGANIZATION ONLY OFFERS EMPLOYER PLANS IN THIS STATE"</font> .
When observing the enhancement parameters, it is useful to see a sample value. conversion:eg is an owl:AnnotationProperty that provides sample values from the column.
:dataset a void:Dataset;
conversion:base_uri "http://data-gov.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "whitehouse-gov";
conversion:dataset_identifier "visitor-records";
conversion:dataset_version "0510";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
conversion:enhance [
ov:csvCol 13;
ov:csvHeader "APPT_END_DATE";
conversion:comment "Date and time for which the appointment was scheduled to end";
<font color="#FF0000">conversion:eg</font> "9/28/200911:59:00PM";
conversion:datetime_pattern "M/d/yy HH:mm", "M/d/yyyyhh:mm:ssaa";
conversion:range xsd:dateTime;
conversion:datetime_timezone -300;
];
TODO
If a property is designated as a primary key, the URI of the subject is changed to incorporate the property name and its value.
e.g. Dataset 1530
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1530";
conversion:dataset_version "2009-May-18";
conversion:conversion_process [
<font color="#777777">a conversion:EnhancementProcess;</font>
conversion:enhancement_identifier "1";
conversion:enhance [
ov:csvCol 1;
ov:csvHeader "Request ID";
conversion:label "Request ID";
conversion:range rdfs:Literal;
<font color="#FF0000">a conversion:PrimaryKeyEnhancement</font>;
];
<http://logd.tw.rpi.edu/source/data-gov/dataset/1530/version/2009-May-18/<font color="#0000FF">thing_1</font>>
raw:request_id "07-F-0001";
raw:requester_name "Connolly, Ward" .
'''becomes'''
@prefix e1: <http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/enhancement/1/> .
<http://logd.tw.rpi.edu/source/data-gov/dataset/1530/version/2009-May-18/<font color="#FF0000">request_id/07-F-0001</font>>
e1:request_id "07-F-0001";
e1:requester_name "Connolly, Ward" .
Other datasets that benefit from this enhancement include Dataset 32, Dataset 1627, and Dataset 1530.
See conversion:domain_template
If a dataset has multiple CSVs, converting each will result in the same names for different rows from each file. This can be avoided by tucking in an extra level to the #dataset identifier, but it then becomes impossible to query for all rows that came from a particular file.
e.g., Dataset 1350 has appe.xls with two tabs "APP A" and "App e" that can be exported to CSV.
conv:dataset_identifier "1350/app-a"; ... conv:dataset_identifier "1350/app-e";
Other datasets that benefit from this enhancement include Dataset 326, Dataset 1612, and Dataset 10030.
see also #Annotation triple parameter.
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "dfid-gov-uk";
conversion:dataset_identifier "sid-2009";
conversion:dataset_version "2009-Nov-10";
conversion:conversion_process [
<font color="#777777">a conversion:EnhancementProcess;</font>
<font color="#FF0000">conversion:enhancement_identifier "1"</font>;
conversion:enhance [
ov:csvCol 1;
ov:csvHeader "";
conversion:label "Country";
];
];
.
<font color="#0000FF">sid-2009:thing_1</font> raw:column_1 "Algeria" . '''becomes''' @prefix <font color="#FF0000">e1</font>: <http://logd.tw.rpi.edu/source/dfid-gov-uk/dataset/sid-2009/vocab/<font color="#FF0000">enhancement/1/</font>> . <font color="#0000FF">sid-2009:thing_1</font> e1:country "Algeria" .
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "dfid-gov-uk";
conversion:dataset_identifier "sid-2009";
conversion:dataset_version "2009-Nov-10";
conversion:conversion_process [
<font color="#777777">a conversion:EnhancementProcess;</font>
conversion:enhancement_identifier "1";
conversion:subject_discriminator "america";
conversion:author <http://logd.tw.rpi.edu/wiki/Special:URIResolver/Tim_Lebo>;
In this conversion vocabulary, patterns are specified to guide the parsing of an original input value, while templates are used to construct literals and URIs to assert in the resulting RDF.
The following conversion vocabulary predicates specify patterns:
The following conversion vocabulary predicates can specify templates:- conversion:domain_template
- conversion:range_template
- conversion:subclass_of
- conversion:subproperty_of
- conversion:human_redirect
- conversion:range_name (only a local label may be specified, use conversion:subclass_of to link to the external class)
Templates may specify variables that will be populated with values relevant to the input and conversion parameters.
Three other variables can be used in the template:
[e] - the dataset's enhancement_identifier [r] - the row of this value [c] - the column of this value
Four namespace variables can be used in the template:
[/] - the value of [[#Data collection's base URI |conversion:base_uri]]. [/s] - [/] with "source/" and the value of [[#Data source identifier |conversion:source_identifier]] appended. [/sd] - [/s] with "dataset/" and the value of [[#Data source's identifier for dataset |conversion:dataset_identifier]] appended. [/sdv] - [/sd] with "version/" and the value of [[#Dataset version |conversion:version_identifier]] appended.
[@] - The local name of the property created for the current column. (NOTE: not implemented) [T] - The conversion:range_name of the property created for the current column. (NOTE: not implemented)
[.] - The value of the cell.
The values of a row's columns can be referenced using either the column index or the local name of the property created for the column. When referencing the column index, a '#' precedes the integer. When referencing the property's local name, an '@' precedes the local name. For example, the following references the value in the first column:
[#1] - the value of the cell in column 1.
And the following references the value of the column with header "Property local name":
* NOTE: if multiple columns become named the same property, this will be more than one value.
In #Example Input 1, the first column names the president being described, but his URI becomes:
http://logd.tw.rpi.edu/data-gov/dataset/1627/version/02-05-2010/thing_1
Although URIs are to be "opaque" and "thing_1" is "just as good' as "George_Washington", developers are still human and could use a break.
When enhancing CSV:
conv:enhance [
ov:csvCol 1;
ov:csvHeader "Name";
<font color="#777777">conv:range rdfs:Resource;</font>
<font color="#FF0000">a conv:Primary_key</font>;
];
When enhancing raw RDF:
conv:enhance [
conv:property_name "name";
conv:range rdfs:Resource;
<font color="#FF0000">a conversion:Primary_key</font>;
];
http://logd.tw.rpi.edu/data-gov/dataset/1627/version/02-05-2010/thing_1 becomes http://logd.tw.rpi.edu/data-gov/dataset/1627/version/02-05-2010/name/George_Washington
Note that the property local name is incorporated into the URI.
(unfinished extension: multi-value key. Described by conv:primaryKeys ( 1 3 4 ) multiple a conv:Primary_key with implicit ordering imposed by ov:csvCol ordering.)
TODO: assert that the property used for primary key is subproperty dc:identifier.
TODO
conversion:enhance [ conversion:domain_template "[#1]"; ];
e.g., the minimal conversion of ftp://ftp.bls.gov/pub/time.series/gp/gp.charact from Dataset 326 looks like:
<http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/> {
<http://logd.tw.rpi.edu/dataset/326/gp.charact/'''thing_2'''>
d326p:column_1 "0020" ;
d326p:column_2 "White, 16+;" ;
ov:csvRow "2"^^<http://www.w3.org/2001/XMLSchema#int>
.
<http://logd.tw.rpi.edu/dataset/326/gp.charact/'''thing_3'''>
d326p:column_1 "0062" ;
d326p:column_2 "men, 16+;" ;
ov:csvRow "3"^^<http://www.w3.org/2001/XMLSchema#int>
.
}
An identifying tag of "326" is used for the data, while "326/gp.charact/" is used from the source's identifying tag for the supporting files.
The names "thing_2" and "thing_3" are created because they are the 2nd and 3rd data entry and a class name was not provided as an enrichment parameter. If the primary key column parameter of "1" is given, the following names are used:
<http://logd.tw.rpi.edu/dataset/326/gp.charact/'''0020'''>
d326p:column_1 "0020" ;
d326p:column_2 "White, 16+;" ;
ov:csvRow "2"^^<http://www.w3.org/2001/XMLSchema#int>
.
<http://logd.tw.rpi.edu/dataset/326/gp.charact/'''0062'''>
d326p:column_1 "0062" ;
d326p:column_2 "men, 16+;" ;
ov:csvRow "3"^^<http://www.w3.org/2001/XMLSchema#int>
.
}
NOTE: predicates created from primary key columns should be asserted as rdfs:subPropertyOf dc:identifier (this will allow UIs to avoid rendering this triple b/c it is in the URI)
TODO
conversion:enhance [ conversion:domain_template "[#1]"; ];
Note: This is the same parameters as above, but the value is recognized as a URI and is treated as one instead of just a literal.
While augmenting Dataset 326, CSV is:
<http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/2732>,women,16+,"service occupations;" <http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/2719>,women,16+,"technicians and related support;" <http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/2745>,women,16+,"transportation and material moving;" <http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/0084>,women,16+;,"" <http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/0086>,women,16-19;,"" <http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/0099>,women,White,"16+;"
Result should be:
<http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/2732>
p:col_1 "women";
p:col_2 "16+";
p:col_3 "service occupations;"
.
...
use of #Superproperty of all predicates created would also make sense, given that the positions change (e.g., "16+"):
<http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/2352>,16+,adminstrative,"support including clerical" <http://logd.tw.rpi.edu/data-gov/dataset/326/gp.charact/2336>,16+,executive,"adminstrative, and managerial"
The default conversion:range_template for un-typed resource promotions is:
conversion:range_template "[/svd]value-of/[@]/[.]";
The default conversion:range_template for typed resource promotions is:
conversion:range_template "[/svd]typed/[T]/[.]";
(see #Template variables for how they are used to within templates.)
Accepted values for datatype casting, and their frequency of use (in number of columns):
xsd:nonNegativeInteger (78) xsd:integer (306) xsd:gYear (6) xsd:decimal (1125) xsd:dateTime (12) xsd:date (12) xsd:boolean (7) rdfs:Resource (120) rdfs:Literal (52)
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "8";
conversion:dataset_version "2010-May-19";
conversion:conversion_process [
<font color="#777777">a conversion:EnhancementProcess;</font>
conversion:enhancement_identifier "1";
...
conversion:enhance [
ov:csvCol 5;
ov:csvHeader "RANK";
conversion:label "RANK";
conversion:comment "Annual rank of the 8-hour daily max.";
<font color="#FF0000">conversion:range xsd:integer</font>;
];
<font color="#0000FF">ds8:thing_1</font> raw:rank "117" ; ov:csvRow "2"^^xsd:integer . '''becomes''' @prefix e1: <http://logd.tw.rpi.edu/source/data-gov/dataset/8/vocab/enhancement/1/> . <font color="#0000FF">ds8:thing_1</font> e1:rank "117"^^<font color="#FF0000">xsd:integer</font> ; ov:csvRow "2"^^xsd:integer .
e.g. Dataset 326 provides counts of people in units of one thousand. The essential conversion of Dataset 326 looks like:
p326:period
rdfs:range rdfs:Literal;
ov:csvHeader "period";
ov:csvCol 3;
.
}
The default recognized lexical representations are (case insensitive): 'yes', 'no', 'true', 'false', '0', and '1'. The conv:boolean_true and conv:boolean_false properties may be used to add additional lexical forms.
If any values new lexical forms a provided, all defaults are overridden. This is to avoid mis-interpretation. For example, HINTS 2005 uses 1 and 2 for true and false).
e.g., Dataset 1571
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1571";
conversion:dataset_version "2010-Apr-08";
conversion:conversion_process [
<font color="#777777">a conversion:EnhancementProcess;</font>
conversion:enhancement_identifier "1";
conversion:interpret [
conversion:symbol " - ";
conversion:interpretation conversion:null;
];
conversion:enhance [
ov:csvRow 9;
a conversion:HeaderRow;
];
conversion:enhance [
ov:csvRow 29;
a conversion:DataEndRow;
];
conversion:enhance [
ov:csvCol 1;
ov:csvHeader " Year ";
conversion:label "Year";
conversion:range xsd:gYear;
];
conversion:enhance [
ov:csvCol 2;
ov:csvHeader "";
conversion:label "Illegal act was responsible";
conversion:range <font color="#FF0000">xsd:boolean;
conversion:interpret [
conversion:symbol "*";
conversion:interpretation true;
];</font>
];
<font color="#0000FF">ds1571:thing_5</font>
raw:year "1994" ;
raw:column_2 "*" .
'''becomes'''
@prefix e1: <http://logd.tw.rpi.edu/source/data-gov/dataset/1571/vocab/enhancement/1/> .
<font color="#0000FF">ds1571:thing_5</font>
e1:year "1994"^^xsd2:gYear ;
e1:illegal_act_was_responsible "<font color="#FF0000">true</font>"^^xsd:boolean .
Other datasets that benefit from this enhancement include Dataset 1450 (:offers_only_in_this_state */), Dataset 1171 (:chairperson Yes/No), Dataset 1491 (:pa_program_declared Yes/No), and Dataset 1492 (:education_applicant Yes/No).
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "epa-gov-mcmahon-ethan";
conversion:dataset_identifier "environmental-reports";
conversion:dataset_version "2011-Jan-12";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
conversion:subject_discriminator "enviro-reports-and-indicators";
conversion:enhance [
ov:csvRow 2;
a conversion:HeaderRow;
];
conversion:enhance [
ov:csvCol 4;
ov:csvHeader "Year";
conversion:label "Year";
conversion:comment "";
<font color="#FF0000">conversion:range xsd:gYear</font>;
];
environmental-reports-enviro-reports-and-indicators:thing_3 dcterms:isReferencedBy <http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/version/2011-Jan-12> ; raw:id_no "16" ; raw:title "City of Bowie State of the Environment Report" ; raw:organization "Department of Planning and Economic Development" ; raw:year "2009" ; '''becomes''' environmental-reports-enviro-reports-and-indicators:thing_3 dcterms:isReferencedBy <http://logd.tw.rpi.edu/source/epa-gov-mcmahon-ethan/dataset/environmental-reports/version/2011-Jan-12> ; e1:id_no "16" ; e1:title "City of Bowie State of the Environment Report" ; e1:organization "Department of Planning and Economic Development" ; <font color="#FF0000">e1:year "2009"^^xsd2:gYear</font> ;
e.g. Dataset 1627
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1627";
conversion:dataset_version "2010-Apr-09";
conversion:conversion_process [
<font color="#777777">a conversion:EnhancementProcess;</font>
conversion:enhancement_identifier "1";
...
conversion:enhance [
ov:csvCol 4;
ov:csvHeader "Received Date";
conversion:label "Received Date";
conversion:range <font color="#FF0000">xsd:date;
conversion:date_pattern "MM/dd/yy"</font>; # Java style - currently implemented
<font color="#FF0000">conversion:date_pattern "%m/%d/%y"</font>; # strftime style - desirable implementation
];
Both perl's strftime and Java's pattern should be accepted.
<font color="#0000FF">ds1627:thing_1</font> raw:received_date "12/19/07" . '''becomes''' @prefix e1: <http://logd.tw.rpi.edu/source/data-gov/dataset/1627/vocab/enhancement/1/> . <font color="#0000FF">ds1627:thing_1</font> e1:received_date "2007-12-19"^^xsd:date .
Other datasets that benefit from this enhancement include Dataset 957, Dataset 1171, Dataset 1350, Dataset 1359, Dataset 1374, Dataset 1492, Dataset 1530, Dataset 1577, and Dataset 1627.
TODO: xsd:date timezone is specified in number of minutes from GMT http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/datatype/XMLGregorianCalendar.html#getTimezone%28%29)
In future revisions, 'date_pattern' will be replaced with simply 'pattern'. The notion of "date" will be indicated by the conversion:range.
Dates are formatted in a myriad ways. Fortunately, there are common conventions for the components of a date. Promoting a literal to an xsd:date or xsd:dateTime requires a pattern that should be used to parse the value correctly.
e.g. Dataset 10025
conv:enhance [
a conv:DateTimePromotionEnhancement ;
conv:property_name "toa" ;
<font color="#FF0000">conv:datetime_pattern</font> "%m/%d/%y %H:%M";
<font color="#FF0000">conv:datetime_timezone</font> "-05:00";
] ;
conversion:enhance [
ov:csvCol 7;
ov:csvHeader "TOA";
<font color="#FF0000">conv:range</font> xsd:dateTime;
<font color="#FF0000">conv:datetime_pattern</font> "M/d/yy HH:mm";
<font color="#FF0000">conv:datetime_timezone_offset</font> -300;
conversion:label "Time of Arrival";
conversion:comment "Time of Arrival";
];
raw:toa "12/23/09 11:08" '''becomes''' e1:toa "2009-12-23T11:08:00-05:00"^^xsd:dateTime
(NOTE: perl (above) vs java: conversion:datetime_pattern "M/d/yy HH:mm"; http://java.sun.com/j2se/1.5.0/docs/api/java/text/SimpleDateFormat.html) (timezone is specified in number of minutes from GMT http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/datatype/XMLGregorianCalendar.html#getTimezone%28%29)
In future revisions, 'datetime_pattern' will be replaced with simply 'pattern'. The notion of "datetime" will be indicated by the conversion:range.
If two patterns are used to produce the same range:
conversion:enhance [
ov:csvCol 7;
ov:csvHeader "TOA";
onversion:range xsd:dateTime;
conversion:datetime_pattern "M/d/yy HH:mm", <font color="#FF0000">"M/d/yyyyhh:mm:ssaa"</font>;
conversion:datetime_timezone -300;
conversion:label "Time of Arrival";
conversion:comment "Time of Arrival";
];
If two patterns are used to produce different ranges, two enhancements need to be made.
conversion:enhance [
ov:csvCol 11;
ov:csvHeader "APPT_MADE_DATE";
<font color="#FF0000">conversion:range xsd:dateTime;
conversion:datetime_pattern "M/d/yy HH:mm";</font>
conversion:datetime_timezone -300;
conversion:comment "Date the Appointment was made.";
];
conversion:enhance [
ov:csvCol 11;
ov:csvHeader "APPT_MADE_DATE";
<font color="#FF0000">conversion:range xsd:date;
conversion:date_pattern "M/d/yy";</font>
conversion:comment "Date the Appointment was made.";
];
Other datasets that benefit from this enhancement include Dataset 8, Dataset 9, Dataset 32 ("Thursday, May 13, 2010 05:11:55 UTC"), Dataset 1171, and Dataset 1491 ("1999/09/24 17:45:00").
If date values for the same column appear in multiple formats.
e.g. Dataset 10025
conversion:enhance [
ov:csvCol 11;
ov:csvHeader "APPT_MADE_DATE";
conversion:range xsd:dateTime;
<font color="#FF0000">conversion:datetime_pattern "M/d/yy HH:mm";
conversion:datetime_pattern "M/d/yy"
conversion:datetime_timezone -300</font>;
conversion:comment "Date the Appointment was made.";
];
conversion:enhance [
ov:csvCol 11;
ov:csvHeader "APPT_MADE_DATE";
conversion:range xsd:date;
<font color="#FF0000">conversion:date_pattern "M/d/yy"</font>;
conversion:comment "Date the Appointment was made.";
];
'''become''' dsvisitor-records:thing_231 e1:appt_start_date "2009-12-14"^^xsd:date . dsvisitor-records:thing_232 e1:appt_start_date "2009-12-14T18:30"^^xsd:dateTime .
e.g., Dataset 1450
ov:csvCol 10; 1-205-930-5520
e.g. Dataset 32
(promoting resources is a good thing for Linked Data.)
Setting the conversion:range to rdfs:Resource, without any further parameters, will do one of two things. If the value is already a URI (guessing via containing "://"), the value will be cast to a URI. If the value is not a URI, one will be created using the predicate-scoped URI construction technique. The former behavior can be requested explicitly by typing the enhancement to type conversion:CastResourcePromotion, while the latter behavior can be requested explicitly by typing the enhancement to type conversion:PredicateScopedResourcePromotion.
e.g., Dataset 1564 benefits from the default resource casting
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1564";
conversion:dataset_version "2009-May-18";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
conversion:enhance [
ov:csvCol 8;
ov:csvHeader "Link";
conversion:label "Link";
conversion:comment "The Link field provides a link to the Structured Product Labeling (SPL) information associated with each animal drug product listed electronically.";
conversion:<font color="#FF0000">range rdfs:Resource</font>;
];
<font color="#0000FF">ds1564:thing_1</font> raw:link "http://www.accessdata.fda.gov/spl/data/fcac4de8-8e3a-4108-a98b-f206626020cc/fcac4de8-8e3a-4108-a98b-f206626020cc.xml" ; ov:csvRow "2"^^xsd:integer . '''becomes''' @prefix e1: <http://logd.tw.rpi.edu/source/data-gov/dataset/1564/vocab/enhancement/1/> . <font color="#0000FF">ds1564:thing_1</font> e1:link <font color="#FF0000"><</font>http://www.accessdata.fda.gov/spl/data/fcac4de8-8e3a-4108-a98b-f206626020cc/fcac4de8-8e3a-4108-a98b-f206626020cc.xml<font color="#FF0000">></font> ; ov:csvRow "2"^^xsd:integer .
e.g., Dataset 8 benefits from the default predicate-scoped resource promotion
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "8";
conversion:dataset_version "2010-May-19";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
conversion:enhance [
ov:csvCol 1;
ov:csvHeader "SITE_ID";
conversion:label "SITE_ID";
conversion:comment "Site identification code.";
conversion:<font color="#FF0000">range rdfs:Resource</font>;
];
<font color="#0000FF">ds8:thing_1</font> raw:site_id "ANL146" ; ov:csvRow "2"^^xsd:integer . '''becomes''' @prefix e1: <http://logd.tw.rpi.edu/source/data-gov/dataset/8/vocab/enhancement/1/> . <font color="#0000FF">ds8:thing_1</font> e1:site_id <http://data-gov.tw.rpi.edu/source/data-gov/dataset/8/<font color="#FF0000">value-of/site_id/</font>ANL146> ; ov:csvRow "2"^^xsd:integer .
Other datasets that benefit from this enhancement include Dataset 8, Dataset 9, Dataset 10, Dataset 32, Dataset 311, Dataset 401, Dataset 402, Dataset 403, Dataset 957, Dataset 1000, Dataset 1146, Dataset 1147, Dataset 1148, Dataset 1149, Dataset 1171, Dataset 1322, Dataset 1330, Dataset 1350, Dataset 1356, Dataset 1359, Dataset 1374, Dataset 1450, Dataset 1491, Dataset 1492, Dataset 1530, Dataset 1554, Dataset 1564, Dataset 1577, Dataset 1612, Dataset 1623, and Dataset 1627.
Per 3 May discussions:
<http://logd.tw.rpi.edu/source/data-gov/dataset/1147/type/badge-number/78026> . # A typed resource promotion vs <http://logd.tw.rpi.edu/source/data-gov/dataset/1147/value/bdgnbr/78026> . # A predicate-scoped resource promotion. vs <http://logd.tw.rpi.edu/source/data-gov/dataset/1147/78026> # All promotions go to same value space
(badge-number is a class, bdgnbr is a property)
another example when subject discriminators are used (multiple files in a dataset - e.g., Dataset 10030):
<http://logd.tw.rpi.edu/source/dfid-gov-uk/dataset/sid-2009/africa/type/country/Algeria> # A typed resource promotion vs <http://logd.tw.rpi.edu/source/dfid-gov-uk/dataset/sid-2009/africa/value/recipient_country/Algeria> # A predicate-scoped resource promotion. vs <http://logd.tw.rpi.edu/source/dfid-gov-uk/dataset/sid-2009/africa/Algeria> # All promotions go to same value space
A type of crutch promotion that uses the name of the column instead of the value of another column to create the URI of the value.
dsvisitor-records:thing_1 raw:bdgnbr "78026" . becomes dsvisitor-records:thing_1 e1:bdgnbr <http://logd.tw.rpi.edu/whitehouse-gov/dataset/visitor-records/version/2010-Mar-26/bdgnbr/78026> .
This takes less effort than Typed Resource Promotion, since the user does not need to specify the type.
The value in the column is already a URL or URI and simply needs to be cast to a resource instead of a literal.
Casting is a default when conversion:range rdfs:Resource is used and the value contains the "://". An example is available at #Resource promotion parameters. However, casting can be explicitly requested by typing the enhancement to type conversion:CastResourcePromotion.
augmenting Dataset 1564 example from above with an explicit request to cast:
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1564";
conversion:dataset_version "2009-May-18";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
conversion:enhance [
ov:csvCol 8;
ov:csvHeader "Link";
conversion:label "Link";
conversion:comment "The Link field provides a link to the Structured Product Labeling (SPL) information associated with each animal drug product listed electronically.";
a conversion:<font color="#FF0000">CastResourcePromotion</font>;
conversion:<font color="#FF0000">range rdfs:Resource</font>;
];
Other datasets that benefit from this enhancement include Dataset 92, and Dataset 1564.
e.g., Dataset 1530
<font color="#777777">a conversion:TypedResourcePromotion;</font>
conversion:<font color="#FF0000">range_name "Requester"</font>;
];
<font color="#0000FF">ds1530:thing_1</font>
raw:request_id "07-F-0001";
raw:requester_name "Connolly, Ward" .
'''becomes'''
@prefix e1: <http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/enhancement/1/> .
<font color="#0000FF">ds1530:thing_1</font>
e1:request_id "07-F-0001";
e1:requester_name <http://logd.tw.rpi.edu/source/data-gov/dataset/1530/<font color="#FF0000">type/requester/Connolly_Ward</font>> .
<http://logd.tw.rpi.edu/source/data-gov/dataset/1530/'''type/requester/Connolly_Ward'''>
a <http://logd.tw.rpi.edu/source/data-gov/dataset/1530/<font color="#FF0000">Requester</font>>;
rdfs:label "Connolly, Ward" .
Other datasets that benefit from this enhancement include Dataset 10025 ("Badge").
Range templates use #Template variables to specify the triple object to assert. They apply to both literal and resource objects.
TODO: This may be better named Object template.
Often, a single value is insufficient to uniquely identify a concept. This is a problem when promoting values to URIs -- if the URIs happen to match, then they are presumed to be the same thing. Incorporating additional values when constructing a URI can avoid this situation. A single value can use other values as a "crutch" to promote itself to a unique URI.
(Note, Range template promotion used to be called Crutch resource promotion).
e.g. Dataset 1374 mentions city names, but their URIs should incorporate their state to ensure uniqueness.
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1374";
conversion:dataset_version "2010-May-17";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
...
conversion:enhance [
ov:csvCol 2;
ov:csvHeader "City";
conversion:range rdfs:Resource;
a conversion:RangeTemplateResourcePromotion;
conversion:<font color="#FF0000">range_template "[#2]-[#3]"</font>; # only one pattern is required;
conversion:<font color="#FF0000">range_template "[@city]-[@state]"</font>; # these four are equivalent.
conversion:<font color="#FF0000">range_template "[#2]-[@state]"</font>; # '#', '@', and '.' references can be mixed in the same pattern.
conversion:<font color="#FF0000">range_template "[.]-[#3]"</font>; # Period is used as a short hand for "value of this property".
];
<font color="#0000FF">ds1374:thing_1</font> raw:city "Elmwood Park"; raw:state "IL"; . '''becomes''' @prefix e1: <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/vocab/enhancement/1/> . <font color="#0000FF">ds1374:thing_1</font> e1:city <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/<font color="#FF0000">value-of/city/Elmwood_Park-IL</font>>; e1:state "IL"; . <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/'''value-of/city/Elmwood_Park-IL'''> rdfs:label "Elmwood Park" .
Range template promotions end up in one of two sections in the local dataset namespace (to create a URI outside of the local namespace, use a #Template resource promotion parameter). In the example above, the "City" column was promoted to a rdfs:Resource with no additional ResourcePromotion specified. So, the default PropertyScopedResourcePromotion was used, which constructs the value-of/city/ style URI. In the example below, a TypedResourcePromotion was further specified beyond the rdfs:Resource range, resulting in the construction of the typed/city/ style URI. In the value-of case, city is mentioning the property name, while in the typed case, city is a lower-case version of the class local name given by conversion:range_name.
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1374";
conversion:dataset_version "2010-May-17";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "<font color="#FF0000">2</font>";
...
conversion:enhance [
ov:csvCol 2;
ov:csvHeader "City";
conversion:range rdfs:Resource;
a conversion:TypedResourcePromotion;
conversion:<font color="#FF0000">range_name "City"</font>;
a conversion:CrutchResourcePromotion;
conversion:<font color="#FF0000">range_template "[#2]-[#3]"</font>; # only one pattern is required;
conversion:<font color="#FF0000">range_template "[@city]-[@state]"</font>; # these four are equivalent.
conversion:<font color="#FF0000">range_template "[#2]-[@state]"</font>; # '#', '@', and '.' references can be mixed in the same pattern.
conversion:<font color="#FF0000">range_template "[.]-[#3]"</font>; # Period is used as a short hand for "value of this property".
];
<font color="#0000FF">ds1374:thing_1</font> e1:city <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/'''value-of/city'''/Elmwood_Park-IL>; e1:state "IL"; . <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/'''value-of/city/'''Elmwood_Park-IL> rdfs:label "Elmwood Park" . '''becomes''' @prefix e2: <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/vocab/enhancement/2/> . <font color="#0000FF">ds1374:thing_1</font> e2:city <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/<font color="#FF0000">typed/city/</font>Elmwood_Park-IL>; e2:state "IL"; . <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/'''typed/city/'''Elmwood_Park-IL> rdfs:label "Elmwood Park" .
Using a label_pattern will produce a different rdfs:label than what is used to create the crutched URI. The result of the label_pattern will also be used instead of the raw value when looking up owl:sameAs relations during #ObjectSameAsEnhancement.
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1374";
conversion:dataset_version "2010-May-17";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "<font color="#FF0000">3</font>";
...
conversion:enhance [
ov:csvCol 2;
ov:csvHeader "City";
conversion:range rdfs:Resource;
a conversion:TypedResourcePromotion;
conversion:range_name "City";
a conversion:CrutchResourcePromotion;
conversion:range_template "[#2]-[#3]"; # only one pattern is required;
conversion:range_template "[@city]-[@state]"; # these four are equivalent.
conversion:range_template "[#2]-[@state]"; # '#', '@', and '.' references can be mixed in the same pattern.
conversion:range_template "[.]-[#3]"; # Period is used as a short hand for "value of this property".
<font color="#FF0000">conversion:label_pattern "[@city], [@state]"</font>;
];
<font color="#0000FF">ds1374:thing_1</font> e2:city <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/typed/city/Elmwood_Park-IL>; e2:state "IL"; . <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/'''value-of/city/'''Elmwood_Park-IL> rdfs:label "Elmwood Park" . '''becomes''' @prefix e3: <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/vocab/enhancement/3/> . <font color="#0000FF">ds1374:thing_1</font> e3:city <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/typed/city/Elmwood_Park-IL>; e3:state "IL"; . <http://logd.tw.rpi.edu/source/data-gov/dataset/1374/'''typed/city/'''Elmwood_Park-IL> rdfs:label <font color="#FF0000">"Elmwood Park, IL"</font> .
e.g. Dataset 1146 mentions county identifiers within a state. For example, county "000" in CA and county "000" in PA are different counties. Thus, the state needs to be incorporated into the URI created for the county.
:thing_1 raw:state "01";
raw:county "001" .
'''becomes'''
:thing_1 e1:county <http://logd.tw.rpi.edu/source/SSS/dataset/DDD/01/001> .
<font color="#FF0000">conv:range_template "[#1]-[#4]"</font>;
Other datasets that benefit from this enhancement parameter include Dataset 1330 (District/State).
A pattern may be specified to populate with a column's value.
(one value is not a crutch)
conv:enhance [
ov:csvRow 1;
ov:csvHeader "";
conv:property_name "my_prop";
<font color="#FF0000">conv:range_template "http://some.other.org/instances/[.]"</font>;
] ;
:thing_1 raw:my_prop "hi" . '''becomes''' :thing_1 e1:my_prop <http://some.other.org/instances/hi> .
conv:enhance [
ov:csvRow 1;
ov:csvHeader "";
conv:property_name "my_prop";
<font color="#FF0000">conv:range_template "[/sdv]/[.]"</font>;
] ;
TODON: how are multiple values inserted? How are the property names cited, how are the property columns cited? What happens when there is a column named 'value'?
e.g. Dataset 326 with CSVs resembling RDBMS tables.
e.g. The raw conversion of Dataset 326 looks like:
p326:series_id "GPU00100000E0000";
p326:year "1981";
p326:period "A01";
p326:value "1491";
ov:csvRow 1;
.
p326:period
ov:csvCol 3;
ov:csvHeader "period";
rdfs:range rdfs:Literal;
.
"A01" is referring to the period from gp.period
ns1:A01
ns2:period "A01";
ns2:period_abbr "ANN";
ns2:period_name "Annual";
ov:csvRow 1;
.
p326:period "A01"; becomes p326:period <http://logd.tw.rpi.edu/dataset/326/gp.period/A01>;
The enrichment parameters to do this would be:
conv:enhance [
ov:csvRow 3;
a conv:
conv:promotionNamespace <http://logd.tw.rpi.edu/dataset/326/gp.period/> .
];
}
(can behave like foreign key)
TODO: if a template is used, should the internal URI be created as well?
Values can be bundled by an implicit resource (as in Dataset 10025) or an existing resource (as in Dataset 1147). In the case of an implicit resource a URI is minted to bundle the values, while in the existing case the URI used to bundle values is derived from a value present in another column of the csv row.
The better thing to use is a #Typed_resource_promotion_parameter.
e.g., Dataset 1530
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1530";
conversion:dataset_version "2009-May-18";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
conversion:enhance [
ov:csvCol 2;
ov:csvHeader "Requester Name";
conversion:label "Requester Name";
conversion:range rdfs:Resource;
<font color="#FF0000">a conversion:GlobalResourcePromotionEnhancement;</font>
];
<font color="#0000FF">ds1530:thing_1</font>
raw:request_id "07-F-0001";
raw:requester_name "Connolly, Ward" .
'''becomes'''
@prefix e1: <http://logd.tw.rpi.edu/source/data-gov/dataset/1530/vocab/enhancement/1/> .
@prefix ds1530_value: <http://logd.tw.rpi.edu/source/data-gov/dataset/1530/> .
<font color="#0000FF">ds1530:thing_1</font>
e1:request_id "07-F-0001" .
e1:requester_name ds1530_value:Connolly_Ward .
dsvisitor-records:thing_1 raw:bdgnbr "78026" . becomes dsvisitor-records:thing_1 e1:bdgnbr <http://logd.tw.rpi.edu/whitehouse-gov/dataset/visitor-records/version/2010-Mar-26/value/78026> .
"value" is a hard-coded default.
(possible) scheme for where TWC will place their mapping files:
http://logd.tw.rpi.edu/source/tetherless/mapping/dbpedia-states/2010-Apr-29.ttl
See ObjectSameAsEnhancement for a current list of links_via mapping files.
Many datasets use abbreviated codes instead of citing things directly. Codebook enhancements describe how input values occurring within a column should be interpreted. This functionality could also be known as a "Data Dictionary". Codebook enhancements feature one or more conversion:interpret triples that cite a (conversion:symbol - conversion:interpretation) pairing. If the conversion:symbol appears as a value in the csv for the particular column, the conversion:interpretation will be output instead. This works when the conversion:range is either rdfs:Literal or rdfs:Resource. In the resource promotion case, the interpretation is used as the basis for promotion instead of the input value.
A common practice is to have one enhancement that decodes the codes into literal expansions and a second enhancement that promotes the interpretations to resources. This allows for easy inspection and query at the same time that it allows third-party description augmentation. Note, however, that resource promotion can be done as a single step by asserting the conversion:range.
Input values can be replaced with their interpretations by specifying one or more conversion:Interpretations on a particular conversion:Enhancement. If no conversion:range is asserted -- or if it is asserted to be rdfs:Literal -- the output triple will be a literal. The example below shows this case.
e.g., Dataset 1930
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1930";
conversion:dataset_version "1st-anniversary";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
...
conversion:enhance [
ov:csvCol 4;
ov:csvHeader "can_off";
conversion:label "Candidate Office";
conversion:range rdfs:Literal;
conversion:comment "office abbreviation";
conversion:comment "P=President; S=Senate; H=House";
<font color="#FF0000">conversion:interpret [
conversion:symbol "S";
conversion:interpretation "Senate";
];
conversion:interpret [
conversion:symbol "P";
conversion:interpretation "President";
];
conversion:interpret [
conversion:symbol "H";
conversion:interpretation "House";
];</font>
];
<font color="#0000FF">ds1930:thing_2</font> raw:can_off "H" . '''becomes''' @prefix e1: <http://data-gov.tw.rpi.edu/source/data-gov/dataset/1930/vocab/enhancement/1/> . <font color="#0000FF">ds1930:thing_2</font> e1:candidate_office "<font color="#FF0000">House</font>" .
conversion:Interpretations are also used when the conversion:Enhancement's conversion:range is rdfs:Resource.
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1930";
conversion:dataset_version "1st-anniversary";
conversion:conversion_process [
a conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
...
conversion:enhance [
ov:csvCol 4;
ov:csvHeader "can_off";
conversion:label "Candidate Office";
conversion:range <font color="#FF0000">rdfs:Resource</font>;
conversion:comment "office abbreviation";
conversion:comment "P=President; S=Senate; H=House";
<font color="#FF0000">conversion:interpret [
conversion:symbol "S";
conversion:interpretation "Senate";
];
conversion:interpret [
conversion:symbol "P";
conversion:interpretation "President";
];
conversion:interpret [
conversion:symbol "H";
conversion:interpretation "House";
];</font>
];
<font color="#0000FF">ds1930:thing_2</font> raw:can_off "H" . '''becomes''' @prefix e1: <http://logd.tw.rpi.edu/source/data-gov/dataset/1930/vocab/enhancement/1/> . <font color="#0000FF">ds1930:thing_2</font> e1:candidate_office <http://logd.tw.rpi.edu/source/data-gov/dataset/1930/<font color="#FF0000">value-of/candidate_office/</font>House> .(Note that conversion:range_name may be used to promote "H" to <http://</nowiki>logd.tw.rpi.edu/source/data-gov/dataset/1930/<font color="#FF0000">typed/candidate_office/</font>House>) A special case of this structure is also used in #Interprets as null parameter to describe a conversion process or enhancement. Other datasets that benefit from this enhancement include Dataset 9, and Dataset 1564. Potential features: codebook with regex for symbol and interpretation (e.g. SEC) The script distinct-values-2-symbol-interps.pl helps create the symbol/interpretation parameters for a given property by querying a sparql endpoint for its distinct values. e.g., Dataset 10025 combining first, middle and last name into a new value for a new predicate. This can be done with a literal range_template, but will override one of the contributing values' predicates. NOTE: This is not completed. See conversion:delimits_object not implemented. Satellite collision use case. eg. @prefix stack-heights: <http://tw2.tw.rpi.edu/source/cordad-at-rpi-edu/dataset/stack-heights/version/2010-jul-14></http://tw2.tw.rpi.edu/source/cordad-at-rpi-edu/dataset/stack-heights/version/2010-jul-14> . stack-heights:thing_2 raw:state_and_county_fips_code "09009" . becomes stack-heights:thing_2 e1:state_fips_code "09"; e1:county_fips_code "009"; . NOTE: This is not completed. Note: Annotation triples typed to scovo:Item become [Converting]. e.g., Dataset 10030 has implicit giving country of UK, Dataset 1554 has implicit giving country of US. conversion:conversion_process [ <pre>]conversion:enhance [ <font color="#777777">a conversion:AnnotateSubjectEnhancement;</font> ]; </pre></http://</nowiki>
e.g., SEC financial reports, the reporting company is implicit.
conversion:conversion_process [ conversion:enhance];
The object may be a resource, but the predicate must be a string. A predicate will be created in the usual enhancement namespace, so make sure to not collide with other properties derived from columns.
conversion:conversion_process [ conversion:enhance][ <font color="#777777">a conversion:AnnotateSubjectEnhancement;</font> ];
See also #Subject discriminator parameter.
See #Multi-dimensional for combination with a cell-based conversion.
e.g., Dataset 1530
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1530";
conversion:dataset_version "2009-May-18";
conversion:conversion_process [
a]conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
conversion:enhance [
ov:csvCol];
'''becomes'''
@prefix e1: &lt;http://logd.tw.rpi.edu/source/data&#45;gov/dataset/1530/vocab/enhancement/1&gt;&lt;/http://logd.tw.rpi.edu/source/data&#45;gov/dataset/1530/vocab/enhancement/1&gt; .
<font color="#0000FF">ds1530:thing_1</font>
<font color="#FF0000">rdf:type</font> &lt;http&amp;#58;//logd.tw.rpi.edu/source/data&amp;#45;gov/dataset/1530/&lt;font color=&quot;#ff0000&quot;&gt;vocab/foia_request&lt;/font&gt;&gt;&lt;/http&amp;#58;//logd.tw.rpi.edu/source/data&amp;#45;gov/dataset/1530/&lt;font color=&quot;#ff0000&quot;&gt;vocab/foia_request&lt;/font&gt;&gt;;
e1:request_id "07-F-0001";
e1:requester_name "Connolly, Ward" .
&lt;http&amp;#58;//logd.tw.rpi.edu/source/data&amp;#45;gov/dataset/1530/&lt;font color=&quot;#ff0000&quot;&gt;vocab/foia_request&lt;/font&gt;&gt;&lt;/http&amp;#58;//logd.tw.rpi.edu/source/data&amp;#45;gov/dataset/1530/&lt;font color=&quot;#ff0000&quot;&gt;vocab/foia_request&lt;/font&gt;&gt;
rdf:type rdfs:Class;
.
conversion&amp;amp&#59;amp&amp;&#35;59&#59;amp&amp;amp&#59;&amp;&#35;35&#59;59&amp;&#35;59&#59;&amp;amp&#59;amp&amp;&#35;59&#59;&amp;amp&#59;&amp;&#35;35&#59;35&amp;&#35;59&#59;35&amp;amp&#59;&amp;&#35;35&#59;59&amp;&#35;59&#59;58&amp;amp&#59;amp&amp;&#35;59&#59;&amp;amp&#59;&amp;&#35;35&#59;35&amp;&#35;59&#59;59&amp;amp&#59;&amp;&#35;35&#59;59&amp;&#35;59&#59;domain_name can go on any property enhancement that is not bundled (see #Resource column bundling promotion parameter).
e.g., Dataset 1491
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1491";
conversion:dataset_version "2009-May-18";
conversion:conversion_process [
a];
.
<font color="#0000FF">ds1491:thing_1</font>
<font color="#FF0000">rdf:type</font> &lt;http&#58;//logd.tw.rpi.edu/source/data&#45;gov/dataset/1491/<font color="#FF0000">vocab/Disaster></font>;
e1:disaster_number "1303"^^xsd:integer;
e1:declaration_date "1999-09-24T17:45:00-04:00"^^xsd:dateTime;
e1:title "HURRICANE FLOYD MAJOR DISASTER DECLARATIONS";
e1:incident_begin_date "1999-09-16T00:00:00-04:00"^^xsd:dateTime;
.
&lt;http&#58;//logd.tw.rpi.edu/source/data&#45;gov/dataset/1491/<font color="#FF0000">vocab/Disaster></font>
rdf:type rdfs:Class;
.
Other datasets that benefit from this enhancement parameter include Dataset 1491, Dataset 1492, Dataset 1530, Dataset 1627.
param is the URI of one superproperty (e.g. my:predicate).
:col_1e1 rdfs:subPropertyOf my:predicate . :col_2e1 rdfs:subPropertyOf my:predicate .
Mint new properties every time and subproperty them. E.g., not like Dataset_326#another.
state all triples entailments of superproperty vs just the rdfs:subProperty axiom.
:dataset a void:Dataset;
conv:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conv:source_identifier "data-gov";
conv:dataset_identifier "0";
conv:dataset_version "2009-May-18";
conv:conversion_process [
conversion:enhance];
.
(note: ov:csvCol currently needs to be specified, but shouldn't have to be.)
A template may also be used to refer to the super property.
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "nci-nih-gov";
conversion:dataset_identifier "tobacco-law-coverage";
conversion:dataset_version "2010-Aug-09";
conversion:conversion_process [
a]conversion:RawConversionProcess;
conversion:enhancement_identifier "2";
conversion:subject_discriminator "table1-anrf-zt";
conversion:enhance [
ov:csvCol];
'''becomes ''' @prefix e2: &lt;http://logd.tw.rpi.edu/source/nci&#45;nih&#45;gov/dataset/tobacco&#45;law&#45;coverage/vocab/enhancement/2&gt;&lt;/http://logd.tw.rpi.edu/source/nci&#45;nih&#45;gov/dataset/tobacco&#45;law&#45;coverage/vocab/enhancement/2&gt; . @prefix base_vocab: &lt;http://logd.tw.rpi.edu/vocab&gt;&lt;/http://logd.tw.rpi.edu/vocab&gt; . value_of_region:Anchorage_AK e2:fips "02-0140" ; base_vocab:fips_code "02-0140" .
While the previous section showed how to type instances to both internal and external vocabularies, the external classes were provided explicitly. However, the URI design of the instances and vocabularies allows an incremental integration of datasets from the same source, and integration of datasets across sources. In this case, templates are used to specify the class within conversion:base_uri value space. These vocabularies thus fall on a continuum from "internal" to "external" as more broadly-scoped vocabularies are created.
Avoids the hard-coding the superclass URI so that it is relative to the internal namespace.
e.g. Dataset 1450:
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1450";
conversion:dataset_version "2009-May-18";
conversion:conversion_process [
a]conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
conversion:enhance [
ov:csvCol];
conversion:enhance [
conversion:class_name];
conversion:enhance [
conversion:class_name];
conversion:enhance [
conversion:class_name];
typed_state:Alabama
a ds1450_vocab:State , # This is in the dataset-specific ontology
<<font color="#FF0000">http&#58&lt;//logd.tw.rpi.edu/source/data-gov/vocab/state&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;</font></font>> , # This is in Tetherless World's ontology for "data-gov".
<<font color="#FF0000">http&#58&lt;//logd.tw.rpi.edu/vocab/state&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;</font></font>> , # This is in Tetherless World's ontology
&lt;http&amp;#58;//geonames/fake/vocab/us_state&gt;&lt;/http&amp;#58;//geonames/fake/vocab/us_state&gt; ; # This is in an external ontology
rdfs:label "Alabama" .
For the enhancement parameters above, the variables in the template would evaluate to the following:
<a href="/">/</a> - http://logd.tw.rpi.edu <a href="/s">/s</a> - http://logd.tw.rpi.edu/source/data-gov/ <a href="/sd">/sd</a> - http://logd.tw.rpi.edu/source/data-gov/dataset/1450/ <a href="/sdv">/sdv</a> - http://logd.tw.rpi.edu/source/data-gov/dataset/1450/version/2009-May-18/
(see #Patterns vs. Templates for a discussion on templates.)
e.g. Dataset 1171
column 19, "ChangedBy", Jocelyn Rowe, jrowe@usaid.gov, 202-712-4002
e.g. keywords
e.g. Dataset 1340 has a field called STCNTY that is "A five digit number representing the state and county in which the institution is physically located. The first two digits represent the FIPS state numeric code and the last three digits represent the FIPS county numeric code." A way for the program to parse fields that were joined together like that in the CSV file might be helpful.
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "nci-nih-gov";
conversion:dataset_identifier "tobacco-law-coverage";
conversion:dataset_version "2010-Aug-09";
conversion:conversion_process [
a]conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
conversion:subject_discriminator "table1-anrf-zt";
conversion:enhance [
ov:csvCol];
'''becomes''' @prefix e1: &lt;http://logd.tw.rpi.edu/source/nci&#45;nih&#45;gov/dataset/tobacco&#45;law&#45;coverage/vocab/enhancement/1&gt;&lt;/http://logd.tw.rpi.edu/source/nci&#45;nih&#45;gov/dataset/tobacco&#45;law&#45;coverage/vocab/enhancement/1&gt; . tobacco-law-coverage-table1-anrf-zt:thing_2 e1:work "Yes" ; e1:restaur "Yes" ; e1:bars "Yes" ;
conversion:enhance [ ov:csvRow];
Although navigating linked data is useful for RDF crawlers, it is not ideal for human navigation. Although linked data browsers exist, it leaves users isolated from communities that share an interest in the data. The Human navigation enhancement provides a pointer to a human-centric web site that allows the development of communities surrounding the topics in the linked data produced by the conversion.
Both row/cells and the promoted resources obtain this additional description. (Note: the current csv2rdf4lod implementation only adds to row/cell).
e.g. Dataset 311:
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "311";
conversion:dataset_version "2009-May-18";
conversion:conversion_process [
a]conversion:RawConversionProcess;
conversion:enhancement_identifier "1";
conversion:enhance [
<font color="#FF0000">conversion:human_redirect "<a href="/">/</a>retrieveRDF?uri=";</font>
];
'''becomes'''
ds311:thing_5
dcterms:isReferencedBy &lt;http://tw2.tw.rpi.edu/source/data&#45;gov/dataset/311/version/2009&#45;may&#45;18&gt;&lt;/http://tw2.tw.rpi.edu/source/data&#45;gov/dataset/311/version/2009&#45;may&#45;18&gt;
.
&lt;http://tw2.tw.rpi.edu/retrieverdf?uri=http://tw2.tw.rpi.edu/source/data&#45;gov/dataset/311/version/2009&#45;may&#45;18/thing_5&gt;&lt;/http://tw2.tw.rpi.edu/retrieverdf?uri=http://tw2.tw.rpi.edu/source/data&#45;gov/dataset/311/version/2009&#45;may&#45;18/thing_5&gt;
foaf:primaryTopic ds311:thing_5
.
When multiple csvs are in a dataset and share the same parameters (e.g., datatype ranges for columns) it would be convenient to have each param file point to a central param file.
e.g. Dataset 1330 (not really, the two tabs are just reorderings)
Converting with cell-based subjects
e.g. Dataset 1612
:dataset a void:Dataset;
conversion:base_uri "http://logd.tw.rpi.edu"^^xsd:anyURI;
conversion:source_identifier "data-gov";
conversion:dataset_identifier "1612";
conversion:dataset_version "2009-May-18";
conversion:conversion_process [
a]conversion:RawConversionProcess;
conversion:enhancement_identifier "2";
conversion:subject_discriminator "air_force";
conversion:enhance [
ov:csvRow];
conversion:enhance [
ov:csvCol];
conversion:enhance [
ov:csvCol];
conversion:enhance [
ov:csvCol];
'''becomes''' ds1612-air_force:thing_10_3 e2:gender "Male" ; e2:has_children "false"^^xsd:boolean ; e2:married "false"^^xsd:boolean ; rdf:value "8127"^^xsd:integer ; ov:csvRow 10 ; ov:csvCol 3 ; ov:subjectDiscriminator &lt;http://logd.tw.rpi.edu/source/data&#45;gov/dataset/1612/discriminator/air_force&gt;&lt;/http://logd.tw.rpi.edu/source/data&#45;gov/dataset/1612/discriminator/air_force&gt; .
- type the promoted resource of column 1 to the local name mentioned in column 2.
- column-specific interpretations:
conversion:interpret [ ov:csvCol];