-
Notifications
You must be signed in to change notification settings - Fork 35
conversion:object_search
After some initial enhancements, twapperkeeper's CSV row (full input file here):
High Volume Stock: stock analysis website - $ABC - http://www.dojispace.com/stock-picks/amerisourcebergen-stock-price-ABC.aspx,,timlisten27,14522987982098432,130595362,en,<a href="http://www.dojispace.com" rel="nofollow">Stock Screener</a>,http://s.twimg.com/a/1291760612/images/default_profile_0_normal.png,,0,0,Tue 14 Dec 2010 03:32:04 +0000,1292297524
can become:
stocks:tweet_14522987982098432
dcterms:identifier "tweet_14522987982098432" ;
dcterms:isReferencedBy
<http://logd.tw.rpi.edu/source/twapperkeeper-com/dataset/stocks/version/2011-Mar-26> ;
a stocks_vocab:Tweet , sioctypes:MicroblogPost ;
sioc:content
"High Volume Stock: stock analysis website - $ABC - http://www.dojispace.com/stock-picks/amerisourcebergen-stock-price-ABC.aspx" ;
But we'd like to not have to regex a tweet to find which stocks it mentions; we'd like to precompute it so we can query it as triples. This can be done with conversion:object_search, which specifies a regex to search the object of a triple, and -- for each match -- the predicate and object to assert on the original subject. (full enhancements file here.)
conversion:enhance [
ov:csvCol 1;
ov:csvHeader "text";
conversion:domain_name "Tweet";
conversion:domain_template "tweet_[#4]";
conversion:equivalent_property sioc:content;
#conversion:label "text";
conversion:comment "";
conversion:range rdfs:Literal;
conversion:object_search [
conversion:eg "is website - $ABC - http:";
conversion:regex "\\\\$([^\\\\s]*)";
conversion:predicate foaf:topic;
conversion:object "$[\\1]";
];
conversion:object_search [
conversion:regex "\\\\$([^\\\\s]*)";
conversion:predicate sioc:topic;
conversion:object "http://dbpedia.org/resource/[\\1]";
];
conversion:object_search [
conversion:regex "\\\\$([^\\\\s]*)";
conversion:predicate foaf:homepage;
conversion:object "[/sd][\\\\1]";
];
];
adds the following triples to those shown above (full output file here):
@prefix stocks_global_value: <http://logd.tw.rpi.edu/source/twapperkeeper-com/dataset/stocks/> .
stocks:tweet_14522987982098432
foaf:topic "$ABC" ;
foaf:homepage stocks_global_value:ABC ;
sioc:topic dbpedia:ABC ;
Note that the enhancements are Using template variables to construct new values, with additional [\1] variables that result from captured groups in the regex.
Take the entire value of the cell and construct a URL with it:
conversion:enhance [
ov:csvCol 3;
...
conversion:equivalent_property dcterms:identifier;
conversion:range rdfs:Literal;
...
conversion:object_search [
conversion:regex "^(.*)$";
conversion:predicate foaf:homepage;
conversion:object "http://www.ncbi.nlm.nih.gov/pubmed/[\\\\1]";
];
Will produce:
<http://bio2rdf.org/pubmed:11587856>
dcterms:identifier "11587856" ;
foaf:homepage <http://www.ncbi.nlm.nih.gov/pubmed/11587856> ;
from the line in gene2pubmed:
205920 3927647 11587856
NOTE: This should only be used in degenerate cases when you can't do it with conversion:delimits_object.
conversion:enhance [
ov:csvCol 14;
...
conversion:object_search [
conversion:regex "([^,]+), ";
conversion:predicate dcterms:subject;
conversion:object "[\\1]";
];
conversion:object_search [
conversion:regex ", ([^,]+)$"; # If you have a single regex, feel free to email me.
conversion:predicate dcterms:subject;
conversion:object "[\\1]";
];
-
Using template variables to construct new values for use within values of
conversion:object_search.