Skip to content

conversion:object_search

timrdf edited this page May 11, 2011 · 41 revisions

See conversion:Enhancement.

Example: Searching Tweets for mentions of Stocks.

After some initial enhancements, twapperkeeper's CSV row (full input file here):

High Volume Stock: stock analysis website - $ABC - http://www.dojispace.com/stock-picks/amerisourcebergen-stock-price-ABC.aspx,,timlisten27,14522987982098432,130595362,en,<a href="http://www.dojispace.com" rel="nofollow">Stock Screener</a>,http://s.twimg.com/a/1291760612/images/default_profile_0_normal.png,,0,0,Tue 14 Dec 2010 03:32:04 +0000,1292297524

can become:

stocks:tweet_14522987982098432 
   dcterms:identifier "tweet_14522987982098432" ;
   dcterms:isReferencedBy 
   <http://logd.tw.rpi.edu/source/twapperkeeper-com/dataset/stocks/version/2011-Mar-26> ;
   a stocks_vocab:Tweet , sioctypes:MicroblogPost ;
   sioc:content 
"High Volume Stock: stock analysis website - $ABC - http://www.dojispace.com/stock-picks/amerisourcebergen-stock-price-ABC.aspx" ;

But we'd like to not have to regex a tweet to find which stocks it mentions; we'd like to precompute it so we can query it as triples. This can be done with conversion:object_search, which specifies a regex to search the object of a triple, and -- for each match -- the predicate and object to assert on the original subject. (full enhancements file here.)

      conversion:enhance [
         ov:csvCol          1;
         ov:csvHeader       "text";
         conversion:domain_name "Tweet";
         conversion:domain_template "tweet_[#4]";
         conversion:equivalent_property sioc:content;
         #conversion:label   "text";
         conversion:comment "";
         conversion:range   rdfs:Literal;
         conversion:object_search [
            conversion:eg        "is website - $ABC - http:";
            conversion:regex     "\\\\$([^\\\\s]*)";
            conversion:predicate foaf:topic;
            conversion:object    "$[\\1]";
         ];
         conversion:object_search [
            conversion:regex     "\\\\$([^\\\\s]*)";
            conversion:predicate sioc:topic;
            conversion:object    "http://dbpedia.org/resource/[\\1]";
         ];
         conversion:object_search [
            conversion:regex     "\\\\$([^\\\\s]*)";
            conversion:predicate foaf:homepage;
            conversion:object    "[/sd][\\\\1]";
         ];
      ];

adds the following triples to those shown above (full output file here):

@prefix stocks_global_value: <http://logd.tw.rpi.edu/source/twapperkeeper-com/dataset/stocks/> .

stocks:tweet_14522987982098432
   foaf:topic   "$ABC" ;
   foaf:homepage stocks_global_value:ABC ;
   sioc:topic    dbpedia:ABC ;

Note that the enhancements are Using template variables to construct new values, with additional [\1] variables that result from captured groups in the regex.

Example: Reusing the cells value

Take the entire value of the cell and construct a URL with it:

      conversion:enhance [
         ov:csvCol          3;
         ...
         conversion:object_search [
            conversion:regex     "^(.*)$";
            conversion:predicate foaf:homepage;
            conversion:object    "http://www.ncbi.nlm.nih.gov/pubmed/[\\1]";
         ];

See also

Clone this wiki locally