Skip to content

Executive summary

timrdf edited this page Mar 4, 2011 · 33 revisions

Our methodology to convert tabular literals to a Resource Description Framework representation enables answers to novel questions that could not previously be answered because it explicitly connects previously disconnected datasets that can and have been queried in a uniform fashion.

(todo :-)

  • Provenance-inspired naming of datasets and the entities they describe (using "the essential three": version, dataset, and version).
  • Minimal effort to obtain initial RDF from tabular formats. Get what you need and quickly move on to the rest of your application.
  • Declarative interpretation parameters control resulting RDF structure.
  • Parallels RDFS and OWL axioms, but applies to tabular literals instead of existing RDF.
  • Provides backwards-compatible enhancements to initial verbatim RDF interpretation (usig layered predicate design).
  • Leverages previous enhancement parameters via an include mechanism.
  • Leverages RDF output of previous conversions as enhancement parameters for subsequent conversions.
  • Abbreviated description of resulting structure (no need to dig into custom code).
  • Uniform treatment and results across dataset application .
  • No immediate need to worry about what to name resources with (cmp. Krextor)
  • No immediate concern for where to name vocabulary classes and predicates (really nice defaults). (cmp. Krextor)
  • Nice CURIE handling (slightly easier to read RDF). (cmp Krextor)
  • Correctly oriented paradigm (Looking forward and tweaking end result instead of looking back and picking out; all gets through by default (cmp Krextor)).

Evidence

  • number triples of verbatim interpretation parameters vs number triples of enhanced interpretation parameters.
  • percentage increase from raw to enhanced compared to percentage increase in number triples in raw to enhanced.
  • number of triples in verbatim interpretation vs. number of triples in enhanced interpretation parameters.
  • vocabulary reuse distribution in verbatim vs. vocabulary reuse distribution in enhanced.
  • vocabulary "depth" - dataset scoped is too low. foaf is high.
  • connectivity to other datasets via shared entities, owl:sameAs, common predicates/classes.
  • histogram at conversion:num_invocation_logs

Clone this wiki locally