Skip to content

Executive summary

timrdf edited this page Mar 5, 2011 · 33 revisions

Our methodology to convert tabular literals to the Resource Description Framework (RDF) enables answers to novel questions by establishing explicit connections among previously disconnected datasets. We provide a simple, minimal-effort entry path to "just getting it done", so that applications can be rapidly developed without concern for a wealth of information design considerations that our system solves without user effort. In addition to providing a "quick and easy" solution to getting RDF, the information design is prepared to provide backward compatible, iterative improvements of the data as time and needs permit.

The benefits of our initial lightweight organizational structure becomes invaluable for managing the number and heterogeneity of the datasets that one may accumulate. To incorporate an additional dataset, the only information required from a human curator are three local identifiers for the source from which the obtained it, the dataset by which the source refers to their data, and the version of the concrete artifacts obtained from the source.

(todo :-)

  • Provenance-inspired naming of datasets and the entities they describe (using "the essential three": version, dataset, and version).
  • Minimal effort to obtain initial RDF from tabular formats. Get what you need and quickly move on to the rest of your application.
  • Declarative interpretation parameters control resulting RDF structure.
  • Parallels RDFS and OWL axioms, but applies to tabular literals instead of existing RDF.
  • Provides backwards-compatible enhancements to initial verbatim RDF interpretation (usig layered predicate design).
  • Leverages previous enhancement parameters via an include mechanism.
  • Leverages RDF output of previous conversions as enhancement parameters for subsequent conversions.
  • Abbreviated description of resulting structure (no need to dig into custom code).
  • Uniform treatment and results across dataset application .
  • No immediate need to worry about what to name resources with (cmp. Krextor)
  • No immediate concern for where to name vocabulary classes and predicates (really nice defaults). (cmp. Krextor)
  • Nice CURIE handling (slightly easier to read RDF). (cmp Krextor)
  • Correctly oriented paradigm (Looking forward and tweaking end result instead of looking back and picking out; all gets through by default (cmp Krextor)).

Evidence

  • number triples of verbatim interpretation parameters vs number triples of enhanced interpretation parameters.
  • percentage increase from raw to enhanced compared to percentage increase in number triples in raw to enhanced.
  • number of triples in verbatim interpretation vs. number of triples in enhanced interpretation parameters.
  • vocabulary reuse distribution in verbatim vs. vocabulary reuse distribution in enhanced.
  • vocabulary "depth" - dataset scoped is too low. foaf is high.
  • connectivity to other datasets via shared entities, owl:sameAs, common predicates/classes.
  • histogram at conversion:num_invocation_logs

Clone this wiki locally