Skip to content

Data Quality

Tim L edited this page Jun 19, 2013 · 40 revisions

Quality data...

  • ... is structured similarly to dataset X using uniform vocabulary.
  • ... is structured similarly to dataset X.
  • ... I [dis]agree with.
  • ... I understand.
  • ... is complete.
  • ... explicitly connects to the data currently portrayed in visual artifact X. (e.g. A book's two pages, currently visible)
  • ... explicitly connects to the data portrayed in visual artifact X. (e.g. An entire book, yet to be opened)
  • ... explicitly connects to dataset X.
  • ... I find interesting.
  • ... explicitly connects to other datasets. (i.e. TBL-5)
  • ... is in RDF that I can retrieve as a dump.
  • ... is in RDF that I can retrieve via SPARQL query.
  • ... is in RDF that I can retrieve with dereferencable URIs.
  • ... is in RDF that I can retrieve.
  • ... is in RDF. (i.e. TBL-4)
  • ... I can retrieve and is machine processable using my own (or open) tools. (i.e. TBL-3)
  • ... I can retrieve and is machine processable. (i.e. TBL-2)
  • ... I may and can retrieve. (i.e. TBL-1)
  • ... I may retrieve. (i.e. open)

situate:

Related work

Hausenblas 2008

  • Number of triples for each dataset
  • Number of datasets
  • Number of interlinks [from each dataset [to each other dataset]]
  • Accessible via data dump, accessible via SPARQL query, accessible via crawling
  • For each property in a dataset, the number of extra-namespace URI values that are dereferencable
  • Density: Number of extra-namespace URI values in a dataset / The size of the dataset.
  • Density: Number of extra-namespace URI values in a dataset / Number of instances of a given class in the dataset.

Tummarello 2007

  • Distribution of URIs over documents

Ding 2005

  • Distribution of URIs over documents
  • Interlinking

Wang 2006

  • schema level gauges

Helena's survey

Clone this wiki locally