Skip to content

Querying datasets created by csv2rdf4lod

timrdf edited this page Mar 15, 2011 · 51 revisions

Other pages showing queries

What version of the converter was used?

Show a time line of which version of the converter was used, how many datasets converted with it results):

PREFIX dcterms:    <http://purl.org/dc/terms/>
PREFIX doap:       <http://usefulinc.com/ns/doap#>
PREFIX pmlp:       <http://inference-web.org/2.0/pml-provenance.owl#>
PREFIX pmlj:       <http://inference-web.org/2.0/pml-justification.owl#>
PREFIX conversion: <http://purl.org/twc/vocab/conversion/>

SELECT distinct max(?date) as ?modified ?engine ?revision count(?dataset) as ?count  
WHERE {
  GRAPH <http://logd.tw.rpi.edu/vocab/Dataset>  {

    ?dataset a conversion:VersionedDataset; 
             void:dataDump ?dumpFile .
    optional { ?dataset dcterms:modified ?date }

    ?ns pmlj:hasConclusion ?dumpFile;
        pmlj:isConsequentOf [
           a pmlj:InferenceStep;
           pmlj:hasInferenceEngine ?engine
        ]
    .
    optional { ?engine doap:revision      ?revision   }
  }
} group by ?engine ?revision order by ?modified ?count

Drafts

Is there a way to know which datasets are fully loaded in the sparql endpoint?

was there any update on this question? Is running the sparql query as below at http://logd.tw.rpi.edu/sparql return the complete list of loaded datasets?

PREFIX conversion: <http://purl.org/twc/vocab/conversion/>
SELECT ?g sum( ?triples ) as ?estimated_triples
WHERE {
  GRAPH ?g  {
   ?g void:subset ?subdataset .
   ?subdataset conversion:num_triples ?triples .
   filter regex(?g, "data-gov")
  }
} 
GROUP BY ?g

bad sources: prefix dcterms: http://purl.org/dc/terms/ prefix conversion: http://purl.org/twc/vocab/conversion/

SELECT count(distinct ?organization) as ?count
WHERE {    
  graph <http://logd.tw.rpi.edu/vocab/Dataset> {      
    ?dataset a conversion:Dataset;
             dcterms:source ?organization .
    filter(!regex(str(?organization),".*provenance_file.*"))
  }
}

Latest version of a dataset

results:

PREFIX void:    <http://rdfs.org/ns/void#>
PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT ?subset ?modified
WHERE {
  GRAPH <http://logd.tw.rpi.edu/vocab/Dataset> {
    <http://logd.tw.rpi.edu/source/data-gov/dataset/92> void:subset ?subset .
    optional { ?subset dcterms:modified ?modified }
  }
}order by desc(?modified)

latest dump file

Alvaro is using this query http://logd.tw.rpi.edu/query/logd-data-list-latest-dump-file-for-dataset.sql to obtain the latest dump for a dataset. However they appear only for some datasets (see http://logd.tw.rpi.edu/datasets)

PREFIX foaf:       <http://xmlns.com/foaf/0.1/>
PREFIX dcterms:    <http://purl.org/dc/terms/>
prefix conversion: <http://purl.org/twc/vocab/conversion/>

SELECT distinct ?dataset ?dump_file
WHERE {

 graph <http://logd.tw.rpi.edu/vocab/Dataset> {
       ?dataset
            a conversion:Dataset;
            void:subset ?version .
       ?version a conversion:VersionedDataset .

  optional {
   ?version void:subset  ?layer .
   {
    {
     ?layer 
            void:dataDump ?dump_file ;
            dcterms:created ?creationtime .
    }
    UNION
    {
     ?descriminator conversion:num_triples ?triples .
     ?layer  
             void:dataDump ?dump_file ;
             dcterms:created ?creationtime .
    }
   }
  }
 }
}
ORDER BY DESC(?creationtime)

Historical notes

(a few more sprinkled around)

Provenance queries

Trying to get to the param files (so we can count their triples so quantify effort to create them).

Use case: find the parameters used during the conversion. (querying this is now difficult and needs to be eased)

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX pmlp: <http://inference-web.org/2.0/pml-provenance.owl#>
PREFIX pmlj: <http://inference-web.org/2.0/pml-justification.owl#>

PREFIX conversion: <http://purl.org/twc/vocab/conversion/>
SELECT distinct ?conclusion
WHERE {
  GRAPH <http://logd.tw.rpi.edu/vocab/Dataset>  {
    ?versioned rdf:first ?thing .
    ?thing pmlj:hasConclusion ?conclusion .
    ?conclusion pmlp:hasFormat <http://inference-web.org/registry/FMT/RDFAbstractSyntax.owl#RDFAbstractSyntax> .
  }
}

Finding all of the sources:

PREFIX pmlp: <http://inference-web.org/2.0/pml-provenance.owl#>
PREFIX irw:  <http://www.ontologydesignpatterns.org/ont/web/irw.owl#>

SELECT ?url
WHERE {
  GRAPH <http://logd.tw.rpi.edu/vocab/Dataset> {
               ?url a pmlp:Source .
    optional { ?url irw:redirectsTo ?none }
                      filter(!bound(?none))
  }
}

Clone this wiki locally