-
Notifications
You must be signed in to change notification settings - Fork 7
Assisting vocabulary selection
How does DataFAQs play a role in vocabulary selection? Would DataFAQs be used as part of an iterative process?
Yes. And Yes.
The vocabulary that one chooses to model their domain is critically important. Although many vocabularies may adequately communicate the topic of our interests, some vocabularies have more practical value than others.
To take an example from our most recent conversion, consider two alternate RDF forms of the same tabular row:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix local_vocab:
<http://logd.tw.rpi.edu/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/vocab/> .
@prefix e1:
<http://logd.tw.rpi.edu/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/vocab/enhancement/1/> .
@prefix biographical-directory-of-the-united-states-congress:
<http://localhost/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/> .
@prefix value_of_state:
<http://localhost/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/value-of/state/> .
@prefix :
<http://logd.tw.rpi.edu/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/version/2012-Jan-04/> .
:congressperson_49
dcterms:isReferencedBy
<http://logd.tw.rpi.edu/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/version/2012-Jan-04> ;
void:inDataset
<http://logd.tw.rpi.edu/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/version/2012-Jan-04> ;
a local_vocab:Congressperson , foaf:Person ;
foaf:firstName "John" ;
foaf:family_name "BULL" ;
e1:congress biographical-directory-of-the-united-states-congress:congress_0 ;
foaf:memberOf biographical-directory-of-the-united-states-congress:congress_0 ; # sic
foaf:workInfoHomepage <http://bioguide.congress.gov/scripts/biodisplay.pl?index=B001047> ,
<http://bioguide.congress.gov/scripts/guidedisplay.pl?index=B001047> ,
<http://bioguide.congress.gov/scripts/bibdisplay.pl?index=B001047> ;
con:preferredURI biographical-directory-of-the-united-states-congress:B001047 ;
prov:specializationOf biographical-directory-of-the-united-states-congress:B001047 ;
e1:doc "2012-01-04T02:12:01" ;
dbpediaprop:state value_of_state:SC;
.
value_of_state:SC
dcterms:identifier "SC" ;
rdfs:label "SC" ;
owl:sameAs dbpedia:South_Carolina ,
<http://sws.geonames.org/4597040/> ,
govtrackusgov:SC .
Many semantic web developers would agree that some of the modeling above is slightly better than the modeling that follows:
@prefix :
<http://logd.tw.rpi.edu/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/version/2012-Jan-04/> .
@prefix raw:
<http://logd.tw.rpi.edu/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/vocab/raw/> .
:thing_49
dcterms:isReferencedBy
<http://localhost/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/version/2012-Jan-04> ;
void:inDataset
<http://localhost/source/congress-gov/dataset/biographical-directory-of-the-united-states-congress/version/2012-Jan-04> ;
raw:first_name "John" ;
raw:last_name "BULL" ;
raw:congress "0" ;
raw:p_url "http://bioguide.congress.gov/scripts/biodisplay.pl?index=B001047" ;
raw:doc "2012-01-04T02:12:01" ;
raw:state "SC" ;
raw:death "1802" ;
raw:birth "1740c" ;
raw:party " " ;
raw:position "ContCong" ;
raw:c_yr "" ;
ov:csvRow "49"^^xsd:integer .
But what, exactly is better about? Well, lots of things. Different people are concerned about different aspects of the difference shown above. Some claims about quality may include:
-
foaf:firstNameis way better thanraw:first_namebecause 400 systems recognize it and display it. -
raw:p_urlas a URI and label is incomprehensible to anyone that did not build this database. And it's a literal, which means that RDF agents will not know that it can be resolved on the web. Usingfoaf:workInfoHomepageis way better because it already exists to associate a person with their work homepages. And systems recognize foaf already. And people know foaf already. -
e1:congressis way better thanraw:congressbecause its value is a URI that can be further described. Being stuck withraw:congress's value "0" is very uninformative. What do I do with zero? At very least, we can type thebiographical-directory-of-the-united-states-congress:congress_0and start describing it's temporal interval, etc. - ACK! Someone starting using
foaf:memberOf, when that URI is not defined in the foaf namespace! That violates Linked Data principles. On the other hand, it's pretty obvious what it is -- it's the inverse offoaf:memberand we can use it and have systems recognize it even without the FOAF Elite defining it in their vocabulary. Practicality can trump principles. Depending on who you ask. - We might not know what
local_vocab:Congresspersonis, but at least we know it's a kind of person foaf:Person. We can work with that. -
dbpediaprop:state :SCis way better thanraw:state "SC"because lots of people run to dbpedia for example data, so more people will start usingdbpediaprop:state. But when more people start using it without clear, established rules, they they'll use it inconsistently. So the relation will have many meanings and runs the risk of becoming meaningless.
DataFAQs is not designed to declare authoritative quality of the datasets it comes by. Instead, it is a framework to allow interested stakeholders to express, survey, and understand the aspects of quality that they and others value. This increased community understanding -- accelerated by automated, asynchronous feedback -- provides the basis for stakeholders to make better, more informed decisions about the vocabulary that they use. Those decisions are based on concrete, qualitative information that is provided by the community, for the community. DataFAQs just connects all of the dots, accumulates perspectives on datasets, and allows you to explore what the community thinks about your dataset.
DataFAQs can and will be used to assist vocabulary selection.
It is important to remember that DataFAQs is not only a resource that provides "grades" for datasets that you point it to. More importantly, it is a framework that allows any stakeholder to reflect their needs, interests, or preferences when it comes to the quality of any dataset.