Skip to content

Script: pcurl.py

timrdf edited this page Jan 3, 2012 · 46 revisions

$CSV2RDF4LOD_HOME/bin/util/pcurl.py is Jim McCusker's reimplemention of pcurl.sh to include FRBR stacks and HTTP-in-RDF. He has included it as part of csv2rdf4lod-automation. Applications of this utility are described in the following publications:

Usage

bash-3.2$ pcurl.py --help
usage: pcurl.py [--help|-h] [--format|-f xml|turtle|n3|nt] [url ...]

Download a URL and compute Functional Requirements for Bibliographic Resources
(FRBR) stacks using cryptograhic digests for the resulting content.

Refer to http://purl.org/twc/pub/mccusker2012parallel
for more information and examples.

optional arguments:
 url            url to compute a FRBR stack for.
 -h, --help     Show this help message and exit,
 -f, --format   File format for FRBR stacks. One of xml, turtle, n3, or nt.

fstack.py is closely associated to pcurl.py. While pcurl.py is used to retrieve a URL and including its FRBR stack, fstack.py can be used to create a FRBR stack of an existing local file.

bash-3.2$ fstack.py --help
usage: fstack.py [--help|-h] [--stdout|-c] [--format|-f xml|turtle|n3|nt] [--print-item] [--print-manifesation] [--print-expression] [--print-work] [-] [file ...]

Compute Functional Requirements for Bibliographic Resources (FRBR)
stacks using cryptograhic digests.

Refer to http://purl.org/twc/pub/mccusker2012parallel
for more information and examples.

optional arguments:
 file                  File to compute a FRBR stack for.
 -                     Read content from stdin and print FRBR stack to stdout.
 -h, --help            Show this help message and exit,
 -c, --stdout          Print frbr stacks to stdout.
 --no-paths            Only output path hashes, not actual paths.
 -f, --format          File format for FRBR stacks. xml, turtle, n3, or nt.
--print-item           Print URI of the Item and quit.
--print-manifestation  Print URI of the Manifestation and quit.
--print-expression     Print URI of the Expression and quit.
--print-work           Print URI of the Work and quit.

Example

The following command will retrieve the latest pcurl.py script and store it to a file in your current directory. The script will include a second file describing the provenance of the one retrieved.

bash-3.2$ pcurl.py https://raw.github.com/timrdf/csv2rdf4lod-automation/master/bin/util/pcurl.py
bash-3.2$ ls
pcurl.py.prov.ttl		pcurl.py

If something happens to the file you retrieved (e.g., a file copy or rename), $CSV2RDF4LOD_HOME//bin/util/fstack.py can be used to recognize an association between the downloaded file and the one we see now:

bash-3.2$ cp pcurl.py mypcurl.py
bash-3.2$ fstack.py mypcurl.py
bash-3.2$ ls
pcurl.py.prov.ttl	pcurl.py		mypcurl.py		mypcurl.py.prov.ttl

To see that the different files pcurl.py and mypcurl.py have the same bitstream

<tag:tw.rpi.edu,2011:filed:SVbQMPyfteayT_XeWKRnygrxhqoAMncsgdRwexQtugw=/sha-256-gvr2NDAF7C0HOGuGFEoYwIbs7mQit_TABy8hQJHIlhU=/pcurl.py>
   a frbr:Item;
   nfo:fileUrl <file:////Users/lebot/pcurl.py>,
               <pcurl.py>;
   dct:modified "2012-01-03T11:05:33"^^xsd:dateTime;
   frbr:exemplarOf <tag:tw.rpi.edu,2011:manifestation:sha-256-81X-JdHSWIdGwDaFk8Mlv8iW_TqlUpG2UCZh1ue04HU=>;

pcurl.py and mypcurl.py are different frbr:Items with the same frbr:Manifestation and frbr:Expression.

More than just message digests (md5, sha1, etc)

If any character of mypcurl.py changes, the derived frbr:Item will have a different frbr:Manifestation and frbr:Expression from that of pcurl.py because we cannot automatically identify these more abstract notions for the procedural python instructions.

However, this shortcoming can be overcome when your files encode RDF instead of procedural code. To demonstrate this, we use $CSV2RDF4LOD_HOME/bin/util/tic.sh to obtain some (incomplete) RDF description of the python script, such as its author.

bash-3.2$ tic.sh mypcurl.py > mypcurl.py.ttl
bash-3.2$ cat mypcurl.py.ttl | grep "doap:developer"
    doap:developer twi:JamesMcCusker ;

Although changing the serialization of the Turtle describing mypcurl.py results in a new frbr:Manifestation, the new frbr:Item associates to the same frbr:Expression as the first.

bash-3.2$ rapper -q -g -o rdfxml-abbrev mypcurl.py.ttl > mypcurl.py.ttl.rdf
bash-3.2$ fstack.py --no-paths mypcurl.py.ttl
bash-3.2$ fstack.py --no-paths mypcurl.py.ttl.rdf

Clone this wiki locally