Skip to content

Script: pcurl.py

timrdf edited this page Jan 3, 2012 · 46 revisions

$CSV2RDF4LOD_HOME/bin/util/pcurl.py is Jim McCusker's reimplemention of pcurl.sh to include FRBR stacks and HTTP-in-RDF. He has included it as part of csv2rdf4lod-automation. Applications of this utility are described in the following publications:

Usage

bash-3.2$ pcurl.py --help
usage: pcurl.py [--help|-h] [--format|-f xml|turtle|n3|nt] [url ...]

Download a URL and compute Functional Requirements for Bibliographic Resources
(FRBR) stacks using cryptograhic digests for the resulting content.

Refer to http://purl.org/twc/pub/mccusker2012parallel
for more information and examples.

optional arguments:
 url            url to compute a FRBR stack for.
 -h, --help     Show this help message and exit,
 -f, --format   File format for FRBR stacks. One of xml, turtle, n3, or nt.

Example

The following command will retrieve the latest pcurl.py script and store it to a file in your current directory. The script will include a second file describing the provenance of the one retrieved.

bash-3.2$ pcurl.py https://raw.github.com/timrdf/csv2rdf4lod-automation/master/bin/util/pcurl.py
bash-3.2$ ls
pcurl.py.prov.ttl		pcurl.py

If something happens to the file you retrieved (e.g., a file copy or rename), $CSV2RDF4LOD_HOME//bin/util/fstack.py can be used to recognize an association between the downloaded file and the one we see now:

bash-3.2$ cp pcurl.py mypcurl.py
bash-3.2$ fstack.py mypcurl.py
bash-3.2$ ls
pcurl.py.prov.ttl	pcurl.py		mypcurl.py		mypcurl.py.prov.ttl

pcurl.py and mypcurl.py are different frbr:Items with the same frbr:Manifestation and frbr:Expression. If any character of mypcurl.py changes, the derived frbr:Item will have a different frbr:Manifestation and frbr:Expression from that of pcurl.py because we cannot automatically identify these more abstract notions for the procedural python instructions.

However, this shortcoming can be overcome when your files encode RDF instead of procedural code. To demonstrate this, we use $CSV2RDF4LOD_HOME/bin/util/tic.sh to obtain some (incomplete) RDF description of the python script, such as its author and homepage.

bash-3.2$ tic.sh mypcurl.py > mypcurl.py.ttl
bash-3.2$ cat mypcurl.py.ttl | grep "doap:developer"
    doap:developer twi:JamesMcCusker ;

Clone this wiki locally