11[ ![ Join the chat at https://gitter.im/rdfhdt ] ( https://badges.gitter.im/Join%20Chat.svg )] ( https://gitter.im/rdfhdt )
22[ ![ DOI] ( https://zenodo.org/badge/DOI/10.5281/zenodo.580298.svg )] ( https://doi.org/10.5281/zenodo.580298 )
33
4- # C++ library for the HDT triple format
4+ # C++ implementation of the HDT compression format
55
6- HDT keeps big RDF datasets compressed while maintaining efficient search and browse operations.
6+ Header Dictionary Triples (HDT) is a compression format for RDF data
7+ that can also be queried for Triple Patterns.
78
89## Getting Started
10+
911### Prerequisites
1012
11- The implementation has the following dependencies:
12- - [ Serd v0.28+] ( http://drobilla.net/software/serd/ ) This enables importing RDF data in the Turtle and N-Triples serialization formats specifically. The dependency is activated by default.
13- - [ libz] ( http://www.zlib.net/ ) Enables loading N-Triples files compressed with GZIP (e.g., ` file.nt.gz ` ) and gzipped HDTs (` file.hdt.gz ` ). The dependency is activated by default.
14- - [ Kyoto Cabinet] ( http://fallabs.com/kyotocabinet/ ) (optional) Enables generating big RDF datasets on machines without much RAM memory, by creating a temporary Kyoto Cabinet database. The dependency is deactivated by default; to activate it, call ` configure ` with ` --with-kyoto=yes ` flag during installation.
13+ In order to compile this library, you need to have the following
14+ dependencies installed:
15+
16+ - [ GNU Autoconf] ( https://www.gnu.org/software/autoconf/autoconf.html )
17+
18+ - ` sudo apt install autoconf ` on Debian-based distros (e.g., Ubuntu)
19+ - ` sudo dnf install autoconf ` on Red Hat-based distros (e.g.,
20+ Fedora)
21+
22+ - [ GNU Libtool] ( https://www.gnu.org/software/libtool/ )
23+
24+ - ` sudo apt install libtool ` on Debian-based distros (e.g., Ubuntu)
25+ - ` sudo dnf install libtool ` on Red Hat-based distros (e.g., Fedora)
26+
27+ - [ GNU zip (gzip)] ( http://www.zlib.net/ ) Allows GNU zipped RDF input
28+ files to be ingested, and allows GNU zipped HDT files to be loaded.
29+
30+ - ` sudo apt install gzip ` on Debian-based distros (e.g., Ubuntu).
31+ - ` sudo dnf install gzip ` in Red Hat-based distros (e.g., Fedora).
32+
33+ - [ Serd v0.28+] ( https://github.com/drobilla/serd ) The default parser
34+ that is used to process RDF input files. It supports the N-Quads,
35+ N-Triples, TriG, and Turtle serialization formats.
1536
16- The installation process has the following dependencies:
37+ - ` sudo apt install libserd-0-0 libserd-dev ` on Debian-based distros
38+ (e.g., Ubuntu).
39+ - ` sudo dnf install serd serd-devel ` on Red Hat-based distros (e.g.,
40+ Fedora).
1741
18- - [ autoconf] ( https://www.gnu.org/software/autoconf/autoconf.html )
19- - [ libtool] ( https://www.gnu.org/software/libtool/ )
42+ ### Installation
2043
21- The following commands should install both packages:
44+ To compile and install, run the following commands under the directory
45+ ` hdt-cpp ` . This will also compile and install some handy tools.
2246
23- sudo apt-get update
24- sudo apt-get install autoconf libtool
47+ ```
48+ ./autogen.sh
49+ ./configure
50+ make -j2
51+ sudo make install
52+ ```
2553
26- ### Installing
54+ ### Complications
2755
28- To compile and install, run the following commands under the directory ` hdt-cpp ` . This will generate the library and tools.
56+ Here we record complications, and possible workarounds, that people
57+ have found while performing the standard installation documented
58+ above.
2959
30- First run the following script to generate all necessary installation files with autotools:
60+ #### ` ./configure ` cannot find Serd
3161
32- ./autogen.sh
62+ While running ` ./configure ` you get a message similar to the
63+ following:
3364
34- Then, run:
65+ ```
66+ Package 'serd-0', required by 'virtual:world', not found
67+ ```
3568
36- ./configure
37- make -j2
69+ This means that ` ./configure ` cannot find the location of the
70+ ` serd-0.pc ` file on your computer. You have to find this location
71+ yourself, e.g., in the following way:
3872
39- ## Running
73+ ```
74+ find /usr/ -name serd-0.pc
75+ ```
4076
41- After building, these are the typical operations that you will perform:
77+ Once you have found the directory containing the ` serd-0.pc ` file, you
78+ have to inform the ` ./configure ` script about this location by setting
79+ the following environment variable (where directory
80+ ` /usr/local/lib/pkgconfig/ ` is adapted to your situation):
4281
43- - Convert your RDF data to HDT:
44-
45- NB: the input stream is assumed to be valid RDF, so you should validate your data before feeding it into rdf2hdt.
46-
47- ```
48- $ tools/rdf2hdt data/test.nt data/test.hdt
49- ```
82+ ```
83+ export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig/
84+ ```
5085
51- - Create only the index of an HDT file:
86+ ## Using HDT
5287
53- ```
54- $ tools/hdtSearch -q 0 data/test.hdt
55- ```
88+ After compiling and installing, you can use the handy tools that are
89+ located in ` hdt-cpp/libhdt/tools ` . We show some common tasks that can
90+ be performed with these tools.
5691
57- - Convert an HDT to another RDF serialization format, such as N-Triples:
92+ ### RDF-2-HDT: Creating an HDT
5893
59- ```
60- $ tools/hdt2rdf data/test.hdt data/test.hdtexport.nt
61- ```
94+ HDT files can only be created for standards-compliant RDF input files.
95+ If your input file is not standards-compliant RDF, it is not possible
96+ to create an HDT files out of it.
6297
63- - Open a terminal to search triple patterns within an HDT file:
98+ ```
99+ $ ./rdf2hdt data.nt data.hdt
100+ ```
64101
65- ```
66- $ tools/hdtSearch data/test.hdt
102+ ### HDT-2-RDF: Exporting an HDT
103+
104+ You can export an HDT file to an RDF file in one of the supported
105+ serialization formats (currently: N-Quads, N-Triples, TriG, and
106+ Turtle). The default serialization format for exporting is N-Triples.
107+
108+ ```
109+ $ ./hdt2rdf data.hdt data.nt
110+ ```
111+
112+ ### Querying for Triple Patterns
113+
114+ You can issue Triple Pattern (TP) queries in the terminal by
115+ specifying a subject, predicate, and/or object term. The questions
116+ mark (` ? ` ) denotes an uninstantiated term. For example, you can
117+ retrieve _ all_ the triples by querying for the TP ` ? ? ? ` :
118+
119+ $ ./hdtSearch data.hdt
67120 >> ? ? ?
68121 http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4
69122 http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5
@@ -83,24 +136,36 @@ After building, these are the typical operations that you will perform:
83136 2 results shown.
84137
85138 >> exit
86- ```
87139
88- - Extract the Header of an HDT file:
140+ ### Exporting the header
141+
142+ The header component of an HDT contains metadata describing the data
143+ contained in the HDT, as well as the creation metadata about the HDT
144+ itself. The contents of the header can be exported to an N-Triples
145+ file:
146+
147+ ```
148+ $ ./hdtInfo data.hdt > header.nt
149+ ```
89150
90- ```
91- $ tools/hdtInfo data/test.hdt > header.nt
92- ```
151+ ### Replacing the Header
93152
94- - Replace the Header of an HDT file with a new one. For example, by editing the existing one as extracted using `hdtInfo`:
153+ It can be useful to update the header information of an HDT. This can
154+ be done by generating a new HDT file (` new.hdt ` ) out of an existing
155+ HDT file (` old.hdt ` ) and an N-Triples file (` new-header.nt ` ) that
156+ contains the new header information:
95157
96- ```
97- $ tools /replaceHeader data/test .hdt data/testOutput .hdt newHeader .nt
98- ```
158+ ```
159+ $ . /replaceHeader old .hdt new .hdt new-header .nt
160+ ```
99161
100162## Contributing
101163
102- Contributions and PRs should be sent to the `develop` branch, and not to `master`.
164+ Contributions are welcome! Please base your contributions and pull
165+ requests (PRs) on the ` develop ` branch, and not on the ` master `
166+ branch.
103167
104168## License
105169
106- `hdt-cpp` is free software licensed as GNU Lesser General Public License. See `libhdt/COPYRIGHT`
170+ ` hdt-cpp ` is free software licensed as GNU Lesser General Public
171+ License (GPL). See ` libhdt/COPYRIGHT ` .
0 commit comments