11[ ![ Join the chat at https://gitter.im/rdfhdt ] ( https://badges.gitter.im/Join%20Chat.svg )] ( https://gitter.im/rdfhdt )
22[ ![ DOI] ( https://zenodo.org/badge/DOI/10.5281/zenodo.580298.svg )] ( https://doi.org/10.5281/zenodo.580298 )
33
4- # C++ library for the HDT triple format
4+ # C++ implementation of the HDT compression format
55
6- HDT keeps big RDF datasets compressed while maintaining efficient search and browse operations.
6+ Header Dictionary Triples (HDT) is a compression format for RDF data
7+ that can also be queried for Triple Patterns.
78
89## Getting Started
10+
911### Prerequisites
1012
11- The implementation has the following dependencies:
12- - [ Serd v0.28+] ( http://drobilla.net/software/serd/ ) This enables importing RDF data in the Turtle and N-Triples serialization formats specifically. The dependency is activated by default.
13- - [ libz] ( http://www.zlib.net/ ) Enables loading N-Triples files compressed with GZIP (e.g., ` file.nt.gz ` ) and gzipped HDTs (` file.hdt.gz ` ). The dependency is activated by default.
13+ In order to compile this library, you need to have the following
14+ dependencies installed:
15+
16+ - [ GNU Autoconf] ( https://www.gnu.org/software/autoconf/autoconf.html )
17+
18+ - ` sudo apt install autoconf ` on Debian-based distros (e.g., Ubuntu)
19+ - ` sudo dnf install autoconf ` on Red Hat-based distros (e.g.,
20+ Fedora)
21+
22+ - [ GNU Libtool] ( https://www.gnu.org/software/libtool/ )
1423
15- The installation process has the following dependencies:
24+ - ` sudo apt install libtool ` on Debian-based distros (e.g., Ubuntu)
25+ - ` sudo dnf install libtool ` on Red Hat-based distros (e.g., Fedora)
26+
27+ - [ GNU zip (gzip)] ( http://www.zlib.net/ ) Allows GNU zipped RDF input
28+ files to be ingested, and allows GNU zipped HDT files to be loaded.
29+
30+ - ` sudo apt install gzip ` on Debian-based distros (e.g., Ubuntu)
31+ - ` sudo dnf install gzip ` on Red Hat-based distros (e.g., Fedora)
1632
17- - [ autoconf] ( https://www.gnu.org/software/autoconf/autoconf.html )
18- - [ libtool] ( https://www.gnu.org/software/libtool/ )
1933- [ pkg-config] ( https://www.freedesktop.org/wiki/Software/pkg-config/ )
34+ A helper tool for compiling applications and libraries.
35+
36+ - ` sudo apt install pkg-config ` on Debian-based distros (e.g.,
37+ Ubuntu)
38+ - ` sudo dnf install pkgconf-pkg-config ` on Red Hat-based distros
39+ (e.g., Fedora)
2040
21- The following commands should install all packages:
41+ - [ Serd v0.28+] ( https://github.com/drobilla/serd ) The default parser
42+ that is used to process RDF input files. It supports the N-Quads,
43+ N-Triples, TriG, and Turtle serialization formats.
2244
23- sudo apt-get update
24- sudo apt-get install autoconf libtool pkg-config
45+ - ` sudo apt install libserd-0-0 libserd-dev ` on Debian-based distros
46+ (e.g., Ubuntu)
47+ - ` sudo dnf install serd serd-devel ` on Red Hat-based distros (e.g.,
48+ Fedora)
2549
26- ### Installing
50+ ### Installation
2751
28- To compile and install, run the following commands under the directory ` hdt-cpp ` . This will generate the library and tools.
52+ To compile and install, run the following commands under the directory
53+ ` hdt-cpp ` . This will also compile and install some handy tools.
2954
30- First run the following script to generate all necessary installation files with autotools:
55+ ```
56+ ./autogen.sh
57+ ./configure
58+ make -j2
59+ sudo make install
60+ ```
3161
32- ./autogen.sh
62+ ### Installation issues
3363
34- Then, run:
64+ Sometimes, the above instructions do not result in a working HDT
65+ installation. This section enumerates common issues and their
66+ workaround.
3567
36- ./configure
37- make -j2
68+ #### ` ./configure ` cannot find Serd
3869
39- If you get the error ` No package 'serd-0' found ` with ` ./configure ` , you must install Serd manually (the last command may require ` sudo ` ):
70+ While running ` ./configure ` you get a message similar to the
71+ following:
4072
41- ``` shell
42- wget https://github.com/drobilla/serd/archive/v0.28.0.tar.gz && \
43- tar -xvzf * .tar.gz && rm * .tar.gz && cd serd-* && \
44- ./waf configure && ./waf && \
45- ./waf install
4673```
47- ## Running
74+ Package 'serd-0', required by 'virtual:world', not found
75+ ```
4876
49- After building, these are the typical operations that you will perform:
77+ This means that ` ./configure ` cannot find the location of the
78+ ` serd-0.pc ` file on your computer. You have to find this location
79+ yourself, e.g., in the following way:
5080
51- - Convert your RDF data to HDT:
81+ ``` sh
82+ find /usr/ -name serd-0.pc
83+ ```
5284
53- NB: the input stream is assumed to be valid RDF, so you should validate your data before feeding it into rdf2hdt.
85+ Once you have found the directory containing the ` serd-0.pc ` file, you
86+ have to inform the ` ./configure ` script about this location by setting
87+ the following environment variable (where directory
88+ ` /usr/local/lib/pkgconfig/ ` is adapted to your situation):
5489
55- ```
56- $ libhdt/tools/rdf2hdt data/test.nt data/test.hdt
57- ```
90+ ``` sh
91+ export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig/
92+ ```
5893
59- - Create only the index of an HDT file:
94+ ## Using HDT
6095
61- ```
62- $ libhdt/tools/hdtSearch -q 0 data/test.hdt
63- ```
96+ After compiling and installing, you can use the handy tools that are
97+ located in ` hdt-cpp/ libhdt/tools` . We show some common tasks that can
98+ be performed with these tools.
6499
65- - Convert an HDT to another RDF serialization format, such as N-Triples:
100+ ### RDF-2-HDT: Creating an HDT
66101
67- ```
68- $ libhdt/tools/hdt2rdf data/test.hdt data/test.hdtexport.nt
69- ```
102+ HDT files can only be created for standards-compliant RDF input files.
103+ If your input file is not standards-compliant RDF, it is not possible
104+ to create an HDT files out of it.
70105
71- - Open a terminal to search triple patterns within an HDT file:
106+ ```
107+ $ ./rdf2hdt data.nt data.hdt
108+ ```
72109
73- ```
74- $ libhdt/tools/hdtSearch data/test.hdt
110+ ### HDT-2-RDF: Exporting an HDT
111+
112+ You can export an HDT file to an RDF file in one of the supported
113+ serialization formats (currently: N-Quads, N-Triples, TriG, and
114+ Turtle). The default serialization format for exporting is N-Triples.
115+
116+ ```
117+ $ ./hdt2rdf data.hdt data.nt
118+ ```
119+
120+ ### Querying for Triple Patterns
121+
122+ You can issue Triple Pattern (TP) queries in the terminal by
123+ specifying a subject, predicate, and/or object term. The questions
124+ mark (` ? ` ) denotes an uninstantiated term. For example, you can
125+ retrieve _ all_ the triples by querying for the TP ` ? ? ? ` :
126+
127+ $ ./hdtSearch data.hdt
75128 >> ? ? ?
76129 http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4
77130 http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5
@@ -91,24 +144,36 @@ After building, these are the typical operations that you will perform:
91144 2 results shown.
92145
93146 >> exit
94- ```
95147
96- - Extract the Header of an HDT file:
148+ ### Exporting the header
97149
98- ```
99- $ libhdt/tools/hdtInfo data/test.hdt > header.nt
100- ```
150+ The header component of an HDT contains metadata describing the data
151+ contained in the HDT, as well as the creation metadata about the HDT
152+ itself. The contents of the header can be exported to an N-Triples
153+ file:
101154
102- - Replace the Header of an HDT file with a new one. For example, by editing the existing one as extracted using `hdtInfo`:
155+ ```
156+ $ ./hdtInfo data.hdt > header.nt
157+ ```
158+
159+ ### Replacing the Header
103160
104- ```
105- $ libhdt/tools/replaceHeader data/test.hdt data/testOutput.hdt newHeader.nt
106- ```
161+ It can be useful to update the header information of an HDT. This can
162+ be done by generating a new HDT file (` new.hdt ` ) out of an existing
163+ HDT file (` old.hdt ` ) and an N-Triples file (` new-header.nt ` ) that
164+ contains the new header information:
165+
166+ ```
167+ $ ./replaceHeader old.hdt new.hdt new-header.nt
168+ ```
107169
108170## Contributing
109171
110- Contributions and PRs should be sent to the `develop` branch, and not to `master`.
172+ Contributions are welcome! Please base your contributions and pull
173+ requests (PRs) on the ` develop ` branch, and not on the ` master `
174+ branch.
111175
112176## License
113177
114- `hdt-cpp` is free software licensed as GNU Lesser General Public License. See `libhdt/COPYRIGHT`
178+ ` hdt-cpp ` is free software licensed as GNU Lesser General Public
179+ License (GPL). See ` libhdt/COPYRIGHT ` .
0 commit comments