Skip to content

Commit a2e8f57

Browse files
committed
DOC: Rewrote the documentation to improve readability.
1 parent 58ef9f3 commit a2e8f57

1 file changed

Lines changed: 115 additions & 50 deletions

File tree

README.md

Lines changed: 115 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,69 +1,122 @@
11
[![Join the chat at https://gitter.im/rdfhdt](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/rdfhdt)
22
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.580298.svg)](https://doi.org/10.5281/zenodo.580298)
33

4-
# C++ library for the HDT triple format
4+
# C++ implementation of the HDT compression format
55

6-
HDT keeps big RDF datasets compressed while maintaining efficient search and browse operations.
6+
Header Dictionary Triples (HDT) is a compression format for RDF data
7+
that can also be queried for Triple Patterns.
78

89
## Getting Started
10+
911
### Prerequisites
1012

11-
The implementation has the following dependencies:
12-
- [Serd v0.28+](http://drobilla.net/software/serd/) This enables importing RDF data in the Turtle and N-Triples serialization formats specifically. The dependency is activated by default.
13-
- [libz](http://www.zlib.net/) Enables loading N-Triples files compressed with GZIP (e.g., `file.nt.gz`) and gzipped HDTs (`file.hdt.gz`). The dependency is activated by default.
14-
- [Kyoto Cabinet](http://fallabs.com/kyotocabinet/) (optional) Enables generating big RDF datasets on machines without much RAM memory, by creating a temporary Kyoto Cabinet database. The dependency is deactivated by default; to activate it, call `configure` with `--with-kyoto=yes` flag during installation.
13+
In order to compile this library, you need to have the following
14+
dependencies installed:
15+
16+
- [GNU Autoconf](https://www.gnu.org/software/autoconf/autoconf.html)
17+
18+
- `sudo apt install autoconf` on Debian-based distros (e.g., Ubuntu)
19+
- `sudo dnf install autoconf` on Red Hat-based distros (e.g.,
20+
Fedora)
21+
22+
- [GNU Libtool](https://www.gnu.org/software/libtool/)
23+
24+
- `sudo apt install libtool` on Debian-based distros (e.g., Ubuntu)
25+
- `sudo dnf install libtool` on Red Hat-based distros (e.g., Fedora)
26+
27+
- [GNU zip (gzip)](http://www.zlib.net/) Allows GNU zipped RDF input
28+
files to be ingested, and allows GNU zipped HDT files to be loaded.
29+
30+
- `sudo apt install gzip` on Debian-based distros (e.g., Ubuntu).
31+
- `sudo dnf install gzip` in Red Hat-based distros (e.g., Fedora).
32+
33+
- [Serd v0.28+](https://github.com/drobilla/serd) The default parser
34+
that is used to process RDF input files. It supports the N-Quads,
35+
N-Triples, TriG, and Turtle serialization formats.
1536

16-
The installation process has the following dependencies:
37+
- `sudo apt install libserd-0-0 libserd-dev` on Debian-based distros
38+
(e.g., Ubuntu).
39+
- `sudo dnf install serd serd-devel` on Red Hat-based distros (e.g.,
40+
Fedora).
1741

18-
- [autoconf](https://www.gnu.org/software/autoconf/autoconf.html)
19-
- [libtool](https://www.gnu.org/software/libtool/)
42+
### Installation
2043

21-
The following commands should install both packages:
44+
To compile and install, run the following commands under the directory
45+
`hdt-cpp`. This will also compile and install some handy tools.
2246

23-
sudo apt-get update
24-
sudo apt-get install autoconf libtool
47+
```
48+
./autogen.sh
49+
./configure
50+
make -j2
51+
sudo make install
52+
```
2553

26-
### Installing
54+
### Complications
2755

28-
To compile and install, run the following commands under the directory `hdt-cpp`. This will generate the library and tools.
56+
Here we record complications, and possible workarounds, that people
57+
have found while performing the standard installation documented
58+
above.
2959

30-
First run the following script to generate all necessary installation files with autotools:
60+
#### `./configure` cannot find Serd
3161

32-
./autogen.sh
62+
While running `./configure` you get a message similar to the
63+
following:
3364

34-
Then, run:
65+
```
66+
Package 'serd-0', required by 'virtual:world', not found
67+
```
3568

36-
./configure
37-
make -j2
69+
This means that `./configure` cannot find the location of the
70+
`serd-0.pc` file on your computer. You have to find this location
71+
yourself, e.g., in the following way:
3872

39-
## Running
73+
```
74+
find /usr/ -name serd-0.pc
75+
```
4076

41-
After building, these are the typical operations that you will perform:
77+
Once you have found the directory containing the `serd-0.pc` file, you
78+
have to inform the `./configure` script about this location by setting
79+
the following environment variable (where directory
80+
`/usr/local/lib/pkgconfig/` is adapted to your situation):
4281

43-
- Convert your RDF data to HDT:
44-
45-
NB: the input stream is assumed to be valid RDF, so you should validate your data before feeding it into rdf2hdt.
46-
47-
```
48-
$ tools/rdf2hdt data/test.nt data/test.hdt
49-
```
82+
```
83+
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig/
84+
```
5085

51-
- Create only the index of an HDT file:
86+
## Using HDT
5287

53-
```
54-
$ tools/hdtSearch -q 0 data/test.hdt
55-
```
88+
After compiling and installing, you can use the handy tools that are
89+
located in `hdt-cpp/libhdt/tools`. We show some common tasks that can
90+
be performed with these tools.
5691

57-
- Convert an HDT to another RDF serialization format, such as N-Triples:
92+
### RDF-2-HDT: Creating an HDT
5893

59-
```
60-
$ tools/hdt2rdf data/test.hdt data/test.hdtexport.nt
61-
```
94+
HDT files can only be created for standards-compliant RDF input files.
95+
If your input file is not standards-compliant RDF, it is not possible
96+
to create an HDT files out of it.
6297

63-
- Open a terminal to search triple patterns within an HDT file:
98+
```
99+
$ ./rdf2hdt data.nt data.hdt
100+
```
64101

65-
```
66-
$ tools/hdtSearch data/test.hdt
102+
### HDT-2-RDF: Exporting an HDT
103+
104+
You can export an HDT file to an RDF file in one of the supported
105+
serialization formats (currently: N-Quads, N-Triples, TriG, and
106+
Turtle). The default serialization format for exporting is N-Triples.
107+
108+
```
109+
$ ./hdt2rdf data.hdt data.nt
110+
```
111+
112+
### Querying for Triple Patterns
113+
114+
You can issue Triple Pattern (TP) queries in the terminal by
115+
specifying a subject, predicate, and/or object term. The questions
116+
mark (`?`) denotes an uninstantiated term. For example, you can
117+
retrieve _all_ the triples by querying for the TP `? ? ?`:
118+
119+
$ ./hdtSearch data.hdt
67120
>> ? ? ?
68121
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4
69122
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5
@@ -83,24 +136,36 @@ After building, these are the typical operations that you will perform:
83136
2 results shown.
84137

85138
>> exit
86-
```
87139

88-
- Extract the Header of an HDT file:
140+
### Exporting the header
141+
142+
The header component of an HDT contains metadata describing the data
143+
contained in the HDT, as well as the creation metadata about the HDT
144+
itself. The contents of the header can be exported to an N-Triples
145+
file:
146+
147+
```
148+
$ ./hdtInfo data.hdt > header.nt
149+
```
89150

90-
```
91-
$ tools/hdtInfo data/test.hdt > header.nt
92-
```
151+
### Replacing the Header
93152

94-
- Replace the Header of an HDT file with a new one. For example, by editing the existing one as extracted using `hdtInfo`:
153+
It can be useful to update the header information of an HDT. This can
154+
be done by generating a new HDT file (`new.hdt`) out of an existing
155+
HDT file (`old.hdt`) and an N-Triples file (`new-header.nt`) that
156+
contains the new header information:
95157

96-
```
97-
$ tools/replaceHeader data/test.hdt data/testOutput.hdt newHeader.nt
98-
```
158+
```
159+
$ ./replaceHeader old.hdt new.hdt new-header.nt
160+
```
99161

100162
## Contributing
101163

102-
Contributions and PRs should be sent to the `develop` branch, and not to `master`.
164+
Contributions are welcome! Please base your contributions and pull
165+
requests (PRs) on the `develop` branch, and not on the `master`
166+
branch.
103167

104168
## License
105169

106-
`hdt-cpp` is free software licensed as GNU Lesser General Public License. See `libhdt/COPYRIGHT`
170+
`hdt-cpp` is free software licensed as GNU Lesser General Public
171+
License (GPL). See `libhdt/COPYRIGHT`.

0 commit comments

Comments
 (0)