Skip to content

Commit a8aafe3

Browse files
Merge branch 'develop-64' into develop
2 parents 16968ce + 70e8fa4 commit a8aafe3

95 files changed

Lines changed: 1244 additions & 878 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,11 @@ stamp-h1
5454

5555
*.hdt
5656
*.hdt.index
57+
*.nq
5758
*.nt
5859
*.rdf
60+
*.trig
61+
*.ttl
5962
*.a
6063
**/examples/*
6164
**/tests/*

README.md

Lines changed: 118 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,77 +1,130 @@
11
[![Join the chat at https://gitter.im/rdfhdt](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/rdfhdt)
22
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.580298.svg)](https://doi.org/10.5281/zenodo.580298)
33

4-
# C++ library for the HDT triple format
4+
# C++ implementation of the HDT compression format
55

6-
HDT keeps big RDF datasets compressed while maintaining efficient search and browse operations.
6+
Header Dictionary Triples (HDT) is a compression format for RDF data
7+
that can also be queried for Triple Patterns.
78

89
## Getting Started
10+
911
### Prerequisites
1012

11-
The implementation has the following dependencies:
12-
- [Serd v0.28+](http://drobilla.net/software/serd/) This enables importing RDF data in the Turtle and N-Triples serialization formats specifically. The dependency is activated by default.
13-
- [libz](http://www.zlib.net/) Enables loading N-Triples files compressed with GZIP (e.g., `file.nt.gz`) and gzipped HDTs (`file.hdt.gz`). The dependency is activated by default.
13+
In order to compile this library, you need to have the following
14+
dependencies installed:
15+
16+
- [GNU Autoconf](https://www.gnu.org/software/autoconf/autoconf.html)
17+
18+
- `sudo apt install autoconf` on Debian-based distros (e.g., Ubuntu)
19+
- `sudo dnf install autoconf` on Red Hat-based distros (e.g.,
20+
Fedora)
21+
22+
- [GNU Libtool](https://www.gnu.org/software/libtool/)
1423

15-
The installation process has the following dependencies:
24+
- `sudo apt install libtool` on Debian-based distros (e.g., Ubuntu)
25+
- `sudo dnf install libtool` on Red Hat-based distros (e.g., Fedora)
26+
27+
- [GNU zip (gzip)](http://www.zlib.net/) Allows GNU zipped RDF input
28+
files to be ingested, and allows GNU zipped HDT files to be loaded.
29+
30+
- `sudo apt install gzip` on Debian-based distros (e.g., Ubuntu)
31+
- `sudo dnf install gzip` on Red Hat-based distros (e.g., Fedora)
1632

17-
- [autoconf](https://www.gnu.org/software/autoconf/autoconf.html)
18-
- [libtool](https://www.gnu.org/software/libtool/)
1933
- [pkg-config](https://www.freedesktop.org/wiki/Software/pkg-config/)
34+
A helper tool for compiling applications and libraries.
35+
36+
- `sudo apt install pkg-config` on Debian-based distros (e.g.,
37+
Ubuntu)
38+
- `sudo dnf install pkgconf-pkg-config` on Red Hat-based distros
39+
(e.g., Fedora)
2040

21-
The following commands should install all packages:
41+
- [Serd v0.28+](https://github.com/drobilla/serd) The default parser
42+
that is used to process RDF input files. It supports the N-Quads,
43+
N-Triples, TriG, and Turtle serialization formats.
2244

23-
sudo apt-get update
24-
sudo apt-get install autoconf libtool pkg-config
45+
- `sudo apt install libserd-0-0 libserd-dev` on Debian-based distros
46+
(e.g., Ubuntu)
47+
- `sudo dnf install serd serd-devel` on Red Hat-based distros (e.g.,
48+
Fedora)
2549

26-
### Installing
50+
### Installation
2751

28-
To compile and install, run the following commands under the directory `hdt-cpp`. This will generate the library and tools.
52+
To compile and install, run the following commands under the directory
53+
`hdt-cpp`. This will also compile and install some handy tools.
2954

30-
First run the following script to generate all necessary installation files with autotools:
55+
```
56+
./autogen.sh
57+
./configure
58+
make -j2
59+
sudo make install
60+
```
3161

32-
./autogen.sh
62+
### Installation issues
3363

34-
Then, run:
64+
Sometimes, the above instructions do not result in a working HDT
65+
installation. This section enumerates common issues and their
66+
workaround.
3567

36-
./configure
37-
make -j2
68+
#### `./configure` cannot find Serd
3869

39-
If you get the error `No package 'serd-0' found` with `./configure`, you must install Serd manually (the last command may require `sudo`):
70+
While running `./configure` you get a message similar to the
71+
following:
4072

41-
```shell
42-
wget https://github.com/drobilla/serd/archive/v0.28.0.tar.gz &&\
43-
tar -xvzf *.tar.gz && rm *.tar.gz && cd serd-* &&\
44-
./waf configure && ./waf &&\
45-
./waf install
4673
```
47-
## Running
74+
Package 'serd-0', required by 'virtual:world', not found
75+
```
4876

49-
After building, these are the typical operations that you will perform:
77+
This means that `./configure` cannot find the location of the
78+
`serd-0.pc` file on your computer. You have to find this location
79+
yourself, e.g., in the following way:
5080

51-
- Convert your RDF data to HDT:
81+
```sh
82+
find /usr/ -name serd-0.pc
83+
```
5284

53-
NB: the input stream is assumed to be valid RDF, so you should validate your data before feeding it into rdf2hdt.
85+
Once you have found the directory containing the `serd-0.pc` file, you
86+
have to inform the `./configure` script about this location by setting
87+
the following environment variable (where directory
88+
`/usr/local/lib/pkgconfig/` is adapted to your situation):
5489

55-
```
56-
$ libhdt/tools/rdf2hdt data/test.nt data/test.hdt
57-
```
90+
```sh
91+
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig/
92+
```
5893

59-
- Create only the index of an HDT file:
94+
## Using HDT
6095

61-
```
62-
$ libhdt/tools/hdtSearch -q 0 data/test.hdt
63-
```
96+
After compiling and installing, you can use the handy tools that are
97+
located in `hdt-cpp/libhdt/tools`. We show some common tasks that can
98+
be performed with these tools.
6499

65-
- Convert an HDT to another RDF serialization format, such as N-Triples:
100+
### RDF-2-HDT: Creating an HDT
66101

67-
```
68-
$ libhdt/tools/hdt2rdf data/test.hdt data/test.hdtexport.nt
69-
```
102+
HDT files can only be created for standards-compliant RDF input files.
103+
If your input file is not standards-compliant RDF, it is not possible
104+
to create an HDT files out of it.
70105

71-
- Open a terminal to search triple patterns within an HDT file:
106+
```
107+
$ ./rdf2hdt data.nt data.hdt
108+
```
72109

73-
```
74-
$ libhdt/tools/hdtSearch data/test.hdt
110+
### HDT-2-RDF: Exporting an HDT
111+
112+
You can export an HDT file to an RDF file in one of the supported
113+
serialization formats (currently: N-Quads, N-Triples, TriG, and
114+
Turtle). The default serialization format for exporting is N-Triples.
115+
116+
```
117+
$ ./hdt2rdf data.hdt data.nt
118+
```
119+
120+
### Querying for Triple Patterns
121+
122+
You can issue Triple Pattern (TP) queries in the terminal by
123+
specifying a subject, predicate, and/or object term. The questions
124+
mark (`?`) denotes an uninstantiated term. For example, you can
125+
retrieve _all_ the triples by querying for the TP `? ? ?`:
126+
127+
$ ./hdtSearch data.hdt
75128
>> ? ? ?
76129
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri4
77130
http://example.org/uri3 http://example.org/predicate3 http://example.org/uri5
@@ -91,24 +144,36 @@ After building, these are the typical operations that you will perform:
91144
2 results shown.
92145

93146
>> exit
94-
```
95147

96-
- Extract the Header of an HDT file:
148+
### Exporting the header
97149

98-
```
99-
$ libhdt/tools/hdtInfo data/test.hdt > header.nt
100-
```
150+
The header component of an HDT contains metadata describing the data
151+
contained in the HDT, as well as the creation metadata about the HDT
152+
itself. The contents of the header can be exported to an N-Triples
153+
file:
101154

102-
- Replace the Header of an HDT file with a new one. For example, by editing the existing one as extracted using `hdtInfo`:
155+
```
156+
$ ./hdtInfo data.hdt > header.nt
157+
```
158+
159+
### Replacing the Header
103160

104-
```
105-
$ libhdt/tools/replaceHeader data/test.hdt data/testOutput.hdt newHeader.nt
106-
```
161+
It can be useful to update the header information of an HDT. This can
162+
be done by generating a new HDT file (`new.hdt`) out of an existing
163+
HDT file (`old.hdt`) and an N-Triples file (`new-header.nt`) that
164+
contains the new header information:
165+
166+
```
167+
$ ./replaceHeader old.hdt new.hdt new-header.nt
168+
```
107169

108170
## Contributing
109171

110-
Contributions and PRs should be sent to the `develop` branch, and not to `master`.
172+
Contributions are welcome! Please base your contributions and pull
173+
requests (PRs) on the `develop` branch, and not on the `master`
174+
branch.
111175

112176
## License
113177

114-
`hdt-cpp` is free software licensed as GNU Lesser General Public License. See `libhdt/COPYRIGHT`
178+
`hdt-cpp` is free software licensed as GNU Lesser General Public
179+
License (GPL). See `libhdt/COPYRIGHT`.

configure.ac

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ AC_MSG_CHECKING(whether to enable optimizations)
2828
AC_ARG_ENABLE([optimization],
2929
AS_HELP_STRING([--enable-optimization],[Build library with optimization parameters [default=yes]]),
3030
[AC_MSG_RESULT(${enableval})],
31-
[CXXFLAGS="${CXXFLAGS} -g -O2"]
31+
[CXXFLAGS="${CXXFLAGS} -g -O2 -std=c++11"]
3232
[AC_MSG_RESULT(yes)])
3333

3434
AC_MSG_CHECKING(whether to build libcds)

hdt-it/hdtcachedinfo.cpp

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -105,17 +105,17 @@ void HDTCachedInfo::generateMatrix(hdt::ProgressListener *listener)
105105

106106
}
107107

108-
Color * HDTCachedInfo::getPredicateColor(unsigned int npred)
108+
Color * HDTCachedInfo::getPredicateColor(size_t npred)
109109
{
110110
return &predicateColors[npred];
111111
}
112112

113-
unsigned int HDTCachedInfo::getPredicateUsages(unsigned int predicate)
113+
size_t HDTCachedInfo::getPredicateUsages(size_t predicate)
114114
{
115115
return predicateCount[predicate];
116116
}
117117

118-
unsigned int HDTCachedInfo::getMaxPredicateCount()
118+
size_t HDTCachedInfo::getMaxPredicateCount()
119119
{
120120
return maxPredicateCount;
121121
}
@@ -130,8 +130,8 @@ void HDTCachedInfo::save(QString &fileName, hdt::ProgressListener *listener)
130130
// Only save info of files bigger than 2M triples. Otherwise is fast to create from scratch.
131131
if(hdt->getTriples()->getNumberOfElements()>2000000) {
132132
std::ofstream out(fileName.toLatin1(), ios::binary);
133-
unsigned int numTriples = triples.size();
134-
out.write((char *)&numTriples, sizeof(unsigned int));
133+
uint64_t numTriples = triples.size();
134+
out.write((char *)&numTriples, sizeof(uint64_t));
135135
out.write((char *)&triples[0], sizeof(hdt::TripleID)*numTriples);
136136
out.close();
137137
}
@@ -143,8 +143,8 @@ void HDTCachedInfo::load(QString &fileName, hdt::ProgressListener *listener)
143143

144144
std::ifstream in(fileName.toLatin1(), ios::binary);
145145
if(in.good()) {
146-
unsigned int numTriples;
147-
in.read((char *)&numTriples, sizeof(unsigned int));
146+
uint64_t numTriples;
147+
in.read((char *)&numTriples, sizeof(uint64_t));
148148
triples.resize(numTriples);
149149
in.read((char *)&triples[0], sizeof(hdt::TripleID)*numTriples);
150150
in.close();

hdt-it/hdtcachedinfo.hpp

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,15 +17,15 @@ class HDTCachedInfo
1717
vector<hdt::TripleID> triples;
1818

1919
vector<Color> predicateColors;
20-
unsigned int maxPredicateCount;
21-
vector<unsigned int> predicateCount;
20+
size_t maxPredicateCount;
21+
vector<size_t> predicateCount;
2222

2323
public:
2424
HDTCachedInfo(hdt::HDT *hdt);
2525

26-
Color *getPredicateColor(unsigned int npred);
27-
unsigned int getPredicateUsages(unsigned int predicate);
28-
unsigned int getMaxPredicateCount();
26+
Color *getPredicateColor(size_t npred);
27+
size_t getPredicateUsages(size_t predicate);
28+
size_t getMaxPredicateCount();
2929
vector<hdt::TripleID> &getTriples();
3030

3131
void generateGeneralInfo(hdt::ProgressListener *listener=NULL);

hdt-it/headermodel.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ HeaderModel::~HeaderModel()
1717
int HeaderModel::rowCount(const QModelIndex &parent) const
1818
{
1919
if(hdtController->hasHDT()) {
20-
return hdtController->getHDT()->getHeader()->getNumberOfElements();
20+
return (int)hdtController->getHDT()->getHeader()->getNumberOfElements();
2121
}
2222
return 0;
2323
}

0 commit comments

Comments
 (0)