Skip to content

Commit d71bc90

Browse files
Merge pull request #171 from OP-TED/feature/TEDSWS-237/TED9-19_outputs_remove-hash-rewrite-uris
Outputs for hash removal and URI pattern rewrite
2 parents 2dbb443 + a1248f8 commit d71bc90

2,217 files changed

Lines changed: 403956 additions & 378243 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -35,22 +35,18 @@ instantiation](https://github.com/RMLio/rmlmapper-java/issues/236) (which was la
3535

3636
## RDF URI Scheme
3737

38-
The eForms RML mappings use the URI scheme `{ns}id_{notice-id}_{concept}_{trailer}`, where:
39-
40-
- `{ns}` is a base namespace, in this case `http://data.europa.eu/a4g/resource/`
41-
- `{concept}` is either (i) an ontology fragment label or (ii) source element label, with a suffix or prefix
42-
- `{trailer}` is either (i) an ID value (if the resource has one) or (ii) an _online_ computed, deterministic hash
43-
- Root concepts such as `epo:Notice` end up to only the `{concept}`
44-
45-
Expanding on some of the components for further clarity:
46-
47-
- Whether a `concept` is an ontology fragment or source element label, and whether this label has a suffix (rarely) or prefix, depends on the subjective (human) evaluation of whether only having the class name is sufficient hint of what the URI represents.
48-
- The trailer, when a hash, is computed (seeded) with the XPath named element (e.g. `cbc:ID`) or (often relative) path (e.g. `path(cbc:ID)`) of what is being mapped, and therefore lends a unique identity to the URI. This yields reproducible URIs across RML TripleMaps, in case a resource needed to be instantiated at different XPaths, for whatever purpose.
49-
- A Lot or any other resource with an inherent ID, would simply have its `cbc:ID` value as the trailer, for e.g. `epd:id_14549263-b47b-4e59-96a1-2d0d13e19343_Lot_LOT-0001`, which is very useful for linking purposes at orthogonal XPaths (e.g. wherever an `id-ref` is concerned, that ID could simply be used to produce a linkable URI without having to navigate XPaths).
50-
- Any other resource where there is no inherent ID would have a hash that is unique to the XPath it represents, e.g. an `epo:Purpose` instance, if instantiated at different XPaths for associating different attributes, would have the same URI across those instantiations, resulting in one unique instance and no duplication due to multiple mappings.
51-
- The `adms:Identifier`, although having an ID, may still get a hash instead of ID in its trailer, as it may not have a short ID that is sensible to use/read (however we may not have enforced this rule strongly)
52-
53-
Note: Wherever _URI_ is mentioned, [IRI](https://www.w3.org/2001/Talks/0912-IUC-IRI/paper.html#:~:text=In%20principle%2C%20the%20definition%20of,us%2Dascii%20characters%20in%20URIs) is meant. Also, the generation of hashes is done _online_ against a remote HTTP web API endpoint offering this function, during transformation (which can otherwise be an offline process).
38+
The eForms RML mappings use the URI scheme `{ns}/{notice}/{concept}/{trailer}`, where:
39+
40+
- `{ns}` is a base namespace, in this case `http://data.europa.eu/a4g/resource/` (prefixed `epd:`)
41+
- `{notice}` is the shared context for all entities in the document, composed of two parts `{notice-id}-{notice-version}`; together with `{ns}` it forms the base `ns-notice` or _notice segment_ of the URI, e.g. `epd:14549263-b47b-4e59-96a1-2d0d13e19343-01`
42+
- `{concept}` is either (i) an ontology fragment label, i.e. the class name or (ii) a source element label, i.e. the XML element name (without any prefix), depending on which provides better context for the resource being represented
43+
- `{trailer}` is either (i) an ID value (if the resource has one) or, in the absence of a usable or reliable ID, (ii) a re-encoded and normalized XPath (to ensure uniqueness within the document), in which case it is preceded by a dollar symbol (`$`) and not slash (`/`) (to facilitate future rewriting or hashing), resulting in the scheme `{ns-notice}/{concept}${reencoded-xpath}`, for e.g. `epd:af0b8395-7498-4d0e-b5eb-3d1a4636eb1a-01/Procedure$_ContractAwardNotice1_TenderingProcess1_ProcessJustification1`
44+
- Root concepts such as `epo:Notice` end at the `{concept}`, and their identifier simply appends `/Identifier`, resulting in the scheme `{ns-notice}/Notice/Identifier` (to avoid redundant repetition of the ID value which is already represented in the notice segment)
45+
- Identifier instances, if not _technical identifiers_ with recognizable ID patterns according to the [eForms specification](https://docs.ted.europa.eu/eforms/latest/schema/identifiers.html) (e.g. `LOT-XXX`), may be preceded by the parent class and followed by the ID value, resulting in the scheme `{ns-notice}/{parent-class}/Identifier/{id-value}`
46+
- In some cases, the `{trailer}` may be an aggregate of multiple values to produce uniqueness, e.g. when the ID is combined with its `schemeName`
47+
- In the case of the externally referenced resources (e.g. a referenced notice or a child entity thereof), the base part is extended with the context of the referred notice, resulting in the scheme `{ns-notice}/Notice/{external-notice-id}/{concept}/{trailer}`
48+
49+
Note: Wherever _URI_ is mentioned, [IRI](https://www.w3.org/2001/Talks/0912-IUC-IRI/paper.html#:~:text=In%20principle%2C%20the%20definition%20of,us%2Dascii%20characters%20in%20URIs) is meant.
5450

5551
## RML Files Organization
5652

docs/antora/modules/ROOT/pages/methodology.adoc

Lines changed: 28 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -89,75 +89,42 @@ The TriplesMaps in the various RML modules, especially those that represent vers
8989
[[ref:uri-scheme]]
9090
=== RDF URI Scheme
9191

92-
The eForms RML mappings use the following URI scheme for the representing ePO ontology instances:
92+
= eForms RML Mappings URI Scheme
9393

94-
```
95-
{ns}id_{notice-id}_{concept}_{trailer}
96-
```
94+
The eForms RML mappings use the URI scheme:
95+
96+
`{ns}/{notice}/{concept}/{trailer}`
97+
98+
where:
99+
100+
* `{ns}` is a base namespace, in this case `http://data.europa.eu/a4g/resource/` (prefixed `epd:`)
101+
* `{notice}` is the shared context for all entities in the document, composed of two parts `{notice-id}-{notice-version}`; together with `{ns}` it forms the base `ns-notice` or _notice segment_ of the URI, e.g. `epd:14549263-b47b-4e59-96a1-2d0d13e19343-01`
102+
* `{concept}` is either:
103+
** (i) an ontology fragment label, i.e. the class name, or
104+
** (ii) a source element label, i.e. the XML element name (without any prefix), depending on which provides better context for the resource being represented
105+
* `{trailer}` is either:
106+
** (i) an ID value (if the resource has one), or
107+
** (ii) a re-encoded and normalized XPath (to ensure uniqueness within the document), in which case it is preceded by a dollar symbol (`$`) and not slash (`/`) (to facilitate future rewriting or hashing), resulting in the scheme `{ns-notice}/{concept}${reencoded-xpath}`, e.g. `epd:af0b8395-7498-4d0e-b5eb-3d1a4636eb1a-01/Procedure$_ContractAwardNotice1_TenderingProcess1_ProcessJustification1`
108+
* Root concepts such as `epo:Notice` end at the `{concept}`, and their identifier simply appends `/Identifier`, resulting in the scheme `{ns-notice}/Notice/Identifier` (to avoid redundant repetition of the ID value which is already represented in the notice segment)
109+
* Identifier instances, if not _technical identifiers_ with recognizable ID patterns according to the https://docs.ted.europa.eu/eforms/latest/schema/identifiers.html[eForms specification] (e.g. `LOT-XXX`), may be preceded by the parent class and followed by the ID value, resulting in the scheme `{ns}/{notice}/{parent-class}/Identifier/{id-value}`
110+
* In some cases, the `{trailer}` may be an aggregate of multiple values to produce uniqueness, e.g. when the ID is combined with its `schemeName`
111+
* In the case of externally referenced resources (e.g. a referenced notice or a child entity thereof), the notice segment is extended with the context of the referred notice, resulting in the scheme `{ns-notice}/Notice/{external-notice-id}/{concept}/{trailer}`
97112

98-
, where:
99-
100-
* `{ns}` is a base namespace, in this case
101-
`http://data.europa.eu/a4g/resource/`
102-
* `{concept}` is either (i) an ontology fragment label or (ii) source
103-
element label, with a suffix or prefix
104-
* `{trailer}` is either (i) an ID value (if the resource has one) or
105-
(ii) an _online_ computed, deterministic hash
106-
* Root concepts such as `epo:Notice` end up to only the `{concept}`
107-
108-
Expanding on some of the components for further clarity:
109-
110-
* Whether a `concept` is an ontology fragment or source element label,
111-
and whether this label has a suffix (rarely) or prefix, depends on the
112-
subjective (human) evaluation of whether only having the class name is
113-
sufficient hint of what the URI represents.
114-
* The trailer, when a hash, is computed (seeded) with the XPath named
115-
element (e.g. `cbc:ID`) or (often relative) path (e.g. `path(cbc:ID)`)
116-
of what is being mapped, and therefore lends a unique identity to the
117-
URI. This yields reproducible URIs across RML TripleMaps, in case a
118-
resource needed to be instantiated at different XPaths, for whatever
119-
purpose.
120-
** A Lot or any other resource with an inherent ID, would simply have
121-
its `cbc:ID` value as the trailer, for
122-
e.g. `epd:id_14549263-b47b-4e59-96a1-2d0d13e19343_Lot_LOT-0001`, which
123-
is very useful for linking purposes at orthogonal XPaths (e.g. wherever
124-
an `id-ref` is concerned, that ID could simply be used to produce a
125-
linkable URI without having to navigate XPaths).
126-
** Any other resource where there is no inherent ID would have a hash
127-
that is unique to the XPath it represents, e.g. an `epo:Purpose`
128-
instance, if instantiated at different XPaths for associating different
129-
attributes, would have the same URI across those instantiations,
130-
resulting in one unique instance and no duplication due to multiple
131-
mappings.
132-
*** The `adms:Identifier`, although having an ID, may still get a hash
133-
instead of ID in its trailer, as it may not have a short ID that is
134-
sensible to use/read (however we may not have enforced this rule
135-
strongly)
136-
137-
There are exceptions to this policy, namely in the _trailer_ segment, as that
138-
is what lends uniqueness to a resource, and determines whether instances being
139-
created from subject URI templates in the technical RML rules are correct. The
140-
following are such exceptions:
113+
The following are some examples of exceptions to the rule:
141114

142115
1. `epo:AgentInRole` instances, which
143116
https://github.com/OP-TED/ted-rdf-mapping-eforms/issues/31[require a carefully
144117
constructed URI] seeded with information about the related party (a
145118
`foaf:Agent`).
146119

147-
2. `epo:AwardDecision` instances, which are hashed on the `cbc:AwardDate` to
148-
yield the same instance for awards on the same date across possibly repeating
149-
elements.
120+
2. `epo:AwardDecision` instances, which have a trailer based on the `cbc:AwardDate` to yield the same instance for awards on the same date across possibly repeating elements.
150121

151-
3. External notices that are referred to, whose base IRI involves the ID of the
152-
respective notice, not the current one in scope.
122+
3. External resources that cannot be identified, such as the Framework
123+
Agreement contract representing `OPT-100-Contract Framework Notice Identifier`, for whom a proxy `epo:FrameworkAgreement` is created _without_ a trailer.
153124

154-
4. External resources that cannot be identified, such as the Framework
155-
Agreement contract representing `OPT-100-Contract Framework Notice Identifier`,
156-
for whom a proxy `epo:FrameworkAgreement` is created _without_ a trailer.
157-
158-
**Note:** Wherever _URI_ is mentioned,
125+
[NOTE]
126+
====
127+
Wherever _URI_ is mentioned,
159128
https://www.w3.org/2001/Talks/0912-IUC-IRI/paper.html#:~:text=In%20principle%2C%20the%20definition%20of,us%2Dascii%20characters%20in%20URIs[IRI]
160-
is meant. Also, the generation of hashes is done _online_ against a
161-
remote HTTP web API endpoint offering this function, during
162-
transformation (which can otherwise be an offline process).
163-
129+
is meant.
130+
====

mappings/package_eforms_sdk1.10_epo4.0/metadata.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,5 +66,5 @@
6666
]
6767
}
6868
},
69-
"mapping_suite_hash_digest": "e77ab8d6df840bcdaf76cfcfbad003aff1b592384da662b534e9e8c7afd417e8"
69+
"mapping_suite_hash_digest": "4d2bc48083fef57a3cd7166f70bb96ed9f46dcda286d2e07d9cf34058e50ab06"
7070
}

0 commit comments

Comments
 (0)