You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: new hash removal techniques with translate() + sample output
After removing the remote hashing call, it was time to replace the
URL-encoding. However, applying a consistent and unified technique to
both `rr:template` and `rml:reference` proved to be challenging, in
order to realize the following substitution requirements:
- `Q\{[^}]+\}` with nothing
- `\[(\d+)\]` with `_\1`
- `/` with `.`
- `encode-for-uri()` function with `replace()`
- adding periods (`.`) in the concat functions (revisiting the need for it, first)
- checking any double-encoding problems (`rr:template` already does URI-encoding)
String substitution techniques like these rely on "regular expressions",
aka regex. In regex, certain characters have special meaning, so
"escaping" them (typically with a backslash `\`) is necessary to force
literal interpretation by the regex engine.
This is made more complicated when multiple processing subsystems are
involved -- in this case the RML engine (RMLMapper), the RML
specifications it follows, Saxon the XPath processor, and Turtle the RDF
syntax used to write the RML rules.
Unfortunately, it did not seem possible to use escape characters or the
necessary characters in `rr:template` without escaping, such as curly
braces `{}` and square braces `[]`. Fortunately, after several rounds of
failed experiments, a trick with a pair of unassuming XPath functions
was discovered.
With `translate()` and `codepoints-to-string()`, one is able to replace
certain characters with alternatives, where the latter transforms a
given code to an ASCII character, and the former replaces a given
sequence of those with another. With this we could finally replace the
stubborn `Q{}` namespace qualifier in the XPath returned by `path()`,
when in an `rr:template`.
We develop here two separate implementations of URI rewrites: (i) a
standard regex substitution (with three levels of replacements) for
`rml:reference`, and (ii) a clever one with the aforementioned technique
for `rr:template` (that has the `{}` characters reserved for itself and
cannot interpret backslash escapes).
P.S: This technique will NOT work if in the source there is more than
one XML element with the same name but different prefix.
Co-authored-by: csnyulas <csongor.nyulas@gmail.com>
0 commit comments