Skip to content

Commit 8d8fd7f

Browse files
committed
feat: Clarify case-folding behavior
This change explicitly defines "lowercase" in terms of the **culture-invariant full case mapping** specified in the Unicode Standard. This avoids potential issues caused by locale-dependent casing operations and ensures consistent behavior across implementations. Fixes package-url#437
1 parent b8a50ec commit 8d8fd7f

1 file changed

Lines changed: 26 additions & 13 deletions

File tree

PURL-SPECIFICATION.rst

Lines changed: 26 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ sometimes look like a ``host`` but its interpretation is specific to a ``type``.
5555

5656

5757
Some ``purl`` examples
58-
~~~~~~~~~~~~~~~~~~~~~~
58+
----------------------
5959

6060
::
6161

@@ -72,7 +72,7 @@ Some ``purl`` examples
7272

7373

7474
A ``purl`` is a URL
75-
~~~~~~~~~~~~~~~~~~~
75+
-------------------
7676

7777
- A ``purl`` is a valid URL and URI that conforms to the URL definitions or
7878
specifications at:
@@ -110,15 +110,16 @@ A ``purl`` is a URL
110110

111111

112112
Rules for each ``purl`` component
113-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
113+
---------------------------------
114114

115115
A ``purl`` string is an ASCII URL string composed of seven components.
116116

117117
Except as expressly stated otherwise in this section, each component:
118118

119-
- MAY be composed of any of the characters defined in the "Permitted
120-
characters" section
121-
- MUST be encoded as defined in the "Character encoding" section
119+
- MAY be composed of any of the characters defined in the "`Permitted characters`_" section
120+
- MUST be encoded as defined in the "`Character encoding`_" section
121+
122+
The "lowercase" rules are defined in the "`Case folding`_" section.
122123

123124
The rules for each component are:
124125

@@ -225,6 +226,8 @@ The rules for each component are:
225226

226227
- The ``subpath`` MUST be interpreted as relative to the root of the package
227228

229+
Characters and transformations
230+
------------------------------
228231

229232
Permitted characters
230233
~~~~~~~~~~~~~~~~~~~~
@@ -286,9 +289,19 @@ Character encoding
286289
- With the exception of the percent-encoding mechanism, the rules regarding
287290
percent-encoding are defined by this specification alone.
288291

292+
Case folding
293+
~~~~~~~~~~~~
294+
295+
References to "lowercase" in this specification refer to the **culture-invariant**
296+
full case mapping defined in
297+
`Section 3.13.2 of the Unicode Standard <https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G34078>`_.
298+
299+
When applied to the ASCII character set, this operation converts uppercase
300+
Latin letters (``A``–``Z``) to their corresponding lowercase forms (``a``–``z``).
301+
All other ASCII characters remain unchanged.
289302

290303
How to build ``purl`` string from its components
291-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
304+
------------------------------------------------
292305

293306
Building a ``purl`` ASCII string works from left to right, from ``type`` to
294307
``subpath``.
@@ -363,7 +376,7 @@ To build a ``purl`` string from its components:
363376

364377

365378
How to parse a ``purl`` string in its components
366-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
379+
------------------------------------------------
367380

368381
Parsing a ``purl`` ASCII string into its components works from right to left,
369382
from ``subpath`` to ``type``.
@@ -443,13 +456,13 @@ To parse a ``purl`` string in its components:
443456

444457

445458
Known ``purl`` types
446-
~~~~~~~~~~~~~~~~~~~~
459+
--------------------
447460

448461
There are several known ``purl`` package type definitions tracked in the
449462
separate `<PURL-TYPES.rst>`_ document.
450463

451464
Known ``qualifiers`` key/value pairs
452-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
465+
------------------------------------
453466

454467
Note: Do not abuse ``qualifiers``: it can be tempting to use many qualifier
455468
keys but their usage should be limited to the bare minimum for proper package
@@ -491,7 +504,7 @@ all package types:
491504

492505

493506
Tests
494-
~~~~~
507+
-----
495508

496509
To support the language-neutral testing of ``purl`` implementations, a test suite
497510
is provided as JSON document named ``test-suite-data.json``. This JSON document
@@ -526,12 +539,12 @@ every listed test object, run these tests:
526539

527540

528541
License
529-
~~~~~~~
542+
-------
530543

531544
This document is licensed under the MIT license
532545

533546
Definitions
534-
~~~~~~~~~~~
547+
-----------
535548

536549
[ASCII] See, e.g.,
537550

0 commit comments

Comments
 (0)