@@ -114,9 +114,11 @@ Rules for each ``purl`` component
114114
115115A ``purl `` string is an ASCII URL string composed of seven components.
116116
117- Some components are allowed to use other characters beyond ASCII: these
118- components must then be UTF-8-encoded strings and percent-encoded as defined in
119- the "Character encoding" section.
117+ Except as expressly stated otherwise in this section, each component:
118+
119+ - MAY be composed of any of the characters defined in the "Permitted
120+ characters" section
121+ - MUST be encoded as defined in the "Character encoding" section
120122
121123The rules for each component are:
122124
@@ -140,38 +142,43 @@ The rules for each component are:
140142
141143- **namespace **:
142144
143- - The optional ``namespace `` contains zero or more segments, separated by slash
144- '/'
145- - Leading and trailing slashes '/' are not significant and should be stripped
146- in the canonical form. They are not part of the ``namespace ``
147- - Each ``namespace `` segment must be a percent-encoded string
145+ - The ``namespace `` is optional, unless required by the package's ``type `` definition.
146+ - If present, the ``namespace `` MAY contain one or more segments, separated
147+ by a single unencoded slash '/' character.
148+ - All leading and trailing slashes '/' are not significant and SHOULD be
149+ stripped in the canonical form. They are not part of the ``namespace ``.
150+ - Each ``namespace `` segment MUST be a percent-encoded string.
148151 - When percent-decoded, a segment:
149152
150- - must not contain a '/'
151- - must not be empty
153+ - MUST NOT contain any slash '/' characters
154+ - MUST NOT be empty
155+ - MAY contain any Unicode character other than '/' unless the package's
156+ ``type `` definition provides otherwise.
152157
153- - A URL host or Authority must NOT be used as a ``namespace ``. Use instead a
158+ - A URL host or Authority MUST NOT be used as a ``namespace ``. Use instead a
154159 ``repository_url `` qualifier. Note however that for some types, the
155160 ``namespace `` may look like a host.
156161
157162
158163- **name **:
159164
160- - The ``name `` is prefixed by a '/' separator when the ``namespace `` is not empty
161- - This '/' is not part of the ``name ``
162- - A ``name `` must be a percent-encoded string
165+ - The ``name `` is prefixed by a single slash '/' separator when the
166+ ``namespace `` is not empty.
167+ - All leading and trailing slashes '/' are not significant and SHOULD be
168+ stripped in the canonical form. They are not part of the ``name ``.
169+ - A ``name `` MUST be a percent-encoded string.
170+ - When percent-decoded, a ``name `` MAY contain any Unicode character unless
171+ prohibited by the package's ``type `` definition in `<PURL-TYPES.rst >`_.
163172
164173
165174- **version **:
166175
167- - The ``version `` is prefixed by a '@' separator when not empty
168- - This '@' is not part of the ``version ``
169- - A ``version `` must be a percent-encoded string
170-
171- - A ``version `` is a plain and opaque string. Some package ``types `` use versioning
172- conventions such as SemVer for NPMs or NEVRA conventions for RPMS. A ``type ``
173- may define a procedure to compare and sort versions, but there is no
174- reliable and uniform way to do such comparison consistently.
176+ - The ``version `` is prefixed by a '@' separator when not empty.
177+ - This '@' is not part of the ``version ``.
178+ - A ``version `` MUST be a percent-encoded string.
179+ - When percent-decoded, a ``version `` MAY contain any Unicode character unless
180+ the package's ``type `` definition provides otherwise.
181+ - A ``version `` is a plain and opaque string.
175182
176183
177184- **qualifiers **:
@@ -219,30 +226,24 @@ The rules for each component are:
219226 - The ``subpath `` MUST be interpreted as relative to the root of the package
220227
221228
222- Character encoding
223- ~~~~~~~~~~~~~~~~~~
224-
225229Permitted characters
226- --------------------
227-
228- A canonical ``purl `` is an ASCII string composed of these characters:
230+ ~~~~~~~~~~~~~~~~~~~~
229231
230- - alphanumeric characters ``A to Z ``, ``a to z ``, ``0 to 9 ``,
231- - the ``purl `` separators ``:/@?=&# `` (colon ':', slash '/', at sign '@',
232- question mark '?', equal sign '=', ampersand '&' and pound sign '#'), and
233- - these punctuation marks ``%.-_~ `` (percent sign '%', period '.', dash '-',
234- underscore '_' and tilde '~').
232+ A canonical ``purl `` is composed of these permitted ASCII characters:
235233
236- All other characters MUST be encoded as UTF-8 and then percent-encoded.
237- In addition, each component specifies its permitted characters and
238- its percent-encoding rules.
234+ - the Alphanumeric Characters: ``A to Z ``, ``a to z ``, ``0 to 9 ``,
235+ - the Punctuation Characters: ``.-_~ `` (period '.',
236+ dash '-', underscore '_' and tilde '~'),
237+ - the Plus Character: ``+ `` (plus '+'),
238+ - the Percent Character: ``% `` (percent sign '%'), and
239+ - the Separator Characters ``:/@?=&# `` (colon ':', slash '/', at sign '@',
240+ question mark '?', equal sign '=', ampersand '&' and pound sign '#').
239241
240242
241243``purl `` separators
242- -------------------
244+ ~~~~~~~~~~~~~~~~~~~
243245
244- These ``purl `` separator characters MUST NOT be percent-encoded when used as
245- ``purl `` separators:
246+ This is how each of the Separator Characters is used:
246247
247248- ':' (colon) is the separator between ``scheme `` and ``type ``
248249- '/' (slash) is the separator between ``type ``, ``namespace `` and ``name ``
@@ -256,17 +257,34 @@ These ``purl`` separator characters MUST NOT be percent-encoded when used as
256257- '#' (number sign) is the separator before ``subpath ``
257258
258259
259- Percent-encoding rules
260- ----------------------
260+ Character encoding
261+ ~~~~~~~~~~~~~~~~~~
262+
263+ - In the "Rules for each ``purl `` component" section, each component
264+ defines when and how to apply percent-encoding and decoding to its content.
265+ - When percent-encoding is required by a component definition, the component
266+ string MUST first be encoded as UTF-8.
267+ - In the component string, each "data octet" MUST be replaced by the
268+ percent-encoded "character triplet" applying the percent-encoding mechanism
269+ defined in RFC 3986 section 2.1 (https://datatracker.ietf.org/doc/html/rfc3986#section-2.1),
270+ including the RFC definition of "data octet" and "character triplet",
271+ and using these definitions for RFC's "allowed set" and "delimiters":
272+
273+ - "allowed set" is composed of the Alphanumeric Characters and the
274+ Punctuation Characters
275+ - "delimiters" is composed of the Separator Characters
261276
262- When applying percent-encoding or decoding to a string, use the rules of RFC
263- 3986 section 2 (https://datatracker.ietf.org/doc/html/rfc3986#section-2).
277+ - The following characters MUST NOT be percent-encoded:
264278
265- Each component defines when and how to apply percent-encoding and decoding to
266- its content.
279+ - the Alphanumeric Characters,
280+ - the Punctuation Characters,
281+ - the Separator Characters when being used as ``purl `` separators,
282+ - the colon ':', whether used as a Separator Character or otherwise, and
283+ - the percent sign '%' when used to represent a percent-encoded character.
267284
268- When percent-encoding is required, all characters MUST be encoded except for
269- the colon ':'.
285+ - Where the space ' ' is permitted, it MUST be percent-encoded as '%20'.
286+ - With the exception of the percent-encoding mechanism, the rules regarding
287+ percent-encoding are defined by this specification alone.
270288
271289
272290How to build ``purl `` string from its components
@@ -405,13 +423,17 @@ To parse a ``purl`` string in its components:
405423- Split the ``remainder `` once from right on '/'
406424
407425 - The left side is the ``remainder ``
426+ - Strip all leading characters (e.g., '/', '//' and so on)
427+ from the right side
408428 - Percent-decode the right side. This is the ``name ``
409429 - UTF-8-decode this ``name `` if needed in your programming language
410430 - Apply type-specific normalization to the ``name `` if needed
411431 - This is the ``name ``
412432
413433- Split the ``remainder `` on '/'
414434
435+ - Strip all leading '/' characters (e.g., '/', '//' and so on)
436+ from that split
415437 - Discard any empty segment from that split
416438 - Percent-decode each segment
417439 - UTF-8-decode each segment if needed in your programming language
0 commit comments