Skip to content

Commit 5052b42

Browse files
authored
Merge branch 'main' into main
2 parents 0d43ad0 + a59196c commit 5052b42

2 files changed

Lines changed: 81 additions & 47 deletions

File tree

PURL-SPECIFICATION.rst

Lines changed: 69 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -114,9 +114,11 @@ Rules for each ``purl`` component
114114

115115
A ``purl`` string is an ASCII URL string composed of seven components.
116116

117-
Some components are allowed to use other characters beyond ASCII: these
118-
components must then be UTF-8-encoded strings and percent-encoded as defined in
119-
the "Character encoding" section.
117+
Except as expressly stated otherwise in this section, each component:
118+
119+
- MAY be composed of any of the characters defined in the "Permitted
120+
characters" section
121+
- MUST be encoded as defined in the "Character encoding" section
120122

121123
The rules for each component are:
122124

@@ -140,38 +142,43 @@ The rules for each component are:
140142

141143
- **namespace**:
142144

143-
- The optional ``namespace`` contains zero or more segments, separated by slash
144-
'/'
145-
- Leading and trailing slashes '/' are not significant and should be stripped
146-
in the canonical form. They are not part of the ``namespace``
147-
- Each ``namespace`` segment must be a percent-encoded string
145+
- The ``namespace`` is optional, unless required by the package's ``type`` definition.
146+
- If present, the ``namespace`` MAY contain one or more segments, separated
147+
by a single unencoded slash '/' character.
148+
- All leading and trailing slashes '/' are not significant and SHOULD be
149+
stripped in the canonical form. They are not part of the ``namespace``.
150+
- Each ``namespace`` segment MUST be a percent-encoded string.
148151
- When percent-decoded, a segment:
149152

150-
- must not contain a '/'
151-
- must not be empty
153+
- MUST NOT contain any slash '/' characters
154+
- MUST NOT be empty
155+
- MAY contain any Unicode character other than '/' unless the package's
156+
``type`` definition provides otherwise.
152157

153-
- A URL host or Authority must NOT be used as a ``namespace``. Use instead a
158+
- A URL host or Authority MUST NOT be used as a ``namespace``. Use instead a
154159
``repository_url`` qualifier. Note however that for some types, the
155160
``namespace`` may look like a host.
156161

157162

158163
- **name**:
159164

160-
- The ``name`` is prefixed by a '/' separator when the ``namespace`` is not empty
161-
- This '/' is not part of the ``name``
162-
- A ``name`` must be a percent-encoded string
165+
- The ``name`` is prefixed by a single slash '/' separator when the
166+
``namespace`` is not empty.
167+
- All leading and trailing slashes '/' are not significant and SHOULD be
168+
stripped in the canonical form. They are not part of the ``name``.
169+
- A ``name`` MUST be a percent-encoded string.
170+
- When percent-decoded, a ``name`` MAY contain any Unicode character unless
171+
prohibited by the package's ``type`` definition in `<PURL-TYPES.rst>`_.
163172

164173

165174
- **version**:
166175

167-
- The ``version`` is prefixed by a '@' separator when not empty
168-
- This '@' is not part of the ``version``
169-
- A ``version`` must be a percent-encoded string
170-
171-
- A ``version`` is a plain and opaque string. Some package ``types`` use versioning
172-
conventions such as SemVer for NPMs or NEVRA conventions for RPMS. A ``type``
173-
may define a procedure to compare and sort versions, but there is no
174-
reliable and uniform way to do such comparison consistently.
176+
- The ``version`` is prefixed by a '@' separator when not empty.
177+
- This '@' is not part of the ``version``.
178+
- A ``version`` MUST be a percent-encoded string.
179+
- When percent-decoded, a ``version`` MAY contain any Unicode character unless
180+
the package's ``type`` definition provides otherwise.
181+
- A ``version`` is a plain and opaque string.
175182

176183

177184
- **qualifiers**:
@@ -219,30 +226,24 @@ The rules for each component are:
219226
- The ``subpath`` MUST be interpreted as relative to the root of the package
220227

221228

222-
Character encoding
223-
~~~~~~~~~~~~~~~~~~
224-
225229
Permitted characters
226-
--------------------
227-
228-
A canonical ``purl`` is an ASCII string composed of these characters:
230+
~~~~~~~~~~~~~~~~~~~~
229231

230-
- alphanumeric characters ``A to Z``, ``a to z``, ``0 to 9``,
231-
- the ``purl`` separators ``:/@?=&#`` (colon ':', slash '/', at sign '@',
232-
question mark '?', equal sign '=', ampersand '&' and pound sign '#'), and
233-
- these punctuation marks ``%.-_~`` (percent sign '%', period '.', dash '-',
234-
underscore '_' and tilde '~').
232+
A canonical ``purl`` is composed of these permitted ASCII characters:
235233

236-
All other characters MUST be encoded as UTF-8 and then percent-encoded.
237-
In addition, each component specifies its permitted characters and
238-
its percent-encoding rules.
234+
- the Alphanumeric Characters: ``A to Z``, ``a to z``, ``0 to 9``,
235+
- the Punctuation Characters: ``.-_~`` (period '.',
236+
dash '-', underscore '_' and tilde '~'),
237+
- the Plus Character: ``+`` (plus '+'),
238+
- the Percent Character: ``%`` (percent sign '%'), and
239+
- the Separator Characters ``:/@?=&#`` (colon ':', slash '/', at sign '@',
240+
question mark '?', equal sign '=', ampersand '&' and pound sign '#').
239241

240242

241243
``purl`` separators
242-
-------------------
244+
~~~~~~~~~~~~~~~~~~~
243245

244-
These ``purl`` separator characters MUST NOT be percent-encoded when used as
245-
``purl`` separators:
246+
This is how each of the Separator Characters is used:
246247

247248
- ':' (colon) is the separator between ``scheme`` and ``type``
248249
- '/' (slash) is the separator between ``type``, ``namespace`` and ``name``
@@ -256,17 +257,34 @@ These ``purl`` separator characters MUST NOT be percent-encoded when used as
256257
- '#' (number sign) is the separator before ``subpath``
257258

258259

259-
Percent-encoding rules
260-
----------------------
260+
Character encoding
261+
~~~~~~~~~~~~~~~~~~
262+
263+
- In the "Rules for each ``purl`` component" section, each component
264+
defines when and how to apply percent-encoding and decoding to its content.
265+
- When percent-encoding is required by a component definition, the component
266+
string MUST first be encoded as UTF-8.
267+
- In the component string, each "data octet" MUST be replaced by the
268+
percent-encoded "character triplet" applying the percent-encoding mechanism
269+
defined in RFC 3986 section 2.1 (https://datatracker.ietf.org/doc/html/rfc3986#section-2.1),
270+
including the RFC definition of "data octet" and "character triplet",
271+
and using these definitions for RFC's "allowed set" and "delimiters":
272+
273+
- "allowed set" is composed of the Alphanumeric Characters and the
274+
Punctuation Characters
275+
- "delimiters" is composed of the Separator Characters
261276

262-
When applying percent-encoding or decoding to a string, use the rules of RFC
263-
3986 section 2 (https://datatracker.ietf.org/doc/html/rfc3986#section-2).
277+
- The following characters MUST NOT be percent-encoded:
264278

265-
Each component defines when and how to apply percent-encoding and decoding to
266-
its content.
279+
- the Alphanumeric Characters,
280+
- the Punctuation Characters,
281+
- the Separator Characters when being used as ``purl`` separators,
282+
- the colon ':', whether used as a Separator Character or otherwise, and
283+
- the percent sign '%' when used to represent a percent-encoded character.
267284

268-
When percent-encoding is required, all characters MUST be encoded except for
269-
the colon ':'.
285+
- Where the space ' ' is permitted, it MUST be percent-encoded as '%20'.
286+
- With the exception of the percent-encoding mechanism, the rules regarding
287+
percent-encoding are defined by this specification alone.
270288

271289

272290
How to build ``purl`` string from its components
@@ -405,13 +423,17 @@ To parse a ``purl`` string in its components:
405423
- Split the ``remainder`` once from right on '/'
406424

407425
- The left side is the ``remainder``
426+
- Strip all leading characters (e.g., '/', '//' and so on)
427+
from the right side
408428
- Percent-decode the right side. This is the ``name``
409429
- UTF-8-decode this ``name`` if needed in your programming language
410430
- Apply type-specific normalization to the ``name`` if needed
411431
- This is the ``name``
412432

413433
- Split the ``remainder`` on '/'
414434

435+
- Strip all leading '/' characters (e.g., '/', '//' and so on)
436+
from that split
415437
- Discard any empty segment from that split
416438
- Percent-decode each segment
417439
- UTF-8-decode each segment if needed in your programming language

faq.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,3 +46,15 @@ package ``type``
4646

4747
As a result, a purl spec implementation must return an error when encountering
4848
a ``type`` that contains a prohibited character.
49+
50+
51+
Version
52+
~~~~~~~
53+
54+
**QUESTION**: How do package ``types`` handle the comparison and sorting of
55+
versions?
56+
57+
**ANSWER**: Some package ``types`` use versioning conventions such as SemVer
58+
for NPMs or NEVRA conventions for RPMS. A ``type`` may define a procedure to
59+
compare and sort versions, but there is no reliable and uniform way to do such
60+
comparison consistently.

0 commit comments

Comments
 (0)