Skip to content

Commit 74c0372

Browse files
authored
Merge branch 'main' into datetime
2 parents f1c1d8c + b62eb1f commit 74c0372

2 files changed

Lines changed: 72 additions & 31 deletions

File tree

PURL-SPECIFICATION.rst

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ The rules for each component are:
134134
- **type**:
135135

136136
- The package ``type`` MUST be composed only of ASCII letters and numbers,
137-
period '.', plus '+', and dash '-'.
137+
period '.', and dash '-'.
138138
- The ``type`` MUST start with an ASCII letter.
139139
- The ``type`` MUST NOT be percent-encoded.
140140
- The ``type`` is case insensitive. The canonical form is lowercase.
@@ -168,7 +168,7 @@ The rules for each component are:
168168
stripped in the canonical form. They are not part of the ``name``.
169169
- A ``name`` MUST be a percent-encoded string.
170170
- When percent-decoded, a ``name`` MAY contain any Unicode character unless
171-
prohibited by the package's ``type`` definition in `<PURL-TYPES.rst>`_.
171+
the package's ``type`` definition provides otherwise.
172172

173173

174174
- **version**:
@@ -205,7 +205,7 @@ The rules for each component are:
205205
- A ``key`` MUST NOT be percent-encoded.
206206
- Each ``key`` MUST be unique among all the keys of the ``qualifiers``
207207
component.
208-
- A ``value`` MAY be composed of any character and all characters MUST be
208+
- A ``value`` MAY contain any Unicode character and all characters MUST be
209209
encoded as described in the "Character encoding" section.
210210

211211

@@ -219,9 +219,11 @@ The rules for each component are:
219219
- Each ``subpath`` segment MUST be a percent-encoded string
220220
- When percent-decoded, a segment:
221221

222-
- MUST NOT contain a '/'
223-
- MUST NOT be any of '..' or '.'
222+
- MUST NOT contain any slash '/' characters
224223
- MUST NOT be empty
224+
- MUST NOT be any of '..' or '.'
225+
- MAY contain any Unicode character other than '/' unless the package's
226+
``type`` definition provides otherwise.
225227

226228
- The ``subpath`` MUST be interpreted as relative to the root of the package
227229

@@ -234,7 +236,6 @@ A canonical ``purl`` is composed of these permitted ASCII characters:
234236
- the Alphanumeric Characters: ``A to Z``, ``a to z``, ``0 to 9``,
235237
- the Punctuation Characters: ``.-_~`` (period '.',
236238
dash '-', underscore '_' and tilde '~'),
237-
- the Plus Character: ``+`` (plus '+'),
238239
- the Percent Character: ``%`` (percent sign '%'), and
239240
- the Separator Characters ``:/@?=&#`` (colon ':', slash '/', at sign '@',
240241
question mark '?', equal sign '=', ampersand '&' and pound sign '#').
@@ -340,7 +341,7 @@ To build a ``purl`` string from its components:
340341

341342
- Discard any pair where the ``value`` is empty.
342343
- UTF-8-encode each ``value`` if needed in your programming language
343-
- If the ``key`` is ``checksums`` and this is a list of ``checksums`` join this
344+
- If the ``key`` is ``checksum`` and this is a list of checksums join this
344345
list with a ',' to create this qualifier ``value``
345346
- Create a string by joining the lowercased ``key``, the equal '=' sign and
346347
the percent-encoded ``value`` to create a qualifier
@@ -396,8 +397,8 @@ To parse a ``purl`` string in its components:
396397
- The ``value`` is the percent-decoded right side
397398
- UTF-8-decode the ``value`` if needed in your programming language
398399
- Discard any key/value pairs where the value is empty
399-
- If the ``key`` is ``checksums``, split the ``value`` on ',' to create
400-
a list of ``checksums``
400+
- If the ``key`` is ``checksum``, split the ``value`` on ',' to create
401+
a list of checksums
401402

402403
- This list of key/value is the ``qualifiers`` object
403404

@@ -463,6 +464,13 @@ download URL, VCS URL or checksums in an API, database or web form.
463464
With this warning, the known ``key`` and ``value`` defined here are valid for use in
464465
all package types:
465466

467+
- ``vers`` allows the specification of a version range.
468+
The value MUST adhere to the `Version Range Specification <VERSION-RANGE-SPEC.rst>`_.
469+
This qualifier is mutually exclusive with the ``version`` component.
470+
For example::
471+
472+
pkg:pypi/django?vers=vers:pypi%2F%3E%3D1.11.0%7C%21%3D1.11.1%7C%3C2.0.0
473+
466474
- ``repository_url`` is an extra URL for an alternative, non-default package
467475
repository or registry. When a package does not come from the default public
468476
package repository for its ``type`` a ``purl`` may be qualified with this extra

faq.rst

Lines changed: 55 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,28 @@ Scheme
66

77
**QUESTION**: Can the ``scheme`` component be followed by a colon and two slashes, like a URI?
88

9-
**ANSWER**: No. Since a ``purl`` never contains a URL Authority, its ``scheme`` should not be suffixed with double slash as in 'pkg://' and should use 'pkg:' instead. Otherwise this would be an invalid URI per RFC 3986 at https://tools.ietf.org/html/rfc3986#section-3.3::
9+
**ANSWER**: No. Since a ``purl`` never contains a URL Authority, its ``scheme`` should not be
10+
suffixed with double slash as in 'pkg://' and should use 'pkg:' instead. Otherwise this would be an
11+
invalid URI per RFC 3986 at https://tools.ietf.org/html/rfc3986#section-3.3::
1012

1113
If a URI does not contain an authority component, then the path
1214
cannot begin with two slash characters ("//").
1315

14-
This rule applies to all slash '/' characters between the ``scheme``'s colon separator and the ``type`` component, e.g., ':/', '://', ':///' et al.
16+
This rule applies to all slash '/' characters between the ``scheme``'s colon separator and the
17+
``type`` component, e.g., ':/', '://', ':///' et al.
1518

16-
In its canonical form, a ``purl`` must not use any such ':/' ``scheme`` suffix and may only use ':' as a ``scheme`` suffix. This means that:
19+
In its canonical form, a ``purl`` must not use any such ':/' ``scheme`` suffix and may only use ':'
20+
as a ``scheme`` suffix. This means that:
1721

18-
- ``purl`` parsers must accept URLs such as 'pkg://' and must ignore and remove all such '/' characters.
19-
- ``purl`` builders should not create invalid URLs with one or more slash '/' characters between 'pkg:' and the ``type`` component.
22+
- ``purl`` parsers must accept URLs such as 'pkg://' and must ignore and remove all such '/'
23+
characters.
2024

21-
For example, although these two purls are strictly equivalent, the first is in canonical form, while the second -- with a '//' between 'pkg:' and the ``type`` 'gem' -- is an acceptable purl but is an invalid URI/URL per RFC 3986::
25+
- ``purl`` builders should not create invalid URLs with one or more slash '/' characters between
26+
'pkg:' and the ``type`` component.
27+
28+
For example, although these two purls are strictly equivalent, the first is in canonical form, while
29+
the second -- with a '//' between 'pkg:' and the ``type`` 'gem' -- is an acceptable purl but is an
30+
invalid URI/URL per RFC 3986::
2231

2332
pkg:gem/ruby-advisory-db-check@0.12.4
2433

@@ -27,34 +36,58 @@ For example, although these two purls are strictly equivalent, the first is in c
2736

2837
**QUESTION**: Is the colon between ``scheme`` and ``type`` encoded? Can it be encoded? If yes, how?
2938

30-
**ANSWER**: The "Rules for each ``purl`` component" section provides that the ``scheme`` MUST be followed by an unencoded colon ':'.
39+
**ANSWER**: The "Rules for each ``purl`` component" section provides that the ``scheme`` MUST be
40+
followed by an unencoded colon ':'.
3141

32-
In this case, the colon ':' between ``scheme`` and ``type`` is being used as a separator, and consequently should be used as-is, never encoded and never requiring any decoding. Moreover, it should be a parsing error if the colon ':' does not come directly after 'pkg'. Tools are welcome to recover from this error to help with malformed purls, but that's not a requirement.
42+
In this case, the colon ':' between ``scheme`` and ``type`` is being used as a separator, and
43+
consequently should be used as-is, never encoded and never requiring any decoding. Moreover, it
44+
should be a parsing error if the colon ':' does not come directly after 'pkg'. Tools are welcome to
45+
recover from this error to help with malformed purls, but that's not a requirement.
3346

3447

3548
Type
3649
~~~~
3750

38-
**QUESTION**: What behavior is expected from a purl spec implementation if a
39-
``type`` contains a character like a slash '/' or a colon ':'?
51+
**QUESTION**: What behavior is expected from a purl spec implementation if a ``type`` contains a
52+
character like a slash '/' or a colon ':'?
4053

41-
**ANSWER**: The "Rules for each purl component" section provides that the
42-
package ``type``
54+
**ANSWER**: The "Rules for each purl component" section provides that the package ``type`` that list
55+
allowed characters:
4356

44-
MUST be composed only of ASCII letters and numbers, period '.', plus '+',
45-
and dash '-'.
57+
MUST be composed only of ASCII letters and numbers, period '.', and dash '-'.
4658

47-
As a result, a purl spec implementation must return an error when encountering
48-
a ``type`` that contains a prohibited character.
59+
As a result, a purl spec implementation must return an error when encountering a ``type`` that
60+
contains a prohibited character.
4961

5062

5163
Version
5264
~~~~~~~
5365

54-
**QUESTION**: How do package ``types`` handle the comparison and sorting of
55-
versions?
66+
**QUESTION**: How do package ``types`` handle the comparison and sorting of versions?
67+
68+
**ANSWER**: Some package ``types`` use versioning conventions such as SemVer for NPMs or NEVRA
69+
conventions for RPMS. A ``type`` may define a procedure to compare and sort versions, but there is
70+
no reliable and uniform way to do such comparison consistently.
71+
72+
73+
Plus
74+
~~~~
75+
76+
**QUESTION**: Can a PURL contain a plus character '+'?
77+
78+
**ANSWER**: Decoded individual components can contain a plus. The encoded, canonical form can never
79+
contain an unencoded plus.
80+
81+
82+
Qualifiers
83+
~~~~~~~~~~~
84+
85+
**QUESTION**: What is the qualifier for a checksum like a SHA1?
86+
87+
**ANSWER**: The spec was originally ambiguous and used ``checksum`` (singular) in one place and
88+
``checksums`` (plural) in other places. This has been discussed extensively in issues and PRs such as
89+
https://github.com/package-url/purl-spec/issues/73 and https://github.com/package-url/purl-
90+
spec/pull/209 and the official form is ``checksum`` (singular). When writing a lenient parser,
91+
consider accepting both ``checksum`` (singular) and ``checksums`` (plural) when reading a PURL, and
92+
always emit ``checksum`` (singular) when writing a PURL.
5693

57-
**ANSWER**: Some package ``types`` use versioning conventions such as SemVer
58-
for NPMs or NEVRA conventions for RPMS. A ``type`` may define a procedure to
59-
compare and sort versions, but there is no reliable and uniform way to do such
60-
comparison consistently.

0 commit comments

Comments
 (0)