-
Notifications
You must be signed in to change notification settings - Fork 225
Update 'qualifiers' rule in core spec #382 #398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
1c1fb31
b621d65
17685b2
654e1b9
9d849b9
ccf07c5
2a13760
42e7ec0
9a7089b
898c64b
3d2a128
a990290
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,5 +1,5 @@ | ||||||||||
| Package URL specification v1.0.X | ||||||||||
| ================================ | ||||||||||
| Package URL specification v1.0.X (from qualifiers PR) | ||||||||||
| ===================================================== | ||||||||||
|
|
||||||||||
| The Package URL core specification defines a versioned and formalized format, | ||||||||||
| syntax, and rules used to represent and validate ``purl``. | ||||||||||
|
|
@@ -136,6 +136,8 @@ The rules for each component are: | |||||||||
| - The ``type`` MUST start with an ASCII letter. | ||||||||||
| - The ``type`` MUST NOT be percent-encoded. | ||||||||||
| - The ``type`` is case insensitive. The canonical form is lowercase. | ||||||||||
| - ``purl`` parsers MUST return an error when the ``type`` contains a prohibited | ||||||||||
|
johnmhoran marked this conversation as resolved.
Outdated
|
||||||||||
| character. | ||||||||||
|
|
||||||||||
|
|
||||||||||
| - **namespace**: | ||||||||||
|
|
@@ -176,25 +178,106 @@ The rules for each component are: | |||||||||
|
|
||||||||||
| - **qualifiers**: | ||||||||||
|
|
||||||||||
| - The ``qualifiers`` string is prefixed by a '?' separator when not empty | ||||||||||
| - This '?' is not part of the ``qualifiers`` | ||||||||||
| - This is a query string composed of zero or more ``key=value`` pairs each | ||||||||||
| separated by a '&' ampersand. A ``key`` and ``value`` are separated by the equal | ||||||||||
| '=' character | ||||||||||
| - These '&' are not part of the ``key=value`` pairs. | ||||||||||
| - ``key`` must be unique within the keys of the ``qualifiers`` string | ||||||||||
| - ``value`` cannot be an empty string: a ``key=value`` pair with an empty ``value`` | ||||||||||
| is the same as no key/value at all for this key | ||||||||||
| - For each pair of ``key`` = ``value``: | ||||||||||
|
|
||||||||||
| - The ``key`` must be composed only of ASCII letters and numbers, '.', '-' and | ||||||||||
| '_' (period, dash and underscore) | ||||||||||
| - A ``key`` cannot start with a number | ||||||||||
| - A ``key`` must NOT be percent-encoded | ||||||||||
| - A ``key`` is case insensitive. The canonical form is lowercase | ||||||||||
| - A ``key`` cannot contain spaces | ||||||||||
| - A ``value`` must be a percent-encoded string | ||||||||||
| - The '=' separator is neither part of the ``key`` nor of the ``value`` | ||||||||||
| - The ``qualifiers`` component MUST be prefixed by a '?' separator when not empty. | ||||||||||
| - The '?' separator is not part of the ``qualifiers`` component. | ||||||||||
| - The ``qualifiers`` component is a query string composed of one or more ``key=value`` | ||||||||||
| pairs, each of which is separated by an ampersand '&'. A ``key`` and ``value`` | ||||||||||
|
pombredanne marked this conversation as resolved.
Outdated
|
||||||||||
| are separated by the equal '=' character. | ||||||||||
|
johnmhoran marked this conversation as resolved.
Outdated
|
||||||||||
| - The '&' separator is not part of the ``key`` or the ``value``. | ||||||||||
|
pombredanne marked this conversation as resolved.
Outdated
|
||||||||||
| - The '=' separator is not part of the ``key`` or the ``value``. | ||||||||||
|
pombredanne marked this conversation as resolved.
Outdated
|
||||||||||
| - The ``key`` MUST be unique among the keys of the ``qualifiers`` string. | ||||||||||
|
pombredanne marked this conversation as resolved.
Outdated
|
||||||||||
| - The ``value`` MUST NOT be an empty string: a ``key=value`` pair with an empty ``value`` | ||||||||||
|
johnmhoran marked this conversation as resolved.
Outdated
|
||||||||||
| is the same as no ``key=value`` pair at all for this ``key``. | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good clarification -- done. |
||||||||||
|
|
||||||||||
| - For each ``key=value`` pair: | ||||||||||
|
|
||||||||||
| - ``key`` | ||||||||||
|
|
||||||||||
| - The ``key`` MUST be composed only of ASCII letters and numbers, '.', '-' and | ||||||||||
| '_' (period, dash and underscore). | ||||||||||
| - A ``key`` MUST start with an ASCII letter. | ||||||||||
|
johnmhoran marked this conversation as resolved.
Outdated
|
||||||||||
| - A ``key`` MUST NOT be percent-encoded. | ||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think this is wrong, and against other existing spec purl-spec/PURL-SPECIFICATION.rst Lines 245 to 248 in 8040ff0
I think a key can be percent-encoded. and at some points, it must be percent encoded. I might be wrong, please help me understand.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current spec already spells this out clearly, so this is not changing anything: Now since the % sign is not allowed in a key, technically, this is implied already and we could remove this sentence entirely. I like this to be explicit though.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given the comments thus far ^, I'm keeping as is unless/until I hear otherwise.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 - keep as you have it right now |
||||||||||
| - A ``key`` is case insensitive. The canonical form is lowercase. | ||||||||||
|
|
||||||||||
| - ``value`` | ||||||||||
|
|
||||||||||
| - The ``value`` MUST be composed only of the following characters, encoded | ||||||||||
| as described below and in keeping with RFC 3986 Section 2.2: | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is confusing and self contradictory. "The
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @matt-phylum -- I don't think it's confusing but in any event the revised update will be simplified and clarified. |
||||||||||
|
|
||||||||||
| "If data for a URI component would conflict with a reserved character's | ||||||||||
| purpose as a delimiter, then the conflicting data must be percent-encoded | ||||||||||
| before the URI is formed." | ||||||||||
| https://datatracker.ietf.org/doc/html/rfc3986#section-2.2 | ||||||||||
|
|
||||||||||
| 1. **All US-ASCII characters defined as "unreserved"** in RFC 3986 Section 2.3 | ||||||||||
| (https://datatracker.ietf.org/doc/html/rfc3986#section-2.3): | ||||||||||
|
|
||||||||||
| .. code-block:: none | ||||||||||
|
|
||||||||||
| 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', '~' | ||||||||||
|
|
||||||||||
| - These 66 characters do not need to be percent-encoded. | ||||||||||
|
|
||||||||||
| 2. **All US-ASCII characters defined as "sub-delims"**, a subset of | ||||||||||
| the "reserved" characters, in RFC 3986 Section 2.2 | ||||||||||
| (https://datatracker.ietf.org/doc/html/rfc3986#section-2.2): | ||||||||||
|
|
||||||||||
| .. code-block:: none | ||||||||||
|
|
||||||||||
| '!', '$', '&', ''', '(', ')', '*', '+', ',', ';', '=' | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The previous item lists characters that should not be encoded (MUST NOT for canonical PURLs), but this item lists some characters that MUST be encoded and some characters that should not be encoded together.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, will look forward to your comments on the revised update once I've pushed it. |
||||||||||
|
|
||||||||||
| - The '&' MUST be percent-encoded to avoid being incorrectly parsed | ||||||||||
| as a separator between multiple key-value pairs. See "How to parse | ||||||||||
| a purl string in its components" ("Split the qualifiers on '&'. | ||||||||||
| Each part is a key=value pair"). | ||||||||||
|
|
||||||||||
| - The other 10 characters do not need to be percent-encoded, including | ||||||||||
| the '=' -- the parser will not mistakenly treat a '=' in the value | ||||||||||
| as a separator because it splits each key-value pair just once, | ||||||||||
| from left-to-right, on the first '=' it encounters, and thus there | ||||||||||
| is no conflict: | ||||||||||
|
|
||||||||||
| .. code-block:: none | ||||||||||
|
|
||||||||||
| - For each pair, split the key=value once from left on '=': | ||||||||||
| - The key is the lowercase left side | ||||||||||
| - The value is the percent-decoded right side | ||||||||||
|
|
||||||||||
| 3. **Four additional US-ASCII characters** identified in the "query" | ||||||||||
| definition in RFC 3986 Section 3.4 (https://datatracker.ietf.org/doc/html/rfc3986#section-3.4) | ||||||||||
| and the "pchar" definition in RFC 3986 Appendix A (https://datatracker.ietf.org/doc/html/rfc3986#appendix-A): | ||||||||||
|
|
||||||||||
| .. code-block:: none | ||||||||||
|
|
||||||||||
| ':', '@', '/', '?' | ||||||||||
|
|
||||||||||
| - The '?' MUST be percent-encoded to avoid being incorrectly parsed | ||||||||||
| as a ``qualifiers`` separator -- in the right-to-left parsing | ||||||||||
| (see "How to parse a purl string in its components"), an unencoded | ||||||||||
| '?' in the ``value`` would be the first '?' encountered by the | ||||||||||
| parser and incorrectly treated as a ``qualifiers`` separator. | ||||||||||
|
|
||||||||||
| - The other three characters do not need to be percent-encoded. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This contradicts preexisting rules: purl-spec/PURL-SPECIFICATION.rst Lines 238 to 241 in 8040ff0
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will be clarified in the update. |
||||||||||
|
|
||||||||||
| 4. **All other US-ASCII characters**. | ||||||||||
|
|
||||||||||
| .. code-block:: none | ||||||||||
|
|
||||||||||
| - 33 control characters (ASCII codes 0-31 and 127) | ||||||||||
|
|
||||||||||
| - the 14 US-ASCII characters not covered in the preceding groups of US-ASCII characters: | ||||||||||
|
|
||||||||||
| ' ' [space], '"', '#', '%', '<', '>', '[', '\', ']', '^', '`', '{', '|', '}' | ||||||||||
|
|
||||||||||
| - Each of these 47 US-ASCII characters MUST be percent-encoded. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a change to the PURL canonicalization rules and will cause problems for anyone who is using canonicalized PURLs as keys.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will be clarified in the update.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @matt-phylum As you'll see elsewhere in these comments, I've deleted what was intended as a simple placeholder rule hoping people would provide language they wanted. Failing that, it's been deleted. |
||||||||||
|
|
||||||||||
| 5. **Any character that is not a US-ASCII character** | ||||||||||
| (i.e., characters with ASCII code > 127 and non-ASCII characters). | ||||||||||
|
|
||||||||||
| - All such characters MUST be UTF-8 encoded and then percent-encoded. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is way too complicated to be one bullet point in the qualifiers section. The encoding rules are:
Plus should also be encoded to avoid interoperability problems: #261
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will be clarified in the update. |
||||||||||
|
|
||||||||||
| - ``purl`` parsers MUST return an error when the ``key`` or ``value`` contains | ||||||||||
| a prohibited character. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is incorrect and incompatible with the preexisting spec. In the unescaped form, no characters are prohibited, so you could have a valid PURL In the escaped form, some characters are escaped to avoid problems unrelated to the URL.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems to me that inclusion of a prohibited character could only be "normalized" by removing/discarding such a character, which does not sound to me like mere normalization. I look to feedback from @pombredanne and others on this point.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I'm misunderstanding. There are no characters that are prohibited in the unencoded form of a qualifier value. In the encoded form, there are characters that are listed above as requiring escaping that don't actually require escaping in order for the PURL to be parsed successfully. If it can be unambiguously parsed and the result can be formatted into a PURL with the same meaning, I don't think that should be a "MUST return an error" condition. The characters do not need to be removed or discarded.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm. I must admit that I don't yet have a good understanding of when a violation of the spec should be normalized (allowing the parsing process to continue) vs. treating a spec violation as invalid (i.e., I'd be interested in reading any authorities/resources you could point me to on this important topic -- and in all events I'll happily defer to whatever approach you, @pombredanne, @jkowalleck, @mprpic and others in the community ultimately agree to. |
||||||||||
|
|
||||||||||
|
|
||||||||||
| - **subpath**: | ||||||||||
|
|
@@ -206,9 +289,11 @@ The rules for each component are: | |||||||||
| in the canonical form | ||||||||||
| - Each ``subpath`` segment MUST be a percent-encoded string | ||||||||||
| - When percent-decoded, a segment: | ||||||||||
|
|
||||||||||
| - MUST NOT contain a '/' | ||||||||||
| - MUST NOT be any of '..' or '.' | ||||||||||
| - MUST NOT be empty | ||||||||||
|
|
||||||||||
| - The ``subpath`` MUST be interpreted as relative to the root of the package | ||||||||||
|
|
||||||||||
|
|
||||||||||
|
|
||||||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -6,7 +6,7 @@ Scheme | |||||
|
|
||||||
| **QUESTION**: Can the ``scheme`` component be followed by a colon and two slashes, like a URI? | ||||||
|
|
||||||
| No. Since a ``purl`` never contains a URL Authority, its ``scheme`` should not be suffixed with double slash as in 'pkg://' and should use 'pkg:' instead. Otherwise this would be an invalid URI per RFC 3986 at https://tools.ietf.org/html/rfc3986#section-3.3:: | ||||||
| **ANSWER**: No. Since a ``purl`` never contains a URL Authority, its ``scheme`` should not be suffixed with double slash as in 'pkg://' and should use 'pkg:' instead. Otherwise this would be an invalid URI per RFC 3986 at https://tools.ietf.org/html/rfc3986#section-3.3:: | ||||||
|
|
||||||
| If a URI does not contain an authority component, then the path | ||||||
| cannot begin with two slash characters ("//"). | ||||||
|
|
@@ -24,9 +24,10 @@ For example, although these two purls are strictly equivalent, the first is in c | |||||
|
|
||||||
| pkg://gem/ruby-advisory-db-check@0.12.4 | ||||||
|
|
||||||
|
|
||||||
| **QUESTION**: Is the colon between ``scheme`` and ``type`` encoded? Can it be encoded? If yes, how? | ||||||
|
|
||||||
| The "Rules for each ``purl`` component" section provides that "[t]he ``scheme`` MUST be followed by an unencoded colon ':'. | ||||||
| **ANSWER**: The "Rules for each ``purl`` component" section provides that "[t]he ``scheme`` MUST be followed by an unencoded colon ':'. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point -- fixed. |
||||||
|
|
||||||
| In this case, the colon ':' between ``scheme`` and ``type`` is being used as a separator, and consequently should be used as-is, never encoded and never requiring any decoding. Moreover, it should be a parsing error if the colon ':' does not come directly after 'pkg'. Tools are welcome to recover from this error to help with malformed purls, but that's not a requirement. | ||||||
|
|
||||||
|
|
@@ -37,7 +38,7 @@ Type | |||||
| **QUESTION**: What behavior is expected from a purl spec implementation if a | ||||||
| ``type`` contains a character like a slash '/' or a colon ':'? | ||||||
|
|
||||||
| The "Rules for each purl component" section provides that | ||||||
| **ANSWER**: The "Rules for each purl component" section provides that | ||||||
|
|
||||||
| [t]he package ``type`` MUST be composed only of ASCII letters and numbers, | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @johnmhoran Can we refine this with the new wording? and remove the the weird square brackets in
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @pombredanne I've fixed the use of square brackets (thanks for catching that) and will commit and push these updates. I'm not sure what you are referring to by "the new wording" aside from the square brackets -- please clarify as needed once the revised |
||||||
| '.', '+' and '-' (period, plus, and dash) | ||||||
|
|
||||||
Uh oh!
There was an error while loading. Please reload this page.