This document defines the JSON format for SecID registry files. The registry currently uses YAML+Markdown (see REGISTRY-FORMAT.md) for flexibility during exploration. This document specifies the target JSON format for v1.0+.
SecID is about labeling and finding things. That's it.
The registry contains:
- Identity - What is this thing called?
- Resolution - How do I find/access it?
- Disambiguation - How do I tell similar things apart?
The registry does NOT contain:
- Enrichment - Metadata about the thing (authors, categories, relationships)
- Judgments - Quality assessments, trust scores, recommendations
- Relationships - How things connect to each other
Enrichment and relationships belong in separate data layers that reference SecIDs.
Registry match_nodes[].patterns are canonicalized to ECMAScript RegExp syntax because production resolution runs in Cloudflare Workers (JavaScript runtime).
- Store one canonical pattern set in the registry. Do not store per-engine variants (
ecmascript,pcre,python, etc.) in registry data. - Non-JS runtimes should use tooling that translates/validates from canonical ECMAScript patterns.
- Keep patterns in a portable subset when possible.
- Legacy inline-flag patterns (for example
(?i)^cve$) may still exist during migration; new/updated patterns should use ECMAScript-compatible syntax.
This section explains how a SecID string is resolved to URLs using registry data.
Important: SecID parsing requires registry access. The registry defines what types, namespaces, and names are valid. This eliminates the need for a complex "banned characters" list - if it's not in the registry, it's not valid.
Parsing uses the registry to identify components:
secid:advisory/github.com/advisories/ghsa#GHSA-1234-5678-abcd
───┬─── ──────────┬──────────── ─┬── ─────────┬─────────
│ │ │ └─ subpath
│ │ └─ name (registry lookup, longest match)
│ └─ namespace (domain, optionally with /path segments)
└─ type (known list)
| Step | Component | How to Parse |
|---|---|---|
| 1 | scheme | Literal secid: |
| 2 | type | Match against 10 known values |
| 3 | namespace | Shortest-to-longest matching against registry. Namespaces can contain / (e.g., github.com/advisories). See SPEC.md Section 4.3. |
| 4 | name | Match remaining path against name-level pattern nodes in match_nodes |
| 5 | version | If @ present after name, match against version-level children |
| 6 | source qualifiers | Parse ?... until # |
| 7 | subpath | If # present, match against subpath-level children |
| 8 | item_version | If @ follows matched subpath pattern, match against deeper children for item version |
| 9 | item qualifiers | If ? follows the item version (or matched identifier), parse as item-level qualifiers. |
Why registry-aware? Names can contain any characters (including #, @, ?, :). The registry defines what names exist, and longest-match resolves ambiguity.
Shortest-to-longest namespace resolution: Since namespaces can contain /, the parser tries shortest namespace first against the registry, then progressively longer matches. See SPEC.md Section 4.3 for details.
Input: secid:advisory/github.com/advisories/ghsa#GHSA-xxxx
After extracting type "advisory", remaining path: github.com/advisories/ghsa#GHSA-xxxx
Try namespace matches (shortest first):
1. "github.com" → exists in registry? Yes → candidate
2. "github.com/advisories" → exists in registry? Yes → longer candidate
3. "github.com/advisories/ghsa" → exists? No → stop
Longest matching namespace: "github.com/advisories"
Remaining: "ghsa#GHSA-xxxx" → name="ghsa", subpath="GHSA-xxxx"
Example with special characters:
secid:advisory/vendor.com/weird#name:here#ID-2024
If registry has source weird#name:here in advisory/vendor, then:
- name =
weird#name:here - subpath =
ID-2024
Using type, namespace, and name, find the source definition:
registry[type][namespace][name] → registry["advisory"]["mitre.org"]["cve"]
Filesystem mapping: The abstract registry[type][namespace] maps to a filesystem path via the reverse-DNS algorithm (see SPEC.md Section 4.0):
| Lookup | Filesystem Path |
|---|---|
registry["advisory"]["mitre.org"] |
registry/advisory/org/mitre.json |
registry["advisory"]["github.com/advisories"] |
registry/advisory/com/github/advisories.json |
registry["control"]["cloudsecurityalliance.org"] |
registry/control/org/cloudsecurityalliance.json |
The resolver walks the pattern tree (match_nodes), matching each portion of the SecID against the corresponding tree level. At each level, all sibling patterns are tested — all matches are traversed to completion, not just the first.
secid:advisory/redhat.com/errata#RHSA-2026:1234
1. Name "errata" → match against name-level nodes → "^errata$" matches
2. No @version → skip version-level children
3. Subpath "RHSA-2026:1234" → match against subpath-level children → "^RHSA-\\d{4}:\\d+$" matches
4. Return data from both levels (source info + specific advisory URL)
Chop and pass: Each regex only sees its portion of the string. The resolver splits at grammar boundaries (@, #) and hands each piece to the appropriate tree level. No backtracking, no lookahead across levels.
All matches traversed: The resolver doesn't stop at the first match — it traverses all matching nodes to completion. Multiple matches are all returned (with weights). When sibling patterns overlap, weight helps consumers choose.
Every level returns data. Query secid:advisory/redhat.com/errata → returns errata info from the name-level node. Query secid:advisory/redhat.com/errata#RHSA-2026:1234 → returns both the source info AND the specific advisory URL. Incomplete queries get the data available at their depth.
Patterns match the complete input at each level, not a substring. Patterns should be anchored with ^...$.
For simple cases, the subpath is used directly as {id} in the URL template:
{"type": "lookup", "url": "https://cve.org/CVERecord?id={id}"}For complex URL structures where parts of the ID need transformation, patterns can specify a variables object:
{
"pattern": "^CWE-\\d+$",
"url": "https://cwe.mitre.org/data/definitions/{number}.html",
"variables": {
"number": {
"extract": "^CWE-(\\d+)$",
"description": "Numeric ID portion (e.g., '79' from 'CWE-79')"
}
}
}Each variable has:
extract- Regex applied to the subpath. Capture groups()are numbered{1},{2}, etc.format- (Optional) How to combine capture groups. Defaults to{1}(first group). Can include literals.description- Explains what this variable represents and how it's derived.
Substitute variables into the URL template:
| Placeholder | Source | Example |
|---|---|---|
{id} |
Full subpath | CVE-2024-1234 |
{version} |
From @version component |
4.0 |
{year} |
Extracted from subpath (if in variables) | 2024 |
{number} |
Extracted from subpath (if in variables) | 1234 |
Result: https://cve.org/CVERecord?id=CVE-2024-1234
For CWE, the lookup URL needs just the number, not the full ID:
{
"pattern": "^CWE-\\d+$",
"description": "CWE weakness ID",
"url": "https://cwe.mitre.org/data/definitions/{number}.html",
"variables": {
"number": {
"extract": "^CWE-(\\d+)$",
"description": "Numeric ID portion (e.g., '79' from 'CWE-79')"
}
}
}Resolution of secid:weakness/mitre.org/cwe#CWE-79:
- Subpath:
CWE-79 - Pattern matches:
^CWE-\d+$✓ - Extract variables: apply
number.extractregex → first capture group(\d+)captures79 - Build URL:
https://cwe.mitre.org/data/definitions/79.html
For the CVE GitHub repository, files are organized by year and a "bucket" (all but last 3 digits + xxx):
{
"pattern": "^CVE-\\d{4}-\\d{4,}$",
"description": "CVE JSON record on GitHub",
"url": "https://github.com/CVEProject/cvelistV5/blob/main/cves/{year}/{bucket}/{id}.json",
"variables": {
"year": {
"extract": "^CVE-(\\d{4})-\\d+$",
"description": "4-digit year (e.g., '2026' from 'CVE-2026-25010')"
},
"bucket": {
"extract": "^CVE-\\d{4}-(\\d+)\\d{3}$",
"format": "{1}xxx",
"description": "All but last 3 digits + 'xxx' (e.g., '25xxx' from 'CVE-2026-25010')"
},
"id": {
"extract": "^(CVE-\\d{4}-\\d+)$",
"description": "Full CVE ID"
}
}
}Resolution of secid:advisory/mitre.org/cve#CVE-2026-25010:
- Subpath:
CVE-2026-25010 - Pattern matches ✓
- Extract variables:
year: extract(2026)→2026bucket: extract(25)from before last 3 digits, format{1}xxx→25xxxid: extract(CVE-2026-25010)→CVE-2026-25010
- Build URL:
https://github.com/CVEProject/cvelistV5/blob/main/cves/2026/25xxx/CVE-2026-25010.json
Traditional data formats optimized for software that needed deterministic, single values. SecID takes an AI-first approach:
- Provide options with context rather than forcing single "canonical" choices
- Let AI reason about which option fits the current need
- Include metadata that aids decision-making
Example: Instead of one lookup URL, provide multiple with context about when each is appropriate.
Use the right pattern for the data:
| Situation | Pattern | Example |
|---|---|---|
| Fixed, small set of categories | Named fields | official_name, common_name, alternate_names |
| Open-ended, numerous categories | Arrays with type/context | urls, match_nodes |
| Identity/classification | Singular values | namespace, type, status |
Why? Named fields are self-documenting. An AI reads official_name and immediately knows what it is. Arrays with type require understanding a schema to interpret.
Distinguish between "no data exists" and "not yet researched":
| State | Representation | Meaning |
|---|---|---|
| Has data | "field": "value" |
We have the information |
| No data exists | "field": null |
We looked, nothing to find |
| Not researched | field absent | We haven't looked yet |
For arrays:
[](empty array) = we looked, there are nonenull= we looked, not applicable to this source- absent = not yet researched
Why? This lets us track completeness. An absent field signals work to be done. A null signals confirmed absence.
Three optional metadata fields record when data was verified and what was observed:
| Field | Meaning | Changes when |
|---|---|---|
checked |
Date someone last verified the value is still accurate | Every verification pass, even if nothing changed |
updated |
Date the value last materially changed | Only when the actual data changes |
note |
Free-text observation about what was found during verification | When the observation changes |
Date format: YYYY-MM-DD (ISO 8601, no time component — day granularity is sufficient).
| Context | Fields | Example |
|---|---|---|
| Source-level (top of file) | checked, updated, note |
"checked": "2026-03-06" |
| Attached to a scalar field | field_checked, field_updated, field_note |
"security_txt_checked": "2026-03-06" |
| Inside objects (URL entries, etc.) | checked, updated, note |
{"url": "...", "checked": "2026-03-06"} |
Source-level checked/updated apply to the entire registry entry. Inside objects, the fields are scoped by the object. The _checked/_updated/_note suffix "attaches" metadata to the scalar field it describes.
Timestamps make null values strictly more informative:
| State | Meaning |
|---|---|
"field": null |
We looked and found nothing (existing) |
"field": null, "field_checked": "2026-03-06" |
We looked on this date and found nothing |
"field" absent |
Not yet researched (existing, unchanged) |
Existing files without timestamps remain valid — absent timestamps mean "not yet tracked."
Source-level timestamps:
{
"schema_version": "1.0",
"namespace": "redhat.com",
"type": "entity",
"status": "draft",
"checked": "2026-03-06",
"updated": "2026-03-06",
...
}URL object with verification note:
"urls": [
{
"type": "security",
"url": "https://access.redhat.com/security/",
"checked": "2026-03-06",
"updated": "2026-03-06"
}
]Scalar field — confirmed positive:
{
"security_txt": "https://security.access.redhat.com/data/meta/v1/security.txt",
"security_txt_checked": "2026-03-06",
"security_txt_updated": "2026-03-06",
"security_txt_note": "PGP signed, RFC 9116 compliant. Expires 2026-06-04."
}Scalar field — confirmed negative:
{
"security_txt": null,
"security_txt_checked": "2026-03-06",
"security_txt_updated": "2026-03-06",
"security_txt_note": "/.well-known/security.txt redirects to homepage"
}See TIMESTAMP-FIELDS.md for full rationale, backwards compatibility analysis, and pilot files.
Registry URL objects carry optional metadata describing the data format available at each URL. This serves three purposes:
- Client filtering — API clients can request only structured (machine-readable) results
- v2.0 data serving — the service needs to know how to fetch and parse each source
- Provenance — documents how registry entries were derived from raw source data
Four optional fields appear on both source-level URL objects and per-item match_node children:
parsability:"structured"(machine-readable with a schema) or"scraped"(HTML/unstructured). Describes data format only — access patterns (API, bulk, search) are captured in the URLtypefield.schema: A SecID string referencing the schema (e.g.,secid:reference/cve.org/cve-schema@5.2.0). Schemas arereferenceregistry entries — versioned, resolvable. Absent for scraped sources.parsing_instructions: A SecID string referencing a parsing instruction document (e.g.,secid:reference/cloudsecurityalliance.org/secid-parsers#cve-json-5). CSA-authored documents covering field mappings, access patterns, and provenance.auth: Free text describing authentication requirements. Ranges from"none"to multi-paragraph instructions for complex access processes.
All four are optional. Absent means "not yet documented" — entries can be annotated incrementally.
A formal JSON Schema file is ideal, but API documentation qualifies too. If the NVD API returns structured JSON defined by their API docs, schema points at a reference entry for those docs. The field means "what defines this data's structure" — not "is there a .json schema file."
| Level | Where | Purpose |
|---|---|---|
Source-level urls |
match_nodes[].data.urls[] |
Access methods for the source as a whole |
Per-item child data |
match_nodes[].children[].data |
Resolution URLs for specific items |
{
"schema_version": "1.0",
"namespace": "mitre.org",
"type": "advisory",
"status": "published",
"checked": "2026-03-06",
"updated": "2026-01-15",
"official_name": "MITRE Corporation",
"common_name": "MITRE",
"alternate_names": ["MITRE Corp"],
"notes": "MITRE is a US nonprofit that operates FFRDCs. Created and maintains CVE, CWE, ATT&CK, CAPEC, and ATLAS. CISA contracts MITRE to operate the CVE Program. NVD (NIST) enriches CVE records with CVSS, CPE, CWE data. CNAs can assign CVE IDs under MITRE's program.",
"urls": [
{"type": "website", "url": "https://www.mitre.org"}
],
"match_nodes": [
{ "patterns": ["^cve$"], "data": { ... }, "children": [ ... ] }
]
}| Field | Type | Description |
|---|---|---|
schema_version |
string | JSON schema version for this file |
namespace |
string | Organization identifier — domain name (used in SecIDs). See namespace validation below. |
type |
string | SecID type: advisory, weakness, ttp, control, capability, methodology, disclosure, regulation, entity, reference |
status |
string | Registry entry status (see below) |
status_notes |
string | null | Optional context about status (blockers, gaps, guidance for contributors) |
notes |
string | null | Free-form context for AI and human readers (see Notes Fields below) |
alias_of |
string | null | If present, this is an alias stub — namespace redirects to the value. No sources needed. |
Namespaces must be safe for filesystems, shells, and URLs while supporting international names.
Allowed characters:
a-z(lowercase ASCII letters)0-9(ASCII digits)-(hyphen, not at start/end of DNS labels).(period, as DNS label separator)- Unicode letters (
\p{L}) and numbers (\p{N})
Validation regex: ^[\p{L}\p{N}]([\p{L}\p{N}._-]*[\p{L}\p{N}])?$
Not allowed within a segment: Spaces, punctuation (except - and .), shell metacharacters.
Per-segment validation: Namespaces are domain names, optionally with /-separated path segments for platform sub-namespaces (e.g., github.com/advisories). / separates segments but is not allowed within a segment. Each segment between / must match the regex above.
Examples:
mitre.org ✓ Domain name
nist.gov ✓ Government domain
github.com/advisories ✓ Platform sub-namespace
aws.amazon.com ✓ Subdomain
字节跳动.com ✓ Unicode domain (ByteDance)
red_hat.com ✗ Underscore not allowed in segment
Alias stubs: When alias_of is present, the entry is a redirect. Resolvers follow it to the target namespace. Used for Punycode/Unicode IDN equivalence (e.g., xn--mnchen-3ya.de → münchen.de). See EDGE-CASES.md for details.
Why these rules:
-
Filesystem safety - Namespace segments become file paths (
registry/advisory/org/mitre.json). Sub-namespaces become directories (registry/advisory/com/github/advisories.json). Avoiding shell metacharacters ensures repos work in Git across all platforms. -
Domain names are globally unique - DNS already provides authoritative, collision-free identifiers. No centralized namespace assignment needed.
-
Unicode for internationalization - Organizations worldwide should use native language names. Unicode letter/number categories include all alphabets while excluding dangerous punctuation.
Registry entry status reflects documentation completeness and review state:
| Status | Meaning | Field Requirements |
|---|---|---|
proposed |
Suggested, minimal info | namespace, type, status, official_name required |
draft |
Being worked on | Any fields, actively researching |
pending |
Awaiting review | All fields present (value, null, or []) - nothing absent |
published |
Reviewed and approved | Same as pending, but reviewed |
Key principle: published doesn't mean "complete" - it means "reviewed." Empty arrays and null values are valid and valuable - they show we looked and couldn't find anything, which exposes gaps and invites contribution.
Examples:
"status": "published",
"status_notes": "Vendor has no public security page - urls intentionally empty""status": "draft",
"status_notes": "Waiting for vendor response about official URL"| Field | Type | Description |
|---|---|---|
wikidata |
string[] | Wikidata Q-numbers for entity disambiguation (e.g., ["Q1116236"]) |
wikipedia |
string[] | Wikipedia article URLs for direct access |
Why arrays? Entities can map to multiple Wikidata entries (mergers, name changes, historical entries) or have multiple relevant Wikipedia articles (different languages, related topics). Arrays handle 0, 1, or more consistently.
Why both fields?
wikidata- Stable, language-neutral identifiers. Links to all Wikipedia versions. Preferred for disambiguation.wikipedia- Direct access to human-readable context. Convenience for AI/humans without extra lookup. Fallback when no Wikidata exists.
| Field | Type | Description |
|---|---|---|
official_name |
string | Official/legal name of the organization |
common_name |
string | null | Common short name (e.g., "MITRE", "NIST") |
alternate_names |
string[] | null | Other names for search/matching |
Why separate fields? Fixed, small set of name categories. Named fields are self-documenting and easier for AI to generate correctly.
The notes field provides free-form context that doesn't fit into structured fields. It exists at two levels:
Top-level notes — context about the organization/namespace:
- History and background ("MITRE created and operates many canonical security identifier systems")
- Relationships to other organizations ("CISA contracts MITRE to operate the CVE Program")
- Why this namespace matters for security practitioners
- Organizational context that helps AI understand the source's role
Source-level notes — operational context about a specific data source:
- Resolution quirks ("Bugzilla accepts both bug IDs and CVE aliases; CVE aliases redirect")
- Data quality notes ("Quality of descriptions varies by CNA")
- Usage guidance ("The cvelistV5 GitHub repo has raw JSON records organized by year/bucket")
- Processing context ("NVD enriches CVE records but has processing backlogs")
- Historical context about format changes or migrations
notes vs description:
| Field | Purpose | Example |
|---|---|---|
description |
What the source is (1-3 sentences) | "Red Hat publishes three types of errata: RHSA, RHBA, and RHEA." |
notes |
Everything else an AI needs to use it well | "RHSA advisories reference CVEs but may bundle multiple CVEs per advisory. Errata IDs contain colons (RHSA-2024:1234) — preserve the colon in subpaths. Red Hat's API requires authentication for some endpoints." |
Format: Markdown-allowed string. Can be multiple paragraphs. Keep it concise but don't artificially truncate — if an AI needs to know it to resolve or understand this source, put it here.
Null vs absent: Same convention as other fields. null means "we looked, nothing noteworthy." Absent means "not yet researched."
What goes in notes:
- Context migrated from YAML+Markdown body content
- Operational knowledge for resolution
- Quirks, edge cases, known issues
- Relationships to other sources (informational, not machine-readable)
What does NOT go in notes:
- Structured data that belongs in other fields (URLs, patterns, examples)
- Enrichment data (severity, affected products, authors)
- Relationship data that should be machine-readable (belongs in the relationship layer)
Two URL mechanisms exist in the registry:
1. urls array — used at top-level, in source-level data, and optionally on child nodes. For documentation, reference links, API endpoints, downloads — any URL that provides context.
"urls": [
{"type": "website", "url": "https://www.mitre.org"},
{"type": "docs", "url": "https://docs.aws.amazon.com/...", "note": "Security chapter"},
{"type": "bulk_data", "url": "https://example.com/data.zip", "format": "zip"},
{"type": "docs", "url": "https://eur-lex.europa.eu/...", "lang": "fr", "note": "French text"}
]2. data.url string — on child match_nodes only. THE resolution URL template with {id} variable substitution. One per child. For multiple resolution URLs (e.g., HTML page + JSON API), use multiple children matching the same pattern with different weights.
"children": [
{"patterns": ["^CVE-\\d{4}-\\d{4,}$"], "weight": 100, "data": {"url": "https://www.cve.org/CVERecord?id={id}"}},
{"patterns": ["^CVE-\\d{4}-\\d{4,}$"], "weight": 50, "data": {"url": "https://api.example.com/{id}", "content_type": "application/json"}}
]URL entry fields:
| Field | Required | Description |
|---|---|---|
url |
Yes | The actual URL |
type |
Yes | Category (see below) |
note |
No | Human/AI readable context explaining what this URL is for |
lang |
No | ISO 639-1 language code (e.g., "en", "fr", "de") |
format |
No | Expected content format: "html", "json", "pdf", "xml", "csv", "zip" |
Common type values (not a strict enum — use descriptive note for specifics):
| Type | Usage |
|---|---|
website |
Main website or product page |
docs |
Documentation, guides, reference pages |
api |
API endpoint or API reference |
bulk_data |
Downloadable dataset (ZIP, JSON, XML, CSV) |
lookup |
Search/lookup URL for finding specific items |
security |
Security-specific page |
security_txt |
RFC 9116 security.txt file |
paper |
Research paper or publication |
Other type values are acceptable. The note field carries the real context for AI consumption — don't over-enumerate types.
The match_nodes array replaces the old sources block. Each node in the tree matches a portion of the SecID string, returns data if matched, and optionally has children for deeper matching.
"match_nodes": [
{
"patterns": ["^cve$"],
"description": "Common Vulnerabilities and Exposures",
"weight": 100,
"data": {
"official_name": "Common Vulnerabilities and Exposures",
"common_name": "CVE",
"alternate_names": null,
"description": "...",
"notes": "...",
"urls": [ ... ],
"version_required": false,
"unversioned_behavior": "current",
"version_disambiguation": null,
"versions_available": null,
"examples": [ ... ]
},
"children": [
{
"patterns": ["^CVE-\\d{4}-\\d{4,}$"],
"description": "Standard CVE ID format",
"weight": 100,
"data": {
"url": "https://www.cve.org/CVERecord?id={id}"
}
}
]
}
]The name-level pattern (e.g., ^cve$) replaces the literal source key. This is matched against the name component of the SecID: secid:advisory/mitre.org/cve#CVE-2024-1234 → name cve matches ^cve$.
| Field | Type | Required | Description |
|---|---|---|---|
patterns |
string[] | yes | One or more regex patterns (OR alternatives). All share the same children and data. |
description |
string | no | Human/AI-readable description of what this node matches |
weight |
integer | no | 0-200, default 0. Higher = more preferred. Returned with results, consumer decides. |
data |
object | no | Result data returned when this node matches (see below) |
children |
array | no | Child nodes for matching the next portion of the string (recursive) |
Multiple patterns per node: A node can have multiple regex alternatives. All share the same children and data. Used when a source is known by multiple names (e.g., ["^top10$", "^top-10$", "^owasp-top-10$"]).
Case sensitivity: Patterns are case-sensitive by default. For case-insensitive matching, add explicit aliases in patterns (for common variants) rather than engine-specific inline flags. Convention: keep canonical lowercase name-level patterns and add targeted aliases only where needed. Subpath patterns should match canonical source case. No lossy normalization of input — the original is always preserved.
The data object at each level contains whatever result information is appropriate for that depth. Common fields:
Name-level data (source metadata):
| Field | Type | Description |
|---|---|---|
official_name |
string | Official name of the source |
common_name |
string | null | Common short name |
alternate_names |
string[] | null | Other names for search/matching |
description |
string | Brief summary of what this source is |
notes |
string | null | Operational context for AI/human readers |
urls |
array | Source-level URLs (website, API, bulk_data) |
version_required |
boolean | See Version Resolution Fields |
unversioned_behavior |
string | See Version Resolution Fields |
version_disambiguation |
string | null | See Version Resolution Fields |
versions_available |
array | null | See Version Resolution Fields |
examples |
(string | ExampleObject)[] | Representative identifier examples (see Examples) |
Source-level URL object fields:
Each object in the urls array has a type and url field. The following optional fields can be added to characterize the data available at that URL:
| Field | Type | Description |
|---|---|---|
type |
string | Access pattern identifier: website, api, bulk_data, search, github, download, lookup. Additional types can be added as encountered. |
url |
string | The URL |
parsability |
string | null | "structured" or "scraped". Same semantics as subpath-level. |
schema |
string | null | SecID schema reference. Same semantics as subpath-level. |
parsing_instructions |
string | null | SecID parsing instructions reference. Same semantics as subpath-level. |
auth |
string | null | Free-text auth description. Same semantics as subpath-level. |
notes |
string | null | Additional context about this access method. |
format |
string | null | Short format hint (e.g., "json", "xml"). Legacy field — prefer parsability + schema for new entries. |
note |
string | null | Legacy alias for notes. Use notes for new entries. |
Subpath-level data (pattern-specific resolution):
| Field | Type | Description |
|---|---|---|
url |
string | Lookup URL with {id} placeholder |
format |
string | Response format (json, html, xml) |
content_type |
string | Full MIME type from HTTP Content-Type header (e.g., text/html, application/json). Used by the ?content_type= qualifier to filter results by format. |
parsability |
string | null | Data format: "structured" (machine-readable, has schema) or "scraped" (HTML/unstructured). Absent means not yet documented. |
schema |
string | null | SecID reference to the schema this data conforms to (e.g., secid:reference/cve.org/cve-schema@5.2.0). Absent for scraped sources. A formal JSON Schema is ideal but API documentation also qualifies — the field means "what defines this data's structure." |
parsing_instructions |
string | null | SecID reference to a parsing instruction document (e.g., secid:reference/cloudsecurityalliance.org/secid-parsers#cve-json-5). Covers field mappings, access patterns, and provenance notes. |
auth |
string | null | Free-text description of how to authenticate/access this URL. Ranges from "none" to multi-paragraph explanations. Absent means not yet documented. |
lang |
LangConfig | Language availability and URL substitution config. See Language Resolution below. |
note |
string | Context for when/why to use this URL |
type |
string | Category when source has multiple ID types |
known_values |
object | Enumeration of finite, stable values (see Known Values) |
lookup_table |
object | Map of IDs to URLs for non-computable URLs (see Lookup Table) |
variables |
object | Variable extraction for complex URL building (see Variable Extraction) |
examples |
(string | ExampleObject)[] | Test fixtures with expected outputs (see Examples) |
Capability-type data (product security features — type: capability):
| Field | Type | Description |
|---|---|---|
options |
array | Configuration options. Each entry has value, name, description, and optionally setting, type, range, default. |
default |
object | string | Default value/configuration. Object with value, since (date of change), note. String for simple defaults. |
vendor_recommendation |
string | What the vendor recommends for this capability. Labeled as the vendor's opinion, not a universal requirement. |
audit |
object | Commands to check current configuration. Keys: cli (CLI command), api (API operation), console (UI path). May have additional keys like cli_root, cli_all for variations. |
configure |
object | Commands to set/enable the capability. Keys: cli, api, console, terraform (resource name), cloudformation (resource type). May have additional keys like cli_create, cli_delete. |
cross_references |
string[] | SecID strings for related capabilities in other services (e.g., ["secid:capability/amazon.com/aws/kms"]). |
limits |
string | Service limits or quotas relevant to this capability. |
recent_changes |
string | Notable recent changes to defaults or behavior. |
Example:
{
"description": "S3 bucket default server-side encryption",
"options": [
{"value": "AES256", "name": "SSE-S3", "description": "Amazon S3 managed keys"},
{"value": "aws:kms", "name": "SSE-KMS", "description": "AWS KMS managed keys"}
],
"default": {"value": "AES256", "since": "2023-01-05", "note": "Enabled by default since January 2023"},
"vendor_recommendation": "Use SSE-KMS for sensitive data",
"audit": {
"cli": "aws s3api get-bucket-encryption --bucket {bucket}",
"api": "GetBucketEncryption",
"console": "S3 → Bucket → Properties → Default encryption"
},
"configure": {
"cli": "aws s3api put-bucket-encryption --bucket {bucket} --server-side-encryption-configuration ...",
"terraform": "aws_s3_bucket_server_side_encryption_configuration",
"cloudformation": "AWS::S3::Bucket BucketEncryption"
},
"urls": [
{"type": "docs", "url": "https://docs.aws.amazon.com/...", "note": "AWS documentation"}
]
}Registry-type data (a discoverability/index entry, not itself a framework — type: registry):
| Field | Type | Description |
|---|---|---|
type |
string | Set to "registry" to flag that this entry is an index/lookup rather than a control framework, capability set, weakness taxonomy, etc. The entry lives in its parent type for discoverability — the things being indexed are defined elsewhere (often in other namespaces or other sources within the same namespace). |
Used when an organization publishes a registry of submissions, attestations, or third-party data that does not itself define new identifiers. Such entries typically have no match_nodes.children because they are leaves — there is no child ID system to match. URLs should distinguish program/policy pages from the searchable registry surface(s) using type: "website" vs type: "lookup".
The type: "registry" flag tells AI agents and downstream consumers: "look elsewhere for the actual controls/identifiers — this entry is a pointer, not a definition."
Example (from registry/control/org/cloudsecurityalliance.json, the STAR entry):
{
"patterns": ["(?i)^star$"],
"description": "CSA STAR Registry — public registry of CAIQ submissions and third-party assessments. NOT a control framework.",
"data": {
"type": "registry",
"official_name": "Security, Trust, Assurance and Risk Registry",
"common_name": "STAR",
"description": "Public registry of cloud provider security assessments. The largest public collection of CAIQ submissions. Listed here for discoverability — STAR is an index/registry of assessments, not itself a set of controls.",
"urls": [
{"type": "website", "url": "https://cloudsecurityalliance.org/star", "note": "STAR program homepage"},
{"type": "lookup", "url": "https://cloudsecurityalliance.org/star/registry", "note": "Public searchable registry"}
]
}
}Disclosure-type data (vulnerability reporting — type: disclosure):
| Field | Type | Description |
|---|---|---|
scope |
string | What products/projects this disclosure program covers. The key field — answers "does this program cover my product?" |
cve_program_role |
string | Role in the CVE Program (e.g., "CNA", "Root", "CNA-LR", "Top-Level Root", "Secretariat"). |
organization_type |
string | Organization classification (e.g., "Vendor", "Open Source", "CERT", "Bug Bounty Provider"). |
contacts |
array | null | Reporting contacts. Each entry: type ("email", "web", "github_pvr"), value (address/URL), note, optionally preferred (boolean). |
Example:
{
"scope": "Vulnerabilities in open source projects affecting Red Hat software",
"cve_program_role": "CNA (reports to Red Hat Root)",
"organization_type": "Vendor, Open Source",
"contacts": [
{"type": "email", "value": "secalert@redhat.com", "note": "CNA contact email"},
{"type": "web", "value": "https://access.redhat.com/security/team/contact", "note": "Security contact page"}
],
"urls": [
{"type": "docs", "url": "https://access.redhat.com/articles/...", "note": "Disclosure policy"}
]
}The content_type field records the MIME type that the URL's HTTP server actually returns in its Content-Type header. Values should be verifiable — CI can HEAD each URL and compare the header to the registry value.
Common values:
text/html— web pages (cve.org record pages, GitHub blob views)application/json— JSON APIs and raw JSON filesapplication/pdf— PDF documents (ISO standards, compliance reports)text/xmlorapplication/xml— XML feeds and OVAL definitions
content_type vs format: The existing format field describes the data format of the content (e.g., a GitHub blob page has "format": "json" because it displays JSON data, but "content_type": "text/html" because the HTTP response is HTML). content_type reflects what the HTTP server returns; format reflects what the underlying data is. Both can coexist on the same node.
The lang field declares that a child node's URL is available in multiple languages. It uses the LangConfig schema:
| Field | Type | Required | Description |
|---|---|---|---|
available |
string[] | Yes | ISO 639-1 language codes (e.g., ["en", "de", "fr"]) |
default |
string | Yes | Default language code (e.g., "en") |
url_transform |
string | No | Transform applied to lang code in URL. "uppercase" → "EN". Absent/null → as-is. |
The URL template uses {lang} as a placeholder:
{
"patterns": ["^art-\\d+(\\.[a-z])?$"],
"description": "GDPR article reference",
"weight": 100,
"data": {
"url": "https://eur-lex.europa.eu/legal-content/{lang}/TXT/HTML/?uri=CELEX:32016R0679",
"content_type": "text/html",
"lang": {
"available": ["en", "de", "fr", "es", "it", "nl", "pt", "pl", "ro", "cs", "da", "el", "et", "fi", "ga", "hr", "hu", "lt", "lv", "mt", "sk", "sl", "sv", "bg"],
"default": "en",
"url_transform": "uppercase"
}
}
}Resolution behavior:
?lang=de→ substitute{lang}withDE(uppercase transform), return URL withlang: "de"on result- No
?lang=→ use default (en), substitute{lang}withEN, return withlang: "en"and +1 weight nudge ?lang=xx(not inavailable) →not_foundwith available languages listed
Why url_transform? Some services use uppercase language codes in URLs (EUR-Lex uses /legal-content/EN/...). The transform lets the registry declare this so the API consumer always receives standard lowercase ISO 639-1 codes regardless of the upstream URL format.
The description field provides a brief summary of what this source is. The notes field provides deeper operational context:
"sources": {
"errata": {
"official_name": "Red Hat Security Advisories",
"description": "Red Hat publishes three types of errata: RHSA (Security Advisory) for security fixes, RHBA (Bug Advisory) for bug fixes, and RHEA (Enhancement Advisory) for new features. Most security work focuses on RHSA.",
"notes": "Errata IDs contain colons (e.g., RHSA-2024:1234) — preserve the colon in subpaths. A single RHSA may bundle fixes for multiple CVEs. Red Hat's API at access.redhat.com/hydra/rest/securitydata provides machine-readable advisory data. Errata are also linked from Bugzilla entries. Numbering resets annually — the number after the colon is sequential within a year.",
...
}
}description — what the source is (1-3 sentences):
- Classes of objects the source contains (what is an RHSA vs RHBA vs RHEA?)
- When to use this source vs similar ones
notes — everything else an AI needs to use it well:
- Resolution quirks and edge cases
- Data quality observations
- Format details and gotchas
- Relationships to other sources (informational)
- Historical context about migrations or format changes
- Processing notes (backlogs, update frequency, authentication requirements)
What does NOT go in either field:
- Every individual instance (don't describe CVE-2024-1234)
- Data enrichment (severity, affected products, authors)
- Machine-readable relationships (belongs in the relationship layer)
Rule of thumb: description answers "what is this?" in a sentence. notes answers "what do I need to know to work with this effectively?"
"urls": [
{"type": "website", "url": "https://cve.org"},
{"type": "lookup", "url": "https://cve.org/CVERecord?id={id}", "note": "Human-readable page"},
{"type": "lookup", "url": "https://cveawg.mitre.org/api/cve/{id}", "format": "json", "note": "API, richer data"},
{"type": "bulk_data", "url": "https://github.com/CVEProject/cvelistV5", "format": "json"},
{"type": "api", "url": "https://cveawg.mitre.org/api"}
]| Field | Type | Required | Description |
|---|---|---|---|
type |
string | yes | URL category (see below) |
url |
string | yes | The URL, may contain {placeholder} templates |
format |
string | no | Response format: json, html, xml, csv, pdf |
note |
string | no | Context for AI: when/why to use, access instructions, auth requirements, download hints |
URL type vocabulary:
| Type | Description |
|---|---|
website |
Main website for humans |
docs |
Documentation pages |
search |
Search interface (human or programmatic) |
lookup |
Resolution URL with {id} placeholder |
api |
API endpoint |
bulk_data |
Bulk download location |
github |
GitHub repository |
paper |
Academic paper |
secid_api |
SecID REST API for this source (if different from main) |
secid_mcp |
SecID MCP endpoint for this source (if different from main) |
Why an array? Multiple URLs of the same type are common (e.g., primary and fallback lookup endpoints, multiple mirrors). The note field provides context to help AI choose appropriately.
URLs may contain placeholders for dynamic resolution:
| Placeholder | Description | Example |
|---|---|---|
{id} |
Full identifier from subpath | CVE-2024-1234 |
{num} |
Numeric portion of identifier | 1234 |
{year} |
Year component of identifier | 2024 |
{version} |
Version from @version component |
4.0 |
{item_version} |
Item version from @item_version after subpath |
a1b2c3d |
The resolver walks the tree level by level, matching each portion of the SecID string:
- Name level: Match the
namecomponent againstpatternsin each top-levelmatch_nodesentry. All matching nodes are traversed. - Version level: If
@versionis present, match against children of the name-level node. If no version children exist, the version is passed through as{version}for URL templates. - Subpath level: If
#subpathis present, match against children at the next level. These are the equivalent of the oldid_patterns. - Item version level: If
@item_versionfollows a matched subpath pattern, match against deeper children.
At each level, the node's data is collected into the result set. The resolver returns data from every matched level, not just the deepest.
Key properties:
- Chop and pass. Each regex only sees its portion of the string. The resolver splits at grammar boundaries (
@,#) and passes each piece to the appropriate tree level. No backtracking, no lookahead across levels. - All matches traversed. The resolver doesn't stop at the first match — all matching sibling nodes are traversed to completion. Multiple matches are returned with weights.
- Case sensitivity per-pattern. Use explicit alias patterns when case-insensitive behavior is needed. No lossy normalization of input.
- Mutual exclusivity is checkable. At each level, you can validate that sibling patterns don't overlap. When they do overlap,
weightdisambiguates.
For sources with multiple subpath types (old id_patterns with type field), each type becomes a sibling child node:
"children": [
{
"patterns": ["^T\\d{4}(\\.\\d{3})?$"],
"description": "ATT&CK technique",
"data": {"type": "technique", "url": "https://attack.mitre.org/techniques/{id}/"}
},
{
"patterns": ["^TA\\d{4}$"],
"description": "ATT&CK tactic",
"data": {"type": "tactic", "url": "https://attack.mitre.org/tactics/{id}/"}
},
{
"patterns": ["^G\\d{4}$"],
"description": "Threat group",
"data": {"type": "group", "url": "https://attack.mitre.org/groups/{id}/"}
}
]For sources where different subpath patterns need different lookup URLs:
"children": [
{
"patterns": ["^ALAS-\\d{4}-\\d+$"],
"description": "Amazon Linux 1",
"data": {"url": "https://alas.aws.amazon.com/{id}.html"}
},
{
"patterns": ["^ALAS2-\\d{4}-\\d+$"],
"description": "Amazon Linux 2",
"data": {"url": "https://alas.aws.amazon.com/AL2/{id}.html"}
},
{
"patterns": ["^ALAS2023-\\d{4}-\\d+$"],
"description": "Amazon Linux 2023",
"data": {"url": "https://alas.aws.amazon.com/AL2023/{id}.html"}
}
]For complex URL structures where parts of the ID need transformation, a node's data can include a variables object:
Each key in variables is a placeholder name (e.g., number, year). The value is an object:
| Field | Type | Required | Description |
|---|---|---|---|
extract |
string | yes | Regex with capture groups. Groups are numbered {1}, {2}, etc. |
format |
string | no | How to assemble the value from capture groups. Defaults to {1}. Can include literals. |
description |
string | yes | Explains what this variable is and how it's derived from the ID. |
Simple example (single capture group, default format):
"variables": {
"number": {
"extract": "^CWE-(\\d+)$",
"description": "Numeric ID portion (e.g., '79' from 'CWE-79')"
}
}Example with format (appending literal text):
"variables": {
"bucket": {
"extract": "^CVE-\\d{4}-(\\d+)\\d{3}$",
"format": "{1}xxx",
"description": "All but last 3 digits + 'xxx' (e.g., '25xxx' from 'CVE-2026-25010')"
}
}Why anchored patterns? Anchored patterns (^CVE-\d{4}-\d{4,}$) ensure the entire input at each level must match. Unanchored patterns would match substrings, allowing invalid input.
Patterns match the human-readable (unencoded) form. Write patterns matching what you see in the source documentation. ^Auditing Guidelines$ with a literal space, not ^Auditing%20Guidelines$. Resolvers are responsible for decoding percent-encoded input before matching against patterns (see SPEC.md Section 8.3).
Sibling patterns are independent. All sibling patterns at each level are tested independently. All matching patterns contribute results. When siblings overlap on the same input, weight helps consumers choose.
Format patterns, not validity checks. A pattern like CVE-\d{4}-\d{4,} tells you "this looks like a CVE ID" — whether that specific CVE actually exists is only known when you try to resolve it.
For patterns with finite, stable value sets, use known_values in the node's data to enumerate them with descriptions:
{
"patterns": ["^[A-Z]{2,3}$"],
"description": "Control domain. Contains multiple controls.",
"data": {
"type": "domain",
"known_values": {
"IAM": "Identity & Access Management",
"DSP": "Data Security & Privacy Lifecycle Management",
"GRC": "Governance, Risk & Compliance",
"SEF": "Security Incident Management, E-Discovery & Forensics"
}
},
"children": [
{
"patterns": ["^[A-Z]{2,3}-\\d{2}$"],
"description": "Specific control (e.g., IAM-12). Belongs to a domain.",
"data": {"type": "control", "url": "https://ccm.cloudsecurityalliance.org/control/{id}"}
}
]
}When to use known_values:
- Finite, stable sets (control domains, advisory types, document categories)
- Classes that need disambiguation (what is IAM vs DSP vs GRC?)
- Important individual items worth enumerating (ISO standard numbers with their titles)
When NOT to use:
- Open-ended or growing sets (individual CVEs, specific controls)
- Values that are obvious from context (years, sequential numbers)
Examples of good candidates:
Control framework domains:
"known_values": {
"IAM": "Identity & Access Management",
"DSP": "Data Security & Privacy Lifecycle Management"
}Advisory types (Red Hat errata):
"known_values": {
"RHSA": "Security Advisory - security fixes, most commonly referenced",
"RHBA": "Bug Advisory - non-security bug fixes",
"RHEA": "Enhancement Advisory - new features"
}ISO standard numbers with titles:
"known_values": {
"27001": "Information security management systems — Requirements",
"27002": "Information security controls",
"42001": "Artificial intelligence — Management system"
}Rule of thumb: Ask "is this a class of objects?" If yes, describe it. For individual instances, only include in known_values when they're distinct enough to need disambiguation (ISO 27001 vs 42001) or when the set is small and stable.
When URLs can't be computed from the ID pattern alone — because the source uses inconsistent slugs, human-readable paths, or other non-derivable URL components — use lookup_table in the node's data to map each ID directly to its URL.
{
"patterns": ["^LLM\\d{2}$"],
"description": "LLM Top 10 item number",
"data": {
"lookup_table": {
"LLM01": {"url": "https://genai.owasp.org/llmrisk/llm01-prompt-injection/", "title": "Prompt Injection"},
"LLM02": {"url": "https://genai.owasp.org/llmrisk/llm022025-sensitive-information-disclosure/", "title": "Sensitive Information Disclosure"},
"LLM03": {"url": "https://genai.owasp.org/llmrisk/llm032025-supply-chain/", "title": "Supply Chain"},
"LLM04": {"url": "https://genai.owasp.org/llmrisk/llm042025-data-and-model-poisoning/", "title": "Data and Model Poisoning"},
"LLM05": {"url": "https://genai.owasp.org/llmrisk/llm052025-improper-output-handling/", "title": "Improper Output Handling"},
"LLM06": {"url": "https://genai.owasp.org/llmrisk/llm062025-excessive-agency/", "title": "Excessive Agency"},
"LLM07": {"url": "https://genai.owasp.org/llmrisk/llm072025-system-prompt-leakage/", "title": "System Prompt Leakage"},
"LLM08": {"url": "https://genai.owasp.org/llmrisk/llm082025-vector-and-embedding-weaknesses/", "title": "Vector and Embedding Weaknesses"},
"LLM09": {"url": "https://genai.owasp.org/llmrisk/llm092025-misinformation/", "title": "Misinformation"},
"LLM10": {"url": "https://genai.owasp.org/llmrisk/llm102025-unbounded-consumption/", "title": "Unbounded Consumption"}
},
"provenance": {
"method": "Searched genai.owasp.org/llm-top-10/ listing page, then verified each individual URL. LLM01 slug lacks the year prefix that all other entries have — confirmed this is how OWASP published it, not a data entry error.",
"date": "2026-02-22",
"source_url": "https://genai.owasp.org/llm-top-10/"
}
}
}Each lookup_table entry maps an ID to:
| Field | Type | Required | Description |
|---|---|---|---|
url |
string | yes | The actual URL for this specific ID |
title |
string | no | Human-readable title (useful when it differs from the ID) |
The provenance object documents how the lookup table was built:
| Field | Type | Required | Description |
|---|---|---|---|
method |
string | yes | How the URLs were found and verified (searched site, scraped listing page, read documentation, etc.) |
date |
string | yes | When the lookup table was last verified (ISO 8601 date) |
source_url |
string | no | The page or API used to build the table |
When to use lookup_table:
- URLs contain human-readable slugs not derivable from the ID (
llm01-prompt-injection) - The source uses inconsistent URL patterns (LLM01 has no year, LLM02-10 do)
- URL structure changed between entries and can't be expressed as a single template
- Small, finite sets where enumerating every URL is practical
When NOT to use:
- URLs follow a consistent, computable pattern (use
urlwith{id}template instead) - The set is open-ended or very large (thousands of entries)
Relationship to known_values: A node's data can have both. known_values provides descriptions for disambiguation. lookup_table provides URLs for resolution. If you have lookup_table with title fields, known_values is redundant — but including both is fine since they serve different purposes (description vs resolution).
Relationship to url: If a node's data has both a url template and a lookup_table, the lookup_table takes priority for IDs it contains. The url template serves as a fallback for IDs not in the table (useful when most IDs follow a pattern but some exceptions exist).
Why provenance? Registry data is only trustworthy if you can verify it. Provenance records how the data was gathered so reviewers (human or AI) can re-verify it, and future maintainers know where to check for updates. Sources change their URL structures — provenance tells you where to look when that happens.
For sources where different versions have different URL structures, add version-level children to the name-level node. These match against the @version component:
{
"patterns": ["^ccm$"],
"description": "Cloud Controls Matrix",
"data": { "..." : "..." },
"children": [
{
"patterns": ["^4\\..*$"],
"description": "Version 4.x",
"data": {"url": "https://ccm.cloudsecurityalliance.org/v4/control/{id}"},
"children": [
{
"patterns": ["^[A-Z]{2,3}-\\d{2}$"],
"description": "Specific control",
"data": {"type": "control"}
}
]
},
{
"patterns": ["^3\\..*$"],
"description": "Version 3.x (legacy)",
"data": {"url": "https://cloudsecurityalliance.org/artifacts/ccm-v3/{id}"}
}
]
}Resolution example: secid:control/cloudsecurityalliance.org/ccm@4.0.1#IAM-12
- Name
ccm→ matches name-level node → returns source metadata - Version
4.0.1→ matches version-level child^4\..*$→ returns v4 URL template - Subpath
IAM-12→ matches subpath-level child^[A-Z]{2,3}-\d{2}$→ returns control type
When not needed: Most sources don't need version-level children. Use when:
- Different major versions have incompatible URL structures
- Legacy versions are hosted on different infrastructure
If URLs are predictable (just substitute {version}), use the {version} placeholder in the name-level data's URLs instead.
For sources where individual items can be versioned independently (e.g., git-backed databases, advisory revision histories), add deeper children within subpath-level nodes:
{
"patterns": ["^GHSA-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}$"],
"description": "GitHub Security Advisory ID",
"data": {"url": "https://github.com/advisories/{id}"},
"children": [
{
"patterns": ["^[0-9a-f]{7,40}$"],
"description": "Git commit hash (short or full)",
"data": {"url": "https://github.com/github/advisory-database/blob/{item_version}/advisories/github-reviewed/{id}.json"}
}
]
}Resolution example: secid:advisory/github.com/advisories/ghsa#GHSA-jfh8-c2jp-5v3q@a1b2c3d
- Subpath
GHSA-jfh8-c2jp-5v3q→ matches subpath-level node → returns advisory URL - Item version
a1b2c3d→ matches deeper child^[0-9a-f]{7,40}$→ returns commit-specific URL
When not needed: Most sources don't need item version children. Use when:
- The source is git-backed and items change over time (GHSA, CVE list repo)
- Advisory revisions are tracked independently (Red Hat errata revisions)
- Content is wiki-like with edit history
When not appropriate:
- The version is already part of the ID itself (arXiv
2303.08774v2) - The whole source is versioned as a unit (OWASP Top 10
@2021) - Items are immutable once published
These fields live in the name-level node's data object. They control what happens when a SecID omits the @version component. Most sources don't need them — the default behavior ("return current") is correct for sources like CVE where IDs are unique across all versions.
| Field | Type | Description |
|---|---|---|
version_required |
boolean, optional | true if unversioned references are ambiguous. Default: false. When true, the resolver should not silently return a single version. |
unversioned_behavior |
string, optional | One of "current" (default), "current_with_history", "all_with_guidance". How the resolver should respond when version is omitted. |
version_disambiguation |
string, optional | AI-readable instructions for determining which version was intended based on available context (publication date, ID format, surrounding references, etc.). |
versions_available |
array, optional | Array of objects documenting known versions. Each object has: version (string, required), release_date (string, ISO date, optional), status (string: "current", "superseded", "draft", optional), note (string, optional). |
| Value | Resolver Response | Use When |
|---|---|---|
"current" |
Return the current/latest version. No ambiguity signal. | IDs are unique across all versions, or the source doesn't meaningfully version (CVE, CWE, GHSA). This is the default. |
"current_with_history" |
Return the current version, plus a note that other versions exist. | The current version is a sensible default, but older versions are still actively referenced (CCM, ISO 27001). |
"all_with_guidance" |
Return all matching versions with disambiguation instructions from version_disambiguation. |
Item identifiers are reused across versions with different meanings (OWASP Top 10 — A01 means something different in each edition). |
The version_disambiguation field provides instructions for AI clients to determine the intended version from surrounding context. Write it as if explaining to another AI agent that has access to the referring document but doesn't know which version was meant:
"version_disambiguation": "Versions are released by year. Match the version whose release year is closest to but not after the referring document's publication date. If no date context is available, use the latest version (2021). Note: item numbering restarts with each version — A01 in one version is unrelated to A01 in another."This implements the "AI on both ends" pattern: the registry provides reasoning guidance (server side), and the AI client applies it to the local context it has access to (publication dates, surrounding references, document age). Neither side alone can resolve the ambiguity.
"versions_available": [
{
"version": "2021",
"release_date": "2021-09-24",
"status": "current",
"note": "Major restructuring from 2017. A01 changed from Injection to Broken Access Control."
},
{
"version": "2017",
"release_date": "2017-11-20",
"status": "superseded",
"note": "Still widely referenced in existing documentation and certifications."
}
]OWASP Top 10 (all_with_guidance):
{
"official_name": "OWASP Top 10",
"version_required": true,
"unversioned_behavior": "all_with_guidance",
"version_disambiguation": "Versions are released by year. Match the version whose release year is closest to but not after the referring document's publication date. If no date context is available, use the latest version (2021). Note: item numbering restarts with each version — A01 in one version is unrelated to A01 in another.",
"versions_available": [
{"version": "2021", "release_date": "2021-09-24", "status": "current"},
{"version": "2017", "release_date": "2017-11-20", "status": "superseded"},
{"version": "2013", "release_date": "2013-06-12", "status": "superseded"}
]
}CCM (current_with_history):
{
"official_name": "Cloud Controls Matrix",
"version_required": false,
"unversioned_behavior": "current_with_history",
"versions_available": [
{"version": "4.0", "release_date": "2021-06-01", "status": "current"},
{"version": "3.0.1", "release_date": "2017-06-01", "status": "superseded", "note": "Still referenced in older compliance documentation."}
]
}CVE (default — no fields needed): When version_required and unversioned_behavior are absent, the default behavior is current — just resolve the identifier. CVE IDs are globally unique and don't need version context.
The examples field accepts both bare strings and structured ExampleObject entries:
"examples": ["CVE-2024-1234", "CVE-2021-44228", "CVE-2023-44487"]Bare strings show valid ID formats. Helps humans and AI understand what identifiers look like.
Structured ExampleObject adds expected outputs, turning examples into test fixtures:
"examples": [
{
"input": "DSA-5678-1",
"variables": {"num": "5678", "year": "2024"},
"url": "https://www.debian.org/security/2024/dsa-5678",
"note": "Year 2024: 5678 >= 5593 (2024 start) and < 5839 (2025 start)"
}
]ExampleObject Fields:
| Field | Type | Required | Description |
|---|---|---|---|
input |
string | yes | The identifier string to resolve |
version |
string | no | Version context for versioned sources (e.g., "2.0" for CSF) |
variables |
object | no | Expected variable extraction results (keys match the variables definition) |
url |
string | no | Expected resolved URL |
note |
string | no | Human/AI context explaining resolution logic or notable behavior |
Placement conventions:
- Source-level
data.examples: bare strings (representative samples of valid identifiers) - Child-level (leaf pattern nodes)
data.examples: structured ExampleObjects where variables, URLs, or notes add value. Bare strings remain valid at any level.
Structured examples support future resolver conformance testing — each ExampleObject is a positive test case with expected outputs that a resolver implementation can verify against.
For type: reference (documents, papers, standards), additional fields help with identity:
{
"type": "reference",
"namespace": "nist.gov",
"title": "AI RMF",
"full_title": "Artificial Intelligence Risk Management Framework",
"sources": { ... }
}| Field | Type | Description |
|---|---|---|
title |
string | Short/common title |
full_title |
string | null | Complete formal title |
Note: Standard identifier systems (DOI, ISBN, ISSN, arXiv, etc.) are namespaces, not fields:
secid:reference/doi.org/10.6028/NIST.AI.100-1
secid:reference/isbn.org/978-0-123456-78-9
secid:reference/arxiv.org/2303.08774
secid:reference/ietf.org/rfc9110
If a document has both a human-readable reference (secid:reference/nist.gov/ai-rmf) and a DOI (secid:reference/doi.org/10.6028/NIST.AI.100-1), the equivalence relationship between them belongs in the relationship layer, not the registry.
Entity files describe organizations and their products/services. They use the same match_nodes structure as other types — the resolver walks the same tree for secid:entity/redhat.com/openshift as it does for secid:advisory/redhat.com/errata#RHSA-2024:1234.
Entity match_nodes typically use literal patterns (^openshift$) rather than regex patterns, since entity names are fixed strings rather than structured identifier formats. However, the same tree shape enables hierarchical navigation — products can have sub-products as children.
Entity match_nodes may include these additional fields in data:
| Field | Type | Description |
|---|---|---|
issues_type |
string | SecID type this entity issues ("advisory", "weakness", "ttp", "control") |
issues_namespace |
string | SecID namespace for those identifiers |
established |
integer | Year the organization was established |
{
"namespace": "redhat.com",
"type": "entity",
"official_name": "Red Hat, Inc.",
"common_name": "Red Hat",
"notes": "Enterprise open source company, part of IBM since 2019.",
"urls": [
{"type": "website", "url": "https://www.redhat.com"},
{"type": "security", "url": "https://access.redhat.com/security/"}
],
"match_nodes": [
{
"patterns": ["^openshift$"],
"description": "Red Hat OpenShift",
"weight": 100,
"data": {
"official_name": "Red Hat OpenShift",
"description": "Kubernetes-based container platform (general/umbrella term)",
"urls": [
{"type": "website", "url": "https://www.redhat.com/en/technologies/cloud-computing/openshift"},
{"type": "docs", "url": "https://docs.openshift.com"}
],
"examples": ["openshift"]
},
"children": [
{
"patterns": ["^rosa$"],
"description": "Red Hat OpenShift Service on AWS",
"weight": 100,
"data": {
"official_name": "Red Hat OpenShift Service on AWS",
"common_name": "ROSA",
"description": "Managed OpenShift on AWS (jointly operated with AWS)",
"urls": [
{"type": "website", "url": "https://www.redhat.com/en/technologies/cloud-computing/openshift/aws"},
{"type": "aws", "url": "https://aws.amazon.com/rosa/"}
],
"examples": ["rosa"]
}
}
]
},
{
"patterns": ["^rhel$"],
"description": "Red Hat Enterprise Linux",
"weight": 100,
"data": {
"official_name": "Red Hat Enterprise Linux",
"description": "Enterprise Linux distribution",
"urls": [
{"type": "website", "url": "https://www.redhat.com/en/technologies/linux-platforms/enterprise-linux"}
],
"examples": ["rhel"]
}
}
]
}The match_nodes approach means the resolver doesn't need entity-specific logic. secid:entity/redhat.com/openshift walks the tree the same way secid:advisory/redhat.com/errata does. Products with variants (OpenShift → ROSA, ARO) naturally become parent → children relationships.
Cross-references between entities and their publications (e.g., "MITRE operates CVE") are documented via issues_type and issues_namespace fields in entity data, but the formal equivalence belongs in the relationship layer.
{
"schema_version": "1.0",
"namespace": "mitre.org",
"type": "advisory",
"status": "published",
"status_notes": null,
"official_name": "MITRE Corporation",
"common_name": "MITRE",
"alternate_names": ["The MITRE Corporation"],
"notes": "MITRE is a US nonprofit that operates federally funded research and development centers (FFRDCs). In cybersecurity, MITRE created and maintains CVE, CWE, ATT&CK, CAPEC, and ATLAS — foundational identifier systems and frameworks used across the industry. CISA contracts MITRE to operate the CVE Program. CNAs (CVE Numbering Authorities) can assign CVE IDs under MITRE's program.",
"wikidata": ["Q1116236"],
"wikipedia": ["https://en.wikipedia.org/wiki/Mitre_Corporation"],
"urls": [
{"type": "website", "url": "https://www.mitre.org"}
],
"match_nodes": [
{
"patterns": ["^cve$"],
"description": "Common Vulnerabilities and Exposures",
"weight": 100,
"data": {
"official_name": "Common Vulnerabilities and Exposures",
"common_name": "CVE",
"alternate_names": null,
"description": "The canonical vulnerability identifier system, operated by MITRE under contract with CISA.",
"notes": "CVE is the canonical identifier — other advisories cross-reference CVEs. NVD (NIST) enriches CVE records with CVSS scores, CPE entries, and CWE mappings, but NVD enrichment has processing backlogs. Quality of CVE descriptions varies by CNA — some provide detailed technical analysis, others provide minimal information. The cvelistV5 GitHub repo contains raw JSON records organized by year and bucket directories.",
"urls": [
{"type": "website", "url": "https://cve.org"},
{"type": "api", "url": "https://cveawg.mitre.org/api"},
{"type": "bulk_data", "url": "https://github.com/CVEProject/cvelistV5"}
],
"examples": ["CVE-2024-1234", "CVE-2021-44228", "CVE-2026-25010"]
},
"children": [
{
"patterns": ["^CVE-\\d{4}-\\d{4,}$"],
"description": "Standard CVE ID format",
"weight": 100,
"data": {
"url": "https://cve.org/CVERecord?id={id}",
"content_type": "text/html"
}
},
{
"patterns": ["^CVE-\\d{4}-\\d{4,}$"],
"description": "CVE JSON record on GitHub (web view)",
"weight": 50,
"data": {
"url": "https://github.com/CVEProject/cvelistV5/blob/main/cves/{year}/{bucket}/{id}.json",
"format": "json",
"content_type": "text/html",
"note": "GitHub web page showing CVE JSON record from cvelistV5 repository",
"variables": {
"year": {
"extract": "^CVE-(\\d{4})-\\d+$",
"description": "4-digit year (e.g., '2026' from 'CVE-2026-25010')"
},
"bucket": {
"extract": "^CVE-\\d{4}-(\\d+)\\d{3}$",
"format": "{1}xxx",
"description": "All but last 3 digits + 'xxx' (e.g., '25xxx' from 'CVE-2026-25010')"
},
"id": {
"extract": "^(CVE-\\d{4}-\\d+)$",
"description": "Full CVE ID"
}
}
}
},
{
"patterns": ["^CVE-\\d{4}-\\d{4,}$"],
"description": "CVE raw JSON from GitHub",
"weight": 45,
"data": {
"url": "https://raw.githubusercontent.com/CVEProject/cvelistV5/main/cves/{year}/{bucket}/{id}.json",
"format": "json",
"content_type": "application/json",
"note": "Raw CVE JSON file — direct download, no HTML wrapper",
"variables": {
"year": {
"extract": "^CVE-(\\d{4})-\\d+$",
"description": "4-digit year (e.g., '2026' from 'CVE-2026-25010')"
},
"bucket": {
"extract": "^CVE-\\d{4}-(\\d+)\\d{3}$",
"format": "{1}xxx",
"description": "All but last 3 digits + 'xxx' (e.g., '25xxx' from 'CVE-2026-25010')"
},
"id": {
"extract": "^(CVE-\\d{4}-\\d+)$",
"description": "Full CVE ID"
}
}
}
},
{
"patterns": ["^CVE-\\d{4}-\\d{4,}$"],
"description": "CVE JSON via API",
"weight": 50,
"data": {
"url": "https://cveawg.mitre.org/api/cve/{id}",
"format": "json",
"content_type": "application/json",
"note": "API endpoint, richer data than web page"
}
}
]
}
]
}This example shows a namespace with a /-separated path portion (github.com/advisories). The namespace maps to registry/advisory/com/github/advisories.json via the reverse-DNS algorithm.
{
"schema_version": "1.0",
"namespace": "github.com/advisories",
"type": "advisory",
"status": "draft",
"status_notes": null,
"official_name": "GitHub Advisory Database",
"common_name": "GitHub Advisories",
"alternate_names": null,
"notes": "GitHub's advisory database aggregates vulnerabilities across package ecosystems. Acquired npm's advisory database. Advisories are community-editable and cross-reference CVEs.",
"urls": [
{"type": "website", "url": "https://github.com/advisories"}
],
"match_nodes": [
{
"patterns": ["^ghsa$"],
"description": "GitHub Security Advisories",
"weight": 100,
"data": {
"official_name": "GitHub Security Advisories",
"common_name": "GHSA",
"alternate_names": null,
"description": "GitHub-native security advisory identifiers for vulnerabilities in open source packages.",
"notes": "GHSA IDs use a base-32 encoding scheme (lowercase letters and digits). Most GHSAs have a corresponding CVE, but some ecosystem-specific advisories may not. The advisory-database GitHub repo contains the raw advisory data in OSV format.",
"urls": [
{"type": "website", "url": "https://github.com/advisories"},
{"type": "api", "url": "https://api.github.com/advisories"},
{"type": "bulk_data", "url": "https://github.com/github/advisory-database"}
],
"examples": ["GHSA-jfh8-c2jp-5v3q", "GHSA-8v63-cqqc-6r2c"]
},
"children": [
{
"patterns": ["^GHSA-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}$"],
"description": "GitHub Security Advisory ID",
"weight": 100,
"data": {"url": "https://github.com/advisories/{id}"}
}
]
}
]
}Key differences from simple namespace:
namespaceincludes a path:github.com/advisories(not justgithub.com)- Filesystem path uses reverse-DNS for domain + appended path:
registry/advisory/com/github/advisories.json - SecID references use the full namespace:
secid:advisory/github.com/advisories/ghsa#GHSA-jfh8-c2jp-5v3q
The current YAML frontmatter maps to JSON as follows:
| YAML Field | JSON Field | Notes |
|---|---|---|
full_name |
official_name |
Renamed for clarity |
website |
urls[] where type=website |
Now array with context |
sources (keyed object) |
match_nodes (array of pattern nodes) |
Literal keys become patterns regex values (ECMAScript-compatible) |
id_pattern / id_patterns |
children on name-level nodes |
Subpath patterns become child nodes |
id_routing |
data.url on child nodes |
Merged into node data |
version_patterns |
Version-level children | Intermediate tree level between name and subpath |
item_version_patterns |
Deeper children within subpath nodes | Same nesting, just deeper in the tree |
urls.lookup |
urls[] where type=lookup |
Now array with context |
wikidata |
wikidata[] |
Now array |
wikipedia |
wikipedia[] |
New field, array |
status |
status |
New values: proposed, draft, pending, published |
status_notes |
status_notes |
New field |
| Markdown body | notes (top-level and/or in node data) |
Narrative content migrates to notes fields |
The following fields were considered but belong in the enrichment/relationship data layer, not the registry:
| Field | Reason |
|---|---|
operator |
Relationship (who operates what) |
superseded_by |
Relationship + judgment (X replaced Y) |
deprecated_by |
Relationship (source X replaced by Y) |
deprecated_date |
Temporal enrichment |
established |
Temporal enrichment |
versions[] |
Replaced by version-level children in the pattern tree for resolution; version catalog is enrichment |
The registry focuses on identity, resolution, and disambiguation. Relationships and lifecycle metadata belong in separate data layers that reference SecIDs.
The Markdown body content (narrative documentation) migrates to notes fields — top-level notes for organizational context, source-level notes for source-specific operational knowledge. No companion .md files needed; everything lives in one .json file.
For sources with hierarchical identifiers, the tree naturally mirrors the hierarchy. Domain → control → section becomes parent → child → grandchild:
{
"schema_version": "1.0",
"namespace": "cloudsecurityalliance.org",
"type": "control",
"status": "published",
"official_name": "Cloud Security Alliance",
"common_name": "CSA",
"notes": "CSA is a nonprofit focused on cloud security best practices. Publishes multiple control frameworks (CCM, AICM) and runs the STAR certification program. Also publishes research on AI security through its AI Safety Initiative.",
"wikidata": ["Q5135329"],
"urls": [
{"type": "website", "url": "https://cloudsecurityalliance.org"}
],
"match_nodes": [
{
"patterns": ["^ccm$"],
"description": "Cloud Controls Matrix",
"weight": 100,
"data": {
"official_name": "Cloud Controls Matrix",
"common_name": "CCM",
"description": "Security controls framework organized by domains. Domains contain controls, controls may have implementation sections.",
"notes": "CCM v4 has 17 domains and 197 controls. Domain codes are 2-3 uppercase letters (e.g., IAM, DSP). Control IDs append a dash and two-digit number (e.g., IAM-12). Some controls have implementation sections with a dot suffix (e.g., IAM-12.1). CCM is available as a spreadsheet download — no direct per-control URL for all versions.",
"urls": [
{"type": "website", "url": "https://cloudsecurityalliance.org/research/cloud-controls-matrix"},
{"type": "docs", "url": "https://cloudsecurityalliance.org/artifacts/cloud-controls-matrix-v4"}
],
"version_required": false,
"unversioned_behavior": "current_with_history",
"versions_available": [
{"version": "4.0", "release_date": "2021-06-01", "status": "current"},
{"version": "3.0.1", "release_date": "2017-06-01", "status": "superseded", "note": "Still referenced in older compliance documentation."}
],
"examples": [
"secid:control/cloudsecurityalliance.org/ccm#IAM",
"secid:control/cloudsecurityalliance.org/ccm#IAM-12",
"secid:control/cloudsecurityalliance.org/ccm@4.0#IAM-12",
"secid:control/cloudsecurityalliance.org/ccm#IAM-12.1"
]
},
"children": [
{
"patterns": ["^4\\..*$"],
"description": "Version 4.x",
"data": {"url": "https://ccm.cloudsecurityalliance.org/v4/control/{id}"},
"children": [
{
"patterns": ["^[A-Z]{2,3}$"],
"description": "Control domain (e.g., IAM). Contains multiple controls.",
"data": {
"type": "domain",
"known_values": {
"A&A": "Audit & Assurance",
"AIS": "Application & Interface Security",
"BCR": "Business Continuity Management & Operational Resilience",
"CCC": "Change Control & Configuration Management",
"CEK": "Cryptography, Encryption & Key Management",
"DCS": "Datacenter Security",
"DSP": "Data Security & Privacy Lifecycle Management",
"GRC": "Governance, Risk & Compliance",
"HRS": "Human Resources",
"IAM": "Identity & Access Management",
"IPY": "Interoperability & Portability",
"IVS": "Infrastructure & Virtualization Security",
"LOG": "Logging & Monitoring",
"SEF": "Security Incident Management, E-Discovery & Forensics",
"STA": "Supply Chain Management, Transparency & Accountability",
"TVM": "Threat & Vulnerability Management",
"UEM": "Universal Endpoint Management"
}
}
},
{
"patterns": ["^[A-Z]{2,3}-\\d{2}$"],
"description": "Specific control (e.g., IAM-12). Belongs to a domain.",
"data": {"type": "control", "url": "https://ccm.cloudsecurityalliance.org/v4/control/{id}"}
},
{
"patterns": ["^[A-Z]{2,3}-\\d{2}\\.\\d{1,2}$"],
"description": "Control section (e.g., IAM-12.1). Implementation detail within a control.",
"data": {"type": "section"}
}
]
},
{
"patterns": ["^3\\..*$"],
"description": "Version 3.x (legacy)",
"data": {"url": "https://cloudsecurityalliance.org/artifacts/ccm-v3/{id}"}
},
{
"patterns": ["^[A-Z]{2,3}$"],
"description": "Control domain (unversioned fallback)",
"data": {"type": "domain"}
},
{
"patterns": ["^[A-Z]{2,3}-\\d{2}$"],
"description": "Specific control (unversioned fallback)",
"data": {"type": "control"}
},
{
"patterns": ["^[A-Z]{2,3}-\\d{2}\\.\\d{1,2}$"],
"description": "Control section (unversioned fallback)",
"data": {"type": "section"}
}
]
}
]
}Key points:
- The tree mirrors the hierarchy naturally: name → version → subpath
known_valueson the domain-level node (finite, stable set)- Version-level children route to different URL structures (v4 vs v3)
- Subpath children within version children for version-specific resolution
- Unversioned fallback children handle queries without
@version - Not every node needs a URL (domain-level returns known_values without a lookup URL)
The schema_version field allows for future evolution. Parsers should check this field and handle unknown versions gracefully.
Current version: 1.0 (draft)