Skip to content

Latest commit

 

History

History
74 lines (45 loc) · 4.64 KB

File metadata and controls

74 lines (45 loc) · 4.64 KB

CI Enforcement

How detection content integrity is gated in the operational repo.


The problem CI is solving

Detection-as-code only works if the CI pipeline is the canonical decision on what is deployable. If CI is advisory — "we'll look at it later" — then the rule content drifts from the validated state, duplicates creep in, placeholders replace real values, and the library slowly becomes unreliable. The whole methodology collapses.

So CI is not advisory here. It is the gate.

The three checks that matter

1. Content validation

validate_detection_content.py is a Python script that parses every Sigma YAML and every Wazuh XML rule in the library and enforces:

  • Schema presence. Required fields (title, id, logsource, detection, level) are non-empty.
  • UUID uniqueness. Every Sigma id is unique across the library. Duplicates are hard errors.
  • UUIDv4 compliance. The id must be a real version-4 UUID, not a shape-regex-compliant placeholder. The third group must start with 4; the fourth group must start with 8, 9, a, or b.
  • Logsource specificity. Sigma rules must declare product at minimum; category or service is strongly preferred.
  • Wazuh rule IDs. Custom rule IDs stay within the locally allocated range. Out-of-range rules are hard errors.

This validator is the one that caught the 33-collision / 69-file placeholder UUID incident on first run. Before it existed, the library passed all prior checks and the collisions were invisible. The story of writing the validator and running it once is exactly the story of why CI content validation is worth the engineering cost.

2. Count integrity

A separate check compares the live detection count against a locked snapshot stored alongside the library. If the count drifts unexpectedly — files deleted, directories renamed, rules silently lost — CI fails. This catches the class of mistake where content vanishes without anyone noticing until an incident exposes the gap.

The snapshot is updated deliberately, as a discrete commit, when counts should change. The snapshot commit is the audit record of "we chose to change this" — an unchosen change is indistinguishable from a loss.

3. Deploy gating

deploy-wazuh-pack.yml only runs after verify.yml passes. It builds the rule pack, SSHes into the Wazuh manager, places the pack, reloads, and runs a post-deploy verification. If any step fails, the deploy aborts and the prior pack stays in place. There is no path to deploy unverified content.

Allowlists are temporary scaffolding

The one acceptable use of an allowlist is during remediation of a known, bounded problem. When the 33 Sigma duplicate UUIDs were discovered, the validator was allowed to downgrade those specific collisions to warnings — but only if the collision set matched the allowlist file-for-file. If a new file joined a collision group, or if an allowlisted file was renamed, the validator failed hard.

The allowlist was emptied the day the remediation completed. It did not grow. It did not persist. It did not become the new normal.

This is the shape an allowlist has to have to be acceptable:

  1. Time-bounded — scheduled to be emptied
  2. File-scoped — cannot silently absorb new violations
  3. Auto-failing on drift — any change to the collision set is a hard error
  4. Owned — someone is on the hook for closing it

An allowlist without those properties is a loophole, not scaffolding.

Failure modes CI cannot catch

Worth being honest about. CI enforces what can be verified statically from the content in the repo. It cannot catch:

  • Rules that are syntactically correct but semantically wrong (a logsource change that accidentally narrows the match)
  • Rules whose filters are too broad in a specific environment
  • Rules whose technique reference is stale because MITRE deprecated the sub-technique
  • Noise ratios, false-positive rates, or signal-to-noise metrics
  • Runtime errors in the Wazuh manager after the pack is deployed

Those are caught by live observation, tuning sessions, and post-deploy verification — not by CI. The Wazuh process telemetry tuning case study is an example of the kind of work that lives in that layer.

The invariant CI protects

If the validator passes, then:

  • Every rule in the library has a unique, real UUIDv4.
  • Every rule declares its log source.
  • Every rule has a severity level.
  • The custom-rule-ID ranges are clean.
  • The count matches the locked snapshot.

That is a narrow invariant, and it is deliberately narrow. Broad invariants aren't enforceable. Narrow ones are, and a narrow enforced invariant is worth more than a broad aspirational one.