ci: add no-public-api-changes label based on public olean diff by kim-em · Pull Request #38230 · leanprover-community/mathlib4

kim-em · 2026-04-19T03:37:13Z

This PR adds a CI step that compares the public .olean files produced by the PR build against master. When they are byte-identical, the label no-public-api-changes is applied; otherwise the label is removed.

It also introduces a contributor/reviewer-controlled label expect-no-public-api-changes; when present, CI fails unless no-public-api-changes also applies. Together these are intended to make proof-only golfing PRs (see the mathlib-maintainers thread) cheap to review with confidence that nothing downstream of the touched files is affected.

⚠️ Open question before merge

The current implementation fetches master's cached oleans via a second lake exe cache get in the existing tools-branch checkout. I'd like feedback on whether the steady-state cost is acceptable before investing further.

Measured cost (sampled from master run 24620430633, 2026-04-19):

	Per PR push
`cache get` (8,241 files from Azure)	~8 s wall-clock (parallel download, pipelined decompress)
`lake build --no-build Mathlib` (baseline validation)	a few seconds
`public_olean_diff.sh` (byte-comparison of ~8k files)	a few seconds
Extra wall-clock total	≈ 15–30 s
Additional runner disk	≈ 5.8 GB (a second full `.lake/build/lib/lean/Mathlib`)
Additional Azure egress	~1 GB compressed

So the wall-clock hit is small, but disk usage on PR runners roughly doubles (on top of the ~5.8 GB the PR build already materializes).

Three concrete questions for reviewers:

Is this disk overhead acceptable as always-on automation? Are any PR runners at risk of hitting their disk ceiling?
If not, do we prefer weaker automation (make the check opt-in via expect-no-public-api-changes, losing the informational no-public-api-changes label on most PRs) or delaying this PR until we extend lake exe cache to fetch public *.olean only (and skip .olean.private/.olean.server/.ir.olean, which would cut egress and disk substantially)?
Is there interest in a follow-up that moves the diff out of the build job and into a dedicated post-build job with its own checkout — same cost story but clearer ownership?

How it works

With the module system, a Mathlib build produces three files per module: Foo.olean (public/exported), Foo.olean.server (server data), and Foo.olean.private (full private data). Theorems are exported as axioms in the public .olean (see addDecl at src/Lean/AddDecl.lean:100-104), so a proof-only change leaves Foo.olean byte-identical and only updates Foo.olean.private. A byte-level diff of public oleans therefore certifies "nothing exported changed" — signatures, exposed definition bodies, and public environment-extension state all included.

New pieces:

scripts/public_olean_diff.sh — compares two .lake/build/lib/lean trees, ignoring .olean.private, .olean.server, and .ir.olean. Tri-state exit: 0=match, 1=differs, 2=error/unavailable.
New steps in .github/workflows/build_template.yml:
1. Fetch master oleans via lake exe cache get inside tools-branch.
2. Validate the baseline with lake build --no-build Mathlib over the full diff scope (not just Mathlib.Init, which would only prove a prefix is cached).
3. Run the diff; upload the result as an artifact.
4. Look up the PR number from the (repo-qualified) head branch, and verify the PR head still points at this run's SHA (TOCTOU guard against fast successive pushes).
5. Add/remove the no-public-api-changes label (only when the diff state is match or differs; unavailable leaves existing labels alone).
6. Fail the job if expect-no-public-api-changes is set but the state is differs.

Gated to same-repo PR-branch pushes on leanprover-community/mathlib4, excluding master, bors staging/trying, and any run overriding tools_branch_ref.

Caveats

Only main-repo PRs are covered. build_fork.yml needs an analogous change in a follow-up.
Cold master cache or failed baseline validation → the state is unavailable and labels are left untouched (no spurious flipping).
Label updates are best-effort; they skip silently if PR lookup, the TOCTOU SHA check, or token operations fail.
The two labels must be created once on the repo (no-public-api-changes, expect-no-public-api-changes).
Not tested on actual CI yet.

🤖 Prepared with Claude Code

This PR adds a CI step that compares the public `.olean` files produced by the PR build against `master`. When they are byte-identical, the label `no-public-api-changes` is applied; otherwise it is removed. It also introduces a contributor/reviewer-controlled label `expect-no-public-api-changes`; when present, CI fails unless `no-public-api-changes` is applicable. This is intended to make proof-only golfing PRs cheap to review with confidence that nothing downstream of the touched files is affected. With the module system, theorems are exported as axioms in the public `.olean`, so proof-body changes only affect `.olean.private`. See `scripts/public_olean_diff.sh` for the comparison logic, and the discussion at https://leanprover.zulipchat.com/#narrow/channel/180721-mathlib-maintainers/topic/yuanyi-350.20golfing.20PRs for context. Caveats: - runs only on main-repo PRs (build_fork.yml is not yet updated); - silently skips when the master cache is cold; - labels `no-public-api-changes` and `expect-no-public-api-changes` must be created on the repo manually. 🤖 Prepared with Claude Code

github-actions · 2026-04-19T03:38:17Z

PR summary caf1dee8e5

Import changes for modified files

No significant changes to the import graph

Import changes for all files

Files	Import difference

Declarations diff

No declarations were harmed in the making of this PR! 🐙

You can run this locally as follows

## summary with just the declaration names:
./scripts/pr_summary/declarations_diff.sh <optional_commit>

## more verbose report:
./scripts/pr_summary/declarations_diff.sh long <optional_commit>

The doc-module for scripts/pr_summary/declarations_diff.sh contains some details about this script.

No changes to technical debt.

You can run this locally as

./scripts/reporting/technical-debt-metrics.sh pr_summary

The relative value is the weighted sum of the differences with weight given by the inverse of the current value of the statistic.
The absolute value is the relative value divided by the total sum of the inverses of the current values (i.e. the weighted average of the differences).

⚠️ Workflow documentation reminder

This PR modifies files under .github/workflows/.
Please update docs/workflows.md if the workflow inventory, triggers, or behavior changed.

Modified workflow files:

.github/workflows/build_template.yml

⚠️ Scripts folder reminder

This PR adds files under scripts/.
Please consider whether each added script belongs in this repository or in leanprover-community/mathlib-ci.

A script belongs in mathlib-ci if it is a CI automation script that interacts with GitHub (e.g. managing labels, posting comments, triggering bots), runs from a trusted external checkout in CI, or requires access to secrets.

A script belongs in this repository (scripts/) if it is a developer or maintainer tool to be run locally, a code maintenance or analysis utility, a style linting tool, or a data file used by the library's own linters.

See the mathlib-ci README for more details.

Added scripts files:

scripts/public_olean_diff.sh

* Tri-state diff result (match/differs/unavailable) so script errors don't masquerade as public-API diffs. * Validate the fetched master baseline with `lake build --no-build Mathlib.Init`; a bare `cache get` isn't sufficient evidence of a warm cache. Skip the diff entirely when validation fails. * Only label when the PR head still points at this run's SHA, to avoid stale-commit TOCTOU after a fast follow-up push. * Tighten the guard to same-repo PR-branch pushes on leanprover-community/mathlib4: exclude master, bors `staging`/`trying`, and any run that overrides `tools_branch_ref`. 🤖 Prepared with Claude Code

Address second-round review on public-olean diff: * Validate the cached baseline over `Mathlib` (the full diff scope) instead of just `Mathlib.Init`. A `--no-build` check of a single module only proves a prefix is cached; a partial cache would otherwise produce spurious added:/removed: entries in the diff. * Qualify the PR lookup as `${owner}:${branch}` so we cannot match a same-named head branch on an unrelated fork. The TOCTOU SHA check also catches this, but explicitness is cheap. 🤖 Prepared with Claude Code

SnirBroshi · 2026-04-19T14:42:45Z

Since this only requires a boolean answer, how about comparing sha256 hashes of the public oleans rather than their contents?
This means that if CI publishes a list of olean hashes as an artifact (which is probably pretty cheap - perhaps the oleans themselves even contain some CRC field) we can avoid downloading the cache and instead download the hash list to compare against. This should avoid the time and disk usage increases you mentioned.

bryangingechen · 2026-04-20T21:35:52Z

I think the disk usage is not a big deal since all our self-hosted runners are ephemeral now; everything will get cleaned up after each run. However, after discussing with @jcommelin, I think we might want a different version of this:

it'd be better to post an "olean report" listing the modules where there were differences (and what types of differences); that way reviewers could focus their attention on the files with nontrivial API changes (or ask for those to be split out)
instad of running on every push, make it run on request (via a special comment or label application?).

I'm working on that now in a branch based off of this one.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kim-em · 2026-04-21T03:23:56Z

@bryangingechen, please feel free to close this once you have an alternative up!

github-actions Bot added the CI Modifies the continuous integration setup or other automation label Apr 19, 2026

kim-em added 2 commits April 19, 2026 06:35

add script documentation to scripts/README.md

46367d1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mathlib-triage Bot assigned jcommelin Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add no-public-api-changes label based on public olean diff#38230

ci: add no-public-api-changes label based on public olean diff#38230
kim-em wants to merge 4 commits intoleanprover-community:masterfrom
kim-em:no-public-api-changes-label

kim-em commented Apr 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 19, 2026 •

edited

Loading

Uh oh!

SnirBroshi commented Apr 19, 2026

Uh oh!

bryangingechen commented Apr 20, 2026

Uh oh!

kim-em commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kim-em commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Open question before merge

How it works

Caveats

Uh oh!

github-actions Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR summary caf1dee8e5

Import changes for modified files

Declarations diff

⚠️ Workflow documentation reminder

⚠️ Scripts folder reminder

Uh oh!

SnirBroshi commented Apr 19, 2026

Uh oh!

bryangingechen commented Apr 20, 2026

Uh oh!

kim-em commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kim-em commented Apr 19, 2026 •

edited

Loading

github-actions Bot commented Apr 19, 2026 •

edited

Loading